2004-09-14 21:23:24

by Neil Schemenauer

[permalink] [raw]
Subject: Re: Possible dcache BUG: debugging patch

Hi Marcelo,

One of our AMD K7 machines seems to be running into the dcache bug.
I'm attaching the oops trace. Is it possible to apply your BUG
patch to the 2.6.7 or 2.6.8.1 kernel or should I use 2.6.9-rc1-mm5?

Best regards,

Neil


======================================================================

Unable to handle kernel paging request at virtual address 705f6573
printing eip:
c01539c6
*pde = 00000000
Oops: 0002 [#1]
Modules linked in: via_rhine crc32 binfmt_capwrap
CPU: 0
EIP: 0060:[prune_dcache+38/288] Not tainted
EFLAGS: 00010212 (2.6.7)
EIP is at prune_dcache+0x26/0x120
eax: c038edd4 ebx: c0601ec4 ecx: c8601e50 edx: 705f6573
esi: c05e8be0 edi: 00000061 ebp: f7ffea48 esp: f7d92f0c
ds: 007b es: 007b ss: 0068
Process kswapd0 (pid: 32, threadinfo=f7d92000 task=f7dc6050)
Stack: 00000080 00000000 f7d92000 c0153d92 c0131417 02c47800 00000000 00030260
000000eb 00000000 000000d0 000000c0 c038dca4 00000001 0000000a c038db80
c01324b3 00000000 f7d92f9c 000000a0 00000000 000000c0 000000c0 000000c0
Call Trace:
[shrink_dcache_memory+18/32] shrink_dcache_memory+0x12/0x20
[shrink_slab+311/368] shrink_slab+0x137/0x170
[balance_pgdat+419/496] balance_pgdat+0x1a3/0x1f0
[kswapd+182/192] kswapd+0xb6/0xc0
[autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
[autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
[kswapd+0/192] kswapd+0x0/0xc0
[kernel_thread_helper+5/24] kernel_thread_helper+0x5/0x18

Code: 89 02 89 49 04 89 09 a1 d8 ed 38 c0 0f 18 00 90 ff 0d e0 ed


$ cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 8
model name : AMD Athlon(TM) XP 2400+
stepping : 1
cpu MHz : 2001.026
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips : 3956.73


2004-09-14 21:58:45

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: Possible dcache BUG: debugging patch


Hi Neil,

IIRC Gene's problem was hardware misconfiguration.

And your trace doesnt seem to look similar to his (different location inside
prune_dcache from what I remember).

Anyway, how hard is for you to reproduce this?

Yes 2.6.9-rc1-mm5 contains the remove_inode_buffers BUGs to catch NULL
list pointers, you can try that, but it might be unrelated.

Ingo was also seeing this oopses he said, but he didnt replied. Ingo?

On Tue, Sep 14, 2004 at 05:13:01PM -0400, Neil Schemenauer wrote:
> Hi Marcelo,
>
> One of our AMD K7 machines seems to be running into the dcache bug.
> I'm attaching the oops trace. Is it possible to apply your BUG
> patch to the 2.6.7 or 2.6.8.1 kernel or should I use 2.6.9-rc1-mm5?
>
> Best regards,
>
> Neil
>
>
> ======================================================================
>
> Unable to handle kernel paging request at virtual address 705f6573
> printing eip:
> c01539c6
> *pde = 00000000
> Oops: 0002 [#1]
> Modules linked in: via_rhine crc32 binfmt_capwrap
> CPU: 0
> EIP: 0060:[prune_dcache+38/288] Not tainted
> EFLAGS: 00010212 (2.6.7)
> EIP is at prune_dcache+0x26/0x120
> eax: c038edd4 ebx: c0601ec4 ecx: c8601e50 edx: 705f6573
> esi: c05e8be0 edi: 00000061 ebp: f7ffea48 esp: f7d92f0c
> ds: 007b es: 007b ss: 0068
> Process kswapd0 (pid: 32, threadinfo=f7d92000 task=f7dc6050)
> Stack: 00000080 00000000 f7d92000 c0153d92 c0131417 02c47800 00000000 00030260
> 000000eb 00000000 000000d0 000000c0 c038dca4 00000001 0000000a c038db80
> c01324b3 00000000 f7d92f9c 000000a0 00000000 000000c0 000000c0 000000c0
> Call Trace:
> [shrink_dcache_memory+18/32] shrink_dcache_memory+0x12/0x20
> [shrink_slab+311/368] shrink_slab+0x137/0x170
> [balance_pgdat+419/496] balance_pgdat+0x1a3/0x1f0
> [kswapd+182/192] kswapd+0xb6/0xc0
> [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
> [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
> [kswapd+0/192] kswapd+0x0/0xc0
> [kernel_thread_helper+5/24] kernel_thread_helper+0x5/0x18
>
> Code: 89 02 89 49 04 89 09 a1 d8 ed 38 c0 0f 18 00 90 ff 0d e0 ed

2004-09-14 22:25:04

by Neil Schemenauer

[permalink] [raw]
Subject: Re: Possible dcache BUG: debugging patch

On Tue, Sep 14, 2004 at 05:06:03PM -0300, Marcelo Tosatti wrote:
> And your trace doesnt seem to look similar to his (different location inside
> prune_dcache from what I remember).
>
> Anyway, how hard is for you to reproduce this?

It's only happened once so far. I'll upgrade to 2.6.8.1 and apply
your BUG patch on top. If it happens again then maybe we will get
more info.

Neil