2003-01-15 03:46:31

by Tupshin Harper

[permalink] [raw]
Subject: Unable to handle kernel NULL pointer kernel 2.4.21-pre3-ac4

Interesting bits:
KT400
LVM2(hence the ac tree)
reiserfs
happened during a lengthy mysql operation.

Jan 14 19:39:17 fussbudget kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000004
Jan 14 19:39:17 fussbudget kernel: c0133c58
Jan 14 19:39:17 fussbudget kernel: *pde = 00000000
Jan 14 19:39:17 fussbudget kernel: Oops: 0002
Jan 14 19:39:17 fussbudget kernel: CPU: 0
Jan 14 19:39:17 fussbudget kernel: EIP:
0010:[__free_pages_ok+600/640] Tainted: PF
Jan 14 19:39:17 fussbudget kernel: EFLAGS: 00210246
Jan 14 19:39:17 fussbudget kernel: eax: 00000000 ebx: c10cfc00 ecx:
c5662000 edx: c566205c
Jan 14 19:39:17 fussbudget kernel: esi: 00000000 edi: 00003b37 ebp:
c0378350 esp: c5663e58
Jan 14 19:39:17 fussbudget kernel: ds: 0018 es: 0018 ss: 0018
Jan 14 19:39:17 fussbudget kernel: Process mysqld (pid: 2415,
stackpage=c5663000)
Jan 14 19:39:17 fussbudget kernel: Stack: 00000001 00200282 caa14d40
caa14d40 caa14d40 c10cfc00 c013f135 caa14d40
Jan 14 19:39:17 fussbudget kernel: dd53a428 c10cfc00 00003b37
c0378350 c0132e1f c10cfc00 000001d2 c5662000
Jan 14 19:39:17 fussbudget kernel: 00000200 000001d2 00000020
00000020 000001d2 00000020 00000006 c0133061
Jan 14 19:39:17 fussbudget kernel: Call Trace:
[try_to_free_buffers+133/240] [shrink_cache+527/768]
[shrink_caches+97/176] [try_to_free_pages_zone+54/96]
[balance_classzone+85/480]
Jan 14 19:39:17 fussbudget kernel: Code: 89 58 04 89 03 89 53 04 89 59
5c 89 73 0c ff 41 68 eb c1 0f
Using defaults from ksymoops -t elf32-i386 -a i386


>>ebx; c10cfc00 <_end+c944bc/205ca93c>
>>ecx; c5662000 <_end+52268bc/205ca93c>
>>edx; c566205c <_end+5226918/205ca93c>
>>ebp; c0378350 <contig_page_data+b0/340>
>>esp; c5663e58 <_end+5228714/205ca93c>

Code; 00000000 Before first symbol
00000000 <_EIP>:
Code; 00000000 Before first symbol
0: 89 58 04 mov %ebx,0x4(%eax)
Code; 00000003 Before first symbol
3: 89 03 mov %eax,(%ebx)
Code; 00000005 Before first symbol
5: 89 53 04 mov %edx,0x4(%ebx)
Code; 00000008 Before first symbol
8: 89 59 5c mov %ebx,0x5c(%ecx)
Code; 0000000b Before first symbol
b: 89 73 0c mov %esi,0xc(%ebx)
Code; 0000000e Before first symbol
e: ff 41 68 incl 0x68(%ecx)
Code; 00000011 Before first symbol
11: eb c1 jmp ffffffd4 <_EIP+0xffffffd4>
Code; 00000013 Before first symbol
13: 0f 00 00 sldtl (%eax)


2003-01-15 06:59:34

by Tupshin Harper

[permalink] [raw]
Subject: Re: Unable to handle kernel NULL pointer kernel 2.4.21-pre3-ac4

FYI...the output I showed previously was on a tainted kernel(vmware
modules), but after a fresh reboot and on the identical but untainted
kernel, I got the same error while doing the same thing, namely
converting a mysql myisam table to innodb. I did succesfully convert
some tables without a problem, and the table that triggered the problem
originally did succeed after rebooting.

-Tupshin



2003-01-15 21:56:51

by Tupshin Harper

[permalink] [raw]
Subject: Re: Unable to handle kernel NULL pointer kernel 2.4.21-pre3-ac4

OK....this definitely(well 99.5% chance) seems to be a problem that is
in 2.4.21-pre3-ac4, but is not in 2.4.21-pre3, or in linus' bk tree.
Also, this problem is not caused by the device-mapper patch which is the
only reason I was trying the ac tree in the first place.

So, it's an ac specific problem separate from the device-mapper code:

Another (possibly related) hint is that at bootup, I get many(measured
in the hundreds) reports of "ide: no cache flush required" which is in
ide_cacheflush_p in drivers/ide/ide-disk.c, and is only present in the
ac tree.

Does this seem like a likely culprit?

Hello...is this thing on...can anyone hear me? ;-)

-Tupshin

2003-01-16 08:44:01

by Tupshin Harper

[permalink] [raw]
Subject: Re: Unable to handle kernel NULL pointer kernel 2.4.21-pre3-ac4

Bertrand VIEILLE [B?bert] wrote:

>Hello !!
>
>I have the same problem with -ac tree.
>
>Alan Cox said he suspected several things to induce this oops:
>
>* Guess #1 is reverting mm/shmem.c.
>* Guess #2 is reverting the buffer cache changes.
>* Guess #3 is new IDE + highmem
>
no highmem enabled here.

>* Guess #4 is quota related (are people seeing the problem with quota
>disabled ?)
>
I do have quota enabled, so I'll try it without just to double check
your results.

>
>Personnally, I answered him, I dont'have quota enabled, so Guess #4 doesn't
>exist anymore.
>
>
>
It's looking like shmem or buffer cache.

The call trace that I posted certainly looks more like the buffer cache,
but obviously doesn't eliminate shmem as the culprit. Is there an easy
way to back out one or the other of these changes? I'm happy to do some
testing.

-Tupshin