2002-10-16 16:49:45

by Andrea Arcangeli

[permalink] [raw]
Subject: 2.4.20pre11aa1

Srihari, I would like if you could try to reproduce with this new one
with CONFIG_SOUND=n. Thanks!

URL:

http://www.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.20pre11aa1.gz
http://www.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.20pre11aa1/

Only in 2.4.20pre10aa1: 00_extraversion-10
Only in 2.4.20pre11aa1: 00_extraversion-11
Only in 2.4.20pre10aa1: 00_max_bytes-5
Only in 2.4.20pre11aa1: 00_max_bytes-6
Only in 2.4.20pre10aa1: 60_pagecache-atomic-6
Only in 2.4.20pre11aa1: 60_pagecache-atomic-7
Only in 2.4.20pre10aa1: 70_intermezzo-junk-1
Only in 2.4.20pre11aa1: 70_intermezzo-junk-2

Rediffed.

Only in 2.4.20pre11aa1: 00_fcntl_getfl-largefile-1

Clear the implicit O_LARGEPAGE with 64bit archs.

Only in 2.4.20pre11aa1: 00_o_direct-read-overflow-write-locking-xfs-2

fix xfs compilation (from Christoph).

Only in 2.4.20pre10aa1: 20_sched-o1-fixes-4
Only in 2.4.20pre11aa1: 20_sched-o1-fixes-5

Take the expired queue into account in sched_yield, still
sched_yield is a cpu-local operation unlike in 2.4 mainline.

Fix idle rescheduling so we don't waste an 80% of the cpu power of some
big irons.

Fixed a race that could explain some instability (in my my tree only).

Only in 2.4.20pre10aa1: 86_x86_64-tsc-hpet-pit-1

Dropped temporarily.

Only in 2.4.20pre10aa1: 9900_aio-11.gz
Only in 2.4.20pre11aa1: 9900_aio-12.gz

Unplug the queue properly in the next_chunk passes too. (from
Chris Mason)

Andrea


2002-10-17 12:04:42

by Srihari Vijayaraghavan

[permalink] [raw]
Subject: Re: 2.4.20pre11aa1

Hello Andrea,

> Srihari, I would like if you could try to reproduce with this new one
> with CONFIG_SOUND=n. Thanks!

No worries!

I tried it without sound and unfortunately it crashed few times. The good news
is that it is very stable without agpgart and radeon (module or not) support.

These are the three oops with agpgart and radeon as modules:
------------------------------------------------------------------------------------------
ksymoops 2.4.5 on i686 2.4.20-pre11aa1. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.20-pre11aa1/ (default)
-m /boot/System.map-2.4.20-pre11aa1 (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

Oct 17 20:27:24 localhost kernel: Unable to handle kernel paging request at
virtual address c68b8008
Oct 17 20:27:24 localhost kernel: c01180ae
Oct 17 20:27:24 localhost kernel: *pde = 068001e3
Oct 17 20:27:24 localhost kernel: Oops: 0000 2.4.20-pre11aa1 #3 Thu Oct 17
20:18:58 EST 2002
Oct 17 20:27:24 localhost kernel: CPU: 0
Oct 17 20:27:24 localhost kernel: EIP: 0010:[<c01180ae>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Oct 17 20:27:24 localhost kernel: EFLAGS: 00013206
Oct 17 20:27:24 localhost kernel: eax: bfffec7c ebx: c68b8000 ecx:
c020de0c edx: 00000018
Oct 17 20:27:24 localhost kernel: esi: 00000100 edi: bfffec7c ebp:
ffffffff esp: c58f5f78
Oct 17 20:27:24 localhost kernel: ds: 0018 es: 0018 ss: 0018
Oct 17 20:27:24 localhost kernel: Process modprobe (pid: 888,
stackpage=c58f5000)
Oct 17 20:27:24 localhost kernel: Stack: dff82e04 00000000 00001000 00000000
00000000 ffffffea c020dda0 bfffec7c
Oct 17 20:27:24 localhost kernel: 080640e8 c01188a4 080640e8 00000100
bfffec7c 00000004 c58f4000 00000100
Oct 17 20:27:24 localhost kernel: bfffec7c bfffeca8 c01074ff 00000000
00000001 080640e8 00000100 bfffec7c
Oct 17 20:27:24 localhost kernel: Call Trace: [<c01188a4>] [<c01074ff>]
Oct 17 20:27:24 localhost kernel: Code: 8b 7b 08 89 e9 31 c0 f2 ae f7 d1 49 8d
79 01 39 f7 77 7f 8b


>>EIP; c01180ae <qm_modules+2e/140> <=====

>>eax; bfffec7c Before first symbol
>>ebx; c68b8000 <[agpgart].bss.end+2c031e5/1c0a3265>
>>ecx; c020de0c <modlist_lock+0/0>
>>edi; bfffec7c Before first symbol
>>ebp; ffffffff <END_OF_CODE+202a3a58/????>
>>esp; c58f5f78 <[agpgart].bss.end+1c4115d/1c0a3265>

Trace; c01188a4 <sys_query_module+d4/1b0>
Trace; c01074ff <system_call+33/38>

Code; c01180ae <qm_modules+2e/140>
00000000 <_EIP>:
Code; c01180ae <qm_modules+2e/140> <=====
0: 8b 7b 08 mov 0x8(%ebx),%edi <=====
Code; c01180b1 <qm_modules+31/140>
3: 89 e9 mov %ebp,%ecx
Code; c01180b3 <qm_modules+33/140>
5: 31 c0 xor %eax,%eax
Code; c01180b5 <qm_modules+35/140>
7: f2 ae repnz scas %es:(%edi),%al
Code; c01180b7 <qm_modules+37/140>
9: f7 d1 not %ecx
Code; c01180b9 <qm_modules+39/140>
b: 49 dec %ecx
Code; c01180ba <qm_modules+3a/140>
c: 8d 79 01 lea 0x1(%ecx),%edi
Code; c01180bd <qm_modules+3d/140>
f: 39 f7 cmp %esi,%edi
Code; c01180bf <qm_modules+3f/140>
11: 77 7f ja 92 <_EIP+0x92>
Code; c01180c1 <qm_modules+41/140>
13: 8b 00 mov (%eax),%eax

Oct 17 20:27:24 localhost kernel: <1>Unable to handle kernel paging request
at virtual address c56ac098
Oct 17 20:27:24 localhost kernel: c0119dd0
Oct 17 20:27:24 localhost kernel: *pde = 054001e3
Oct 17 20:27:24 localhost kernel: Oops: 0000 2.4.20-pre11aa1 #3 Thu Oct 17
20:18:58 EST 2002
Oct 17 20:27:24 localhost kernel: CPU: 0
Oct 17 20:27:24 localhost kernel: EIP: 0010:[<c0119dd0>] Not tainted
Oct 17 20:27:24 localhost kernel: EFLAGS: 00013206
Oct 17 20:27:24 localhost kernel: eax: 00000000 ebx: c56ac000 ecx:
c4ad9000 edx: 00000000
Oct 17 20:27:24 localhost kernel: esi: c58f4000 edi: 000000b8 ebp:
0000000b esp: c58f5e2c
Oct 17 20:27:24 localhost kernel: ds: 0018 es: 0018 ss: 0018
Oct 17 20:27:24 localhost kernel: Process modprobe (pid: 888,
stackpage=c58f5000)
Oct 17 20:27:24 localhost kernel: Stack: c1587bb8 c4ad9ac0 c58f4000 00000000
c58f4000 000000b8 0000000b c011a2c0
Oct 17 20:27:24 localhost kernel: c58f4000 c16f1880 c58f5f44 00000000
000000b8 c58f4000 c0107bef 0000000b
Oct 17 20:27:24 localhost kernel: c01f1e2a 00000000 00000000 c01125a4
c01f1e2a c58f5f44 00000000 dff82e00
Oct 17 20:27:24 localhost kernel: Call Trace: [<c011a2c0>] [<c0107bef>]
[<c01125a4>] [<c0126aaa>] [<c01314e5>]
Oct 17 20:27:24 localhost kernel: [<c0126dde>] [<c011244a>] [<c01276dc>]
[<c01122a0>] [<c01075f0>] [<c01180ae>]
Oct 17 20:27:24 localhost kernel: [<c01188a4>] [<c01074ff>]
Oct 17 20:27:24 localhost kernel: Code: 39 b3 98 00 00 00 0f 84 85 02 00 00 8b
5b 50 81 fb 00 a0 21


>>EIP; c0119dd0 <exit_notify+20/300> <=====

>>ebx; c56ac000 <[agpgart].bss.end+19f71e5/1c0a3265>
>>ecx; c4ad9000 <[agpgart].bss.end+e241e5/1c0a3265>
>>esi; c58f4000 <[agpgart].bss.end+1c3f1e5/1c0a3265>
>>esp; c58f5e2c <[agpgart].bss.end+1c41011/1c0a3265>

Trace; c011a2c0 <do_exit+210/260>
Trace; c0107bef <die+7f/80>
Trace; c01125a4 <do_page_fault+304/5a0>
Trace; c0126aaa <do_no_page+8a/1c0>
Trace; c01314e5 <lru_cache_add+65/70>
Trace; c0126dde <handle_mm_fault+8e/160>
Trace; c011244a <do_page_fault+1aa/5a0>
Trace; c01276dc <zap_pmd_range+7c/80>
Trace; c01122a0 <do_page_fault+0/5a0>
Trace; c01075f0 <error_code+34/3c>
Trace; c01180ae <qm_modules+2e/140>
Trace; c01188a4 <sys_query_module+d4/1b0>
Trace; c01074ff <system_call+33/38>

Code; c0119dd0 <exit_notify+20/300>
00000000 <_EIP>:
Code; c0119dd0 <exit_notify+20/300> <=====
0: 39 b3 98 00 00 00 cmp %esi,0x98(%ebx) <=====
Code; c0119dd6 <exit_notify+26/300>
6: 0f 84 85 02 00 00 je 291 <_EIP+0x291>
Code; c0119ddc <exit_notify+2c/300>
c: 8b 5b 50 mov 0x50(%ebx),%ebx
Code; c0119ddf <exit_notify+2f/300>
f: 81 fb 00 a0 21 00 cmp $0x21a000,%ebx

Oct 17 20:27:24 localhost kernel: <1>Unable to handle kernel paging request
at virtual address c4db8098
Oct 17 20:27:24 localhost kernel: c0119dd0
Oct 17 20:27:24 localhost kernel: *pde = 04c001e3
Oct 17 20:27:24 localhost kernel: Oops: 0000 2.4.20-pre11aa1 #3 Thu Oct 17
20:18:58 EST 2002
Oct 17 20:27:24 localhost kernel: CPU: 0
Oct 17 20:27:24 localhost kernel: EIP: 0010:[<c0119dd0>] Not tainted
Oct 17 20:27:24 localhost kernel: EFLAGS: 00013206
Oct 17 20:27:24 localhost kernel: eax: 00000000 ebx: c4db8000 ecx:
00000000 edx: 00000000
Oct 17 20:27:24 localhost kernel: esi: c58f4000 edi: 000002ac ebp:
0000000b esp: c58f5ce0
Oct 17 20:27:24 localhost kernel: ds: 0018 es: 0018 ss: 0018
Oct 17 20:27:24 localhost kernel: Process modprobe (pid: 888,
stackpage=c58f5000)
Oct 17 20:27:24 localhost kernel: Stack: 00000020 00000400 c58f4000 00000000
c58f4000 000002ac 0000000b c011a2c0
Oct 17 20:27:24 localhost kernel: c58f4000 00000000 c58f5df8 00000000
000002ac c58f4000 c0107bef 0000000b
Oct 17 20:27:24 localhost kernel: c01f1e2a 00000000 00000000 c01125a4
c01f1e2a c58f5df8 00000000 33323130
Oct 17 20:27:24 localhost kernel: Call Trace: [<c011a2c0>] [<c0107bef>]
[<c01125a4>] [<c0131577>] [<c01278e8>]
Oct 17 20:27:24 localhost kernel: [<c01122a0>] [<c01276dc>] [<c01122a0>]
[<c01075f0>] [<c0119dd0>] [<c011a2c0>]
Oct 17 20:27:24 localhost kernel: [<c0107bef>] [<c01125a4>] [<c0126aaa>]
[<c01314e5>] [<c0126dde>] [<c011244a>]
Oct 17 20:27:24 localhost kernel: [<c01276dc>] [<c01122a0>] [<c01075f0>]
[<c01180ae>] [<c01188a4>] [<c01074ff>]
Oct 17 20:27:24 localhost kernel: Code: 39 b3 98 00 00 00 0f 84 85 02 00 00 8b
5b 50 81 fb 00 a0 21


>>EIP; c0119dd0 <exit_notify+20/300> <=====

>>ebx; c4db8000 <[agpgart].bss.end+11031e5/1c0a3265>
>>esi; c58f4000 <[agpgart].bss.end+1c3f1e5/1c0a3265>
>>esp; c58f5ce0 <[agpgart].bss.end+1c40ec5/1c0a3265>

Trace; c011a2c0 <do_exit+210/260>
Trace; c0107bef <die+7f/80>
Trace; c01125a4 <do_page_fault+304/5a0>
Trace; c0131577 <__lru_cache_del+87/90>
Trace; c01278e8 <zap_pte_range+f8/150>
Trace; c01122a0 <do_page_fault+0/5a0>
Trace; c01276dc <zap_pmd_range+7c/80>
Trace; c01122a0 <do_page_fault+0/5a0>
Trace; c01075f0 <error_code+34/3c>
Trace; c0119dd0 <exit_notify+20/300>
Trace; c011a2c0 <do_exit+210/260>
Trace; c0107bef <die+7f/80>
Trace; c01125a4 <do_page_fault+304/5a0>
Trace; c0126aaa <do_no_page+8a/1c0>
Trace; c01314e5 <lru_cache_add+65/70>
Trace; c0126dde <handle_mm_fault+8e/160>
Trace; c011244a <do_page_fault+1aa/5a0>
Trace; c01276dc <zap_pmd_range+7c/80>
Trace; c01122a0 <do_page_fault+0/5a0>
Trace; c01075f0 <error_code+34/3c>
Trace; c01180ae <qm_modules+2e/140>
Trace; c01188a4 <sys_query_module+d4/1b0>
Trace; c01074ff <system_call+33/38>

Code; c0119dd0 <exit_notify+20/300>
00000000 <_EIP>:
Code; c0119dd0 <exit_notify+20/300> <=====
0: 39 b3 98 00 00 00 cmp %esi,0x98(%ebx) <=====
Code; c0119dd6 <exit_notify+26/300>
6: 0f 84 85 02 00 00 je 291 <_EIP+0x291>
Code; c0119ddc <exit_notify+2c/300>
c: 8b 5b 50 mov 0x50(%ebx),%ebx
Code; c0119ddf <exit_notify+2f/300>
f: 81 fb 00 a0 21 00 cmp $0x21a000,%ebx


1 warning issued. Results may not be reliable.

These are the two oops with agpgart and radeon built-in the kernel:
------------------------------------------------------------------------------------------------
ksymoops 2.4.5 on i686 2.4.20-pre11aa1-agpdrm. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.20-pre11aa1-agpdrm/ (default)
-m /boot/System.map-2.4.20-pre11aa1-agpdrm (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

Oct 17 21:22:29 localhost kernel: Unable to handle kernel paging request at
virtual address c72b4034
Oct 17 21:22:29 localhost kernel: c0112b57
Oct 17 21:22:29 localhost kernel: *pde = 070001e3
Oct 17 21:22:29 localhost kernel: Oops: 0000 2.4.20-pre11aa1-agpdrm #6 Thu Oct
17 21:11:50 EST 2002
Oct 17 21:22:29 localhost kernel: CPU: 0
Oct 17 21:22:29 localhost kernel: EIP: 0010:[<c0112b57>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Oct 17 21:22:29 localhost kernel: EFLAGS: 00013086
Oct 17 21:22:29 localhost kernel: eax: 00000000 ebx: c8aaa000 ecx:
c72b4000 edx: c8aabe78
Oct 17 21:22:29 localhost kernel: esi: 00000002 edi: c01f5f22 ebp:
00003246 esp: c8aabd9c
Oct 17 21:22:29 localhost kernel: ds: 0018 es: 0018 ss: 0018
Oct 17 21:22:29 localhost kernel: Process modprobe (pid: 1036,
stackpage=c8aab000)
Oct 17 21:22:29 localhost kernel: Stack: c8aaa000 00000002 c6e3c000 c8aaa000
c01124c2 c01f5f22 c8aaa000 00000000
Oct 17 21:22:29 localhost kernel: c6270f8e c110eb5c c8aaa000 c8aabfc4
0001ff9d c022326f c6270000 c110eb5c
Oct 17 21:22:29 localhost kernel: c2d94000 00000000 c0223360 c8aabfc8
c0141c50 c8aabdfc c8aabf6c c8aabdfc
Oct 17 21:22:29 localhost kernel: Call Trace: [<c01124c2>] [<c01f5f22>]
[<c0141c50>] [<c01122a0>] [<c01075f0>]
Oct 17 21:22:29 localhost kernel: [<c01f5f22>] [<c01269b2>] [<c0126dde>]
[<c011244a>] [<c01286df>] [<c0128a37>]
Oct 17 21:22:29 localhost kernel: [<c0128ab4>] [<c01122a0>] [<c01075f0>]
Oct 17 21:22:29 localhost kernel: Code: 8b 51 34 85 d2 74 3f f7 41 14 41 00 00
00 74 36 8b 71 38 89


>>EIP; c0112b57 <search_exception_table+17/80> <=====

>>ebx; c8aaa000 <[sr_mod].bss.end+1da61a9/1902c229>
>>ecx; c72b4000 <[sr_mod].bss.end+5b01a9/1902c229>
>>edx; c8aabe78 <[sr_mod].bss.end+1da8021/1902c229>
>>edi; c01f5f22 <fast_clear_page+12/50>
>>ebp; 00003246 Before first symbol
>>esp; c8aabd9c <[sr_mod].bss.end+1da7f45/1902c229>

Trace; c01124c2 <do_page_fault+222/5a0>
Trace; c01f5f22 <fast_clear_page+12/50>
Trace; c0141c50 <do_execve+180/220>
Trace; c01122a0 <do_page_fault+0/5a0>
Trace; c01075f0 <error_code+34/3c>
Trace; c01f5f22 <fast_clear_page+12/50>
Trace; c01269b2 <do_anonymous_page+a2/110>
Trace; c0126dde <handle_mm_fault+8e/160>
Trace; c011244a <do_page_fault+1aa/5a0>
Trace; c01286df <unmap_fixup+12f/140>
Trace; c0128a37 <do_munmap+297/2d0>
Trace; c0128ab4 <sys_munmap+44/80>
Trace; c01122a0 <do_page_fault+0/5a0>
Trace; c01075f0 <error_code+34/3c>

Code; c0112b57 <search_exception_table+17/80>
00000000 <_EIP>:
Code; c0112b57 <search_exception_table+17/80> <=====
0: 8b 51 34 mov 0x34(%ecx),%edx <=====
Code; c0112b5a <search_exception_table+1a/80>
3: 85 d2 test %edx,%edx
Code; c0112b5c <search_exception_table+1c/80>
5: 74 3f je 46 <_EIP+0x46>
Code; c0112b5e <search_exception_table+1e/80>
7: f7 41 14 41 00 00 00 testl $0x41,0x14(%ecx)
Code; c0112b65 <search_exception_table+25/80>
e: 74 36 je 46 <_EIP+0x46>
Code; c0112b67 <search_exception_table+27/80>
10: 8b 71 38 mov 0x38(%ecx),%esi
Code; c0112b6a <search_exception_table+2a/80>
13: 89 00 mov %eax,(%eax)

Oct 17 21:22:29 localhost kernel: <1>Unable to handle kernel paging request
at virtual address c77340c4
Oct 17 21:22:29 localhost kernel: c0139b5e
Oct 17 21:22:29 localhost kernel: *pde = 07769163
Oct 17 21:22:29 localhost kernel: Oops: 0003 2.4.20-pre11aa1-agpdrm #6 Thu Oct
17 21:11:50 EST 2002
Oct 17 21:22:29 localhost kernel: CPU: 0
Oct 17 21:22:29 localhost kernel: EIP: 0010:[<c0139b5e>] Not tainted
Oct 17 21:22:29 localhost kernel: EFLAGS: 00013246
Oct 17 21:22:29 localhost kernel: eax: c27e7340 ebx: c779cdc0 ecx:
00000000 edx: c77340c0
Oct 17 21:22:29 localhost kernel: esi: c158e380 edi: c1689dc0 ebp:
c1ac8540 esp: c8aabc20
Oct 17 21:22:29 localhost kernel: ds: 0018 es: 0018 ss: 0018
Oct 17 21:22:29 localhost kernel: Process modprobe (pid: 1036,
stackpage=c8aab000)
Oct 17 21:22:29 localhost kernel: Stack: c1689dc0 c779cdc0 c1c338c0 00001000
dfe572c0 08060000 c0128e85 dfe572c0
Oct 17 21:22:29 localhost kernel: 08060000 00001000 c1c33940 dfe572c0
c8aaa000 000002b4 0000000b c0115076
Oct 17 21:22:29 localhost kernel: dfe572c0 00003202 dfe572c0 c011a137
dfe572c0 00000000 c8aabd68 00000000
Oct 17 21:22:29 localhost kernel: Call Trace: [<c0128e85>] [<c0115076>]
[<c011a137>] [<c0107bef>] [<c01125a4>]
Oct 17 21:22:29 localhost kernel: [<c014322b>] [<c01122a0>] [<c01075f0>]
[<c01f5f22>] [<c0112b57>] [<c01124c2>]
Oct 17 21:22:29 localhost kernel: [<c01f5f22>] [<c0141c50>] [<c01122a0>]
[<c01075f0>] [<c01f5f22>] [<c01269b2>]
Oct 17 21:22:29 localhost kernel: [<c0126dde>] [<c011244a>] [<c01286df>]
[<c0128a37>] [<c0128ab4>] [<c01122a0>]
Oct 17 21:22:29 localhost kernel: [<c01075f0>]
Oct 17 21:22:29 localhost kernel: Code: 89 42 04 c7 03 00 00 00 00 a1 b4 3e 22
c0 89 58 04 89 03 89


>>EIP; c0139b5e <fput+9e/120> <=====

>>eax; c27e7340 <[floppy].bss.end+599905/4ab2645>
>>ebx; c779cdc0 <[sr_mod].bss.end+a98f69/1902c229>
>>edx; c77340c0 <[sr_mod].bss.end+a30269/1902c229>
>>esi; c158e380 <_end+12f1d10/15aaa10>
>>edi; c1689dc0 <_end+13ed750/15aaa10>
>>ebp; c1ac8540 <[md].bss.end+25a861/3123a1>
>>esp; c8aabc20 <[sr_mod].bss.end+1da7dc9/1902c229>

Trace; c0128e85 <exit_mmap+125/140>
Trace; c0115076 <mmput+56/d0>
Trace; c011a137 <do_exit+87/260>
Trace; c0107bef <die+7f/80>
Trace; c01125a4 <do_page_fault+304/5a0>
Trace; c014322b <cached_lookup+1b/70>
Trace; c01122a0 <do_page_fault+0/5a0>
Trace; c01075f0 <error_code+34/3c>
Trace; c01f5f22 <fast_clear_page+12/50>
Trace; c0112b57 <search_exception_table+17/80>
Trace; c01124c2 <do_page_fault+222/5a0>
Trace; c01f5f22 <fast_clear_page+12/50>
Trace; c0141c50 <do_execve+180/220>
Trace; c01122a0 <do_page_fault+0/5a0>
Trace; c01075f0 <error_code+34/3c>
Trace; c01f5f22 <fast_clear_page+12/50>
Trace; c01269b2 <do_anonymous_page+a2/110>
Trace; c0126dde <handle_mm_fault+8e/160>
Trace; c011244a <do_page_fault+1aa/5a0>
Trace; c01286df <unmap_fixup+12f/140>
Trace; c0128a37 <do_munmap+297/2d0>
Trace; c0128ab4 <sys_munmap+44/80>
Trace; c01122a0 <do_page_fault+0/5a0>
Trace; c01075f0 <error_code+34/3c>

Code; c0139b5e <fput+9e/120>
00000000 <_EIP>:
Code; c0139b5e <fput+9e/120> <=====
0: 89 42 04 mov %eax,0x4(%edx) <=====
Code; c0139b61 <fput+a1/120>
3: c7 03 00 00 00 00 movl $0x0,(%ebx)
Code; c0139b67 <fput+a7/120>
9: a1 b4 3e 22 c0 mov 0xc0223eb4,%eax
Code; c0139b6c <fput+ac/120>
e: 89 58 04 mov %ebx,0x4(%eax)
Code; c0139b6f <fput+af/120>
11: 89 03 mov %eax,(%ebx)
Code; c0139b71 <fput+b1/120>
13: 89 00 mov %eax,(%eax)


1 warning issued. Results may not be reliable.

The mainline (2.4.20-pre11) is fine with agpgart and radeon as modules. I
haven't tested it with agpgart and radeon built-in the kernel.

I am trying to find if any of my friends have a different Radeon card (mine is
Radeon VE QY) or any video card that has DRM support on the official kernel
tree. If I find one I will try and see if --aa works fine with that.

Thanks for your help.
--
Hari
[email protected]

2002-10-17 12:55:22

by Keith Owens

[permalink] [raw]
Subject: Re: 2.4.20pre11aa1

On Thu, 17 Oct 2002 14:10:05 +0200,
Andrea Arcangeli <[email protected]> wrote:
>please try to find which is this module, replace modprobe with a script
>that does:
>
>#!/bin/sh
>echo $@ >>/tmp/log
>sync
>modprobe.orig $@

You don't need that, just mkdir /var/log/ksymoops. modprobe/insmod
will create a daily log file and snapshot a copy of lsmod and
/proc/ksyms for every module loaded or unloaded. All with sync in the
right places.

2002-10-17 12:54:13

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.20pre11aa1

On Thu, Oct 17, 2002 at 11:02:24PM +1000, Srihari Vijayaraghavan wrote:
> Sorry if it was not clear. The -aa kernel crashes _only_ when I have agpgart
> and radeon support (either as modules or as built-in the kernel). If there is
> no agpgart and radeon support enabled, it does not crash.

ok. So the mystery is why it crashes only with my tree. there are no
changes to the graphics/gart drivers as far as I can tell. Now I even
wonder about a collision of dma with the sound driver or something weird
like that ;)

> > It doesn't make any sense that 2.4.20-pre11 works and my tree doesn't,
> > there are no changes to those sound and graphics driver. Can you make
> > sure that modversions is enabled, and please send me your .config.
>
> Here is my current .config. While this one doesn't have modversions enabled I
> have seen crashes even when it is enabled.

ok. but you can left modversions enabled, I do it myself too ;)

Andrea

2002-10-17 12:06:08

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.20pre11aa1

On Thu, Oct 17, 2002 at 10:04:50PM +1000, Srihari Vijayaraghavan wrote:
> Hello Andrea,
>
> > Srihari, I would like if you could try to reproduce with this new one
> > with CONFIG_SOUND=n. Thanks!
>
> No worries!
>
> I tried it without sound and unfortunately it crashed few times. The good news
> is that it is very stable without agpgart and radeon (module or not) support.

I've no idea what could be wrong with the graphics drivers, there are no
changes there.

> ffffffff esp: c58f5f78
> Oct 17 20:27:24 localhost kernel: ds: 0018 es: 0018 ss: 0018
> Oct 17 20:27:24 localhost kernel: Process modprobe (pid: 888,


please try to find which is this module, replace modprobe with a script
that does:

#!/bin/sh
echo $@ >>/tmp/log
sync
modprobe.orig $@

then look at log after the crash. You said in your last email that the
gart code wasn't the culprit. If it isn't the sound drivers I've no
clue what it is. What does it mean the without agpgart it is very
stable? That it crashes less frequently? (I recalled it crashed even
without those modules)

It doesn't make any sense that 2.4.20-pre11 works and my tree doesn't,
there are no changes to those sound and graphics driver. Can you make
sure that modversions is enabled, and please send me your .config.

Andrea

2002-10-17 12:47:55

by Srihari Vijayaraghavan

[permalink] [raw]
Subject: Re: 2.4.20pre11aa1

Hello,

> please try to find which is this module, replace modprobe with a script
> that does:
>
> #!/bin/sh
> echo $@ >>/tmp/log
> sync
> modprobe.orig $@

I will try that.

> then look at log after the crash. You said in your last email that the
> gart code wasn't the culprit. If it isn't the sound drivers I've no
> clue what it is. What does it mean the without agpgart it is very
> stable? That it crashes less frequently? (I recalled it crashed even
> without those modules)

Sorry if it was not clear. The -aa kernel crashes _only_ when I have agpgart
and radeon support (either as modules or as built-in the kernel). If there is
no agpgart and radeon support enabled, it does not crash.

> It doesn't make any sense that 2.4.20-pre11 works and my tree doesn't,
> there are no changes to those sound and graphics driver. Can you make
> sure that modversions is enabled, and please send me your .config.

Here is my current .config. While this one doesn't have modversions enabled I
have seen crashes even when it is enabled.

CONFIG_X86=y
CONFIG_UID16=y
CONFIG_MODULES=y
CONFIG_KMOD=y
CONFIG_MK7=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_CMPXCHG8=y
CONFIG_X86_HAS_TSC=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_USE_3DNOW=y
CONFIG_X86_PGE=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_F00F_WORKS_OK=y
CONFIG_X86_MCE=y
CONFIG_NOHIGHMEM=y
CONFIG_1GB=y
CONFIG_MTRR=y
CONFIG_X86_TSC=y
CONFIG_NET=y
CONFIG_PCI=y
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_NAMES=y
CONFIG_SYSVIPC=y
CONFIG_SYSCTL=y
CONFIG_KCORE_ELF=y
CONFIG_BINFMT_AOUT=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=y
CONFIG_PM=y
CONFIG_BLK_DEV_FD=m
CONFIG_MD=y
CONFIG_BLK_DEV_MD=m
CONFIG_MD_RAID0=m
CONFIG_PACKET=m
CONFIG_NETFILTER=y
CONFIG_UNIX=m
CONFIG_INET=y
CONFIG_IP_NF_CONNTRACK=m
CONFIG_IP_NF_FTP=m
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_MATCH_TOS=m
CONFIG_IP_NF_MATCH_STATE=m
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m
CONFIG_IP_NF_NAT=m
CONFIG_IP_NF_NAT_NEEDED=y
CONFIG_IP_NF_TARGET_MASQUERADE=m
CONFIG_IP_NF_TARGET_REDIRECT=m
CONFIG_IP_NF_NAT_FTP=m
CONFIG_IP_NF_TARGET_LOG=m
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_IDEDISK_MULTI_MODE=y
CONFIG_BLK_DEV_IDECD=m
CONFIG_BLK_DEV_IDESCSI=m
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_IDEDMA_PCI_AUTO=y
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_BLK_DEV_ADMA=y
CONFIG_BLK_DEV_VIA82CXXX=y
CONFIG_IDEDMA_AUTO=y
CONFIG_BLK_DEV_IDE_MODES=y
CONFIG_SCSI=m
CONFIG_BLK_DEV_SR=m
CONFIG_CHR_DEV_SG=m
CONFIG_SCSI_DEBUG_QUEUES=y
CONFIG_SCSI_MULTI_LUN=y
CONFIG_SCSI_CONSTANTS=y
CONFIG_NETDEVICES=y
CONFIG_PPP=m
CONFIG_PPP_ASYNC=m
CONFIG_PPP_DEFLATE=m
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_SERIAL=m
CONFIG_SERIAL_EXTENDED=y
CONFIG_UNIX98_PTYS=y
CONFIG_MOUSE=m
CONFIG_PSMOUSE=y
CONFIG_RTC=m
CONFIG_AGP=y
CONFIG_AGP_AMD=y
CONFIG_DRM=y
CONFIG_DRM_NEW=y
CONFIG_DRM_RADEON=y
CONFIG_EXT3_FS=y
CONFIG_JBD=y
CONFIG_RAMFS=y
CONFIG_ISO9660_FS=m
CONFIG_JOLIET=y
CONFIG_PROC_FS=y
CONFIG_DEVPTS_FS=y
CONFIG_MSDOS_PARTITION=y
CONFIG_NLS=y
CONFIG_VGA_CONSOLE=y
CONFIG_DEBUG_KERNEL=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_ZLIB_INFLATE=m
CONFIG_ZLIB_DEFLATE=m

Thanks
--
Hari
[email protected]

2002-10-17 15:12:08

by Srihari Vijayaraghavan

[permalink] [raw]
Subject: Re: 2.4.20pre11aa1

Hello Keith,

> You don't need that, just mkdir /var/log/ksymoops. modprobe/insmod
> will create a daily log file and snapshot a copy of lsmod and
> /proc/ksyms for every module loaded or unloaded. All with sync in the
> right places.

Thanks, and that works fine.

Hello Andrea,

1. To simplify and to prove that the crashes are associated with agpgart
and/or radeon I have compiled kernel with _only_ agpgart and radeon as
modules and nothing else.

$ cat /lib/modules/2.4.20-pre10aa1/modules.dep
/lib/modules/2.4.20-pre11aa1/kernel/drivers/char/agp/agpgart.o:

/lib/modules/2.4.20-pre11aa1/kernel/drivers/char/drm/radeon.o:

These are some decoded output of oops appeared in the system logs:
------------------------------------------------------------------------------------------------------
ksymoops 2.4.5 on i686 2.4.20-pre11aa1. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.20-pre11aa1/ (default)
-m /boot/System.map-2.4.20-pre11aa1 (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

Oct 18 00:29:02 localhost kernel: Unable to handle kernel paging request at
virtual address c73ae000
Oct 18 00:29:02 localhost kernel: c0210ee2
Oct 18 00:29:02 localhost kernel: *pde = 070001e3
Oct 18 00:29:02 localhost kernel: Oops: 0002 2.4.20-pre11aa1 #9 Fri Oct 18
00:06:42 EST 2002
Oct 18 00:29:02 localhost kernel: CPU: 0
Oct 18 00:29:02 localhost kernel: EIP: 0010:[<c0210ee2>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Oct 18 00:29:02 localhost kernel: EFLAGS: 00013246
Oct 18 00:29:02 localhost kernel: eax: 0000003f ebx: c73ae000 ecx:
c7c8c000 edx: 00000000
Oct 18 00:29:02 localhost kernel: esi: c2daffe0 edi: 00000fe0 ebp:
c113e204 esp: c7c8deac
Oct 18 00:29:02 localhost kernel: ds: 0018 es: 0018 ss: 0018
Oct 18 00:29:02 localhost kernel: Process modprobe (pid: 944,
stackpage=c7c8d000)
Oct 18 00:29:03 localhost kernel: Stack: 00104025 c01269b2 c73ae000 c7534bfc
bfff8e50 c2d9f480 c4e97a40 c0126dde
Oct 18 00:29:03 localhost kernel: c2d9f480 c4e97a40 c2daffe0 c7534bfc
00000001 bfff8e50 c7c8df24 c2d9f480
Oct 18 00:29:03 localhost kernel: c4e97a40 bfff8e50 c7c8c000 c011244a
c2d9f480 c4e97a40 bfff8e50 00000001
Oct 18 00:29:03 localhost kernel: Call Trace: [<c01269b2>] [<c0126dde>]
[<c011244a>] [<c0127bb6>] [<c0128cc7>]
Oct 18 00:29:03 localhost kernel: [<c0127ab1>] [<c01122a0>] [<c01075f0>]
Oct 18 00:29:03 localhost kernel: Code: 0f e7 03 0f e7 43 08 0f e7 43 10 0f e7
43 18 0f e7 43 20 0f


>>EIP; c0210ee2 <fast_clear_page+12/50> <=====

>>ebx; c73ae000 <END_OF_CODE+35e90a5/????>
>>ecx; c7c8c000 <END_OF_CODE+3ec70a5/????>
>>esi; c2daffe0 <_end+2ad7c48/3a47ce8>
>>edi; 00000fe0 Before first symbol
>>ebp; c113e204 <_end+e65e6c/3a47ce8>
>>esp; c7c8deac <END_OF_CODE+3ec8f51/????>

Trace; c01269b2 <do_anonymous_page+a2/110>
Trace; c0126dde <handle_mm_fault+8e/160>
Trace; c011244a <do_page_fault+1aa/5a0>
Trace; c0127bb6 <__vma_link+56/d0>
Trace; c0128cc7 <do_brk+1d7/210>
Trace; c0127ab1 <sys_brk+f1/130>
Trace; c01122a0 <do_page_fault+0/5a0>
Trace; c01075f0 <error_code+34/3c>

Code; c0210ee2 <fast_clear_page+12/50>
00000000 <_EIP>:
Code; c0210ee2 <fast_clear_page+12/50> <=====
0: 0f e7 03 movntq %mm0,(%ebx) <=====
Code; c0210ee5 <fast_clear_page+15/50>
3: 0f e7 43 08 movntq %mm0,0x8(%ebx)
Code; c0210ee9 <fast_clear_page+19/50>
7: 0f e7 43 10 movntq %mm0,0x10(%ebx)
Code; c0210eed <fast_clear_page+1d/50>
b: 0f e7 43 18 movntq %mm0,0x18(%ebx)
Code; c0210ef1 <fast_clear_page+21/50>
f: 0f e7 43 20 movntq %mm0,0x20(%ebx)
Code; c0210ef5 <fast_clear_page+25/50>
13: 0f 00 00 sldtl (%eax)

Oct 18 00:29:03 localhost kernel: <1>Unable to handle kernel NULL pointer
dereference at virtual address 00000044
Oct 18 00:29:03 localhost kernel: c014ca41
Oct 18 00:29:03 localhost kernel: *pde = 0752b067
Oct 18 00:29:03 localhost kernel: Oops: 0000 2.4.20-pre11aa1 #9 Fri Oct 18
00:06:42 EST 2002
Oct 18 00:29:03 localhost kernel: CPU: 0
Oct 18 00:29:03 localhost kernel: EIP: 0010:[<c014ca41>] Not tainted
Oct 18 00:29:03 localhost kernel: EFLAGS: 00013217
Oct 18 00:29:03 localhost kernel: eax: dff32cf8 ebx: 00000010 ecx:
00000010 edx: dff00000
Oct 18 00:29:03 localhost kernel: esi: 00000000 edi: 00000000 ebp:
0003b0c1 esp: c64d9d74
Oct 18 00:29:03 localhost kernel: ds: 0018 es: 0018 ss: 0018
Oct 18 00:29:03 localhost kernel: Process X (pid: 945, stackpage=c64d9000)
Oct 18 00:29:03 localhost kernel: Stack: 00000000 00000000 00000000 00000000
00000000 dff32cf8 dfe66005 00000002
Oct 18 00:29:03 localhost kernel: dfe66005 dfe66007 00000000 c64d9e14
c014322b c16d7540 c64d9dd4 dfe66005
Oct 18 00:29:03 localhost kernel: c0143854 c16d7540 c64d9dd4 00000000
00000009 00000000 c16c29c0 00000000
Oct 18 00:29:03 localhost kernel: Call Trace: [<c014322b>] [<c0143854>]
[<c0143d37>] [<c0141187>] [<c0141af7>]
Oct 18 00:29:03 localhost kernel: [<c0132ecf>] [<c01314e5>] [<c0126510>]
[<c0126e69>] [<c011244a>] [<c0142fd7>]
Oct 18 00:29:03 localhost kernel: [<c0105c90>] [<c01074ff>]
Oct 18 00:29:03 localhost kernel: Code: 39 6e 44 8b 1b 75 e8 8b 7c 24 34 39 7e
0c 75 df 8b 57 4c 85


>>EIP; c014ca41 <d_lookup+61/110> <=====

>>eax; dff32cf8 <END_OF_CODE+1c16dd9d/????>
>>edx; dff00000 <END_OF_CODE+1c13b0a5/????>
>>ebp; 0003b0c1 Before first symbol
>>esp; c64d9d74 <END_OF_CODE+2714e19/????>

Trace; c014322b <cached_lookup+1b/70>
Trace; c0143854 <link_path_walk+3c4/6f0>
Trace; c0143d37 <path_lookup+37/40>
Trace; c0141187 <open_exec+27/e0>
Trace; c0141af7 <do_execve+27/220>
Trace; c0132ecf <__alloc_pages+5f/280>
Trace; c01314e5 <lru_cache_add+65/70>
Trace; c0126510 <do_wp_page+140/1f0>
Trace; c0126e69 <handle_mm_fault+119/160>
Trace; c011244a <do_page_fault+1aa/5a0>
Trace; c0142fd7 <getname+97/d0>
Trace; c0105c90 <sys_execve+50/80>
Trace; c01074ff <system_call+33/38>

Code; c014ca41 <d_lookup+61/110>
00000000 <_EIP>:
Code; c014ca41 <d_lookup+61/110> <=====
0: 39 6e 44 cmp %ebp,0x44(%esi) <=====
Code; c014ca44 <d_lookup+64/110>
3: 8b 1b mov (%ebx),%ebx
Code; c014ca46 <d_lookup+66/110>
5: 75 e8 jne ffffffef <_EIP+0xffffffef>
Code; c014ca48 <d_lookup+68/110>
7: 8b 7c 24 34 mov 0x34(%esp,1),%edi
Code; c014ca4c <d_lookup+6c/110>
b: 39 7e 0c cmp %edi,0xc(%esi)
Code; c014ca4f <d_lookup+6f/110>
e: 75 df jne ffffffef <_EIP+0xffffffef>
Code; c014ca51 <d_lookup+71/110>
10: 8b 57 4c mov 0x4c(%edi),%edx
Code; c014ca54 <d_lookup+74/110>
13: 85 00 test %eax,(%eax)

Oct 18 00:29:04 localhost kernel: <1>Unable to handle kernel paging request
at virtual address c6b917c4
Oct 18 00:29:04 localhost kernel: c0139920
Oct 18 00:29:04 localhost kernel: *pde = 0748a163
Oct 18 00:29:04 localhost kernel: Oops: 0003 2.4.20-pre11aa1 #9 Fri Oct 18
00:06:42 EST 2002
Oct 18 00:29:04 localhost kernel: CPU: 0
Oct 18 00:29:04 localhost kernel: EIP: 0010:[<c0139920>] Not tainted
Oct 18 00:29:04 localhost kernel: EFLAGS: 00010216
Oct 18 00:29:04 localhost kernel: eax: c6b917c0 ebx: c4a132c0 ecx:
00000004 edx: c0251474
Oct 18 00:29:04 localhost kernel: esi: 00000000 edi: ffffffe9 ebp:
c158e380 esp: c8bb7f44
Oct 18 00:29:04 localhost kernel: ds: 0018 es: 0018 ss: 0018
Oct 18 00:29:04 localhost kernel: Process sh (pid: 950, stackpage=c8bb7000)
Oct 18 00:29:04 localhost kernel: Stack: c167e440 00000004 c57acbe4 00000000
c0137e29 00000004 c16d77c0 00000000
Oct 18 00:29:04 localhost kernel: c1be5000 4001edcd bfffeb68 c0137e07
c16d77c0 c158e380 00000000 c8bb7f84
Oct 18 00:29:04 localhost kernel: c16d77c0 c158e380 c1be5000 c2dbc61c
00000003 00000001 00000001 4001edcd
Oct 18 00:29:04 localhost kernel: Call Trace: [<c0137e29>] [<c0137e07>]
[<c01381e3>] [<c01074ff>]
Oct 18 00:29:04 localhost kernel: Code: 89 50 04 89 02 c7 43 04 00 00 00 00 c7
03 00 00 00 00 ff 0d


>>EIP; c0139920 <get_empty_filp+20/130> <=====

>>eax; c6b917c0 <END_OF_CODE+2dcc865/????>
>>ebx; c4a132c0 <END_OF_CODE+c4e365/????>
>>edx; c0251474 <free_list+0/8>
>>edi; ffffffe9 <END_OF_CODE+3c23b08e/????>
>>ebp; c158e380 <_end+12b5fe8/3a47ce8>
>>esp; c8bb7f44 <END_OF_CODE+4df2fe9/????>

Trace; c0137e29 <dentry_open+19/210>
Trace; c0137e07 <filp_open+67/70>
Trace; c01381e3 <sys_open+53/a0>
Trace; c01074ff <system_call+33/38>

Code; c0139920 <get_empty_filp+20/130>
00000000 <_EIP>:
Code; c0139920 <get_empty_filp+20/130> <=====
0: 89 50 04 mov %edx,0x4(%eax) <=====
Code; c0139923 <get_empty_filp+23/130>
3: 89 02 mov %eax,(%edx)
Code; c0139925 <get_empty_filp+25/130>
5: c7 43 04 00 00 00 00 movl $0x0,0x4(%ebx)
Code; c013992c <get_empty_filp+2c/130>
c: c7 03 00 00 00 00 movl $0x0,(%ebx)
Code; c0139932 <get_empty_filp+32/130>
12: ff 0d 00 00 00 00 decl 0x0

Oct 18 00:29:10 localhost kernel: <1>Unable to handle kernel paging request
at virtual address c6895b44
Oct 18 00:29:10 localhost kernel: c0139920
Oct 18 00:29:10 localhost kernel: *pde = 0748a163
Oct 18 00:29:10 localhost kernel: Oops: 0003 2.4.20-pre11aa1 #9 Fri Oct 18
00:06:42 EST 2002
Oct 18 00:29:10 localhost kernel: CPU: 0
Oct 18 00:29:10 localhost kernel: EIP: 0010:[<c0139920>] Not tainted
Oct 18 00:29:10 localhost kernel: EFLAGS: 00010216
Warning (Oops_read): Code line not seen, dumping what data is available


>>EIP; c0139920 <get_empty_filp+20/130> <=====


2 warnings issued. Results may not be reliable.

2. Then I compiled the kernel with one and only module ie, radeon, and nothing
else.
$ cat /lib/modules/2.4.20-pre11aa1/modules.dep
/lib/modules/2.4.20-pre11aa1/kernel/drivers/char/drm/radeon.o:

Here is the decoded output of the oops appeared on the system logs:
----------------------------------------------------------------------------------------------------
ksymoops 2.4.5 on i686 2.4.20-pre11aa1. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.20-pre11aa1/ (default)
-m /boot/System.map-2.4.20-pre11aa1 (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

Oct 18 01:00:26 localhost kernel: Unable to handle kernel paging request at
virtual address c3d50000
Oct 18 01:00:26 localhost kernel: c021389a
Oct 18 01:00:26 localhost kernel: *pde = 03c001e3
Oct 18 01:00:26 localhost kernel: Oops: 0002 2.4.20-pre11aa1 #10 Fri Oct 18
00:39:27 EST 2002
Oct 18 01:00:26 localhost kernel: CPU: 0
Oct 18 01:00:26 localhost kernel: EIP: 0010:[<c021389a>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Oct 18 01:00:26 localhost kernel: EFLAGS: 00013246
Oct 18 01:00:26 localhost kernel: eax: 0000003a ebx: c1730000 ecx:
c3df4000 edx: 00000000
Oct 18 01:00:26 localhost kernel: esi: c3d50000 edi: 01730025 ebp:
c10a89dc esp: c3df5e9c
Oct 18 01:00:26 localhost kernel: ds: 0018 es: 0018 ss: 0018
Oct 18 01:00:26 localhost kernel: Process modprobe (pid: 712,
stackpage=c3df5000)
Oct 18 01:00:26 localhost kernel: Stack: c103fc5c c3fad498 c01264ce c3d50000
c1730000 dfe1ce00 c10a89dc c4c99420
Oct 18 01:00:26 localhost kernel: 42126000 dfe1ce00 c164ed40 c0126e69
dfe1ce00 c164ed40 42126000 c3fad498
Oct 18 01:00:26 localhost kernel: c4c99420 01730025 c164e5c0 dfe1ce00
c164ed40 42126000 c3df4000 c011244a
Oct 18 01:00:26 localhost kernel: Call Trace: [<c01264ce>] [<c0126e69>]
[<c011244a>] [<c01276dc>] [<c0139b8c>]
Oct 18 01:00:26 localhost kernel: [<c01286df>] [<c0128a37>] [<c0128ab4>]
[<c01122a0>] [<c01075f0>]
Oct 18 01:00:26 localhost kernel: Code: 0f e7 06 0f 6f 4b 08 0f e7 4e 08 0f 6f
53 10 0f e7 56 10 0f


>>EIP; c021389a <fast_copy_page+3a/e0> <=====

>>ebx; c1730000 <_end+1455ba8/3a85c28>
>>ecx; c3df4000 <END_OF_CODE+7ea89/????>
>>esi; c3d50000 <_end+3a75ba8/3a85c28>
>>edi; 01730025 Before first symbol
>>ebp; c10a89dc <_end+dce584/3a85c28>
>>esp; c3df5e9c <END_OF_CODE+80925/????>

Trace; c01264ce <do_wp_page+fe/1f0>
Trace; c0126e69 <handle_mm_fault+119/160>
Trace; c011244a <do_page_fault+1aa/5a0>
Trace; c01276dc <zap_pmd_range+7c/80>
Trace; c0139b8c <fput+cc/120>
Trace; c01286df <unmap_fixup+12f/140>
Trace; c0128a37 <do_munmap+297/2d0>
Trace; c0128ab4 <sys_munmap+44/80>
Trace; c01122a0 <do_page_fault+0/5a0>
Trace; c01075f0 <error_code+34/3c>

Code; c021389a <fast_copy_page+3a/e0>
00000000 <_EIP>:
Code; c021389a <fast_copy_page+3a/e0> <=====
0: 0f e7 06 movntq %mm0,(%esi) <=====
Code; c021389d <fast_copy_page+3d/e0>
3: 0f 6f 4b 08 movq 0x8(%ebx),%mm1
Code; c02138a1 <fast_copy_page+41/e0>
7: 0f e7 4e 08 movntq %mm1,0x8(%esi)
Code; c02138a5 <fast_copy_page+45/e0>
b: 0f 6f 53 10 movq 0x10(%ebx),%mm2
Code; c02138a9 <fast_copy_page+49/e0>
f: 0f e7 56 10 movntq %mm2,0x10(%esi)
Code; c02138ad <fast_copy_page+4d/e0>
13: 0f 00 00 sldtl (%eax)


1 warning issued. Results may not be reliable.

I can provide .config upon request, but it is basically the same as the
previous one except I have deselected the whole Netfilter stuff.

Thanks.
--
Hari
[email protected]

2002-10-17 16:21:46

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.20pre11aa1

On Fri, Oct 18, 2002 at 01:26:36AM +1000, Srihari Vijayaraghavan wrote:
> Hello Keith,
>
> > You don't need that, just mkdir /var/log/ksymoops. modprobe/insmod
> > will create a daily log file and snapshot a copy of lsmod and
> > /proc/ksyms for every module loaded or unloaded. All with sync in the
> > right places.
>
> Thanks, and that works fine.

if you enabled it before getting the new oopses what's interesting is
that you send me a tarball of /var/log/ksymoops, so I
will also be able to resolve those module addresses too (please send me
also your agpgart.o and your radeon.o modules, all from the same
kernels: .o, ksymoops and below oopses).

thanks,

Andrea

2002-10-18 14:46:01

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.20pre11aa1

On Sat, Oct 19, 2002 at 12:14:19AM +1000, Srihari Vijayaraghavan wrote:
> Oct 18 23:40:42 localhost kernel: Process modprobe (pid: 957,

modprobe was running at 234042, now in the log I see:

20021018 234001 start /sbin/modprobe -s -k -- char-major-14 safemode=1
20021018 234001 probe ended
20021018 234004 start /sbin/modprobe -s -k -- char-major-10-134 safemode=1
20021018 234004 probe ended
20021018 234014 start /sbin/modprobe -s -k -- char-major-10-134 safemode=1
20021018 234014 probe ended
20021018 234021 start /sbin/modprobe -s -k -- char-major-14 safemode=1
20021018 234021 probe ended
20021018 234022 start /sbin/modprobe -s -k -- ide-cd safemode=1
20021018 234022 probe ended
20021018 234022 start /sbin/modprobe -s -k -- ide-cd safemode=1
20021018 234022 probe ended
20021018 234040 start /sbin/modprobe -s -k -- char-major-14 safemode=1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20021018 234040 probe ended
20021018 234051 start /sbin/modprobe -s -k -- binfmt-ffff safemode=1
20021018 234051 probe ended
20021018 234051 start /sbin/modprobe -s -k -- binfmt-ffff safemode=1
20021018 234051 probe ended

I don't see any modprobe in the logs at 234042 and the one at 234040 is
writing "probe ended" at 234040. maybe it was another modprobe that
crashed before it could write into the logs? or maybe it was the
underlined one that crashed after writing "probe ended"? But anyways it
looks like modprobe is innocent if it didn't write into the log any new
module loaded. Do you agree Keith?

if you still have the .config used to build the kernel please send it
too, thanks!

I've no idea why radeon or agpgart could generate corruption in my tree
and not in mainline and I can't reproduce. the best would be if you
could do a binary search on all the patches applied (first applying all
the [012]* and see if you can rerproduce, and so on)

Andrea

2002-10-18 15:06:29

by Srihari Vijayaraghavan

[permalink] [raw]
Subject: Re: 2.4.20pre11aa1

Hello,

On Saturday 19 October 2002 00:52, Andrea Arcangeli wrote:
> if you still have the .config used to build the kernel please send it
> too, thanks!

CONFIG_X86=y
CONFIG_UID16=y
CONFIG_MODULES=y
CONFIG_MODVERSIONS=y
CONFIG_KMOD=y
CONFIG_MK7=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_CMPXCHG8=y
CONFIG_X86_HAS_TSC=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_USE_3DNOW=y
CONFIG_X86_PGE=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_F00F_WORKS_OK=y
CONFIG_X86_MCE=y
CONFIG_NOHIGHMEM=y
CONFIG_1GB=y
CONFIG_MTRR=y
CONFIG_X86_TSC=y
CONFIG_NET=y
CONFIG_PCI=y
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_NAMES=y
CONFIG_SYSVIPC=y
CONFIG_SYSCTL=y
CONFIG_KCORE_ELF=y
CONFIG_BINFMT_AOUT=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=y
CONFIG_PM=y
CONFIG_BLK_DEV_FD=y
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_RAID0=y
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_IDEDISK_MULTI_MODE=y
CONFIG_BLK_DEV_IDECD=y
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_IDEDMA_PCI_AUTO=y
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_BLK_DEV_ADMA=y
CONFIG_BLK_DEV_VIA82CXXX=y
CONFIG_IDEDMA_AUTO=y
CONFIG_BLK_DEV_IDE_MODES=y
CONFIG_NETDEVICES=y
CONFIG_PPP=y
CONFIG_PPP_ASYNC=y
CONFIG_PPP_DEFLATE=y
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_SERIAL=y
CONFIG_SERIAL_EXTENDED=y
CONFIG_UNIX98_PTYS=y
CONFIG_MOUSE=y
CONFIG_PSMOUSE=y
CONFIG_RTC=y
CONFIG_AGP=m
CONFIG_AGP_AMD=y
CONFIG_DRM=y
CONFIG_DRM_NEW=y
CONFIG_DRM_RADEON=m
CONFIG_EXT3_FS=y
CONFIG_JBD=y
CONFIG_RAMFS=y
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_PROC_FS=y
CONFIG_DEVPTS_FS=y
CONFIG_MSDOS_PARTITION=y
CONFIG_NLS=y
CONFIG_VGA_CONSOLE=y
CONFIG_DEBUG_KERNEL=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y

> I've no idea why radeon or agpgart could generate corruption in my tree
> and not in mainline and I can't reproduce. the best would be if you
> could do a binary search on all the patches applied (first applying all
> the [012]* and see if you can rerproduce, and so on)

I will try that.

Thanks
--
Hari
[email protected]

2002-10-18 15:28:20

by Keith Owens

[permalink] [raw]
Subject: Re: 2.4.20pre11aa1

On Fri, 18 Oct 2002 16:52:04 +0200,
Andrea Arcangeli <[email protected]> wrote:
>On Sat, Oct 19, 2002 at 12:14:19AM +1000, Srihari Vijayaraghavan wrote:
>> Oct 18 23:40:42 localhost kernel: Process modprobe (pid: 957,
>
>modprobe was running at 234042, now in the log I see:
>
>20021018 234001 start /sbin/modprobe -s -k -- char-major-14 safemode=1
>20021018 234001 probe ended
>20021018 234004 start /sbin/modprobe -s -k -- char-major-10-134 safemode=1
>20021018 234004 probe ended
>20021018 234014 start /sbin/modprobe -s -k -- char-major-10-134 safemode=1
>20021018 234014 probe ended
>20021018 234021 start /sbin/modprobe -s -k -- char-major-14 safemode=1
>20021018 234021 probe ended
>20021018 234022 start /sbin/modprobe -s -k -- ide-cd safemode=1
>20021018 234022 probe ended
>20021018 234022 start /sbin/modprobe -s -k -- ide-cd safemode=1
>20021018 234022 probe ended
>20021018 234040 start /sbin/modprobe -s -k -- char-major-14 safemode=1
>^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>20021018 234040 probe ended
>20021018 234051 start /sbin/modprobe -s -k -- binfmt-ffff safemode=1
>20021018 234051 probe ended
>20021018 234051 start /sbin/modprobe -s -k -- binfmt-ffff safemode=1
>20021018 234051 probe ended
>
>I don't see any modprobe in the logs at 234042 and the one at 234040 is
>writing "probe ended" at 234040. maybe it was another modprobe that
>crashed before it could write into the logs? or maybe it was the
>underlined one that crashed after writing "probe ended"? But anyways it
>looks like modprobe is innocent if it didn't write into the log any new
>module loaded. Do you agree Keith?

modprobe appends to the log for all operations that might change the
module state. The data is flushed before changing module state, with

snap_shot_log()
fprintf(log, "\n");
fflush(log);
fdatasync(fileno(log));
fclose(log);

so the log should always be valid, even if modprobe then crashes.
There is no system code after modprobe writes 'probe ended', crashes
after writing 'probe ended' should not be possible.

Three possibilities :-

(a) The modprobe at 234040 completed the load successfully then the
oops occurred before the modprobe task was completely purged. IOW, the
module loaded, module_init() ran, modprobe returned to user space then
the module died handling some event.

(b) The failing modprobe at 234042 is real, but is performing an
operation that will not change module state. For example, it is
doing modprobe -n, this will not log but will still invoke some module
syscalls. The oops is then caused by corrupt module tables.

(c) modprobe is not being run as root so it cannot log. Although it
cannot actually change module state, it will do part of the work in
extracting existing module symbols. Again, the oops is caused by
corrupt module tables.

2002-10-18 15:54:46

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.20pre11aa1

On Sat, Oct 19, 2002 at 01:34:06AM +1000, Keith Owens wrote:
> Three possibilities :-
>
> (a) The modprobe at 234040 completed the load successfully then the
> oops occurred before the modprobe task was completely purged. IOW, the
> module loaded, module_init() ran, modprobe returned to user space then
> the module died handling some event.
>
> (b) The failing modprobe at 234042 is real, but is performing an
> operation that will not change module state. For example, it is
> doing modprobe -n, this will not log but will still invoke some module
> syscalls. The oops is then caused by corrupt module tables.
>
> (c) modprobe is not being run as root so it cannot log. Although it
> cannot actually change module state, it will do part of the work in
> extracting existing module symbols. Again, the oops is caused by
> corrupt module tables.

thanks for the help.

the corrupted module tables rings a bell. I fixed the wrong locking in
the module code that could corrupt these tables (they were relying on
the bkl but the bkl means nothing if you copy_user in the middle of the
loop like the module code does, so I replaced the bkl with a semaphore
and that should fix things), but I wonder if I broken something else
with these fixes.

Here's the patch that I'm talking about, you may want to start the
binary search backing this out and see if the problem goes away. if it
goes away I clearly need to double check it ;)

diff -urNp x-ref/kernel/module.c x/kernel/module.c
--- x-ref/kernel/module.c Tue Jan 22 18:56:00 2002
+++ x/kernel/module.c Thu Oct 10 23:47:20 2002
@@ -78,6 +78,8 @@ static int kmalloc_failed;

spinlock_t modlist_lock = SPIN_LOCK_UNLOCKED;

+static DECLARE_MUTEX(module_mutex);
+
/**
* inter_module_register - register a new set of inter module data.
* @im_name: an arbitrary string to identify the data, must be unique
@@ -298,7 +300,7 @@ sys_create_module(const char *name_user,

if (!capable(CAP_SYS_MODULE))
return -EPERM;
- lock_kernel();
+ down(&module_mutex);
if ((namelen = get_mod_name(name_user, &name)) < 0) {
error = namelen;
goto err0;
@@ -334,7 +336,7 @@ sys_create_module(const char *name_user,
err1:
put_mod_name(name);
err0:
- unlock_kernel();
+ up(&module_mutex);
return error;
}

@@ -353,7 +355,7 @@ sys_init_module(const char *name_user, s

if (!capable(CAP_SYS_MODULE))
return -EPERM;
- lock_kernel();
+ down(&module_mutex);
if ((namelen = get_mod_name(name_user, &name)) < 0) {
error = namelen;
goto err0;
@@ -549,13 +551,16 @@ sys_init_module(const char *name_user, s
/* Initialize the module. */
atomic_set(&mod->uc.usecount,1);
mod->flags |= MOD_INITIALIZING;
+ up(&module_mutex);
if (mod->init && (error = mod->init()) != 0) {
+ down(&module_mutex);
atomic_set(&mod->uc.usecount,0);
mod->flags &= ~MOD_INITIALIZING;
if (error > 0) /* Buggy module */
error = -EBUSY;
goto err0;
}
+ down(&module_mutex);
atomic_dec(&mod->uc.usecount);

/* And set it running. */
@@ -571,7 +576,7 @@ err2:
err1:
put_mod_name(name);
err0:
- unlock_kernel();
+ up(&module_mutex);
kfree(name_tmp);
return error;
}
@@ -602,7 +607,7 @@ sys_delete_module(const char *name_user)
if (!capable(CAP_SYS_MODULE))
return -EPERM;

- lock_kernel();
+ down(&module_mutex);
if (name_user) {
if ((error = get_mod_name(name_user, &name)) < 0)
goto out;
@@ -664,7 +669,7 @@ restart:

error = 0;
out:
- unlock_kernel();
+ up(&module_mutex);
return error;
}

@@ -887,7 +892,7 @@ sys_query_module(const char *name_user,
struct module *mod;
int err;

- lock_kernel();
+ down(&module_mutex);
if (name_user == NULL)
mod = &kernel_module;
else {
@@ -937,7 +942,7 @@ sys_query_module(const char *name_user,
atomic_dec(&mod->uc.usecount);

out:
- unlock_kernel();
+ up(&module_mutex);
return err;
}

@@ -956,7 +961,7 @@ sys_get_kernel_syms(struct kernel_sym *t
int i;
struct kernel_sym ksym;

- lock_kernel();
+ down(&module_mutex);
for (mod = module_list, i = 0; mod; mod = mod->next) {
/* include the count for the module name! */
i += mod->nsyms + 1;
@@ -999,7 +1004,7 @@ sys_get_kernel_syms(struct kernel_sym *t
}
}
out:
- unlock_kernel();
+ up(&module_mutex);
return i;
}

@@ -1037,8 +1042,11 @@ free_module(struct module *mod, int tag_

if (mod->flags & MOD_RUNNING)
{
- if(mod->cleanup)
+ if(mod->cleanup) {
+ up(&module_mutex);
mod->cleanup();
+ down(&module_mutex);
+ }
mod->flags &= ~MOD_RUNNING;
}

@@ -1082,6 +1090,7 @@ int get_module_list(char *p)
char tmpstr[64];
struct module_ref *ref;

+ down(&module_mutex);
for (mod = module_list; mod != &kernel_module; mod = mod->next) {
long len;
const char *q;
@@ -1150,6 +1159,7 @@ int get_module_list(char *p)
}

fini:
+ up(&module_mutex);
return PAGE_SIZE - left;
}

@@ -1172,7 +1182,7 @@ static void *s_start(struct seq_file *m,

if (!p)
return ERR_PTR(-ENOMEM);
- lock_kernel();
+ down(&module_mutex);
for (v = module_list, n = *pos; v; n -= v->nsyms, v = v->next) {
if (n < v->nsyms) {
p->mod = v;
@@ -1180,7 +1190,7 @@ static void *s_start(struct seq_file *m,
return p;
}
}
- unlock_kernel();
+ up(&module_mutex);
kfree(p);
return NULL;
}
@@ -1193,7 +1203,7 @@ static void *s_next(struct seq_file *m,
do {
v->mod = v->mod->next;
if (!v->mod) {
- unlock_kernel();
+ up(&module_mutex);
kfree(p);
return NULL;
}
@@ -1206,7 +1216,7 @@ static void *s_next(struct seq_file *m,
static void s_stop(struct seq_file *m, void *p)
{
if (p && !IS_ERR(p)) {
- unlock_kernel();
+ up(&module_mutex);
kfree(p);
}
}


Andrea

2002-10-19 01:06:41

by Srihari Vijayaraghavan

[permalink] [raw]
Subject: Re: 2.4.20pre11aa1

Hello Andrea,

On Saturday 19 October 2002 02:00, Andrea Arcangeli wrote:
> the corrupted module tables rings a bell. I fixed the wrong locking in
> the module code that could corrupt these tables (they were relying on
> the bkl but the bkl means nothing if you copy_user in the middle of the
> loop like the module code does, so I replaced the bkl with a semaphore
> and that should fix things), but I wonder if I broken something else
> with these fixes.
>
> Here's the patch that I'm talking about, you may want to start the
> binary search backing this out and see if the problem goes away. if it
> goes away I clearly need to double check it ;)

Unfortunately removing that change off kernel/module.c did not help.

I may be wrong but considering in my case the kernel is crashing whether
agpgart/radeon are compiled as modules or built-in, I suspect that this issue
is larger than just modules sub-system.

Anyway I will start applying the patches from 00* on-wards from your tree to
see if I can reliably prove where the problem is.

Thanks.
--
Hari
[email protected]

2002-10-19 01:19:14

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.20pre11aa1

On Sat, Oct 19, 2002 at 11:21:19AM +1000, Srihari Vijayaraghavan wrote:
> I may be wrong but considering in my case the kernel is crashing whether
> agpgart/radeon are compiled as modules or built-in, I suspect that this issue
> is larger than just modules sub-system.

agreed. the oops in modprobe sounds more like a coincidence now.

> Anyway I will start applying the patches from 00* on-wards from your tree to
> see if I can reliably prove where the problem is.

that will help a lot, thanks!

Andrea

2002-10-22 10:33:05

by Srihari Vijayaraghavan

[permalink] [raw]
Subject: Re: 2.4.20pre11aa1

Hello Andrea,

On Saturday 19 October 2002 11:25, Andrea Arcangeli wrote:
> that will help a lot, thanks!

Is there a quick HOWTO on how to apply the individual patches?

Do I apply 00*gz patches after applying 00* patches?

When I tried the above procedure there were a lot of hunks and it did not
compile bzImage and agpgart.o etc..

Thanks
--
Hari
[email protected]

2002-10-22 14:50:05

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.20pre11aa1

On Tue, Oct 22, 2002 at 08:48:05PM +1000, Srihari Vijayaraghavan wrote:
> Hello Andrea,
>
> On Saturday 19 October 2002 11:25, Andrea Arcangeli wrote:
> > that will help a lot, thanks!
>
> Is there a quick HOWTO on how to apply the individual patches?
>
> Do I apply 00*gz patches after applying 00* patches?

gz doesn't matter, the `ls` ordering is the only thing that matters. You
can gzip -d * and then apply [0123]* and see if it still breaks.

> When I tried the above procedure there were a lot of hunks and it did not
> compile bzImage and agpgart.o etc..

something like this will apply cleanly, if every patch is self contained
as it should, it will compile correctly too:

rm ../2.4.20pre11aa1/*.bz2
gzip -d ../2.4.20pre11aa1/*.gz
for i in ../2.4.20pre11aa1/[0123]*; patch -p1 < $i; done

Andrea

2002-10-23 12:12:35

by Srihari Vijayaraghavan

[permalink] [raw]
Subject: Re: 2.4.20pre11aa1

Hello Andrea,

On Wednesday 23 October 2002 00:55, Andrea Arcangeli wrote:
> something like this will apply cleanly, if every patch is self contained
> as it should, it will compile correctly too:
>
> rm ../2.4.20pre11aa1/*.bz2
> gzip -d ../2.4.20pre11aa1/*.gz
> for i in ../2.4.20pre11aa1/[0123]*; patch -p1 < $i; done

Thanks that is neat.

I was able to trigger few oops with [0123]* patches.

ksymoops 2.4.5 on i686 2.4.20-pre11aa1-0123. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.20-pre11aa1-0123/ (default)
-m /boot/System.map-2.4.20-pre11aa1-0123 (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

Oct 23 21:23:22 localhost kernel: Unable to handle kernel paging request at
virtual address c463b440
Oct 23 21:23:22 localhost kernel: c01485d1
Oct 23 21:23:22 localhost kernel: *pde = 045fe163
Oct 23 21:23:22 localhost kernel: Oops: 0003
Oct 23 21:23:22 localhost kernel: CPU: 0
Oct 23 21:23:22 localhost kernel: EIP: 0010:[<c01485d1>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Oct 23 21:23:22 localhost kernel: EFLAGS: 00010282
Oct 23 21:23:22 localhost kernel: eax: c463b440 ebx: c463b440 ecx:
c5938080 edx: 00000296
Oct 23 21:23:22 localhost kernel: esi: c6a6f6c8 edi: c6a6f680 ebp:
00000001 esp: c85d1f18
Oct 23 21:23:22 localhost kernel: ds: 0018 es: 0018 ss: 0018
Oct 23 21:23:22 localhost kernel: Process bonobo-activati (pid: 795,
stackpage=c85d1000)
Oct 23 21:23:22 localhost kernel: Stack: c01a492d c158f2dc 00000000 c6a6f6c8
c01de68b c463b440 00000000 00000217
Oct 23 21:23:22 localhost kernel: c158e480 c463b440 c5706b50 c158e200
c5706a40 c6cf4140 c01de987 c6a6f680
Oct 23 21:23:22 localhost kernel: 00000000 c01a236c c5706b50 c5706a40
c01a2949 c5706b50 c641f3c0 00000000
Oct 23 21:23:22 localhost kernel: Call Trace: [<c01a492d>] [<c01de68b>]
[<c01de987>] [<c01a236c>] [<c01a2949>]
Oct 23 21:23:22 localhost kernel: [<c0136782>] [<c0134e6d>] [<c0134eee>]
[<c010737f>]
Oct 23 21:23:22 localhost kernel: Code: ff 0b 0f 94 c0 84 c0 0f 84 8f 00 00 00
8d 73 18 39 73 18 74


>>EIP; c01485d1 <dput+11/110> <=====

>>eax; c463b440 <END_OF_CODE+66e505/????>
>>ebx; c463b440 <END_OF_CODE+66e505/????>
>>ecx; c5938080 <END_OF_CODE+196b145/????>
>>esi; c6a6f6c8 <END_OF_CODE+2aa278d/????>
>>edi; c6a6f680 <END_OF_CODE+2aa2745/????>
>>esp; c85d1f18 <END_OF_CODE+4604fdd/????>

Trace; c01a492d <sk_free+2d/60>
Trace; c01de68b <unix_release_sock+11b/1d0>
Trace; c01de987 <unix_release+27/30>
Trace; c01a236c <sock_release+5c/60>
Trace; c01a2949 <sock_close+39/60>
Trace; c0136782 <fput+102/130>
Trace; c0134e6d <filp_close+4d/80>
Trace; c0134eee <sys_close+4e/60>
Trace; c010737f <system_call+33/38>

Code; c01485d1 <dput+11/110>
00000000 <_EIP>:
Code; c01485d1 <dput+11/110> <=====
0: ff 0b decl (%ebx) <=====
Code; c01485d3 <dput+13/110>
2: 0f 94 c0 sete %al
Code; c01485d6 <dput+16/110>
5: 84 c0 test %al,%al
Code; c01485d8 <dput+18/110>
7: 0f 84 8f 00 00 00 je 9c <_EIP+0x9c>
Code; c01485de <dput+1e/110>
d: 8d 73 18 lea 0x18(%ebx),%esi
Code; c01485e1 <dput+21/110>
10: 39 73 18 cmp %esi,0x18(%ebx)
Code; c01485e4 <dput+24/110>
13: 74 00 je 15 <_EIP+0x15>

Oct 23 21:23:22 localhost kernel: <1>Unable to handle kernel paging request
at virtual address c4c6a360
Oct 23 21:23:22 localhost kernel: c0137103
Oct 23 21:23:22 localhost kernel: *pde = 04c001e3
Oct 23 21:23:22 localhost kernel: Oops: 0002
Oct 23 21:23:22 localhost kernel: CPU: 0
Oct 23 21:23:22 localhost kernel: EIP: 0010:[<c0137103>] Not tainted
Oct 23 21:23:22 localhost kernel: EFLAGS: 00013286
Oct 23 21:23:22 localhost kernel: eax: c4c6a340 ebx: 00000000 ecx:
c916b940 edx: c025ec44
Oct 23 21:23:22 localhost kernel: esi: c916b940 edi: c1ee3930 ebp:
c1ee3cc0 esp: c1c11e54
Oct 23 21:23:22 localhost kernel: ds: 0018 es: 0018 ss: 0018
Oct 23 21:23:22 localhost kernel: Process kjournald (pid: 136,
stackpage=c1c11000)
Oct 23 21:23:22 localhost kernel: Stack: 00000000 c01379e8 c916b940 00000000
c916b940 c1ee3450 c0169b7e c916b940
Oct 23 21:23:22 localhost kernel: 0000002d c1c11ea8 000002fa ffffffff
c1c10000 dffceaf4 00000000 00000000
Oct 23 21:23:22 localhost kernel: 00000000 00000000 c1ca2c40 c1b72540
000002fa c90e1640 c90e15c0 c8576a40
Oct 23 21:23:22 localhost kernel: Call Trace: [<c01379e8>] [<c0169b7e>]
[<c011350b>] [<c016bf5c>] [<c016be00>]
Oct 23 21:23:22 localhost kernel: [<c010576e>] [<c016be20>]
Oct 23 21:23:22 localhost kernel: Code: 89 48 20 8b 02 89 48 24 ff 04 9d 50 ec
25 c0 0f b7 41 08 01


>>EIP; c0137103 <__insert_into_lru_list+43/60> <=====

>>eax; c4c6a340 <END_OF_CODE+c9d405/????>
>>ecx; c916b940 <END_OF_CODE+519ea05/????>
>>edx; c025ec44 <lru_list+0/c>
>>esi; c916b940 <END_OF_CODE+519ea05/????>
>>edi; c1ee3930 <[md].bss.end+216dd1/2273521>
>>ebp; c1ee3cc0 <[md].bss.end+217161/2273521>
>>esp; c1c11e54 <_end+1997e04/1a32030>

Trace; c01379e8 <__refile_buffer+58/70>
Trace; c0169b7e <journal_commit_transaction+105e/11c0>
Trace; c011350b <schedule+15b/240>
Trace; c016bf5c <kjournald+13c/1d0>
Trace; c016be00 <commit_timeout+0/10>
Trace; c010576e <kernel_thread+2e/40>
Trace; c016be20 <kjournald+0/1d0>

Code; c0137103 <__insert_into_lru_list+43/60>
00000000 <_EIP>:
Code; c0137103 <__insert_into_lru_list+43/60> <=====
0: 89 48 20 mov %ecx,0x20(%eax) <=====
Code; c0137106 <__insert_into_lru_list+46/60>
3: 8b 02 mov (%edx),%eax
Code; c0137108 <__insert_into_lru_list+48/60>
5: 89 48 24 mov %ecx,0x24(%eax)
Code; c013710b <__insert_into_lru_list+4b/60>
8: ff 04 9d 50 ec 25 c0 incl 0xc025ec50(,%ebx,4)
Code; c0137112 <__insert_into_lru_list+52/60>
f: 0f b7 41 08 movzwl 0x8(%ecx),%eax
Code; c0137116 <__insert_into_lru_list+56/60>
13: 01 00 add %eax,(%eax)

Oct 23 21:23:22 localhost kernel: <1>Unable to handle kernel paging request
at virtual address c51c0098
Oct 23 21:23:22 localhost kernel: c0119a10
Oct 23 21:23:22 localhost kernel: *pde = 050001e3
Oct 23 21:23:22 localhost kernel: Oops: 0000
Oct 23 21:23:22 localhost kernel: CPU: 0
Oct 23 21:23:22 localhost kernel: EIP: 0010:[<c0119a10>] Not tainted
Oct 23 21:23:22 localhost kernel: EFLAGS: 00013206
Oct 23 21:23:22 localhost kernel: eax: 00000000 ebx: c51c0000 ecx:
c193f000 edx: 00000000
Oct 23 21:23:22 localhost kernel: esi: c1c10000 edi: 0000006a ebp:
0000000b esp: c1c11d08
Oct 23 21:23:22 localhost kernel: ds: 0018 es: 0018 ss: 0018
Oct 23 21:23:22 localhost kernel: Process kjournald (pid: 136,
stackpage=c1c11000)
Oct 23 21:23:22 localhost kernel: Stack: c1587bb8 c193f040 c1c10000 00000000
c1c10000 0000006a 0000000b c0119f00
Oct 23 21:23:22 localhost kernel: c1c10000 00000002 c1c11e20 00000002
0000006a c1c10000 c01079f2 0000000b
Oct 23 21:23:22 localhost kernel: c01edc4a 00000002 4942412e c01123c4
c01edc4a c1c11e20 00000002 c0276784
Oct 23 21:23:22 localhost kernel: Call Trace: [<c0119f00>] [<c01079f2>]
[<c01123c4>] [<c019bc12>] [<c0137cab>]
Oct 23 21:23:22 localhost kernel: [<c018f4ec>] [<c018f8d5>] [<c018fac5>]
[<c01120b0>] [<c0107470>] [<c0137103>]
Oct 23 21:23:22 localhost kernel: [<c01379e8>] [<c0169b7e>] [<c011350b>]
[<c016bf5c>] [<c016be00>] [<c010576e>]
Oct 23 21:23:22 localhost kernel: [<c016be20>]
Oct 23 21:23:22 localhost kernel: Code: 39 b3 98 00 00 00 0f 84 85 02 00 00 8b
5b 50 81 fb 00 80 21


>>EIP; c0119a10 <exit_notify+20/300> <=====

>>ebx; c51c0000 <END_OF_CODE+11f30c5/????>
>>ecx; c193f000 <_end+16c4fb0/1a32030>
>>esi; c1c10000 <_end+1995fb0/1a32030>
>>esp; c1c11d08 <_end+1997cb8/1a32030>

Trace; c0119f00 <do_exit+210/260>
Trace; c01079f2 <die+72/80>
Trace; c01123c4 <do_page_fault+314/5d0>
Trace; c019bc12 <do_rw_disk+4b2/5c0>
Trace; c0137cab <create_buffers+6b/e0>
Trace; c018f4ec <ide_wait_stat+bc/130>
Trace; c018f8d5 <start_request+1b5/250>
Trace; c018fac5 <ide_do_request+c5/1c0>
Trace; c01120b0 <do_page_fault+0/5d0>
Trace; c0107470 <error_code+34/3c>
Trace; c0137103 <__insert_into_lru_list+43/60>
Trace; c01379e8 <__refile_buffer+58/70>
Trace; c0169b7e <journal_commit_transaction+105e/11c0>
Trace; c011350b <schedule+15b/240>
Trace; c016bf5c <kjournald+13c/1d0>
Trace; c016be00 <commit_timeout+0/10>
Trace; c010576e <kernel_thread+2e/40>
Trace; c016be20 <kjournald+0/1d0>

Code; c0119a10 <exit_notify+20/300>
00000000 <_EIP>:
Code; c0119a10 <exit_notify+20/300> <=====
0: 39 b3 98 00 00 00 cmp %esi,0x98(%ebx) <=====
Code; c0119a16 <exit_notify+26/300>
6: 0f 84 85 02 00 00 je 291 <_EIP+0x291>
Code; c0119a1c <exit_notify+2c/300>
c: 8b 5b 50 mov 0x50(%ebx),%ebx
Code; c0119a1f <exit_notify+2f/300>
f: 81 fb 00 80 21 00 cmp $0x218000,%ebx

Oct 23 21:23:22 localhost kernel: <1>Unable to handle kernel paging request
at virtual address c54bc098
Oct 23 21:23:22 localhost kernel: c0119a10
Oct 23 21:23:22 localhost kernel: *pde = 054001e3
Oct 23 21:23:22 localhost kernel: Oops: 0000
Oct 23 21:23:22 localhost kernel: CPU: 0
Oct 23 21:23:22 localhost kernel: EIP: 0010:[<c0119a10>] Not tainted
Oct 23 21:23:23 localhost kernel: EFLAGS: 00013206
Oct 23 21:23:23 localhost kernel: eax: 00000000 ebx: c54bc000 ecx:
00000000 edx: 00000000
Oct 23 21:23:23 localhost kernel: esi: c1c10000 edi: 000001c0 ebp:
0000000b esp: c1c11bbc
Oct 23 21:23:23 localhost kernel: ds: 0018 es: 0018 ss: 0018
Oct 23 21:23:23 localhost kernel: Process kjournald (pid: 136,
stackpage=c1c11000)
Oct 23 21:23:23 localhost kernel: Stack: 00000020 00000400 c1c10000 00000000
c1c10000 000001c0 0000000b c0119f00
Oct 23 21:23:23 localhost kernel: c1c10000 00000000 c1c11cd4 00000000
000001c0 c1c10000 c01079f2 0000000b
Oct 23 21:23:23 localhost kernel: c01edc4a 00000000 24548924 c01123c4
c01edc4a c1c11cd4 00000000 33323130
Oct 23 21:23:23 localhost kernel: Call Trace: [<c0119f00>] [<c01079f2>]
[<c01123c4>] [<c0185ba9>] [<c0185ba9>]
Oct 23 21:23:23 localhost kernel: [<c0185ba9>] [<c01167bf>] [<c0185ba9>]
[<c0185ba9>] [<c01120b0>] [<c0107470>]
Oct 23 21:23:23 localhost kernel: [<c0119a10>] [<c0119f00>] [<c01079f2>]
[<c01123c4>] [<c019bc12>] [<c0137cab>]
Oct 23 21:23:23 localhost kernel: [<c018f4ec>] [<c018f8d5>] [<c018fac5>]
[<c01120b0>] [<c0107470>] [<c0137103>]
Oct 23 21:23:23 localhost kernel: [<c01379e8>] [<c0169b7e>] [<c011350b>]
[<c016bf5c>] [<c016be00>] [<c010576e>]
Oct 23 21:23:23 localhost kernel: [<c016be20>]
Oct 23 21:23:23 localhost kernel: Code: 39 b3 98 00 00 00 0f 84 85 02 00 00 8b
5b 50 81 fb 00 80 21


>>EIP; c0119a10 <exit_notify+20/300> <=====

>>ebx; c54bc000 <END_OF_CODE+14ef0c5/????>
>>esi; c1c10000 <_end+1995fb0/1a32030>
>>esp; c1c11bbc <_end+1997b6c/1a32030>

Trace; c0119f00 <do_exit+210/260>
Trace; c01079f2 <die+72/80>
Trace; c01123c4 <do_page_fault+314/5d0>
Trace; c0185ba9 <vt_console_print+59/310>
Trace; c0185ba9 <vt_console_print+59/310>
Trace; c0185ba9 <vt_console_print+59/310>
Trace; c01167bf <__call_console_drivers+5f/70>
Trace; c0185ba9 <vt_console_print+59/310>
Trace; c0185ba9 <vt_console_print+59/310>
Trace; c01120b0 <do_page_fault+0/5d0>
Trace; c0107470 <error_code+34/3c>
Trace; c0119a10 <exit_notify+20/300>
Trace; c0119f00 <do_exit+210/260>
Trace; c01079f2 <die+72/80>
Trace; c01123c4 <do_page_fault+314/5d0>
Trace; c019bc12 <do_rw_disk+4b2/5c0>
Trace; c0137cab <create_buffers+6b/e0>
Trace; c018f4ec <ide_wait_stat+bc/130>
Trace; c018f8d5 <start_request+1b5/250>
Trace; c018fac5 <ide_do_request+c5/1c0>
Trace; c01120b0 <do_page_fault+0/5d0>
Trace; c0107470 <error_code+34/3c>
Trace; c0137103 <__insert_into_lru_list+43/60>
Trace; c01379e8 <__refile_buffer+58/70>
Trace; c0169b7e <journal_commit_transaction+105e/11c0>
Trace; c011350b <schedule+15b/240>
Trace; c016bf5c <kjournald+13c/1d0>
Trace; c016be00 <commit_timeout+0/10>
Trace; c010576e <kernel_thread+2e/40>
Trace; c016be20 <kjournald+0/1d0>

Code; c0119a10 <exit_notify+20/300>
00000000 <_EIP>:
Code; c0119a10 <exit_notify+20/300> <=====
0: 39 b3 98 00 00 00 cmp %esi,0x98(%ebx) <=====
Code; c0119a16 <exit_notify+26/300>
6: 0f 84 85 02 00 00 je 291 <_EIP+0x291>
Code; c0119a1c <exit_notify+2c/300>
c: 8b 5b 50 mov 0x50(%ebx),%ebx
Code; c0119a1f <exit_notify+2f/300>
f: 81 fb 00 80 21 00 cmp $0x218000,%ebx


1 warning issued. Results may not be reliable.


When I tried to see if I can trigger the oops with only 0* patches, I couldn't
compile the kernel. Here is the standard error stream of 'make dep clean ;
make bzImage' :

module.c:7:28: linux/rcupdate.h: No such file or directory
module.c: In function `free_module':
module.c:1082: warning: implicit declaration of function `synchronize_kernel'
make[2]: *** [module.o] Error 1
make[1]: *** [first_rule] Error 2
make: *** [_dir_kernel] Error 2

BTW I heard DaveM mentioning about AMD only bugs appearing during 2.4.20-pre
series, I am not sure about -aa series though. I thought of testing the
-aa/radeon/agpgart on my friend's computer which is an Intel P-III/VIA
Chipset mother board.

Thanks for your help.
--
Hari
[email protected]

2002-10-23 12:40:45

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.20pre11aa1

On Wed, Oct 23, 2002 at 10:27:47PM +1000, Srihari Vijayaraghavan wrote:
> module.c:7:28: linux/rcupdate.h: No such file or directory
> module.c: In function `free_module':
> module.c:1082: warning: implicit declaration of function `synchronize_kernel'
> make[2]: *** [module.o] Error 1
> make[1]: *** [first_rule] Error 2
> make: *** [_dir_kernel] Error 2

Ok, please try to backout 2.4.20pre11aa1/00_reduce-module-races-1.
I just moved it into the 20 serie. that should fix this bit.

Andrea

2002-10-23 14:11:15

by Srihari Vijayaraghavan

[permalink] [raw]
Subject: Re: 2.4.20pre11aa1

Hello Andrea,

On Wednesday 23 October 2002 22:46, Andrea Arcangeli wrote:
> Ok, please try to backout 2.4.20pre11aa1/00_reduce-module-races-1.
> I just moved it into the 20 serie. that should fix this bit.

Yes I did that. I renamed it to _00_reduce-module-races-1, and did the
patching again.

But that did not help. Here is the current std_err:

exit.c: In function `release_task':
exit.c:44: warning: implicit declaration of function `sched_exit'
shmem.c: In function `shmem_getpage_locked':
shmem.c:560: warning: unused variable `flags'
{standard input}: Assembler messages:
{standard input}:1014: Warning: indirect lcall without `*'
{standard input}:1091: Warning: indirect lcall without `*'
{standard input}:1176: Warning: indirect lcall without `*'
{standard input}:1255: Warning: indirect lcall without `*'
{standard input}:1271: Warning: indirect lcall without `*'
{standard input}:1281: Warning: indirect lcall without `*'
{standard input}:1349: Warning: indirect lcall without `*'
{standard input}:1364: Warning: indirect lcall without `*'
{standard input}:1375: Warning: indirect lcall without `*'
{standard input}:1874: Warning: indirect lcall without `*'
{standard input}:1960: Warning: indirect lcall without `*'
init_task.c:3:34: linux/sched_runqueue.h: No such file or directory
make[1]: *** [init_task.o] Error 1
make: *** [_dir_arch/i386/kernel] Error 2

Thanks.
--
Hari
[email protected]

2002-10-23 14:29:07

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.20pre11aa1

On Thu, Oct 24, 2002 at 12:26:36AM +1000, Srihari Vijayaraghavan wrote:
> Hello Andrea,
>
> On Wednesday 23 October 2002 22:46, Andrea Arcangeli wrote:
> > Ok, please try to backout 2.4.20pre11aa1/00_reduce-module-races-1.
> > I just moved it into the 20 serie. that should fix this bit.
>
> Yes I did that. I renamed it to _00_reduce-module-races-1, and did the
> patching again.
>
> But that did not help. Here is the current std_err:
>
> exit.c: In function `release_task':
> exit.c:44: warning: implicit declaration of function `sched_exit'
> shmem.c: In function `shmem_getpage_locked':
> shmem.c:560: warning: unused variable `flags'
> {standard input}: Assembler messages:
> {standard input}:1014: Warning: indirect lcall without `*'
> {standard input}:1091: Warning: indirect lcall without `*'
> {standard input}:1176: Warning: indirect lcall without `*'
> {standard input}:1255: Warning: indirect lcall without `*'
> {standard input}:1271: Warning: indirect lcall without `*'
> {standard input}:1281: Warning: indirect lcall without `*'
> {standard input}:1349: Warning: indirect lcall without `*'
> {standard input}:1364: Warning: indirect lcall without `*'
> {standard input}:1375: Warning: indirect lcall without `*'
> {standard input}:1874: Warning: indirect lcall without `*'
> {standard input}:1960: Warning: indirect lcall without `*'
> init_task.c:3:34: linux/sched_runqueue.h: No such file or directory
> make[1]: *** [init_task.o] Error 1
> make: *** [_dir_arch/i386/kernel] Error 2

try to apply all the scheduler related patches:

10_sched-o1-hyperthreading-3 20_apm-o1-sched-1 20_sched-o1-fixes-5
21_o1-A4-aa-1 20_rcu-poll-7

Andrea

2002-10-25 13:47:31

by Srihari Vijayaraghavan

[permalink] [raw]
Subject: Re: 2.4.20pre11aa1

Hello Andrea,

[I tried to post the reply through groups.google.com, and it looks like it
didn't get to lkml. :( ]

> try to apply all the scheduler related patches:
>
> 10_sched-o1-hyperthreading-3 20_apm-o1-sched-1 20_sched-o1-fixes-5
> 21_o1-A4-aa-1 20_rcu-poll-7

OK.

I have applied the patches 0* and the following patches in this order:
10_sched-o1-hyperthreading-3
20_apm-o1-sched-1
20_rcu-poll-7
20_sched-o1-fixes-5
21_o1-A4-aa-1

The resulting kernel is very stable and it does not crash.

Then I tried patches [01]* and the extra patches (20_apm-o1-sched-1,
20_rcu-poll-7, 20_sched-o1-fixes-5, 21_o1-A4-aa-1), I couldn't compile
the kernel.

Here is the current std_err:

inode.c:1468: warning: initialization from incompatible pointer type
In file included from ide.c:149:
/usr/src/01/include/linux/ide.h:333:16: warning: ISO C requires
whitespace after the macro name
ide.c: In function `init_hwif_data':
ide.c:270: `ide_disk' undeclared (first use in this function)
ide.c:270: (Each undeclared identifier is reported only once
ide.c:270: for each function it appears in.)
ide.c: In function `ide_geninit':
ide.c:639: `ide_disk' undeclared (first use in this function)
ide.c: In function `do_reset1':
ide.c:791: `ide_disk' undeclared (first use in this function)
ide.c: In function `ide_dump_status':
ide.c:973: `ide_disk' undeclared (first use in this function)
ide.c: In function `try_to_flush_leftover_data':
ide.c:1034: `ide_disk' undeclared (first use in this function)
ide.c: In function `ide_error':
ide.c:1071: `ide_disk' undeclared (first use in this function)
ide.c: In function `start_request':
ide.c:1373: `ide_disk' undeclared (first use in this function)
ide.c: In function `ide_open':
ide.c:2119: `ide_disk' undeclared (first use in this function)
ide.c: In function `ide_reinit_drive':
ide.c:2768: `ide_disk' undeclared (first use in this function)
ide.c: In function `ide_ioctl':
ide.c:2842: `ide_disk' undeclared (first use in this function)
ide.c: In function `ide_setup':
ide.c:3383: `ide_disk' undeclared (first use in this function)
make[3]: *** [ide.o] Error 1
make[2]: *** [first_rule] Error 2
make[1]: *** [_subdir_ide] Error 2
make: *** [_dir_drivers] Error 2
make: *** Waiting for unfinished jobs....
{standard input}: Assembler messages:
{standard input}:1014: Warning: indirect lcall without `*'
{standard input}:1091: Warning: indirect lcall without `*'
{standard input}:1176: Warning: indirect lcall without `*'
{standard input}:1255: Warning: indirect lcall without `*'
{standard input}:1271: Warning: indirect lcall without `*'
{standard input}:1281: Warning: indirect lcall without `*'
{standard input}:1349: Warning: indirect lcall without `*'
{standard input}:1364: Warning: indirect lcall without `*'
{standard input}:1375: Warning: indirect lcall without `*'
{standard input}:1874: Warning: indirect lcall without `*'
{standard input}:1960: Warning: indirect lcall without `*'

Thanks.
--
Hari
[email protected]


2002-10-31 10:31:46

by Srihari Vijayaraghavan

[permalink] [raw]
Subject: Re: 2.4.20pre11aa1

Hello Andrea,

On Saturday 26 October 2002 00:03, Srihari Vijayaraghavan wrote:
> The resulting kernel is very stable and it does not crash.
>
> Then I tried patches [01]* and the extra patches (20_apm-o1-sched-1,
> 20_rcu-poll-7, 20_sched-o1-fixes-5, 21_o1-A4-aa-1), I couldn't compile
> the kernel.

The current status is:

[0]* - compiles fine - works fine
[01]* - couldn't compile
[012]* - compiles fine - crashes

So I believe either 1* or 2* patches are introducing the issue.

In the mean time I had an opportunity to test -aa on a nice IBM NetVista
computer, whose configuration is as follows:

00:00.0 Host bridge: Intel Corp. 82815 815 Chipset Host Bridge and Memory
Controller Hub (rev 02)
00:02.0 VGA compatible controller: Intel Corp. 82815 CGC [Chipset Graphics
Controller] (rev 02)
00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB PCI Bridge (rev 02)
00:1f.0 ISA bridge: Intel Corp. 82801BA ISA Bridge (LPC) (rev 02)
00:1f.1 IDE interface: Intel Corp. 82801BA IDE U100 (rev 02)
00:1f.2 USB Controller: Intel Corp. 82801BA/BAM USB (Hub #1) (rev 02)
00:1f.3 SMBus: Intel Corp. 82801BA/BAM SMBus (rev 02)
00:1f.5 Multimedia audio controller: Intel Corp. 82801BA/BAM AC'97 Audio (rev
02)
01:08.0 Ethernet controller: Intel Corp. 82801BA/BAM/CA/CAM Ethernet
Controller (rev 01)

I can easily reproduce the same issue on that computer too (of course I am
using CONFIG_AGP_I810 for agpgart support and CONFIG_DRM_I810 for i810
display card support).

I think this eliminates the doubt on DRM support of Radeon (or i810 for that
matter), and the issue appears very specific to agpgart in general.

Anyway I guess we are very close to the problem, if someone helps me to
compile -aa with [01]* patches I think we can pinpoint the issue I suspect.

Thanks for your help and support.
--
Hari
[email protected]

2002-11-09 09:16:54

by Srihari Vijayaraghavan

[permalink] [raw]
Subject: Solved 2.4.20pre11aa1/2.4.20rc1aa1 Agpgart/Radeon crash. [was: Re: 2.4.20pre11aa1]

Hello Andrea,

> So I believe either 1* or 2* patches are introducing the issue.

Got it. The 10_x86-fast-pte2 patch is introducting the instability.

I have tested it on 2.4.20rc1aa1 though, backing out that patch alone solves
the instability.

I can give the .config and ksymoops of 2.4.20rc1aa1 if needed.

> In the mean time I had an opportunity to test -aa on a nice IBM NetVista
> computer, whose configuration is as follows:

I will verify this finding even on that computer perhaps on Monday.

Thanks for your help.
--
Hari
[email protected]

2002-11-10 02:43:37

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Solved 2.4.20pre11aa1/2.4.20rc1aa1 Agpgart/Radeon crash. [was: Re: 2.4.20pre11aa1]

On Sat, Nov 09, 2002 at 08:34:39PM +1100, Srihari Vijayaraghavan wrote:
> Hello Andrea,
>
> > So I believe either 1* or 2* patches are introducing the issue.
>
> Got it. The 10_x86-fast-pte2 patch is introducting the instability.

Great job! Many thanks! This reduces the bug a whole lot. I will think
on Monday what could be going wrong with that patch, in the meantime
just try to run (slower ;) with it backed out, to be sure it's really
such one (nevertheless if I had to guess right now I would say this most
certainly is triggering a bug somewhere else, unlikely that such patch
is really containing a bug, the patch is kind of obviously correct and
it is a so much stressed codepath that everybody would reproduce it if
that was the case, one of the reason I could never guess such patch
could be the interesting one for your case without your useful binary
search).

Andrea

2002-11-10 03:06:25

by Srihari Vijayaraghavan

[permalink] [raw]
Subject: Re: Solved 2.4.20pre11aa1/2.4.20rc1aa1 Agpgart/Radeon crash. [was: Re: 2.4.20pre11aa1]

Hello Andrea,

On Sunday 10 November 2002 13:50, Andrea Arcangeli wrote:
> Great job! Many thanks! This reduces the bug a whole lot. I will think
> on Monday what could be going wrong with that patch, in the meantime
> just try to run (slower ;) with it backed out, to be sure it's really

I am running complete 2.4.20rc1aa1 minus 10_x86-fast-pte-2 at present. It has
been very stable as mainline plus as snappy as -aa :).

On a related note, I had to apply 20_rcu-poll-7 for compiling 10* patch(es)
(even for the10_ext3-o_direct-2 patch), so would it be a good idea to move it
as the earliest 10* patch?

Thanks.
--
Hari
[email protected]