LinuxLists.cc - Frequent system freezes after kernel bug

2004-06-12 18:37:50

Subject: Frequent system freezes after kernel bug

Hi,

in the last few days I have experienced frequent system freezes
apparently related to kernel bugs on one box here. Unfortunately,
that's just the same system serving as router and local fileserver for
me and my roommates, so it is quite disturbing to have to reboot it up
to three times a day.

This box runs under Debian stable; I noticed these particular bugs
starting with kernel 2.4.24. Yesterday, I updated from 2.4.25 to
2.4.26, but today it died again twice.

The first log entry looked unfamiliar to me:
> kernel: Unable to handle kernel paging request at virtual address
> 35fb6a3f
> kernel: printing eip:
> kernel: c014782c
> kernel: Oops: 0000
> kernel: CPU: 0
> kernel: EIP: 0010:[iput+44/592] Tainted: P
> kernel: EFLAGS: 00010202
> kernel: eax: 00000000 ebx: 00000000 ecx: ddd20050 edx: ddd20050
> kernel: esi: ddd20040 edi: 35fb6a1f ebp: 00001f0f esp: c1599f20
> kernel: ds: 0018 es: 0018 ss: 0018
> kernel: Process kswapd (pid: 4, stackpage=c1599000)
> kernel: Stack: c3756c58 c3756c40 ddd20040 c0145406 ddd20040 00000000
> c144e040 00000006
> kernel: ffffffff c014570b 00002744 c012cdfa 00000006 000001d0
> 0000001f 000001d0
> kernel: c024a49c c024a49c c1598000 00003401 000001d0 c024a49c
> c012d06f c1599fa8
> kernel: Call Trace: [prune_dcache+214/336] [shrink_dcache_memory
> +27/64] [shrink_cache+650/880] [shrink_caches+47/64]
> [try_to_free_pages_zone+96/240]
> kernel: [kswapd_balance_pgdat+74/160] [kswapd_balance+22/48]
> [kswapd+157/192] [arch_kernel_thread+40/64]
> kernel:
> kernel: Code: 8b 5f 20 85 db 74 0d 8b 43 18 85 c0 74 06 56 ff d0 83
> c4 04

Entires like this one have popped up frequently before, however:
> kernel: kernel BUG at buffer.c:575!
> kernel: invalid operand: 0000
> kernel: CPU: 0
> kernel: EIP: 0010:[__insert_into_lru_list+28/112] Tainted: P
> kernel: EFLAGS: 00010206
> kernel: eax: 00000000 ebx: 00000002 ecx: d06e40c0 edx: c02ad6cc
> kernel: esi: d06e40c0 edi: d06e40c0 ebp: 00000001 esp: cc3b7e6c
> kernel: ds: 0018 es: 0018 ss: 0018
> kernel: Process proftpd (pid: 3311, stackpage=cc3b7000)
> kernel: Stack: 00000002 c0135a36 d06e40c0 00000002 d06e40c0 00001000
> c0135a4a d06e40c0
> kernel: c0136443 d06e40c0 00000006 dabae5c0 10aad006 00000000
> 00001000 00000000
> kernel: c0136ab4 dabae5c0 c12c5a00 00000000 00000006 c12c5a00
> 0811346a dabae5c0
> kernel: Call Trace: [__refile_buffer+86/96] [refile_buffer+10/16]
> [__block_commit_write+131/208] [generic_commit_write+52/96]
> [ext3_commit_write+305/448]
> kernel: [do_generic_file_write+654/976] [generic_file_write
> +259/288] [ext3_file_write+35/192] [sys_write+150/240] [system_call
> +51/56]
> kernel:
> kernel: Code: 0f 0b 3f 02 e5 57 21 c0 83 3a 00 75 05 89 0a 89 49 24
> 8b 02
> kernel: <0>Assertion failure in journal_start() at transaction.
> c:251: "handle->h_transaction->t_journal == journal"
> kernel: kernel BUG at transaction.c:251!
> kernel: invalid operand: 0000
> kernel: CPU: 0
> kernel: EIP: 0010:[journal_start+74/192] Tainted: P
> kernel: EFLAGS: 00010282
> kernel: eax: 0000006c ebx: d55f9580 ecx: dfc72000 edx: 00000001
> kernel: esi: dfe49800 edi: cc3b6000 ebp: 00000040 esp: cc3b7c10
> kernel: ds: 0018 es: 0018 ss: 0018
> kernel: Process proftpd (pid: 3311, stackpage=cc3b7000)
> kernel: Stack: c021b240 c021b46c c021b220 000000fb c021b440 d55f9580
> dfae8c00 deddb740
> kernel: c015ea38 dfe49800 00000002 deddb740 dfae8c00 00000001
> c014642e deddb740
> kernel: deddb740 deddb7ac dd377ac0 c0128c93 deddb740 00000001
> deddb740 deddb7ac
> kernel: Call Trace: [ext3_dirty_inode+88/256] [__mark_inode_dirty
> +46/144] [do_generic_file_write+211/976] [check_free_space+290/320]
> [generic_file_write+259/288]
> kernel: [ext3_file_write+35/192] [do_acct_process+571/592]
> [acct_process+25/39] [do_exit+105/592] [do_invalid_op+0/160] [die
> +86/96]
> kernel: [do_invalid_op+140/160] [__insert_into_lru_list+28/112]
> [ext3_get_block_handle+426/640] [ext3_get_block_handle+242/640]
> [error_code+52/60] [__insert_into_lru_list+28/112]
> kernel: [__refile_buffer+86/96] [refile_buffer+10/16]
> [__block_commit_write+131/208] [generic_commit_write+52/96]
> [ext3_commit_write+305/448] [do_generic_file_write+654/976]
> kernel: [generic_file_write+259/288] [ext3_file_write+35/192]
> [sys_write+150/240] [system_call+51/56]
> kernel:
> kernel: Code: 0f 0b fb 00 20 b2 21 c0 83 c4 14 ff 43 08 89 d8 eb 59
> 8d 74

When these bugs occur, the computer doesn't die right away. One can
still issue commands, or login. It's just that after any such action
one never gets to see the prompt again. Issuing "init 6" right from the
console usually starts ok, then stops with a couple of hanging PIDs
reported (I really don't know what those processes are; they might well
be the terminals.) Nothing but a hard reset works then...

So, does anybody know what that is all about and what I could do so
these freezes don't happen again?

Thanks a lot and best regards,

Andreas (quite desperate by now)

2004-06-12 20:20:57

by Chris Wedgwood

[permalink] [raw]

Subject: Re: Frequent system freezes after kernel bug

On Sat, Jun 12, 2004 at 08:37:42PM +0200, Andreas Schmidt wrote:

> > kernel: EIP: 0010:[iput+44/592] Tainted: P

what modules do you have loaded?

--cw

2004-06-12 21:50:08

by Andreas Schmidt

[permalink] [raw]

Subject: Re: Frequent system freezes after kernel bug

On 2004.06.12 22:20, Chris Wedgwood wrote:
> On Sat, Jun 12, 2004 at 08:37:42PM +0200, Andreas Schmidt wrote:
>
> > > kernel: EIP: 0010:[iput+44/592] Tainted: P
>
> what modules do you have loaded?
>
root@stralsunder-10:~# lsmod
Module Size Used by Tainted: P
ipt_MASQUERADE 1272 1 (autoclean)
ipt_state 568 10 (autoclean)
ipt_LOG 3256 1 (autoclean)
iptable_filter 1700 1 (autoclean)
ip_nat_ftp 2800 0 (unused)
iptable_nat 15448 2 [ipt_MASQUERADE ip_nat_ftp]
ip_conntrack_irc 2992 0 (unused)
ip_conntrack_ftp 3696 1
ip_conntrack 19272 2 [ipt_MASQUERADE ipt_state ip_nat_ftp
iptable_nat ip_conntrack_irc ip_conntrack_ftp]
ip_tables 11416 7 [ipt_MASQUERADE ipt_state ipt_LOG
iptable_filter iptable_nat]
ppp_deflate 2968 0 (autoclean)
zlib_inflate 18532 0 (autoclean) [ppp_deflate]
zlib_deflate 17912 0 (autoclean) [ppp_deflate]
bsd_comp 3992 0 (autoclean)
ppp_synctty 5152 1 (autoclean)
ppp_generic 20192 3 (autoclean) [ppp_deflate bsd_comp
ppp_synctty]
slhc 4672 0 (autoclean) [ppp_generic]
serial 44228 0 (autoclean)
nls_iso8859-1 2844 0 (unused)
dummy 1056 1
fcdsl 862816 2
capi 18304 5
kernelcapi 29828 3 [fcdsl capi]
capiutil 22368 0 [kernelcapi]
capifs 3532 1 [capi]
8139too 12296 2
mii 2400 0 [8139too]
crc32 2848 0 [8139too]
apm 9344 2
rtc 6012 0 (autoclean)
root@stralsunder-10:~#

2004-06-13 00:10:48

by Christian Kujau

[permalink] [raw]

Subject: Re: Frequent system freezes after kernel bug

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andreas Schmidt wrote:
|> On Sat, Jun 12, 2004 at 08:37:42PM +0200, Andreas Schmidt wrote:
|>
|> > > kernel: EIP: 0010:[iput+44/592] Tainted: P
|>
| fcdsl 862816 2

could be totally unrelated, but please try to reproduce without this
(tainted) fcdsl module. it is often known for weird lockups. i'm not
able to tell from the oops message, so i have to ask: is this an SMP box?

Christian.
- --
BOFH excuse #314:

You need to upgrade your VESA local bus to a MasterCard local bus.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFAy5uD+A7rjkF8z0wRAnxzAKCoctKAT8NohJlkl6pFpbLmg/mS5ACgw416
B80DpO60XmMoKgzwyBEowBk=
=7Fns
-----END PGP SIGNATURE-----

2004-06-13 17:51:36

by Andreas Schmidt

[permalink] [raw]

Subject: Re: Frequent system freezes after kernel bug

On 2004.06.13 02:10, Christian Kujau wrote:
> Andreas Schmidt wrote:
> |> On Sat, Jun 12, 2004 at 08:37:42PM +0200, Andreas Schmidt wrote:
> |>
> |> > > kernel: EIP: 0010:[iput+44/592] Tainted: P
> |>
> | fcdsl 862816 2
>
> could be totally unrelated, but please try to reproduce without this
> (tainted) fcdsl module. it is often known for weird lockups. i'm not
> able to tell from the oops message, so i have to ask: is this an SMP
> box?
The box has just one processor (Duron, in case it's relevant). BTW, I'm
a bit at a loss how to reproduce the problem. Trouble is, it appears
quite arbitrarily. The box has been running about 26hrs now without a
reboot. (OK, there wasn't much activity today except handling mails.) I
could unload the fcdsl module, which would cut me off the net. If
necessary, I could even do this for a day or two -- but what if that
error didn't occur during that time? As I'm not sure what triggered it
in the first place, IMHO nothing could be deduced with certainty from
the error not occuring. There'd only be definite information if it
_did_ happen again. So, what do you suggest? (Sorry if that sounds
obnoxious, but I'm really a bit confused about this stuff...)

Best regards,

Andreas

2004-06-13 22:14:27

by Christian Kujau

[permalink] [raw]

Subject: Re: Frequent system freezes after kernel bug

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andreas Schmidt wrote:
| _did_ happen again. So, what do you suggest? (Sorry if that sounds
| obnoxious, but I'm really a bit confused about this stuff...)

sorry to confuse. no, i'm not able to tell any relevant information from
your oops. i must say that it just occured *to me* that we had different
oopses and lockups with fcdsl and fcpci (!) modules, which were hard to
reproduce too.

you said:
| This box runs under Debian stable; I noticed these particular bugs
| starting with kernel 2.4.24. Yesterday, I updated from 2.4.25 to

so 2.4.23 did work? would be strange, since patch-2.4.24.b2 is only
2,5KB in size and touches very few things. good for you: if 2.4.23 is
really working and 2.4.24 is not, you could back out the changes and see
what it gives.

just my 2?,
Christian.
- --
BOFH excuse #179:

multicasts on broken packets
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFAzNG8+A7rjkF8z0wRAo4XAJ9wP6X38jyICqmB7ATYSyc2N6l/BgCfQnVf
TkJlP1NWnExS6uub5KEHesk=
=DoAL
-----END PGP SIGNATURE-----

2004-06-14 03:51:01

by Andreas Schmidt

[permalink] [raw]

Subject: Re: Frequent system freezes after kernel bug

On 2004.06.14 00:14, Christian Kujau wrote:
> Andreas Schmidt wrote:

> you said:
> | This box runs under Debian stable; I noticed these particular bugs
> | starting with kernel 2.4.24. Yesterday, I updated from 2.4.25 to
>
> so 2.4.23 did work? would be strange, since patch-2.4.24.b2 is only
> 2,5KB in size and touches very few things. good for you: if 2.4.23 is
> really working and 2.4.24 is not, you could back out the changes and
> see
> what it gives.
Hmmm, I guess my posting was a bit ambigious in this regard. I started
out with 2.4.18, upgraded to 2.4.19, later 2.4.21. As the fcdsl-crashes
(triggered by disconnection) didn't cease, I googled and found other
reports of the same bug on kernels 2.4.18 and higher, so I decided to
give it a try and downgrade to 2.4.17. From there, I went right up to
2.4.24. At that time I learned of a patch to the module which allowed
to run it without the problems previously encountered. However, there
still remained the system freezes from the second bug (buffer.c/
transaction.c), so I decided to upgrade to 2.4.25 and, ultimately,
2.4.26.

Best regards,

Andreas

2004-06-16 07:24:07

by Andreas Schmidt

[permalink] [raw]

Subject: Re: Frequent system freezes after kernel bug

On 2004.06.12 20:37, Andreas Schmidt wrote:
> Hi,
>
> in the last few days I have experienced frequent system freezes
> apparently related to kernel bugs on one box here. Unfortunately,
> that's just the same system serving as router and local fileserver
> for
I have removed the tainted fcdsl-module and tried to crash the machine
by generating a steady harddisk activity. Writing to a partition with
EXT2 resulted in two non-fatal bugs in buffer.c after several hours:

19:09:32
kernel: kernel BUG at buffer.c:575!
kernel: invalid operand: 0000
kernel: CPU: 0
kernel: EIP: 0010:[__insert_into_lru_list+28/112] Not tainted
kernel: EFLAGS: 00010206
kernel: eax: 00000000 ebx: 00000002 ecx: d2e980c0 edx: c02ad6cc
kernel: esi: d2e980c0 edi: d2e980c0 ebp: 00000001 esp: c4677ebc
kernel: ds: 0018 es: 0018 ss: 0018
kernel: Process cp (pid: 4081, stackpage=c4677000)
kernel: Stack: 00000002 c0135a36 d2e980c0 00000002 d2e980c0 00001000
c0135a4a d2e980c0
kernel: c0136443 d2e980c0 00001000 dc1bc0c0 06261000 00000000
00001000 00000000
kernel: c0136ab4 dc1bc0c0 c130e514 00000000 00001000 c130e514
bffff404 d1c7b000
kernel: Call Trace: [__refile_buffer+86/96] [refile_buffer+10/16]
[__block_commit_write+131/208] [generic_commit_write+52/96]
[do_generic_file_write+654/976]
kernel: [generic_file_write+259/288] [sys_write+150/240] [system_call
+51/56]
kernel:
kernel: Code: 0f 0b 3f 02 e5 57 21 c0 83 3a 00 75 05 89 0a 89 49 24 8b
02

04:47:01
kernel: kernel BUG at buffer.c:575!
kernel: invalid operand: 0000
kernel: CPU: 0
kernel: EIP:
0010:[__insert_into_lru_list+28/112] Not tainted
kernel: EFLAGS: 00010206
kernel: eax: 00000000 ebx: 00000002 ecx: d07e80c0 edx: c02ad6cc
kernel: esi: d07e80c0 edi: d07e80c0 ebp: 00000001 esp: c2d85ebc
kernel: ds: 0018 es: 0018 ss: 0018
kernel: Process cp (pid: 760, stackpage=c2d85000)
kernel: Stack: 00000002 c0135a36 d07e80c0 00000002 d07e80c0 00001000
c0135a4a d07e80c0
kernel: c0136443 d07e80c0 00001000 cee62ac0 0c7b4000 00000000
00001000 00000000
kernel: c0136ab4 cee62ac0 c14be3d4 00000000 00001000 c14be3d4
bffff3d4 db98b000
kernel: Call Trace: [__refile_buffer+86/96] [refile_buffer+10/16]
[__block_commit_write+131/208] [generic_commit_write+52/96]
[do_generic_file_write+654/976]
kernel: [generic_file_write+259/288] [sys_write+150/240] [system_call
+51/56]
kernel:
kernel: Code: 0f 0b 3f 02 e5 57 21 c0 83 3a 00 75 05 89 0a 89 49 24 8b
02

Thereafter, I tried to continuously writing to an EXT3 partition, which
after just about half an hour resulted in bugs in buffer.c and
transaction.c. This time, the system froze again. "init 6" stopped
after sending the KILL signal because the processes with IDs "ca" and
"1" to "6" were hanging. Here's the output:

kernel: kernel BUG at buffer.c:575!
kernel: invalid operand: 0000
kernel: CPU: 0
kernel: EIP: 0010:[__insert_into_lru_list+28/112] Not tainted
kernel: EFLAGS: 00010206
kernel: eax: 00000000 ebx: 00000002 ecx: d07b80c0 edx: c02ad6cc
kernel: esi: d07b80c0 edi: d07b80c0 ebp: 00000001 esp: c6195e6c
kernel: ds: 0018 es: 0018 ss: 0018
kernel: Process cp (pid: 3627, stackpage=c6195000)
kernel: Stack: 00000002 c0135a36 d07b80c0 00000002 d07b80c0 00001000
c0135a4a d07b80c0
kernel: c0136443 d07b80c0 00001000 c648bc80 07df7000 00000000
00001000 00000000
kernel: c0136ab4 c648bc80 c105c770 00000000 00001000 c105c770
bffff3e4 c648bc80
kernel: Call Trace: [__refile_buffer+86/96] [refile_buffer+10/16]
[__block_commit_write+131/208] [generic_commit_write+52/96]
[ext3_commit_write+305/448]
kernel: [do_generic_file_write+654/976] [generic_file_write+259/288]
[ext3_file_write+35/192] [sys_write+150/240] [system_call+51/56]
kernel:
kernel: Code: 0f 0b 3f 02 e5 57 21 c0 83 3a 00 75 05 89 0a 89 49 24 8b
02
kernel: <0>Assertion failure in journal_start() at transaction.c:251:
"handle->h_transaction->t_journal == journal"
kernel: kernel BUG at transaction.c:251!
kernel: invalid operand: 0000
kernel: CPU: 0
kernel: EIP: 0010:[journal_start+74/192] Not tainted
kernel: EFLAGS: 00010282
kernel: eax: 0000006c ebx: d3ea6600 ecx: df37e000 edx: 00000001
kernel: esi: dfe49800 edi: c6194000 ebp: 00000040 esp: c6195c10
kernel: ds: 0018 es: 0018 ss: 0018
kernel: Process cp (pid: 3627, stackpage=c6195000)
kernel: Stack: c021b240 c021b46c c021b220 000000fb c021b440 d3ea6600
dfae0c00 c419d6c0
kernel: c015ea38 dfe49800 00000002 c419d6c0 dfae0c00 00000001
c014642e c419d6c0
kernel: c419d6c0 c419d72c d163cec0 c0128c93 c419d6c0 00000001
c419d6c0 c419d72c
kernel: Call Trace: [ext3_dirty_inode+88/256] [__mark_inode_dirty
+46/144] [do_generic_file_write+211/976] [ide_start_request+382/432]
[generic_file_write+259/288]
kernel: [ext3_file_write+35/192] [do_acct_process+571/592]
[acct_process+25/39] [do_exit+105/592] [do_invalid_op+0/160] [die
+86/96]
kernel: [do_invalid_op+140/160] [__insert_into_lru_list+28/112]
[ext3_get_block_handle+426/640] [ext3_get_block_handle+242/640]
[error_code+52/60] [__insert_into_lru_list+28/112]
kernel: [__refile_buffer+86/96] [refile_buffer+10/16]
[__block_commit_write+131/208] [generic_commit_write+52/96]
[ext3_commit_write+305/448] [do_generic_file_write+654/976]
kernel: [generic_file_write+259/288] [ext3_file_write+35/192]
[sys_write+150/240] [system_call+51/56]
kernel:
kernel: Code: 0f 0b fb 00 20 b2 21 c0 83 c4 14 ff 43 08 89 d8 eb 59 8d
74

Best regards,

Andreas