2002-10-28 10:17:48

by Hugo Mills

[permalink] [raw]
Subject: Oops in kswapd, 2.4.19 kernel and before

Hi,

This is the third time I've tried to report this problem, with no
response so far. One last try. If you're not interested, please tell
me and I won't bother you any more...

I'm getting regular oopsen in kswapd on my 2.4.19 kernel. They
generally appear to happen while running Amanda (a tape backup
utility) -- although I've not identified exactly which component of
Amanda triggers it. The machine is lightly stressed with regard to
memory usage, although I suspect much of it is (currently) swapped out
(I'm running postgres and apache, but they don't get much use at the
moment):

hrm@vlad:hrm $ free
total used free shared buffers cached
Mem: 127240 125264 1976 0 2576 35020
-/+ buffers/cache: 87668 39572
Swap: 262132 53240 208892

After the oops, my kswapd is zombied:

hrm@vlad:hrm $ ps ax | grep kswapd
5 ? Z 0:11 [kswapd <defunct>]

although the machine does appear to continue to function without
problems. I have seen precisely similar effects on most of the
previous 2.4.x kernels.

Decoded oopsen are below (they _are_ decoded with the right system
maps, despite ksymoops's concerns). If there's anything else that's
needed in order to track this down, please let me know.

Thanks,
Hugo.

-----

ksymoops 2.4.6 on i586 2.4.19. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.19/ (default)
-m /boot/System.map-2.4.19 (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

Oct 24 06:31:14 vlad kernel: c014248a
Oct 24 06:31:14 vlad kernel: Oops: 0000
Oct 24 06:31:14 vlad kernel: CPU: 0
Oct 24 06:31:14 vlad kernel: EIP: 0010:[iput+46/432] Not tainted
Oct 24 06:31:14 vlad kernel: EFLAGS: 00010206
Oct 24 06:31:14 vlad kernel: eax: 00000000 ebx: c67d8800 ecx: c67d8810 edx: c67d8810
Oct 24 06:31:14 vlad kernel: esi: 476f7200 edi: 00000000 ebp: c7f9ff3c esp: c7f9ff30
Oct 24 06:31:14 vlad kernel: ds: 0018 es: 0018 ss: 0018
Oct 24 06:31:14 vlad kernel: Process kswapd (pid: 5, stackpage=c7f9f000)
Oct 24 06:31:14 vlad kernel: Stack: c088ef78 c088ef60 c67d8800 c7f9ff54 c01405e6 c67d8800 00000011 000001d0
Oct 24 06:31:14 vlad kernel: 00000011 c7f9ff60 c01408bc 0000172d c7f9ff84 c012b0b1 00000002 000001d0
Oct 24 06:31:14 vlad kernel: 00000002 000001d0 c0287d74 00000002 c0287d74 c7f9ff9c c012b101 00000011
Oct 24 06:31:14 vlad kernel: Call Trace: [prune_dcache+198/316] [shrink_dcache_memory+28/52] [shrink_caches+105/132] [try_to_free_pages+53/88] [kswapd_balance_pgdat+76/160]
Oct 24 06:31:14 vlad kernel: Code: 8b 46 20 85 c0 74 02 89 c7 85 ff 74 0d 8b 47 10 85 c0 74 06
Using defaults from ksymoops -t elf32-i386 -a i386


>>ebx; c67d8800 <_end+64d5fc8/852f828>
>>ecx; c67d8810 <_end+64d5fd8/852f828>
>>edx; c67d8810 <_end+64d5fd8/852f828>
>>ebp; c7f9ff3c <_end+7c9d704/852f828>
>>esp; c7f9ff30 <_end+7c9d6f8/852f828>

Code; 00000000 Before first symbol
00000000 <_EIP>:
Code; 00000000 Before first symbol
0: 8b 46 20 mov 0x20(%esi),%eax
Code; 00000003 Before first symbol
3: 85 c0 test %eax,%eax
Code; 00000005 Before first symbol
5: 74 02 je 9 <_EIP+0x9>
Code; 00000007 Before first symbol
7: 89 c7 mov %eax,%edi
Code; 00000009 Before first symbol
9: 85 ff test %edi,%edi
Code; 0000000b Before first symbol
b: 74 0d je 1a <_EIP+0x1a>
Code; 0000000d Before first symbol
d: 8b 47 10 mov 0x10(%edi),%eax
Code; 00000010 Before first symbol
10: 85 c0 test %eax,%eax
Code; 00000012 Before first symbol
12: 74 06 je 1a <_EIP+0x1a>


1 warning issued. Results may not be reliable.

-----

ksymoops 2.4.6 on i586 2.4.19. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.19/ (default)
-m /boot/System.map-2.4.19 (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

Oct 27 01:46:00 vlad kernel: Unable to handle kernel paging request at virtual address 47804220
Oct 27 01:46:00 vlad kernel: c014248a
Oct 27 01:46:00 vlad kernel: *pde = 00000000
Oct 27 01:46:00 vlad kernel: Oops: 0000
Oct 27 01:46:00 vlad kernel: CPU: 0
Oct 27 01:46:00 vlad kernel: EIP: 0010:[iput+46/432] Not tainted
Oct 27 01:46:00 vlad kernel: EFLAGS: 00010206
Oct 27 01:46:00 vlad kernel: eax: 00000000 ebx: c67c8800 ecx: c67c8810 edx: c67c8810
Oct 27 01:46:00 vlad kernel: esi: 47804200 edi: 00000000 ebp: c7f9ff3c esp: c7f9ff30
Oct 27 01:46:00 vlad kernel: ds: 0018 es: 0018 ss: 0018
Oct 27 01:46:00 vlad kernel: Process kswapd (pid: 5, stackpage=c7f9f000)
Oct 27 01:46:00 vlad kernel: Stack: c6a93d38 c6a93d20 c67c8800 c7f9ff54 c01405e6c67c8800 00000005 000001d0
Oct 27 01:46:00 vlad kernel: 00000020 c7f9ff60 c01408bc 000009e1 c7f9ff84c012b0b1 00000006 000001d0
Oct 27 01:46:00 vlad kernel: 00000006 000001d0 c0287d74 00000006 c0287d74c7f9ff9c c012b101 00000020
Oct 27 01:46:00 vlad kernel: Call Trace: [prune_dcache+198/316] [shrink_dcache_memory+28/52] [shrink_caches+105/132] [try_to_free_pages+53/88] [kswapd_balance_pgdat+76/160]
Oct 27 01:46:00 vlad kernel: Code: 8b 46 20 85 c0 74 02 89 c7 85 ff 74 0d 8b 47 10 85 c0 74 06
Using defaults from ksymoops -t elf32-i386 -a i386


>>ebx; c67c8800 <_end+64c5fc8/852f828>
>>ecx; c67c8810 <_end+64c5fd8/852f828>
>>edx; c67c8810 <_end+64c5fd8/852f828>
>>ebp; c7f9ff3c <_end+7c9d704/852f828>
>>esp; c7f9ff30 <_end+7c9d6f8/852f828>

Code; 00000000 Before first symbol
00000000 <_EIP>:
Code; 00000000 Before first symbol
0: 8b 46 20 mov 0x20(%esi),%eax
Code; 00000003 Before first symbol
3: 85 c0 test %eax,%eax
Code; 00000005 Before first symbol
5: 74 02 je 9 <_EIP+0x9>
Code; 00000007 Before first symbol
7: 89 c7 mov %eax,%edi
Code; 00000009 Before first symbol
9: 85 ff test %edi,%edi
Code; 0000000b Before first symbol
b: 74 0d je 1a <_EIP+0x1a>
Code; 0000000d Before first symbol
d: 8b 47 10 mov 0x10(%edi),%eax
Code; 00000010 Before first symbol
10: 85 c0 test %eax,%eax
Code; 00000012 Before first symbol
12: 74 06 je 1a <_EIP+0x1a>


1 warning issued. Results may not be reliable.

-----

ksymoops 2.4.6 on i586 2.4.19. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.19/ (default)
-m /boot/System.map-2.4.19 (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

Oct 28 08:04:49 vlad kernel: Unable to handle kernel paging request at virtual address 47880220
Oct 28 08:04:49 vlad kernel: c0141c6a
Oct 28 08:04:49 vlad kernel: *pde = 00000000
Oct 28 08:04:49 vlad kernel: Oops: 0000
Oct 28 08:04:49 vlad kernel: CPU: 0
Oct 28 08:04:49 vlad kernel: EIP: 0010:[clear_inode+86/168] Not tainted
Oct 28 08:04:49 vlad kernel: EFLAGS: 00010206
Oct 28 08:04:49 vlad kernel: eax: 47880200 ebx: c67c8800 ecx: c67c8808 edx: c67c8818
Oct 28 08:04:49 vlad kernel: esi: c7f9ff44 edi: c73e8a28 ebp: c7f9ff14 esp: c7f9ff10
Oct 28 08:04:49 vlad kernel: ds: 0018 es: 0018 ss: 0018
Oct 28 08:04:49 vlad kernel: Process kswapd (pid: 5, stackpage=c7f9f000)
Oct 28 08:04:49 vlad kernel: Stack: c67c8800 c7f9ff28 c0141cff c67c8800 c5829648 c5829640 c7f9ff4c c0141f24
Oct 28 08:04:49 vlad kernel: c7f9ff44 0000000c 000001d0 00000020 000005df c02d3428 c36d5de8 c7f9ff58
Oct 28 08:04:49 vlad kernel: c0141f5c 00000000 c7f9ff84 c012b0bb 00000006 000001d0 00000006 000001d0
Oct 28 08:04:49 vlad kernel: Call Trace: [dispose_list+67/96] [prune_icache+164/192] [shrink_icache_memory+28/52] [shrink_caches+115/132] [try_to_free_pages+53/88]
Oct 28 08:04:49 vlad kernel: Code: 8b 40 20 85 c0 74 0f 8b 40 30 85 c0 74 08 53 ff d0 83 c4 04
Using defaults from ksymoops -t elf32-i386 -a i386


>>ebx; c67c8800 <_end+64c5fc8/852f828>
>>ecx; c67c8808 <_end+64c5fd0/852f828>
>>edx; c67c8818 <_end+64c5fe0/852f828>
>>esi; c7f9ff44 <_end+7c9d70c/852f828>
>>edi; c73e8a28 <_end+70e61f0/852f828>
>>ebp; c7f9ff14 <_end+7c9d6dc/852f828>
>>esp; c7f9ff10 <_end+7c9d6d8/852f828>

Code; 00000000 Before first symbol
00000000 <_EIP>:
Code; 00000000 Before first symbol
0: 8b 40 20 mov 0x20(%eax),%eax
Code; 00000003 Before first symbol
3: 85 c0 test %eax,%eax
Code; 00000005 Before first symbol
5: 74 0f je 16 <_EIP+0x16>
Code; 00000007 Before first symbol
7: 8b 40 30 mov 0x30(%eax),%eax
Code; 0000000a Before first symbol
a: 85 c0 test %eax,%eax
Code; 0000000c Before first symbol
c: 74 08 je 16 <_EIP+0x16>
Code; 0000000e Before first symbol
e: 53 push %ebx
Code; 0000000f Before first symbol
f: ff d0 call *%eax
Code; 00000011 Before first symbol
11: 83 c4 04 add $0x4,%esp


1 warning issued. Results may not be reliable.

--
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
PGP: 1024D/1C335860 from wwwkeys.eu.pgp.net or http://www.carfax.nildram.co.uk
--- Anyone who claims their cryptographic protocol is secure is ---
either a genius or a fool. Given the genius/fool ratio
for our species, the odds aren't good.


Attachments:
(No filename) (10.62 kB)
(No filename) (189.00 B)
Download all attachments

2002-10-28 10:32:49

by Morten Helgesen

[permalink] [raw]
Subject: Re: Oops in kswapd, 2.4.19 kernel and before

Hey Hugo,

On Mon, Oct 28, 2002 at 10:24:39AM +0000, Hugo Mills wrote:
> Hi,
>
> This is the third time I've tried to report this problem, with no
> response so far. One last try. If you're not interested, please tell
> me and I won't bother you any more...
>
> I'm getting regular oopsen in kswapd on my 2.4.19 kernel. They
> generally appear to happen while running Amanda (a tape backup
> utility) -- although I've not identified exactly which component of
> Amanda triggers it. The machine is lightly stressed with regard to
> memory usage, although I suspect much of it is (currently) swapped out
> (I'm running postgres and apache, but they don't get much use at the
> moment):

[snip]

I think this is the same issue I reported here :
http://marc.theaimsgroup.com/?l=linux-kernel&m=103226236223247&w=2

Upgrading to 2.4.20-pre7 has solved the problem for me ... Haven`t
had time to look into what actually caused/fixed it.

== Morten

--

"Livet er ikke for nybegynnere" - sitat fra en klok person.

mvh
Morten Helgesen
UNIX System Administrator & C Developer
Nextframe AS
[email protected] / 93445641
http://www.nextframe.net

2002-10-28 12:22:42

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Oops in kswapd, 2.4.19 kernel and before

On Mon, Oct 28, 2002 at 10:24:39AM +0000, Hugo Mills wrote:
> Hi,
>
> This is the third time I've tried to report this problem, with no
> response so far. One last try. If you're not interested, please tell
> me and I won't bother you any more...
>
> I'm getting regular oopsen in kswapd on my 2.4.19 kernel. They
> generally appear to happen while running Amanda (a tape backup

if it only happens while or after running Amanda, it may be a tape
driver bug.

> Decoded oopsen are below (they _are_ decoded with the right system
> maps, despite ksymoops's concerns). If there's anything else that's
> needed in order to track this down, please let me know.

the oopses shows some inode was corrupted, it doesn't tell us who is
corrupting them but most likely it is not a piece of common code (a driver
or a non mainstream feature or we should be able to reproduce it) You
should try to localize the bug to a piece of code, by for example making
100% sure that it triggers as soon as you start amanda. Then you can try
to backup using another device (not tape) and see if you can still
reproduce. finally you can try to use older or newer 2.4 drivers for the
tape and see if there's any change that fixes the problem in the old/new
drivers. Of course it isn't certain at all that it is the tape, I'm just
guessing because you said it happens while backing up to the tape.

Andrea

2002-10-28 16:38:25

by Hugo Mills

[permalink] [raw]
Subject: Re: Oops in kswapd, 2.4.19 kernel and before

On Mon, Oct 28, 2002 at 01:29:01PM +0100, Andrea Arcangeli wrote:
> On Mon, Oct 28, 2002 at 10:24:39AM +0000, Hugo Mills wrote:
> > I'm getting regular oopsen in kswapd on my 2.4.19 kernel. They
> > generally appear to happen while running Amanda (a tape backup
>
> if it only happens while or after running Amanda, it may be a tape
> driver bug.

I may have seen it (once?) before without touching the tape drive,
although I'm not certain. I shall see if I can reproduce without use
of the tape.

> > Decoded oopsen are below (they _are_ decoded with the right system
> > maps, despite ksymoops's concerns). If there's anything else that's
> > needed in order to track this down, please let me know.
>
> the oopses shows some inode was corrupted, it doesn't tell us who is
> corrupting them but most likely it is not a piece of common code (a driver
> or a non mainstream feature or we should be able to reproduce it) You
> should try to localize the bug to a piece of code, by for example making
> 100% sure that it triggers as soon as you start amanda.

It's not certain. I appear to have triggered it this morning on the
_third_ consecutive run of amflush. Again, I'll test more carefully.

> Then you can try to backup using another device (not tape) and see
> if you can still reproduce. finally you can try to use older or
> newer 2.4 drivers for the tape and see if there's any change that
> fixes the problem in the old/new drivers. Of course it isn't certain
> at all that it is the tape, I'm just guessing because you said it
> happens while backing up to the tape.

I've definitely seen the problem throughout the 2.4 series. I don't
recall what the first 2.4 kernel I used was, but it was definitely
there in all mainstream kernels (and those -ac kernels I tried) from
about 2.4.14 onwards. I'll try 2.4.20-preX and report on that as well.

Thanks for your help. It may be a week or two before I can get all
these tests completed, but I shall definitely report back when I'm
done.

Hugo.

--
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
PGP: 1024D/1C335860 from wwwkeys.eu.pgp.net or http://www.carfax.nildram.co.uk
--- Anyone who claims their cryptographic protocol is secure is ---
either a genius or a fool. Given the genius/fool ratio
for our species, the odds aren't good.


Attachments:
(No filename) (2.34 kB)
(No filename) (189.00 B)
Download all attachments

2002-10-28 19:04:42

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Oops in kswapd, 2.4.19 kernel and before

On Mon, Oct 28, 2002 at 04:45:40PM +0000, Hugo Mills wrote:
> On Mon, Oct 28, 2002 at 01:29:01PM +0100, Andrea Arcangeli wrote:
> > On Mon, Oct 28, 2002 at 10:24:39AM +0000, Hugo Mills wrote:
> > > I'm getting regular oopsen in kswapd on my 2.4.19 kernel. They
> > > generally appear to happen while running Amanda (a tape backup
> >
> > if it only happens while or after running Amanda, it may be a tape
> > driver bug.
>
> I may have seen it (once?) before without touching the tape drive,
> although I'm not certain. I shall see if I can reproduce without use
> of the tape.

perfect, thanks.

>
> > > Decoded oopsen are below (they _are_ decoded with the right system
> > > maps, despite ksymoops's concerns). If there's anything else that's
> > > needed in order to track this down, please let me know.
> >
> > the oopses shows some inode was corrupted, it doesn't tell us who is
> > corrupting them but most likely it is not a piece of common code (a driver
> > or a non mainstream feature or we should be able to reproduce it) You
> > should try to localize the bug to a piece of code, by for example making
> > 100% sure that it triggers as soon as you start amanda.
>
> It's not certain. I appear to have triggered it this morning on the
> _third_ consecutive run of amflush. Again, I'll test more carefully.

You may want to start a very intensive kernel stress test right after
doing the backup. If it corrupts memory, you won't notice until you
actually use the corrupted memory. Other times it may corrupt user or
free memory and in such cases you won't get an oops.

> > Then you can try to backup using another device (not tape) and see
> > if you can still reproduce. finally you can try to use older or
> > newer 2.4 drivers for the tape and see if there's any change that
> > fixes the problem in the old/new drivers. Of course it isn't certain
> > at all that it is the tape, I'm just guessing because you said it
> > happens while backing up to the tape.
>
> I've definitely seen the problem throughout the 2.4 series. I don't
> recall what the first 2.4 kernel I used was, but it was definitely
> there in all mainstream kernels (and those -ac kernels I tried) from
> about 2.4.14 onwards. I'll try 2.4.20-preX and report on that as well.
>
> Thanks for your help. It may be a week or two before I can get all
> these tests completed, but I shall definitely report back when I'm
> done.

Ok.

Andrea