2001-07-05 14:41:47

by Henry

[permalink] [raw]
Subject: OOPS (kswapd) in 2.4.5 and 2.4.6

Hello

Presumably this has already been mentioned, but since it seems like an ongoing
thing (I've seen a similar topic discussed at
http://kt.zork.net/kernel-traffic/) I thought it wouldn't hurt to provide
more info.

We've noticed the following kernel error since 2.4 (2.4.1-2.4.6). It appears to
be swap (kswapd thread specific?) related. The same error is reported on
several SMP machines after only a short period (an hour or less). The problem
has only started since we upgraded to 2.4.

Here's two ksymoops outputs from different machines (on 2.4.5 the first server
would eventually fail with memory errors (sorry, don't have the specific
error, but it involved 'semget' and (eg) apache would refuse to launch) and
require a reboot; the second server would not require a reboot). With 2.4.6
the error still appears, but the servers *seem* more stable (ie, not requiring
a reboot).

Please advise if more detail is required or if anything else will help.

regards
Henry

Hardware:
-------
Our test servers are:
Dual-cpu pentium 233 (intel) with 128MB RAM and more than double that swap.

Software:
-------
Kernel: 2.4.6
gcc: egcs-2.91.66 19990314/Linux (egcs-1.1.2 release) and gcc version 2.95.2
19991024 (as I type this I realise the diff with compilers - surely that's not
the cause though, since compiling 2.2 with both was not a problem)

Distribution: slackware 7 with the latest e2fsprogs/modutils/util-linux.

Server1: ------
cpu: 0, clocks: 668166, slice: 222722
cpu: 1, clocks: 668166, slice: 222722
Unable to handle kernel NULL pointer dereference at virtual address 00000008
c01b4227
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c01b4227>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010207
eax: 00000001 ebx: 00000000 ecx: 000000c0 edx: c12c49c0
esi: c12d3f4c edi: 00000001 ebp: c0d0f2a0 esp: c12d3ee0
ds: 0018 es: 0018 ss: 0018
Process kswapd (pid: 3, stackpage=c12d3000)
Stack: 00000000 c12d3f4c c12d3f4c c01330cb 00000001 00000000 001c4300 c1203048
00000000 00000028 c0129752 00000001 c1203048 00000305 c12d3f48 00001000
001c4300 c1203048 00000000 00000028 c12d3f48 00000000 00001000 00001c43
Call Trace: [<c01330cb>] [<c0129752>] [<c0106cec>] [<c012981f>] [<c012a4e8>] [<c
0128b1d>] [<c01293f5>]
[<c0129486>] [<c01054cc>]
Code: 0f b7 43 08 66 c1 e8 09 0f b7 f0 8b 43 18 a8 04 75 19 68 a7

>>EIP; c01b4227 <submit_bh+b/74> <=====
Trace; c01330cb <brw_page+8f/a0>
Trace; c0129752 <rw_swap_page_base+152/1b0>
Trace; c0106cec <ret_from_intr+0/7>
Trace; c012981f <rw_swap_page+6f/b8>
Trace; c012a4e8 <swap_writepage+78/80>
Trace; c0128b1d <page_launder+285/874>
Trace; c01293f5 <do_try_to_free_pages+1d/58>
Trace; c0129486 <kswapd+56/e8>
Trace; c01054cc <kernel_thread+28/38>
Code; c01b4227 <submit_bh+b/74>
00000000 <_EIP>:
Code; c01b4227 <submit_bh+b/74> <=====
0: 0f b7 43 08 movzwl 0x8(%ebx),%eax <=====
Code; c01b422b <submit_bh+f/74>
4: 66 c1 e8 09 shrw $0x9,%ax
Code; c01b422f <submit_bh+13/74>
8: 0f b7 f0 movzwl %ax,%esi
Code; c01b4232 <submit_bh+16/74>
b: 8b 43 18 movl 0x18(%ebx),%eax
Code; c01b4235 <submit_bh+19/74>
e: a8 04 testb $0x4,%al
Code; c01b4237 <submit_bh+1b/74>
10: 75 19 jne 2b <_EIP+0x2b> c01b4252 <submit_bh+36/74>
Code; c01b4239 <submit_bh+1d/74>
12: 68 a7 00 00 00 pushl $0xa7


Server2:
------
cpu: 0, clocks: 668219, slice: 222739
cpu: 1, clocks: 668219, slice: 222739
Unable to handle kernel NULL pointer dereference at virtual address 00000008
c01bd75b
*pde = 00000000
Oops: 0000
CPU: 1
EIP: 0010:[<c01bd75b>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010207
eax: 00000000 ebx: 00000000 ecx: 00000001 edx: 00000000
esi: 00000002 edi: 00000001 ebp: c12d5f4c esp: c12d5ee4
ds: 0018 es: 0018 ss: 0018
Process kswapd (pid: 3, stackpage=c12d5000)
Stack: 00000000 00000002 c0948d40 c01345bf 00000001 00000000 0020b000 c1192208
00000008 00001000 c012a7c7 00000001 c1192208 00000302 c12d5f48 00001000
0020b000 c1192208 00000008 00000030 c12d5f48 00000000 000020b0 030200c0
Call Trace: [<c01345bf>] [<c012a7c7>] [<c012a89c>] [<c012b56c>] [<c0129a44>] [<c
012a445>] [<c012a4d6>]
[<c0105000>] [<c010550b>]
Code: 0f b7 43 08 66 c1 e8 09 0f b7 f0 8b 43 18 a8 04 75 1b 68 a7

>>EIP; c01bd75b <submit_bh+b/78> <=====
Trace; c01345bf <brw_page+93/a4>
Trace; c012a7c7 <rw_swap_page_base+14f/1b0>
Trace; c012a89c <rw_swap_page+74/bc>
Trace; c012b56c <swap_writepage+78/80>
Trace; c0129a44 <page_launder+294/918>
Trace; c012a445 <do_try_to_free_pages+1d/58>
Trace; c012a4d6 <kswapd+56/e4>
Trace; c0105000 <_stext+0/0>
Trace; c010550b <kernel_thread+23/30>
Code; c01bd75b <submit_bh+b/78>
00000000 <_EIP>:
Code; c01bd75b <submit_bh+b/78> <=====
0: 0f b7 43 08 movzwl 0x8(%ebx),%eax <=====
Code; c01bd75f <submit_bh+f/78>
4: 66 c1 e8 09 shrw $0x9,%ax
Code; c01bd763 <submit_bh+13/78>
8: 0f b7 f0 movzwl %ax,%esi
Code; c01bd766 <submit_bh+16/78>
b: 8b 43 18 movl 0x18(%ebx),%eax
Code; c01bd769 <submit_bh+19/78>
e: a8 04 testb $0x4,%al
Code; c01bd76b <submit_bh+1b/78>
10: 75 1b jne 2d <_EIP+0x2d> c01bd788 <submit_bh+38/78>
Code; c01bd76d <submit_bh+1d/78>
12: 68 a7 00 00 00 pushl $0xa7


2001-07-05 16:55:34

by Wayne Whitney

[permalink] [raw]
Subject: Re: OOPS (kswapd) in 2.4.5 and 2.4.6

In mailing-lists.linux-kernel, you wrote:

> We've noticed the following kernel error since 2.4 (2.4.1-2.4.6).
> It appears to be swap (kswapd thread specific?) related. The same
> error is reported on several SMP machines after only a short period
> (an hour or less).

FYI, I see a similar problem under 2.4.5, also SMP, although only
intermittently. Two oopses are below, from two different, although
similarly configured, machines.

A bit of warning: these kernels were patched with MOSIX-1.04
(http://www.mosix.org), but I don't believe that it touches the relevant
parts of the kernel. Moreover, the MOSIX developers suggested these
oopses were not MOSIX-related, and they are usually pretty good about
that.

Cheers,
Wayne


ksymoops 2.3.4 on i686 2.4.5-mosix-1.0.4. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.5-mosix-1.0.4/ (default)
-m /boot/System.map (specified)

Warning (compare_maps): ksyms_base symbol __VERSIONED_SYMBOL(shmem_file_setup) not found in System.map. Ignoring ksyms_base entry
Jun 15 01:18:17 mf3 kernel: invalid operand: 0000
Jun 15 01:18:17 mf3 kernel: CPU: 0
Jun 15 01:18:17 mf3 kernel: EIP: 0010:[clear_inode+51/260]
Jun 15 01:18:17 mf3 kernel: EIP: 0010:[<c015bca7>]
Using defaults from ksymoops -t elf32-i386 -a i386
Jun 15 01:18:17 mf3 kernel: EFLAGS: 00010292
Jun 15 01:18:17 mf3 kernel: eax: 0000001b ebx: c42cdd60 ecx: 00000001 edx: 00000001
Jun 15 01:18:17 mf3 kernel: esi: e0937860 edi: c42cdd60 ebp: 00001e7c esp: dff67f68
Jun 15 01:18:17 mf3 kernel: ds: 0018 es: 0018 ss: 0018
Jun 15 01:18:17 mf3 kernel: Process kswapd (pid: 4, stackpage=dff67000)
Jun 15 01:18:17 mf3 kernel: Stack: c023fea4 c023ff03 000001eb c42cdd60 c015c842 c42cdd60 c42f2940 c42f2920
Jun 15 01:18:17 mf3 kernel: e0929ffd c42cdd60 c42f2940 c015a040 c42f2920 c42cdd60 00000482 00000004
Jun 15 01:18:17 mf3 kernel: 00000000 0008e000 c015a399 00002bc0 c013b2b7 00000006 00000004 00000004
Jun 15 01:18:17 mf3 kernel: Call Trace: [iput+342/360] [ide-scsi:__insmod_ide-scsi_S.bss_L96+377149/32977459] [prune_dcache+220/368] [shrink_dcache_memory+33/48] [do_try_to_free_pages+39/88] [kswapd+87/228] [prepare_namespace+0/8]
Jun 15 01:18:17 mf3 kernel: Call Trace: [<c015c842>] [<e0929ffd>] [<c015a040>] [<c015a399>] [<c013b2b7>] [<c013b33f>] [<c0105000>]
Jun 15 01:18:17 mf3 kernel: [<c010550b>]
Jun 15 01:18:17 mf3 kernel: Code: 0f 0b 83 c4 0c 8d 74 26 00 8b 83 f4 00 00 00 a8 10 75 26 68

>>EIP; c015bca7 <clear_inode+33/104> <=====
Trace; c015c842 <iput+156/168>
Trace; e0929ffd <[nfs]nfs_dentry_iput+75/7c>
Trace; c015a040 <prune_dcache+dc/170>
Trace; c015a399 <shrink_dcache_memory+21/30>
Trace; c013b2b7 <do_try_to_free_pages+27/58>
Trace; c013b33f <kswapd+57/e4>
Trace; c0105000 <prepare_namespace+0/8>
Trace; c010550b <kernel_thread+23/30>
Code; c015bca7 <clear_inode+33/104>
00000000 <_EIP>:
Code; c015bca7 <clear_inode+33/104> <=====
0: 0f 0b ud2a <=====
Code; c015bca9 <clear_inode+35/104>
2: 83 c4 0c add $0xc,%esp
Code; c015bcac <clear_inode+38/104>
5: 8d 74 26 00 lea 0x0(%esi,1),%esi
Code; c015bcb0 <clear_inode+3c/104>
9: 8b 83 f4 00 00 00 mov 0xf4(%ebx),%eax
Code; c015bcb6 <clear_inode+42/104>
f: a8 10 test $0x10,%al
Code; c015bcb8 <clear_inode+44/104>
11: 75 26 jne 39 <_EIP+0x39> c015bce0 <clear_inode+6c/104>
Code; c015bcba <clear_inode+46/104>
13: 68 00 00 00 00 push $0x0

1 warning issued. Results may not be reliable.


ksymoops 2.3.4 on i686 2.4.5-mosix-1.0.4b. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.5-mosix-1.0.4b/ (default)
-m /boot/System.map (specified)

Warning (compare_maps): ksyms_base symbol __VERSIONED_SYMBOL(shmem_file_setup) not found in System.map. Ignoring ksyms_base entry
Jun 27 09:22:58 mf2 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000814
Jun 27 09:22:58 mf2 kernel: c015a045
Jun 27 09:22:58 mf2 kernel: *pde = 00000000
Jun 27 09:22:58 mf2 kernel: Oops: 0000
Jun 27 09:22:58 mf2 kernel: CPU: 0
Jun 27 09:22:58 mf2 kernel: EIP: 0010:[prune_dcache+209/368]
Jun 27 09:22:58 mf2 kernel: EIP: 0010:[<c015a045>]
Using defaults from ksymoops -t elf32-i386 -a i386
Jun 27 09:22:58 mf2 kernel: EFLAGS: 00010206
Jun 27 09:22:58 mf2 kernel: eax: 00000800 ebx: e959a8e0 ecx: e9596df0 edx: e9596df0
Jun 27 09:22:58 mf2 kernel: esi: e959a8c0 edi: e9596de0 ebp: 0000101b esp: c2225fa0
Jun 27 09:22:58 mf2 kernel: ds: 0018 es: 0018 ss: 0018
Jun 27 09:22:58 mf2 kernel: Process kswapd (pid: 4, stackpage=c2225000)
Jun 27 09:22:58 mf2 kernel: Stack: 000000c8 00000004 00000000 0008e000 c015a3a9 00002926 c013b2b7 00000006
Jun 27 09:22:58 mf2 kernel: 00000004 00000004 00000000 c2224000 c023bb31 c2224331 c013b33f 00000004
Jun 27 09:22:58 mf2 kernel: 00000000 00010f00 c2213fb4 c0105000 c010550b 00000000 c02bb080 c2212000
Jun 27 09:22:58 mf2 kernel: Call Trace: [shrink_dcache_memory+33/48] [do_try_to_free_pages+39/88] [kswapd+87/228] [prepare_namespace+0/8] [kernel_thread+35/48]
Jun 27 09:22:58 mf2 kernel: Call Trace: [<c015a3a9>] [<c013b2b7>] [<c013b33f>] [<c0105000>] [<c010550b>]
Jun 27 09:22:58 mf2 kernel: Code: 8b 40 14 85 c0 74 09 57 56 ff d0 83 c4 08 eb 12 57 e8 a1 26

>>EIP; c015a045 <prune_dcache+d1/170> <=====
Trace; c015a3a9 <shrink_dcache_memory+21/30>
Trace; c013b2b7 <do_try_to_free_pages+27/58>
Trace; c013b33f <kswapd+57/e4>
Trace; c0105000 <prepare_namespace+0/8>
Trace; c010550b <kernel_thread+23/30>
Code; c015a045 <prune_dcache+d1/170>
00000000 <_EIP>:
Code; c015a045 <prune_dcache+d1/170> <=====
0: 8b 40 14 mov 0x14(%eax),%eax <=====
Code; c015a048 <prune_dcache+d4/170>
3: 85 c0 test %eax,%eax
Code; c015a04a <prune_dcache+d6/170>
5: 74 09 je 10 <_EIP+0x10> c015a055 <prune_dcache+e1/170>
Code; c015a04c <prune_dcache+d8/170>
7: 57 push %edi
Code; c015a04d <prune_dcache+d9/170>
8: 56 push %esi
Code; c015a04e <prune_dcache+da/170>
9: ff d0 call *%eax
Code; c015a050 <prune_dcache+dc/170>
b: 83 c4 08 add $0x8,%esp
Code; c015a053 <prune_dcache+df/170>
e: eb 12 jmp 22 <_EIP+0x22> c015a067 <prune_dcache+f3/170>
Code; c015a055 <prune_dcache+e1/170>
10: 57 push %edi
Code; c015a056 <prune_dcache+e2/170>
11: e8 a1 26 00 00 call 26b7 <_EIP+0x26b7> c015c6fc <iput+0/168>

1 warning issued. Results may not be reliable.

2001-07-05 17:21:16

by Henry

[permalink] [raw]
Subject: Re: OOPS (kswapd) in 2.4.5 and 2.4.6

On Thu, 05 Jul 2001, Wayne Whitney wrote:
> In mailing-lists.linux-kernel, you wrote:
>
> > We've noticed the following kernel error since 2.4 (2.4.1-2.4.6).
> > It appears to be swap (kswapd thread specific?) related. The same
> > error is reported on several SMP machines after only a short period
> > (an hour or less).
>
> FYI, I see a similar problem under 2.4.5, also SMP, although only
> intermittently. Two oopses are below, from two different, although
> similarly configured, machines.

[snip]

Sounds very similar. Our servers are all identical (except for RAM).

What's unusual is that the machines we *expect* to fail sooner - don't
(not even an oops). Those are very busy cache servers (several of them
in a sibling cluster) which do a lot of swapping. The machines which
*do* fail (or oops without any further catastrophe) are typically
web/mail hosting servers (reasonably busy with about 25% swap being
used). Increasing swap did not help on 2.4.5. We're still waiting for
something to happen on 2.4.6 (ie, oops already appeared - waiting for
meltdown, which, hopefully, will not occur). We used to auto-reboot
every morning at 2am or something to keep things stable - which I
*hate* because I remember having a 2.0.35/6 workstation that had an
uptime of 6 months a couple of years ago. God, I loved that box.

cheers
Henry

2001-07-06 06:10:09

by Henry

[permalink] [raw]
Subject: Re: OOPS (kswapd) in 2.4.5 and 2.4.6

> >
> > FYI, I see a similar problem under 2.4.5, also SMP, although only
> > intermittently. Two oopses are below, from two different, although
> > similarly configured, machines.
>
> [snip]
>
> Sounds very similar. Our servers are all identical (except for RAM).
>
> What's unusual is that the machines we *expect* to fail sooner - don't
> (not even an oops). Those are very busy cache servers (several of them
> in a sibling cluster) which do a lot of swapping. The machines which
> *do* fail (or oops without any further catastrophe) are typically
> web/mail hosting servers (reasonably busy with about 25% swap being
> used). Increasing swap did not help on 2.4.5. We're still waiting for
> something to happen on 2.4.6 (ie, oops already appeared - waiting for
> meltdown, which, hopefully, will not occur). We used to auto-reboot
> every morning at 2am or something to keep things stable - which I
> *hate* because I remember having a 2.0.35/6 workstation that had an
> uptime of 6 months a couple of years ago. God, I loved that box.
>

It's happened again. The server which previously failed with memory
errors, has failed again and required a reboot. It was using 26% swap,
and apache would fail to start with 'semget: No space left on device'.
What we also noticed was that the kswapd process showed 'defunct' on
ps... mean anything to anyone?

Regards
Henry

2001-07-06 08:34:15

by Andrew Morton

[permalink] [raw]
Subject: Re: OOPS (kswapd) in 2.4.5 and 2.4.6

Henry wrote:
>
> ...
> Dual-cpu pentium 233 (intel) with 128MB RAM and more than double that swap.
>
> ...
> Unable to handle kernel NULL pointer dereference at virtual address 00000008
> c01b4227
> *pde = 00000000
> Oops: 0000
> CPU: 0
> EIP: 0010:[<c01b4227>]
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00010207
> eax: 00000001 ebx: 00000000 ecx: 000000c0 edx: c12c49c0
> esi: c12d3f4c edi: 00000001 ebp: c0d0f2a0 esp: c12d3ee0
> ds: 0018 es: 0018 ss: 0018
> Process kswapd (pid: 3, stackpage=c12d3000)
> Stack: 00000000 c12d3f4c c12d3f4c c01330cb 00000001 00000000 001c4300 c1203048
> 00000000 00000028 c0129752 00000001 c1203048 00000305 c12d3f48 00001000
> 001c4300 c1203048 00000000 00000028 c12d3f48 00000000 00001000 00001c43
> Call Trace: [<c01330cb>] [<c0129752>] [<c0106cec>] [<c012981f>] [<c012a4e8>] [<c
> 0128b1d>] [<c01293f5>]
> [<c0129486>] [<c01054cc>]
> Code: 0f b7 43 08 66 c1 e8 09 0f b7 f0 8b 43 18 a8 04 75 19 68 a7
>
> >>EIP; c01b4227 <submit_bh+b/74> <=====
> Trace; c01330cb <brw_page+8f/a0>
> Trace; c0129752 <rw_swap_page_base+152/1b0>
> Trace; c0106cec <ret_from_intr+0/7>
> Trace; c012981f <rw_swap_page+6f/b8>
> Trace; c012a4e8 <swap_writepage+78/80>
> Trace; c0128b1d <page_launder+285/874>
> Trace; c01293f5 <do_try_to_free_pages+1d/58>
> Trace; c0129486 <kswapd+56/e8>
> Trace; c01054cc <kernel_thread+28/38>

There does appear to be an SMP race in brw_page() which can cause
this - end_buffer_io_async() unlocks the page, try_to_free_buffers()
zaps the buffer_head ring and brw_page() gets a null pointer. But
gee, it's unlikely unless you have super-fast disks and/or something
which has a super-slow interrupt routine.

Could you please provide a description of your hardware lineup?

And could you please test 2.4.6 with this patch?

--- linux-2.4.6/fs/buffer.c Wed Jul 4 18:21:31 2001
+++ lk-ext3/fs/buffer.c Fri Jul 6 18:25:00 2001
@@ -2181,8 +2181,9 @@ int brw_page(int rw, struct page *page,

/* Stage 2: start the IO */
do {
+ struct buffer_head *next = bh->b_this_page;
submit_bh(rw, bh);
- bh = bh->b_this_page;
+ bh = next;
} while (bh != head);
return 0;
}

-

2001-07-06 10:45:23

by Henry

[permalink] [raw]
Subject: Re: OOPS (kswapd) in 2.4.5 and 2.4.6

>
> There does appear to be an SMP race in brw_page() which can cause
> this - end_buffer_io_async() unlocks the page, try_to_free_buffers()
> zaps the buffer_head ring and brw_page() gets a null pointer. But
> gee, it's unlikely unless you have super-fast disks and/or something
> which has a super-slow interrupt routine.
>
> Could you please provide a description of your hardware lineup?
>
> And could you please test 2.4.6 with this patch?
>
> --- linux-2.4.6/fs/buffer.c Wed Jul 4 18:21:31 2001
> +++ lk-ext3/fs/buffer.c Fri Jul 6 18:25:00 2001
> @@ -2181,8 +2181,9 @@ int brw_page(int rw, struct page *page,
>
> /* Stage 2: start the IO */
> do {
> + struct buffer_head *next = bh->b_this_page;
> submit_bh(rw, bh);
> - bh = bh->b_this_page;
> + bh = next;
> } while (bh != head);
> return 0;
> }

Howzit Andrew,

OK, I'll give the patch a try. I'll only be able to provide feedback
after about 12-24 hours though, depending on when I can reboot.

Hardware is pretty standard stuff:

Dual CPU Pentium II 233Mhz. 128MB RAM, Gigabyte motherboard (circa
1998), 20Gb IDE disks, realtek 8139 100Mb card. Pretty std stuff.
What's weird though is that this oops doesn't occur on our several
*very* busy clustered cache servers (running squid) - only one task
though (ie, squid/diskd/dnsserver), whereas the problem machines run
various apps.

Cheers
Henry

2001-07-07 06:08:55

by Henry

[permalink] [raw]
Subject: Re: OOPS (kswapd) in 2.4.5 and 2.4.6

On Fri, 06 Jul 2001, Andrew Morton wrote:
> Henry wrote:
> >
> > ...
> > Dual-cpu pentium 233 (intel) with 128MB RAM and more than double that swap.
> >
> > ...
> > Unable to handle kernel NULL pointer dereference at virtual address 00000008
> > c01b4227
> > *pde = 00000000
> > Oops: 0000
> > CPU: 0
> > EIP: 0010:[<c01b4227>]
> > Using defaults from ksymoops -t elf32-i386 -a i386
> > EFLAGS: 00010207
> > eax: 00000001 ebx: 00000000 ecx: 000000c0 edx: c12c49c0
> > esi: c12d3f4c edi: 00000001 ebp: c0d0f2a0 esp: c12d3ee0
> > ds: 0018 es: 0018 ss: 0018
> > Process kswapd (pid: 3, stackpage=c12d3000)
> > Stack: 00000000 c12d3f4c c12d3f4c c01330cb 00000001 00000000 001c4300 c1203048
> > 00000000 00000028 c0129752 00000001 c1203048 00000305 c12d3f48 00001000
> > 001c4300 c1203048 00000000 00000028 c12d3f48 00000000 00001000 00001c43
> > Call Trace: [<c01330cb>] [<c0129752>] [<c0106cec>] [<c012981f>] [<c012a4e8>] [<c
> > 0128b1d>] [<c01293f5>]
> > [<c0129486>] [<c01054cc>]
> > Code: 0f b7 43 08 66 c1 e8 09 0f b7 f0 8b 43 18 a8 04 75 19 68 a7
> >
> > >>EIP; c01b4227 <submit_bh+b/74> <=====
> > Trace; c01330cb <brw_page+8f/a0>
> > Trace; c0129752 <rw_swap_page_base+152/1b0>
> > Trace; c0106cec <ret_from_intr+0/7>
> > Trace; c012981f <rw_swap_page+6f/b8>
> > Trace; c012a4e8 <swap_writepage+78/80>
> > Trace; c0128b1d <page_launder+285/874>
> > Trace; c01293f5 <do_try_to_free_pages+1d/58>
> > Trace; c0129486 <kswapd+56/e8>
> > Trace; c01054cc <kernel_thread+28/38>
>
> There does appear to be an SMP race in brw_page() which can cause
> this - end_buffer_io_async() unlocks the page, try_to_free_buffers()
> zaps the buffer_head ring and brw_page() gets a null pointer. But
> gee, it's unlikely unless you have super-fast disks and/or something
> which has a super-slow interrupt routine.
>
> Could you please provide a description of your hardware lineup?
>
> And could you please test 2.4.6 with this patch?
>
> --- linux-2.4.6/fs/buffer.c Wed Jul 4 18:21:31 2001
> +++ lk-ext3/fs/buffer.c Fri Jul 6 18:25:00 2001
> @@ -2181,8 +2181,9 @@ int brw_page(int rw, struct page *page,
>
> /* Stage 2: start the IO */
> do {
> + struct buffer_head *next = bh->b_this_page;
> submit_bh(rw, bh);
> - bh = bh->b_this_page;
> + bh = next;
> } while (bh != head);
> return 0;
> }


Howzit Andrew

So far, so good. There has not been a single oops on the two principle
servers I patched.

uptime1: 8:04am up 18:22, 1 user, load average: 0.09, 0.15, 0.11
uptime2: 8:04am up 18:25, 1 user, load average: 0.15, 0.20, 0.15

Andrew my china, you are the _MAN_! We should know by monday afternoon
(the monday morning/midday crunch should provide some valuable
feedback).

Cheers
Henry

2001-07-07 08:06:23

by Andrew Morton

[permalink] [raw]
Subject: Re: OOPS (kswapd) in 2.4.5 and 2.4.6

Henry wrote:
>
> ...
> So far, so good. There has not been a single oops on the two principle
> servers I patched.
>
> uptime1: 8:04am up 18:22, 1 user, load average: 0.09, 0.15, 0.11
> uptime2: 8:04am up 18:25, 1 user, load average: 0.15, 0.20, 0.15

OK, that looks good.

> Andrew my china, you are the _MAN_!

Not only that - I have great legs!

> We should know by monday afternoon
> (the monday morning/midday crunch should provide some valuable
> feedback).

I wonder why it only affects you. Is the drive which holds
your swap partition running in PIO mode? `hdparm' will tell
you. If it is, then that could easily cause the page to come
unlocked before brw_page() has finished touching the buffer
ring. Then all it takes is a parallel try_to_free_buffers
on the other CPU.

There's a similar bug in __block_write_full_page(). I'll
send a patch...

-

2001-07-07 09:39:01

by Henry

[permalink] [raw]
Subject: Re: OOPS (kswapd) in 2.4.5 and 2.4.6


>
> I wonder why it only affects you. Is the drive which holds
> your swap partition running in PIO mode? `hdparm' will tell
> you. If it is, then that could easily cause the page to come
> unlocked before brw_page() has finished touching the buffer
> ring. Then all it takes is a parallel try_to_free_buffers
> on the other CPU.

Here's output from htparm:

/dev/hda:
multcount = 0 (off)
I/O support = 0 (default 16-bit)
unmaskirq = 0 (off)
using_dma = 0 (off)
keepsettings = 0 (off)
nowerr = 0 (off)
readonly = 0 (off)
readahead = 8 (on)
geometry = 2494/255/63, sectors = 40079088, start = 0

Does this provide the info you need?

I believe another chap responded to my post with a similar issue (also
SMP machine).

Uptime now 21:55 with no oops.

2001-07-07 09:53:43

by Andrew Morton

[permalink] [raw]
Subject: Re: OOPS (kswapd) in 2.4.5 and 2.4.6

Henry wrote:
>
> >
> > I wonder why it only affects you. Is the drive which holds
> > your swap partition running in PIO mode? `hdparm' will tell
> > you. If it is, then that could easily cause the page to come
> > unlocked before brw_page() has finished touching the buffer
> > ring. Then all it takes is a parallel try_to_free_buffers
> > on the other CPU.
>
> Here's output from htparm:
>
> /dev/hda:
> multcount = 0 (off)
> I/O support = 0 (default 16-bit)
> unmaskirq = 0 (off)
> using_dma = 0 (off)
> keepsettings = 0 (off)
> nowerr = 0 (off)
> readonly = 0 (off)
> readahead = 8 (on)
> geometry = 2494/255/63, sectors = 40079088, start = 0
>
> Does this provide the info you need?

Bingo. PIO mode -> synchronous writes in submit_bh(). Thanks.

> I believe another chap responded to my post with a similar issue (also
> SMP machine).

No, his oops was a bad inode state while trying to
release unused NFS client inodes. Different bug :)

-

2001-07-08 11:16:34

by Henry

[permalink] [raw]
Subject: Re: OOPS (kswapd) in 2.4.5 and 2.4.6

>
> No, his oops was a bad inode state while trying to
> release unused NFS client inodes. Different bug :)
>

New development. No oops, but apache eventually crashed with the same
error message 'semget - no space left on device'. So,... either this
was a coincidence (ie, with the kernel issue) and a problem exists with
Apache/1.3.19 Ben-SSL/1.42 (Unix)/PHP which requires a reboot to fix,
or something else is happening. Could there be a link between the
previous kernel bug and the apache issue? Do you have any idea what
the error message means, or what it's related to? Previously (when the
oops was prevalent), the oops would occur at roughly the same time as
the apache problem - which could mean everything, or nothing at all...

sigh.

Cheers
Henry