2005-02-07 19:15:27

by Stuart Anderson

[permalink] [raw]
Subject: kernel Oops in rpc.mountd

A dual-Xeon FC3 machine just crashed with the following kernel Oops in
rpc.mounted. Any ideas on how to debug this?

kernel-smp-2.6.10-1.760_FC3
kernel-utils-2.4-13.1.49_FC3
nfs-utils-1.0.6-44
portmap-4.0-63

I am getting about 1 kernel crash per day on a cluster of 290 such boxes
with different kernel Oops messages. I do not always get the syslog message,
but perhaps this one has enough information to track it down.

Thanks.


Feb 6 21:49:44 node77 kernel: Unable to handle kernel paging request at virtual address 00100104
Feb 6 21:49:44 node77 kernel: printing eip:
Feb 6 21:49:44 node77 kernel: f8a5179e
Feb 6 21:49:44 node77 kernel: *pde = 369c0001
Feb 6 21:49:44 node77 kernel: Oops: 0000 [#1]
Feb 6 21:49:44 node77 kernel: SMP
Feb 6 21:49:44 node77 kernel: Modules linked in: nfsd exportfs md5 ipv6 nfs lockd sunrpc dm_mod video button battery ac uhci_hcd hw_random i2c_i801 i2c_core e1000 floppy ext3 jbd
Feb 6 21:49:44 node77 kernel: CPU: 2
Feb 6 21:49:44 node77 kernel: EIP: 0060:[<f8a5179e>] Not tainted VLI
Feb 6 21:49:44 node77 kernel: EFLAGS: 00010206 (2.6.10-1.760_FC3smp)
Feb 6 21:49:44 node77 kernel: EIP is at cache_clean+0xe6/0x1b7 [sunrpc]
Feb 6 21:49:44 node77 kernel: eax: dff05000 ebx: 00100100 ecx: 0000008f edx: f8a62fa0
Feb 6 21:49:44 node77 kernel: esi: cf874180 edi: 00000000 ebp: f6c5bef4 esp: f6c5bec8
Feb 6 21:49:44 node77 kernel: ds: 007b es: 007b ss: 0068
Feb 6 21:49:44 node77 kernel: Process rpc.mountd (pid: 3620, threadinfo=f6c5b000 task=f6ad5a60)
Feb 6 21:49:44 node77 kernel: Stack: df941140 c232c680 42070880 f8a518bc f8a4f6ec f6c5bf02 0000000a 0000000e
Feb 6 21:49:44 node77 kernel: 00000001 000000c8 f6c5bf58 f8a65f69 37303131 31373537 00003438 00166995
Feb 6 21:49:44 node77 kernel: 000081a4 00000001 00000000 00000000 00000000 00000000 00000000 00000098
Feb 6 21:49:44 node77 kernel: Call Trace:
Feb 6 21:49:44 node77 kernel: [<f8a518bc>] cache_flush+0x1a/0x3b [sunrpc]
Feb 6 21:49:44 node77 kernel: [<f8a4f6ec>] ip_map_parse+0x18b/0x19a [sunrpc]
Feb 6 21:49:44 node77 kernel: [<f8a4f561>] ip_map_parse+0x0/0x19a [sunrpc]
Feb 6 21:49:44 node77 kernel: [<f8a51e4a>] cache_write+0x8d/0xa7 [sunrpc]
Feb 6 21:49:44 node77 kernel: [<c0152424>] vfs_write+0xb6/0xe2
Feb 6 21:49:44 node77 kernel: [<c01524ee>] sys_write+0x3c/0x62
Feb 6 21:49:44 node77 kernel: [<c0103c97>] syscall_call+0x7/0xb
Feb 6 21:49:44 node77 kernel: Code: f8 0f 8d e5 00 00 00 8d 42 08 e8 4d a5 86 c7 a1 00 4f a6 f8 8b 50 04 a1 04 4f a6 f8 8d 34 82 8b 1e 85 db 74 74 8b 15 00 4f a6 f8 <8b> 43 04 39 42 34 7e 04 40 89 42 34 8b 43 04 3b 05 10 1d 41 c0


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2005-02-07 23:21:47

by NeilBrown

[permalink] [raw]
Subject: Re: kernel Oops in rpc.mountd

On Monday February 7, [email protected] wrote:
> A dual-Xeon FC3 machine just crashed with the following kernel Oops in
> rpc.mounted. Any ideas on how to debug this?
>
> kernel-smp-2.6.10-1.760_FC3
> kernel-utils-2.4-13.1.49_FC3
> nfs-utils-1.0.6-44
> portmap-4.0-63
>
> I am getting about 1 kernel crash per day on a cluster of 290 such boxes
> with different kernel Oops messages. I do not always get the syslog message,
> but perhaps this one has enough information to track it down.
>
> Thanks.
>
>
> Feb 6 21:49:44 node77 kernel: Unable to handle kernel paging request at virtual address 00100104
^^^^^^^^
...
> Feb 6 21:49:44 node77 kernel: eax: dff05000 ebx: 00100100 ecx: 0000008f edx: f8a62fa0
^^^^^^^^^^^^^
> Feb 6 21:49:44 node77 kernel: esi: cf874180 edi: 00000000 ebp: f6c5bef4 esp: f6c5bec8


Looks like two flipped bits in memory. Do you have ECC RAM? Is it
enabled?
What does memtest86 report?

NeilBrown


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-02-08 04:03:55

by Stuart Anderson

[permalink] [raw]
Subject: Re: kernel Oops in rpc.mountd

I just had a physically different node Oops with an identical stack trace
in rpc.mountd:

Feb 7 17:23:10 node48 kernel: Unable to handle kernel paging request at virtual address 00100104
Feb 7 17:23:10 node48 kernel: printing eip:
Feb 7 17:23:10 node48 kernel: f8a6179e
Feb 7 17:23:10 node48 kernel: *pde = 02288001
Feb 7 17:23:10 node48 kernel: Oops: 0000 [#1]
Feb 7 17:23:10 node48 kernel: SMP
Feb 7 17:23:10 node48 kernel: Modules linked in: nfsd exportfs md5 ipv6 nfs lockd sunrpc dm_mod video button battery ac uhci_hcd hw_random i2c_i801 i2c_core e1000 floppy ext3 jbd
Feb 7 17:23:10 node48 kernel: CPU: 1
Feb 7 17:23:10 node48 kernel: EIP: 0060:[<f8a6179e>] Not tainted VLI
Feb 7 17:23:10 node48 kernel: EFLAGS: 00010206 (2.6.10-1.760_FC3smp)
Feb 7 17:23:10 node48 kernel: EIP is at cache_clean+0xe6/0x1b7 [sunrpc]
Feb 7 17:23:10 node48 kernel: eax: e76f2000 ebx: 00100100 ecx: 0000000b edx: f8a72fa0
Feb 7 17:23:10 node48 kernel: esi: f4df9940 edi: 00000000 ebp: f6b19ef4 esp: f6b19ec8
Feb 7 17:23:10 node48 kernel: ds: 007b es: 007b ss: 0068
Feb 7 17:23:10 node48 kernel: Process rpc.mountd (pid: 4013, threadinfo=f6b19000 task=f7cf0540)
Feb 7 17:23:10 node48 kernel: Stack: f4cd7200 f5a9c580 42081b86 f8a618bc f8a5f6ec f6b19f02 0000000a 0000000e
Feb 7 17:23:10 node48 kernel: 00000001 00000023 f6b19f58 f8a75f68 37303131 35373238 00003039 0002bd64
Feb 7 17:23:10 node48 kernel: 000081a4 00000001 00000000 00000000 00000000 00000000 00000000 00000098
Feb 7 17:23:10 node48 kernel: Call Trace:
Feb 7 17:23:10 node48 kernel: [<f8a618bc>] cache_flush+0x1a/0x3b [sunrpc]
Feb 7 17:23:10 node48 kernel: [<f8a5f6ec>] ip_map_parse+0x18b/0x19a [sunrpc]
Feb 7 17:23:10 node48 kernel: [<f8a5f561>] ip_map_parse+0x0/0x19a [sunrpc]
Feb 7 17:23:10 node48 kernel: [<f8a61e4a>] cache_write+0x8d/0xa7 [sunrpc]
Feb 7 17:23:10 node48 kernel: [<c0152424>] vfs_write+0xb6/0xe2
Feb 7 17:23:10 node48 kernel: [<c01524ee>] sys_write+0x3c/0x62
Feb 7 17:23:10 node48 kernel: [<c0103c97>] syscall_call+0x7/0xb
Feb 7 17:23:10 node48 kernel: Code: f8 0f 8d e5 00 00 00 8d 42 08 e8 4d a5 85 c7 a1 00 4f a7 f8 8b 50 04 a1 04 4f a7 f8 8d 34 82 8b 1e 85 db 74 74 8b 15 00 4f a7 f8 <8b> 43 04 39 42 34 7e 04 40 89 42 34 8b 43 04 3b 05 10 1d 41 c0

According to Neil Brown:
> On Monday February 7, [email protected] wrote:
> > A dual-Xeon FC3 machine just crashed with the following kernel Oops in
> > rpc.mounted. Any ideas on how to debug this?
> >
> > kernel-smp-2.6.10-1.760_FC3
> > kernel-utils-2.4-13.1.49_FC3
> > nfs-utils-1.0.6-44
> > portmap-4.0-63
> >
> > I am getting about 1 kernel crash per day on a cluster of 290 such boxes
> > with different kernel Oops messages. I do not always get the syslog message,
> > but perhaps this one has enough information to track it down.
> >
> > Thanks.
> >
> >
> > Feb 6 21:49:44 node77 kernel: Unable to handle kernel paging request at virtual address 00100104
> ^^^^^^^^
> ...
> > Feb 6 21:49:44 node77 kernel: eax: dff05000 ebx: 00100100 ecx: 0000008f edx: f8a62fa0
> ^^^^^^^^^^^^^
> > Feb 6 21:49:44 node77 kernel: esi: cf874180 edi: 00000000 ebp: f6c5bef4 esp: f6c5bec8
>
>
> Looks like two flipped bits in memory. Do you have ECC RAM? Is it
> enabled?
> What does memtest86 report?
>
> NeilBrown
>


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-02-08 04:19:45

by NeilBrown

[permalink] [raw]
Subject: Re: kernel Oops in rpc.mountd

On Monday February 7, [email protected] wrote:
> I just had a physically different node Oops with an identical stack trace
> in rpc.mountd:

Well, that's pretty convincing!!!

The code in question is walking down a hash chain looking for old
entries to discard. It finds an entry at 0x00100100.
My guess is that some entry is being freed without being unlinked
properly and the memory gets reused so that the ->next point becomes
corrupt.

If it is convenient, recompiling the kernel with
CONFIG_DEBUG_SLAB=y

might help narrow down the problem, as the memory will be "poisoned"
as soon as it is freed.

I will also try to review the code and see if I can find a race that
might get lost.

NeilBrown


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-02-08 04:47:14

by Stuart Anderson

[permalink] [raw]
Subject: Re: kernel Oops in rpc.mountd

We are going to rebuild 2.6.10-1.760_FC3smp with 8k stack (just my
paranoia about the 4k stacks), and remove absolutely everything we do
not need, however, I will add in the CONFIG_DEBUG_SLAB. Are there any
other kernel debug flags that might be helpful?

Perhaps the bug is due to having a large list of static NFS mountes (290)?
I would desparately like to get rid of these but recent versions
of autofs have a problem running more than ~1 mount per second
due to using too many priveleged TCP ports per mount, and some
of our applications go through the 290 cross-mounts faster than that.
For that matter, recent versions of /bin/mount have the same problem
so we have to throttle the rate of mounts at boot time in /etc/rc.local.

I have also had 6 crashes with no console or syslog trace and 2 other
kernel Oops in the last few days that I suspect are related based on
the cluster usage patttern that might help understand the problem.
One did not get the syslog written to disk beyond,

Feb 6 07:53:45 node24 kernel: Unable to handle kernel paging request at virtual address 00001000
Feb 6 07:53:45 node24 kernel: printing eip:
Feb 6 07:53:45 node24 kernel: c013f7e0
Feb 6 07:53:45 node24 kernel: *pde = 37281001

but the console had something like,

Process events/2 (pid: 12 ...)
...
Call Trace:
drain_array_locked
cache_reap
worker_thread
cache_reap
default_wake_function
default_wake_function
worker_thread
kthread
kthread
kernel_thread_helper

and another that logged the full Oops message,

Feb 7 18:06:27 node52 kernel: Unable to handle kernel paging request at virtual address 20202024
Feb 7 18:06:27 node52 kernel: printing eip:
Feb 7 18:06:27 node52 kernel: f8a5179e
Feb 7 18:06:27 node52 kernel: *pde = 29455001
Feb 7 18:06:27 node52 kernel: Oops: 0000 [#1]
Feb 7 18:06:27 node52 kernel: SMP
Feb 7 18:06:27 node52 kernel: Modules linked in: nfsd exportfs md5 ipv6 nfs lockd sunrpc dm_mod video button battery ac uhci_hcd hw_random i2c_i801 i2c_core e1000 floppy ext3 jbd
Feb 7 18:06:27 node52 kernel: CPU: 0
Feb 7 18:06:27 node52 kernel: EIP: 0060:[<f8a5179e>] Not tainted VLI
Feb 7 18:06:27 node52 kernel: EFLAGS: 00010202 (2.6.10-1.760_FC3smp)
Feb 7 18:06:27 node52 kernel: EIP is at cache_clean+0xe6/0x1b7 [sunrpc]
Feb 7 18:06:27 node52 kernel: eax: 20305550 ebx: 20202020 ecx: 000000a8 edx: f8a62fa0
Feb 7 18:06:27 node52 kernel: esi: f24cc000 edi: 00000000 ebp: f7f46000 esp: f7f1bf58
Feb 7 18:06:27 node52 kernel: ds: 007b es: 007b ss: 0068
Feb 7 18:06:27 node52 kernel: Process events/0 (pid: 10, threadinfo=f7f1b000 task=f7fefa60)
Feb 7 18:06:27 node52 kernel: Stack: 00000005 f8a63124 00000206 f8a5187a f8a63120 c012b2bf 00000000 f8a5186f
Feb 7 18:06:27 node52 kernel: ffffffff ffffffff 00000001 00000000 c011a3de 00010000 00000000 c03f00a0
Feb 7 18:06:27 node52 kernel: c201f060 00000000 00000000 f7fefa60 c011a3de 00100100 00200200 f7fefbcc
Feb 7 18:06:27 node52 kernel: Call Trace:
Feb 7 18:06:27 node52 kernel: [<f8a5187a>] do_cache_clean+0xb/0x33 [sunrpc]
Feb 7 18:06:27 node52 kernel: [<c012b2bf>] worker_thread+0x168/0x1d5
Feb 7 18:06:27 node52 kernel: [<f8a5186f>] do_cache_clean+0x0/0x33 [sunrpc]
Feb 7 18:06:27 node52 kernel: [<c011a3de>] default_wake_function+0x0/0xc
Feb 7 18:06:27 node52 kernel: [<c011a3de>] default_wake_function+0x0/0xc
Feb 7 18:06:27 node52 kernel: [<c012b157>] worker_thread+0x0/0x1d5
Feb 7 18:06:27 node52 kernel: [<c012e569>] kthread+0x73/0x9b
Feb 7 18:06:27 node52 kernel: [<c012e4f6>] kthread+0x0/0x9b
Feb 7 18:06:27 node52 kernel: [<c01021f5>] kernel_thread_helper+0x5/0xb
Feb 7 18:06:27 node52 kernel: Code: f8 0f 8d e5 00 00 00 8d 42 08 e8 4d a5 86 c7 a1 00 4f a6 f8 8b 50 04 a1 04 4f a6 f8 8d 34 82 8b 1e 85 db 74 74 8b 15 00 4f a6 f8 <8b> 43 04 39 42 34 7e 04 40 89 42 34 8b 43 04 3b 05 10 1d 41 c0

According to Neil Brown:
> On Monday February 7, [email protected] wrote:
> > I just had a physically different node Oops with an identical stack trace
> > in rpc.mountd:
>
> Well, that's pretty convincing!!!
>
> The code in question is walking down a hash chain looking for old
> entries to discard. It finds an entry at 0x00100100.
> My guess is that some entry is being freed without being unlinked
> properly and the memory gets reused so that the ->next point becomes
> corrupt.
>
> If it is convenient, recompiling the kernel with
> CONFIG_DEBUG_SLAB=y
>
> might help narrow down the problem, as the memory will be "poisoned"
> as soon as it is freed.
>
> I will also try to review the code and see if I can find a race that
> might get lost.
>
> NeilBrown
>


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-02-08 05:58:12

by NeilBrown

[permalink] [raw]
Subject: Re: kernel Oops in rpc.mountd

On Monday February 7, [email protected] wrote:
> We are going to rebuild 2.6.10-1.760_FC3smp with 8k stack (just my
> paranoia about the 4k stacks), and remove absolutely everything we do
> not need, however, I will add in the CONFIG_DEBUG_SLAB. Are there any
> other kernel debug flags that might be helpful?

No. However the following patch might be worth a try.
It adds some extra locking. I'm pretty sure there is a race that this
closes that could possibly cause your problem. I also think the
locking here is a bit heavy handed, but I am sure it is safe.

>
> Perhaps the bug is due to having a large list of static NFS mounts (290)?

Having lots of mounts won't hurt. But having lots of clients mount
this server might have make it more likely to trigger the bug as there
is more activity on the export cache.

NeilBrown


Signed-off-by: Neil Brown <[email protected]>

### Diffstat output
./include/linux/sunrpc/cache.h | 14 +++++++++-----
1 files changed, 9 insertions(+), 5 deletions(-)

diff ./include/linux/sunrpc/cache.h~current~ ./include/linux/sunrpc/cache.h
--- ./include/linux/sunrpc/cache.h~current~ 2005-02-08 16:23:21.000000000 +1100
+++ ./include/linux/sunrpc/cache.h 2005-02-08 16:41:09.000000000 +1100
@@ -268,15 +268,19 @@ static inline struct cache_head *cache_

static inline int cache_put(struct cache_head *h, struct cache_detail *cd)
{
- atomic_dec(&h->refcnt);
+ int rv = 0;
+ read_lock(&cd->hash_lock);
+ if (atomic_dec_and_test(&h->refcnt))
+ rv = 1;
if (!atomic_read(&h->refcnt) &&
h->expiry_time < cd->nextcheck)
cd->nextcheck = h->expiry_time;
- if (!test_bit(CACHE_HASHED, &h->flags) &&
- !atomic_read(&h->refcnt))
- return 1;
+ if (test_bit(CACHE_HASHED, &h->flags) ||
+ atomic_read(&h->refcnt))
+ rv = 0;
+ read_unlock(&cd->hash_lock);

- return 0;
+ return rv;
}

extern void cache_init(struct cache_head *h);


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-02-12 22:23:19

by Stuart Anderson

[permalink] [raw]
Subject: Re: kernel Oops in rpc.mountd

Neil,
This patch did the trick! We have now run our 290 cluster nodes for
72 hours with this extra locking patch without any kernel crashes. This is
to be compared to 1-5 crashes per day before the patch.

Many thanks!

What is the next step in getting this patch integrated into FC3
and the mainline kernel branch?

According to Neil Brown:
> On Monday February 7, [email protected] wrote:
> > We are going to rebuild 2.6.10-1.760_FC3smp with 8k stack (just my
> > paranoia about the 4k stacks), and remove absolutely everything we do
> > not need, however, I will add in the CONFIG_DEBUG_SLAB. Are there any
> > other kernel debug flags that might be helpful?
>
> No. However the following patch might be worth a try.
> It adds some extra locking. I'm pretty sure there is a race that this
> closes that could possibly cause your problem. I also think the
> locking here is a bit heavy handed, but I am sure it is safe.
>
> >
> > Perhaps the bug is due to having a large list of static NFS mounts (290)?
>
> Having lots of mounts won't hurt. But having lots of clients mount
> this server might have make it more likely to trigger the bug as there
> is more activity on the export cache.
>
> NeilBrown
>
>
> Signed-off-by: Neil Brown <[email protected]>
>
> ### Diffstat output
> ./include/linux/sunrpc/cache.h | 14 +++++++++-----
> 1 files changed, 9 insertions(+), 5 deletions(-)
>
> diff ./include/linux/sunrpc/cache.h~current~ ./include/linux/sunrpc/cache.h
> --- ./include/linux/sunrpc/cache.h~current~ 2005-02-08 16:23:21.000000000 +1
100
> +++ ./include/linux/sunrpc/cache.h 2005-02-08 16:41:09.000000000 +1100
> @@ -268,15 +268,19 @@ static inline struct cache_head *cache_
>
> static inline int cache_put(struct cache_head *h, struct cache_detail *cd)
> {
> - atomic_dec(&h->refcnt);
> + int rv = 0;
> + read_lock(&cd->hash_lock);
> + if (atomic_dec_and_test(&h->refcnt))
> + rv = 1;
> if (!atomic_read(&h->refcnt) &&
> h->expiry_time < cd->nextcheck)
> cd->nextcheck = h->expiry_time;
> - if (!test_bit(CACHE_HASHED, &h->flags) &&
> - !atomic_read(&h->refcnt))
> - return 1;
> + if (test_bit(CACHE_HASHED, &h->flags) ||
> + atomic_read(&h->refcnt))
> + rv = 0;
> + read_unlock(&cd->hash_lock);
>
> - return 0;
> + return rv;
> }
>
> extern void cache_init(struct cache_head *h);
>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-02-14 02:19:12

by NeilBrown

[permalink] [raw]
Subject: Re: kernel Oops in rpc.mountd

On Saturday February 12, [email protected] wrote:
> Neil,
> This patch did the trick! We have now run our 290 cluster nodes for
> 72 hours with this extra locking patch without any kernel crashes. This is
> to be compared to 1-5 crashes per day before the patch.
>
> Many thanks!
>
> What is the next step in getting this patch integrated into FC3
> and the mainline kernel branch?

How things get into FC3 I have no idea - not my problem:-)

I have revised this patch totally (as I said, I felt the locking that
I had added was a bit heavy handed). The following patch fixes
exactly the same problem in a very different way.

I will forward it to Andrew Morton shortly and it should then appear
in his next -mm release.
It is too late for it to get into 2.6.11, but it should be in an early
-rc for 2.6.12.

Thanks for the feedback.

NeilBrown

Status: ok

Discard CACHE_HASHED flag, keeping information in refcount instead.

The rpc auth cache currently differentiates between a reference
due to being in a hash chain (signalled by CACHE_HASHED flag)
and any other reference (counted in refcnt).

This is an artificial difference due to an historical accident,
and it makes cache_put unsafe.

This patch removes the distinction so now existance in a hash
chain is counted just like any other reference.

This allows us to close a race that exists in cache_put.

Signed-off-by: Neil Brown <[email protected]>

### Diffstat output
./include/linux/sunrpc/cache.h | 19 +++++--------------
./net/sunrpc/cache.c | 4 +---
./net/sunrpc/svcauth.c | 8 ++++----
3 files changed, 10 insertions(+), 21 deletions(-)

diff ./include/linux/sunrpc/cache.h~current~ ./include/linux/sunrpc/cache.h
--- ./include/linux/sunrpc/cache.h~current~ 2005-02-14 12:54:53.000000000 +1100
+++ ./include/linux/sunrpc/cache.h 2005-02-14 12:54:53.000000000 +1100
@@ -37,8 +37,7 @@
* Entries have a ref count and a 'hashed' flag which counts the existance
* in the hash table.
* We only expire entries when refcount is zero.
- * Existance in the cache is not measured in refcount but rather in
- * CACHE_HASHED flag.
+ * Existance in the cache is counted the refcount.
*/

/* Every cache item has a common header that is used
@@ -57,7 +56,6 @@ struct cache_head {
#define CACHE_VALID 0 /* Entry contains valid data */
#define CACHE_NEGATIVE 1 /* Negative entry - there is no match for the key */
#define CACHE_PENDING 2 /* An upcall has been sent but no reply received yet*/
-#define CACHE_HASHED 3 /* Entry is in a hash table */

#define CACHE_NEW_EXPIRY 120 /* keep new things pending confirmation for 120 seconds */

@@ -185,7 +183,6 @@ RTN *FNAME ARGS \
\
if (new) \
{INIT;} \
- cache_get(&tmp->MEMBER); \
if (set) { \
if (!INPLACE && test_bit(CACHE_VALID, &tmp->MEMBER.flags))\
{ /* need to swap in new */ \
@@ -194,8 +191,6 @@ RTN *FNAME ARGS \
new->MEMBER.next = tmp->MEMBER.next; \
*hp = &new->MEMBER; \
tmp->MEMBER.next = NULL; \
- set_bit(CACHE_HASHED, &new->MEMBER.flags); \
- clear_bit(CACHE_HASHED, &tmp->MEMBER.flags); \
t2 = tmp; tmp = new; new = t2; \
} \
if (test_bit(CACHE_NEGATIVE, &item->MEMBER.flags)) \
@@ -205,6 +200,7 @@ RTN *FNAME ARGS \
clear_bit(CACHE_NEGATIVE, &tmp->MEMBER.flags); \
} \
} \
+ cache_get(&tmp->MEMBER); \
if (set||new) write_unlock(&(DETAIL)->hash_lock); \
else read_unlock(&(DETAIL)->hash_lock); \
if (set) \
@@ -220,7 +216,7 @@ RTN *FNAME ARGS \
new->MEMBER.next = *head; \
*head = &new->MEMBER; \
(DETAIL)->entries ++; \
- set_bit(CACHE_HASHED, &new->MEMBER.flags); \
+ cache_get(&new->MEMBER); \
if (set) { \
tmp = new; \
if (test_bit(CACHE_NEGATIVE, &item->MEMBER.flags)) \
@@ -268,15 +264,10 @@ static inline struct cache_head *cache_

static inline int cache_put(struct cache_head *h, struct cache_detail *cd)
{
- atomic_dec(&h->refcnt);
- if (!atomic_read(&h->refcnt) &&
+ if (atomic_read(&h->refcnt) <= 2 &&
h->expiry_time < cd->nextcheck)
cd->nextcheck = h->expiry_time;
- if (!test_bit(CACHE_HASHED, &h->flags) &&
- !atomic_read(&h->refcnt))
- return 1;
-
- return 0;
+ return atomic_dec_and_test(&h->refcnt);
}

extern void cache_init(struct cache_head *h);

diff ./net/sunrpc/cache.c~current~ ./net/sunrpc/cache.c
--- ./net/sunrpc/cache.c~current~ 2005-02-14 12:54:53.000000000 +1100
+++ ./net/sunrpc/cache.c 2005-02-14 12:54:53.000000000 +1100
@@ -321,12 +321,10 @@ static int cache_clean(void)
if (test_and_clear_bit(CACHE_PENDING, &ch->flags))
queue_loose(current_detail, ch);

- if (!atomic_read(&ch->refcnt))
+ if (atomic_read(&ch->refcnt) == 1)
break;
}
if (ch) {
- cache_get(ch);
- clear_bit(CACHE_HASHED, &ch->flags);
*cp = ch->next;
ch->next = NULL;
current_detail->entries--;

diff ./net/sunrpc/svcauth.c~current~ ./net/sunrpc/svcauth.c
--- ./net/sunrpc/svcauth.c~current~ 2005-02-14 12:54:53.000000000 +1100
+++ ./net/sunrpc/svcauth.c 2005-02-14 12:54:53.000000000 +1100
@@ -178,12 +178,12 @@ auth_domain_lookup(struct auth_domain *i
tmp = container_of(*hp, struct auth_domain, h);
if (!auth_domain_match(tmp, item))
continue;
- cache_get(&tmp->h);
- if (!set)
+ if (!set) {
+ cache_get(&tmp->h);
goto out_noset;
+ }
*hp = tmp->h.next;
tmp->h.next = NULL;
- clear_bit(CACHE_HASHED, &tmp->h.flags);
auth_domain_drop(&tmp->h, &auth_domain_cache);
goto out_set;
}
@@ -192,9 +192,9 @@ auth_domain_lookup(struct auth_domain *i
goto out_nada;
auth_domain_cache.entries++;
out_set:
- set_bit(CACHE_HASHED, &item->h.flags);
item->h.next = *head;
*head = &item->h;
+ cache_get(&item->h);
write_unlock(&auth_domain_cache.hash_lock);
cache_fresh(&auth_domain_cache, &item->h, item->h.expiry_time);
cache_get(&item->h);


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-02-08 00:29:38

by Stuart Anderson

[permalink] [raw]
Subject: Re: kernel Oops in rpc.mountd

According to Neil Brown:
> On Monday February 7, [email protected] wrote:
> > A dual-Xeon FC3 machine just crashed with the following kernel Oops in
> > rpc.mounted. Any ideas on how to debug this?
> >
> > kernel-smp-2.6.10-1.760_FC3
> > kernel-utils-2.4-13.1.49_FC3
> > nfs-utils-1.0.6-44
> > portmap-4.0-63
> >
> > I am getting about 1 kernel crash per day on a cluster of 290 such boxes
> > with different kernel Oops messages. I do not always get the syslog message,
> > but perhaps this one has enough information to track it down.
> >
> > Thanks.
> >
> >
> > Feb 6 21:49:44 node77 kernel: Unable to handle kernel paging request at virtual address 00100104
> ^^^^^^^^
> ...
> > Feb 6 21:49:44 node77 kernel: eax: dff05000 ebx: 00100100 ecx: 0000008f edx: f8a62fa0
> ^^^^^^^^^^^^^
> > Feb 6 21:49:44 node77 kernel: esi: cf874180 edi: 00000000 ebp: f6c5bef4 esp: f6c5bec8
>
>
> Looks like two flipped bits in memory. Do you have ECC RAM? Is it

Yes.

> enabled?

Yes.

> What does memtest86 report?

I have not run it recently, but we ran it for 72 hours on all 290 nodes about
a year ago without any errors. The current round of kernel Oops are happening
on different nodes, so either most of the memory is experience a sudden
end-of-life accelerated failure rate or there is a software bug in the
newer kernel.

It looks like I have it narrowed down to one user application, but it
can run for up to a day before crashing, and it is always on a different
node/hardware, so I think it is a software bug in the kernel.

The 290 nodes cross mount each others internal IDE drives with ext3 shared
via NFS v3.


We will probably try and downgrade the kernel, but I am open to other
suggestions.

Thanks.

--
Stuart Anderson [email protected] http://www.srl.caltech.edu/personnel/sba


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs