2005-12-19 21:38:57

by John F Flynn III

[permalink] [raw]
Subject: Very rare crash in prune_dcache

Good evening, folks...

We have been experiencing a very rare (on average once every two to
three months) crash on some of our servers.

uname -a:
Linux cheetah 2.6.9-22.0.1.ELsmp #1 SMP Thu Oct 27 13:14:25 CDT 2005
i686 i686 i386 GNU/Linux

(This is a CentOS provided kernel)

Here is a photo of the bottom of the panic. Unfortunately the kernel has
no chance to log this anywhere else:

http://www.cs.fiu.edu/~flynnj/cheetah-crash.jpg


The crash appears to be in prune_dcache, and has happened on several
distinct machines, so we do not believe it is a hardware problem.

If anyone has pointers on what bug could be causing this crash, or if
it's been fixed in newer kernels we could try, it would be greatly
appreciated. This only seems to happen on loaded production machines,
and it happens so rarely that more detailed debugging is nearly impossible.

Thanks in advance,
-John Flynn

--
John Flynn [email protected]
=========================================================
Systems and Network Administration /\_/\
School of Computer Science ( O.O )
Florida International University > <


2005-12-19 22:34:39

by Cliff Wickman

[permalink] [raw]
Subject: Re: Very rare crash in prune_dcache

We've seen the below on at 2.6.5 kernel (SuSE SLES9) at SGI.
Does it look like your crash?

The panic is by kswapd0:

<1>Unable to handle kernel NULL pointer dereference (address
0000000000000078)
<4>kswapd0[122]: Oops 8813272891392 [1]

whose stack shows:
[<a0000001001cecf0>] clear_inode+0x1b0/0x2c0
[<a0000001001d03d0>] generic_drop_inode+0x3b0/0x400
[<a0000001001ccf30>] iput+0x130/0x1c0
[<a00000020b6f0cd0>] nfs_dentry_iput+0x170/0x1c0 [nfs]
[<a0000001001ca050>] prune_dcache+0x510/0x540
[<a0000001001ca0c0>] shrink_dcache_memory+0x40/0x80
[<a00000010014c360>] shrink_slab+0x2e0/0x440

Both generic_shutdown_super()'s calls to shrink_dcache_parent() or
shrink_dcache_anon(), and kswapd0's call to shrink_dcache_memory()
call prune_dcache().
I suspect a race condition inside prune_dcache().

The prune_dcache() function:
lock dcache_lock
scan the dentry_unused list of dentry's for a given number ("count") of
dentry's to free:
if a dentry to free, call prune_one_dentry()
dentry_iput()
unlock dcache_lock
iput() any associated inode
d_free() the dentry
lock dcache_lock
unlock dcache_lock

Two processors entering prune_dcache() near the same time will both scan
the dentry_unused list and could try to iput() the same inode twice. That is
because the dcache_lock is released while running iput().

I suppose the dcache_lock must be released here because the iput() may take
a long time. And the dcache_lock is used many places in the system
to protect the dentry cache's lists.

It would seem to me that a straighforward fix would be to add another
lock to protect just the scan of the dentry_unused list only here in
prune_dcache()

-Cliff Wickman

On Mon, Dec 19, 2005 at 04:38:55PM -0500, John F Flynn III wrote:
> Good evening, folks...
>
> We have been experiencing a very rare (on average once every two to
> three months) crash on some of our servers.
>
> uname -a:
> Linux cheetah 2.6.9-22.0.1.ELsmp #1 SMP Thu Oct 27 13:14:25 CDT 2005
> i686 i686 i386 GNU/Linux
>
> (This is a CentOS provided kernel)
>
> Here is a photo of the bottom of the panic. Unfortunately the kernel has
> no chance to log this anywhere else:
>
> http://www.cs.fiu.edu/~flynnj/cheetah-crash.jpg
>
>
> The crash appears to be in prune_dcache, and has happened on several
> distinct machines, so we do not believe it is a hardware problem.
>
> If anyone has pointers on what bug could be causing this crash, or if
> it's been fixed in newer kernels we could try, it would be greatly
> appreciated. This only seems to happen on loaded production machines,
> and it happens so rarely that more detailed debugging is nearly impossible.
>
> Thanks in advance,
> -John Flynn
>
> --
> John Flynn [email protected]
> =========================================================
> Systems and Network Administration /\_/\
> School of Computer Science ( O.O )
> Florida International University > <
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Cliff Wickman
Silicon Graphics, Inc.
[email protected]
(651) 683-3824

2005-12-20 06:42:15

by Bharata B Rao

[permalink] [raw]
Subject: Re: Very rare crash in prune_dcache

Hi Cliff,

On Mon, Dec 19, 2005 at 04:34:35PM -0600, Cliff Wickman wrote:
> We've seen the below on at 2.6.5 kernel (SuSE SLES9) at SGI.
> Does it look like your crash?
>
> The panic is by kswapd0:
>
> <1>Unable to handle kernel NULL pointer dereference (address
> 0000000000000078)
> <4>kswapd0[122]: Oops 8813272891392 [1]
>
> whose stack shows:
> [<a0000001001cecf0>] clear_inode+0x1b0/0x2c0
> [<a0000001001d03d0>] generic_drop_inode+0x3b0/0x400
> [<a0000001001ccf30>] iput+0x130/0x1c0
> [<a00000020b6f0cd0>] nfs_dentry_iput+0x170/0x1c0 [nfs]
> [<a0000001001ca050>] prune_dcache+0x510/0x540
> [<a0000001001ca0c0>] shrink_dcache_memory+0x40/0x80
> [<a00000010014c360>] shrink_slab+0x2e0/0x440
>
> Both generic_shutdown_super()'s calls to shrink_dcache_parent() or
> shrink_dcache_anon(), and kswapd0's call to shrink_dcache_memory()
> call prune_dcache().
> I suspect a race condition inside prune_dcache().
>
> The prune_dcache() function:
> lock dcache_lock
> scan the dentry_unused list of dentry's for a given number ("count") of
> dentry's to free:
> if a dentry to free, call prune_one_dentry()
> dentry_iput()
> unlock dcache_lock
> iput() any associated inode
> d_free() the dentry
> lock dcache_lock
> unlock dcache_lock
>
> Two processors entering prune_dcache() near the same time will both scan
> the dentry_unused list and could try to iput() the same inode twice. That is
> because the dcache_lock is released while running iput().
>
> I suppose the dcache_lock must be released here because the iput() may take
> a long time. And the dcache_lock is used many places in the system
> to protect the dentry cache's lists.
>
> It would seem to me that a straighforward fix would be to add another
> lock to protect just the scan of the dentry_unused list only here in
> prune_dcache()
>

Isn't this what dcache_lock doing presently ? As per vanilla 2.6.5 kernel
I don't see how the race condition you mention above can happen.

In prune_dcache(), a dentry is first removed off the dentry_unused list
(under dcache_lock) before calling prune_one_dentry(). So how is it
possible that an another thread executing prune_dcache() will hit
the same dentry again ?

Regards,
Bharata.

2005-12-20 13:34:14

by Cliff Wickman

[permalink] [raw]
Subject: Re: Very rare crash in prune_dcache

Hi Bharata,

On Tue, Dec 20, 2005 at 12:16:29PM +0530, Bharata B Rao wrote:
> Hi Cliff,
>
> > I suspect a race condition inside prune_dcache().
> >
> > The prune_dcache() function:
> > lock dcache_lock
> > scan the dentry_unused list of dentry's for a given number ("count") of
> > dentry's to free:
--------
get (remove) dentry from dentry_unused list
--------
> > if a dentry to free, call prune_one_dentry()
> > dentry_iput()
> > unlock dcache_lock
> > iput() any associated inode
> > d_free() the dentry
> > lock dcache_lock
> > unlock dcache_lock
> >
> > Two processors entering prune_dcache() near the same time will both scan
> > the dentry_unused list and could try to iput() the same inode twice. That is
> > because the dcache_lock is released while running iput().
>
> Isn't this what dcache_lock doing presently ? As per vanilla 2.6.5 kernel
> I don't see how the race condition you mention above can happen.
>
> In prune_dcache(), a dentry is first removed off the dentry_unused list
> (under dcache_lock) before calling prune_one_dentry(). So how is it
> possible that an another thread executing prune_dcache() will hit
> the same dentry again ?

Yes, I think you're right. And it's not theoretically possible for
two dentry's to point to the same inode. So the inode that caused our
crash must have been corrupted elsewhere.
Thanks.

-Cliff

--
Cliff Wickman
Silicon Graphics, Inc.
[email protected]
(651) 683-3824