On Monday April 16, [email protected] wrote:
> On 04/16/2007 12:47:32 +0200, Neil Brown <[email protected]> wrote:
>
> > On Monday April 16, [email protected] wrote:
>
> >>
> >> Is this related to the kernel cache_clean function in net/sunrpc/cache.c ?
> >
> > Yes. It is crashing at:
> > for (; ch; cp= & ch->next, ch= *cp) {
> > if (current_detail->nextcheck > ch->expiry_time)
> > ^^^^^^
> > current_detail->nextcheck = ch->expiry_time+1;
> > if (ch->expiry_time >= get_seconds()
> >
> > ch has a garbage value (0001e71926010009), presumable because a
> > previous cp had been corrupted, most likely by being freed while still
> > in use.
> >
>
> Maybe this can helps : I never encountered the problem with kernel
> 2.6.18, maybe this can restrict the scope for where is this corruption
> happening ?
I did end up just hunting through the patches between 2.6.18 and
2.6.20 and I think I have found it (I had already looked at the code
where the bug is, but didn't see it the first time :-().
Could you try this patch and report the results please?
I'm very confident that the patch is correct and required, but I would
like to also know that it fixes your problem.
Thanks
NeilBrown
---------------------------------------------
Use a spinlock to protect sk_info_authunix
sk_info_authunix is not being protected properly so the object that
it points to can be cache_put twice, leading to corruption.
We borrow svsk->sk_defer_lock to provide the protection. We should probably
rename that lock to have a more generic name.
Signed-off-by: Neil Brown <[email protected]>
### Diffstat output
./net/sunrpc/svcauth_unix.c | 21 ++++++++++++++++-----
1 file changed, 16 insertions(+), 5 deletions(-)
diff .prev/net/sunrpc/svcauth_unix.c ./net/sunrpc/svcauth_unix.c
--- .prev/net/sunrpc/svcauth_unix.c 2007-04-17 11:18:44.000000000 +1000
+++ ./net/sunrpc/svcauth_unix.c 2007-04-17 11:19:18.000000000 +1000
@@ -383,7 +383,10 @@ void svcauth_unix_purge(void)
static inline struct ip_map *
ip_map_cached_get(struct svc_rqst *rqstp)
{
- struct ip_map *ipm = rqstp->rq_sock->sk_info_authunix;
+ struct ip_map *ipm;
+ struct svc_sock *svsk = rqstp->rq_sock;
+ spin_lock(&svsk->sk_defer_lock);
+ ipm = svsk->sk_info_authunix;
if (ipm != NULL) {
if (!cache_valid(&ipm->h)) {
/*
@@ -391,12 +394,14 @@ ip_map_cached_get(struct svc_rqst *rqstp
* remembered, e.g. by a second mount from the
* same IP address.
*/
- rqstp->rq_sock->sk_info_authunix = NULL;
+ svsk->sk_info_authunix = NULL;
+ spin_unlock(&svsk->sk_defer_lock);
cache_put(&ipm->h, &ip_map_cache);
return NULL;
}
cache_get(&ipm->h);
}
+ spin_unlock(&svsk->sk_defer_lock);
return ipm;
}
@@ -405,9 +410,15 @@ ip_map_cached_put(struct svc_rqst *rqstp
{
struct svc_sock *svsk = rqstp->rq_sock;
- if (svsk->sk_sock->type == SOCK_STREAM && svsk->sk_info_authunix == NULL)
- svsk->sk_info_authunix = ipm; /* newly cached, keep the reference */
- else
+ spin_lock(&svsk->sk_defer_lock);
+ if (svsk->sk_sock->type == SOCK_STREAM &&
+ svsk->sk_info_authunix == NULL) {
+ /* newly cached, keep the reference */
+ svsk->sk_info_authunix = ipm;
+ ipm = NULL;
+ }
+ spin_unlock(&svsk->sk_defer_lock);
+ if (ipm)
cache_put(&ipm->h, &ip_map_cache);
}
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On 04/17/2007 3:23:40 +0200, Neil Brown <[email protected]> wrote:
>
> I did end up just hunting through the patches between 2.6.18 and
> 2.6.20 and I think I have found it (I had already looked at the code
> where the bug is, but didn't see it the first time :-().
>
> Could you try this patch and report the results please?
> I'm very confident that the patch is correct and required, but I would
> like to also know that it fixes your problem.
>
By the way, I patched the 2.6.20.7 kernel since you provided the patch,
and haven't encountered any stability problem, except some CFQ scheduler
related kernel panics, which are, I suppose, not linked to the NFS
problem (I still mention it because unrelated but linked problems like
this sometimes happens)
Thanks for the patch !
Gabriel Barazer
> Thanks
> NeilBrown
>
> ---------------------------------------------
> Use a spinlock to protect sk_info_authunix
>
> sk_info_authunix is not being protected properly so the object that
> it points to can be cache_put twice, leading to corruption.
>
> We borrow svsk->sk_defer_lock to provide the protection. We should probably
> rename that lock to have a more generic name.
>
> Signed-off-by: Neil Brown <[email protected]>
>
> ### Diffstat output
> ./net/sunrpc/svcauth_unix.c | 21 ++++++++++++++++-----
> 1 file changed, 16 insertions(+), 5 deletions(-)
>
> diff .prev/net/sunrpc/svcauth_unix.c ./net/sunrpc/svcauth_unix.c
> --- .prev/net/sunrpc/svcauth_unix.c 2007-04-17 11:18:44.000000000 +1000
> +++ ./net/sunrpc/svcauth_unix.c 2007-04-17 11:19:18.000000000 +1000
> @@ -383,7 +383,10 @@ void svcauth_unix_purge(void)
> static inline struct ip_map *
> ip_map_cached_get(struct svc_rqst *rqstp)
> {
> - struct ip_map *ipm = rqstp->rq_sock->sk_info_authunix;
> + struct ip_map *ipm;
> + struct svc_sock *svsk = rqstp->rq_sock;
> + spin_lock(&svsk->sk_defer_lock);
> + ipm = svsk->sk_info_authunix;
> if (ipm != NULL) {
> if (!cache_valid(&ipm->h)) {
> /*
> @@ -391,12 +394,14 @@ ip_map_cached_get(struct svc_rqst *rqstp
> * remembered, e.g. by a second mount from the
> * same IP address.
> */
> - rqstp->rq_sock->sk_info_authunix = NULL;
> + svsk->sk_info_authunix = NULL;
> + spin_unlock(&svsk->sk_defer_lock);
> cache_put(&ipm->h, &ip_map_cache);
> return NULL;
> }
> cache_get(&ipm->h);
> }
> + spin_unlock(&svsk->sk_defer_lock);
> return ipm;
> }
>
> @@ -405,9 +410,15 @@ ip_map_cached_put(struct svc_rqst *rqstp
> {
> struct svc_sock *svsk = rqstp->rq_sock;
>
> - if (svsk->sk_sock->type == SOCK_STREAM && svsk->sk_info_authunix == NULL)
> - svsk->sk_info_authunix = ipm; /* newly cached, keep the reference */
> - else
> + spin_lock(&svsk->sk_defer_lock);
> + if (svsk->sk_sock->type == SOCK_STREAM &&
> + svsk->sk_info_authunix == NULL) {
> + /* newly cached, keep the reference */
> + svsk->sk_info_authunix = ipm;
> + ipm = NULL;
> + }
> + spin_unlock(&svsk->sk_defer_lock);
> + if (ipm)
> cache_put(&ipm->h, &ip_map_cache);
> }
>
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs