2007-08-29 20:59:18

by Hua Zhong

[permalink] [raw]
Subject: regression of autofs for current git?

Hi,

I am wondering if this is a known issue, but I just built the current git
and several autofs mounts mysteriously disappeared. Restarting autofs could
fix some, but then lose others. 2.6.22 was fine.

Is there anything I could check other than bisect? (It may take some time
for me to get to it)

Thanks for your help.

Hua


2007-08-29 22:48:05

by Adrian Bunk

[permalink] [raw]
Subject: Re: regression of autofs for current git?

On Wed, Aug 29, 2007 at 01:58:58PM -0700, Hua Zhong wrote:

> Hi,

Hi Hua,

> I am wondering if this is a known issue, but I just built the current git
> and several autofs mounts mysteriously disappeared. Restarting autofs could
> fix some, but then lose others. 2.6.22 was fine.
>
> Is there anything I could check other than bisect? (It may take some time
> for me to get to it)

the commit below is the only autofs4 patch that went into the git tree
since 2.6.22.

Does reverting it fix your problems?

> Thanks for your help.
>
> Hua

cu
Adrian


commit 1864f7bd58351732593def024e73eca1f75bc352
Author: Ian Kent <[email protected]>
Date: Wed Aug 22 14:01:54 2007 -0700

autofs4: deadlock during create

Due to inconsistent locking in the VFS between calls to lookup and
revalidate deadlock can occur in the automounter.

The inconsistency is that the directory inode mutex is held for both lookup
and revalidate calls when called via lookup_hash whereas it is held only
for lookup during a path walk. Consequently, if the mutex is held during a
call to revalidate autofs4 can't release the mutex to callback the daemon
as it can't know whether it owns the mutex.

This situation happens when a process tries to create a directory within an
automount and a second process also tries to create the same directory
between the lookup and the mkdir. Since the first process has dropped the
mutex for the daemon callback, the second process takes it during
revalidate leading to deadlock between the autofs daemon and the second
process when the daemon tries to create the mount point directory.

After spending quite a bit of time trying to resolve this on more than one
occassion, using rather complex and ulgy approaches, it turns out that just
delaying the hashing of the dentry until the create operation works fine.

Signed-off-by: Ian Kent <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
index 2d4c8a3..45ff3d6 100644
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -587,19 +587,20 @@ static struct dentry *autofs4_lookup(struct inode *dir, struct dentry *dentry, s
unhashed = autofs4_lookup_unhashed(sbi, dentry->d_parent, &dentry->d_name);
if (!unhashed) {
/*
- * Mark the dentry incomplete, but add it. This is needed so
- * that the VFS layer knows about the dentry, and we can count
- * on catching any lookups through the revalidate.
- *
- * Let all the hard work be done by the revalidate function that
- * needs to be able to do this anyway..
- *
- * We need to do this before we release the directory semaphore.
+ * Mark the dentry incomplete but don't hash it. We do this
+ * to serialize our inode creation operations (symlink and
+ * mkdir) which prevents deadlock during the callback to
+ * the daemon. Subsequent user space lookups for the same
+ * dentry are placed on the wait queue while the daemon
+ * itself is allowed passage unresticted so the create
+ * operation itself can then hash the dentry. Finally,
+ * we check for the hashed dentry and return the newly
+ * hashed dentry.
*/
dentry->d_op = &autofs4_root_dentry_operations;

dentry->d_fsdata = NULL;
- d_add(dentry, NULL);
+ d_instantiate(dentry, NULL);
} else {
struct autofs_info *ino = autofs4_dentry_ino(unhashed);
DPRINTK("rehash %p with %p", dentry, unhashed);
@@ -607,15 +608,17 @@ static struct dentry *autofs4_lookup(struct inode *dir, struct dentry *dentry, s
* If we are racing with expire the request might not
* be quite complete but the directory has been removed
* so it must have been successful, so just wait for it.
+ * We need to ensure the AUTOFS_INF_EXPIRING flag is clear
+ * before continuing as revalidate may fail when calling
+ * try_to_fill_dentry (returning EAGAIN) if we don't.
*/
- if (ino && (ino->flags & AUTOFS_INF_EXPIRING)) {
+ while (ino && (ino->flags & AUTOFS_INF_EXPIRING)) {
DPRINTK("wait for incomplete expire %p name=%.*s",
unhashed, unhashed->d_name.len,
unhashed->d_name.name);
autofs4_wait(sbi, unhashed, NFY_NONE);
DPRINTK("request completed");
}
- d_rehash(unhashed);
dentry = unhashed;
}

@@ -658,7 +661,7 @@ static struct dentry *autofs4_lookup(struct inode *dir, struct dentry *dentry, s
* for all system calls, but it should be OK for the operations
* we permit from an autofs.
*/
- if (dentry->d_inode && d_unhashed(dentry)) {
+ if (!oz_mode && d_unhashed(dentry)) {
/*
* A user space application can (and has done in the past)
* remove and re-create this directory during the callback.
@@ -716,7 +719,7 @@ static int autofs4_dir_symlink(struct inode *dir,
strcpy(cp, symname);

inode = autofs4_get_inode(dir->i_sb, ino);
- d_instantiate(dentry, inode);
+ d_add(dentry, inode);

if (dir == dir->i_sb->s_root->d_inode)
dentry->d_op = &autofs4_root_dentry_operations;
@@ -844,7 +847,7 @@ static int autofs4_dir_mkdir(struct inode *dir, struct dentry *dentry, int mode)
return -ENOSPC;

inode = autofs4_get_inode(dir->i_sb, ino);
- d_instantiate(dentry, inode);
+ d_add(dentry, inode);

if (dir == dir->i_sb->s_root->d_inode)
dentry->d_op = &autofs4_root_dentry_operations;

2007-08-30 03:00:24

by Ian Kent

[permalink] [raw]
Subject: Re: regression of autofs for current git?

On Thu, 2007-08-30 at 00:47 +0200, Adrian Bunk wrote:
> On Wed, Aug 29, 2007 at 01:58:58PM -0700, Hua Zhong wrote:
>
> > Hi,
>
> Hi Hua,
>
> > I am wondering if this is a known issue, but I just built the current git
> > and several autofs mounts mysteriously disappeared. Restarting autofs could
> > fix some, but then lose others. 2.6.22 was fine.
> >
> > Is there anything I could check other than bisect? (It may take some time
> > for me to get to it)
>
> the commit below is the only autofs4 patch that went into the git tree
> since 2.6.22.
>
> Does reverting it fix your problems?

Maybe but there is an NFS change that appears to be in the current
2.6.23-rc kernel and doesn't seem to be in 2.6.22 that is known to break
some autofs maps and also breaks amd.

I can't seem to locate the commit just now.
In the meantime what version of user space autofs and nfs-utils are you
running?
And can you post your autofs maps please?

Ian




2007-08-30 03:10:06

by Hua Zhong

[permalink] [raw]
Subject: RE: regression of autofs for current git?

I am in the process of bisecting now. It takes a while but the bad commit is
between 2.6.22 and 13f9966b3ba5b45f47f2ea0eb0a90afceedfbb1f (June 28). I'll
continue tomorrow.

My system is FC4.

autofs-4.1.4-26
nfs-utils-1.0.7-13.FC4

By the way, the particular autofs commit Andrian sent is innocent. It's
something else.

It will still take me about 10 reboots to bisect. If anyone has a patch for
me to try, I'll give it a shot tomorrow.

> -----Original Message-----
> From: Ian Kent [mailto:[email protected]]
> Sent: Wednesday, August 29, 2007 8:00 PM
> To: Hua Zhong
> Cc: 'Linux Kernel Mailing List'; [email protected]; Adrian Bunk
> Subject: Re: regression of autofs for current git?
>
> On Thu, 2007-08-30 at 00:47 +0200, Adrian Bunk wrote:
> > On Wed, Aug 29, 2007 at 01:58:58PM -0700, Hua Zhong wrote:
> >
> > > Hi,
> >
> > Hi Hua,
> >
> > > I am wondering if this is a known issue, but I just built the
> current git
> > > and several autofs mounts mysteriously disappeared. Restarting
> autofs could
> > > fix some, but then lose others. 2.6.22 was fine.
> > >
> > > Is there anything I could check other than bisect? (It may take
> some time
> > > for me to get to it)
> >
> > the commit below is the only autofs4 patch that went into the git
> tree
> > since 2.6.22.
> >
> > Does reverting it fix your problems?
>
> Maybe but there is an NFS change that appears to be in the current
> 2.6.23-rc kernel and doesn't seem to be in 2.6.22 that is known to
> break
> some autofs maps and also breaks amd.
>
> I can't seem to locate the commit just now.
> In the meantime what version of user space autofs and nfs-utils are you
> running?
> And can you post your autofs maps please?
>
> Ian
>
>


2007-08-30 03:26:32

by Ian Kent

[permalink] [raw]
Subject: RE: regression of autofs for current git?

On Wed, 2007-08-29 at 20:09 -0700, Hua Zhong wrote:
> I am in the process of bisecting now. It takes a while but the bad commit is
> between 2.6.22 and 13f9966b3ba5b45f47f2ea0eb0a90afceedfbb1f (June 28). I'll
> continue tomorrow.
>
> My system is FC4.
>
> autofs-4.1.4-26
> nfs-utils-1.0.7-13.FC4
>
> By the way, the particular autofs commit Andrian sent is innocent. It's
> something else.
>
> It will still take me about 10 reboots to bisect. If anyone has a patch for
> me to try, I'll give it a shot tomorrow.

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=75180df2ed467866ada839fe73cf7cc7d75c0a22

This (and it's related patches) may be the problem.
I can probably tell if you post your map or if you strace the automount
process managing the a problem mount point and look for mount returning
EBUSY when it should succeed.

>
> > -----Original Message-----
> > From: Ian Kent [mailto:[email protected]]
> > Sent: Wednesday, August 29, 2007 8:00 PM
> > To: Hua Zhong
> > Cc: 'Linux Kernel Mailing List'; [email protected]; Adrian Bunk
> > Subject: Re: regression of autofs for current git?
> >
> > On Thu, 2007-08-30 at 00:47 +0200, Adrian Bunk wrote:
> > > On Wed, Aug 29, 2007 at 01:58:58PM -0700, Hua Zhong wrote:
> > >
> > > > Hi,
> > >
> > > Hi Hua,
> > >
> > > > I am wondering if this is a known issue, but I just built the
> > current git
> > > > and several autofs mounts mysteriously disappeared. Restarting
> > autofs could
> > > > fix some, but then lose others. 2.6.22 was fine.
> > > >
> > > > Is there anything I could check other than bisect? (It may take
> > some time
> > > > for me to get to it)
> > >
> > > the commit below is the only autofs4 patch that went into the git
> > tree
> > > since 2.6.22.
> > >
> > > Does reverting it fix your problems?
> >
> > Maybe but there is an NFS change that appears to be in the current
> > 2.6.23-rc kernel and doesn't seem to be in 2.6.22 that is known to
> > break
> > some autofs maps and also breaks amd.
> >
> > I can't seem to locate the commit just now.
> > In the meantime what version of user space autofs and nfs-utils are you
> > running?
> > And can you post your autofs maps please?
> >
> > Ian
> >
> >
>
>

2007-08-30 09:49:49

by Martin Knoblauch

[permalink] [raw]
Subject: RE: regression of autofs for current git?

On Wed, 2007-08-29 at 20:09 -0700, Ian Kent wrote:
>
>http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=75180df2ed467866ada839fe73cf7cc7d75c0a22
>
>This (and it's related patches) may be the problem.
>I can probably tell if you post your map or if you strace the
automount
>process managing the a problem mount point and look for mount
returning
>EBUSY when it should succeed.

Likely. That is the one that will break the user-space automounter as
well (and keeps me from .23). I don't care very much about what the
default is, but it would be great if the new behaviour could be
globally changed at run- (or boot-) time. It will be some time until
the new mount option makes it into the distros.

Cheers
Martin
PS: Sorry, but I likely killed the CC list


------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de