2008-01-16 03:06:30

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 2.6.24-rc7 1/2] sysfs: make sysfs_lookup() return ERR_PTR(-ENOENT) on failed lookup

sysfs tries to keep dcache a strict subset of sysfs_dirent tree by
shooting down dentries when a node is removed, that is, no negative
dentry for sysfs. However, the lookup function returned NULL and thus
created negative dentries when the target node didn't exist.

Make sysfs_lookup() return ERR_PTR(-ENOENT) on lookup failure. This
fixes the NULL dereference bug in sysfs_get_dentry() discovered by
bluetooth rfcomm device moving around.

Signed-off-by: Tejun Heo <[email protected]>
---
fs/sysfs/dir.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

Index: work/fs/sysfs/dir.c
===================================================================
--- work.orig/fs/sysfs/dir.c
+++ work/fs/sysfs/dir.c
@@ -678,8 +678,10 @@ static struct dentry * sysfs_lookup(stru
sd = sysfs_find_dirent(parent_sd, dentry->d_name.name);

/* no such entry */
- if (!sd)
+ if (!sd) {
+ ret = ERR_PTR(-ENOENT);
goto out_unlock;
+ }

/* attach dentry and inode */
inode = sysfs_get_inode(sd);


2008-01-16 03:11:17

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 2.6.24-rc7 2/2] sysfs: fix bugs in sysfs_rename/move_dir()

sysfs_rename/move_dir() have the following bugs.

* On dentry lookup failure, kfree() is called on ERR_PTR() value.
* sysfs_move_dir() has an extra dput() on success path.

Fix them.

Signed-off-by: Tejun Heo <[email protected]>
---
fs/sysfs/dir.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

Index: work/fs/sysfs/dir.c
===================================================================
--- work.orig/fs/sysfs/dir.c
+++ work/fs/sysfs/dir.c
@@ -783,6 +783,7 @@ int sysfs_rename_dir(struct kobject * ko
old_dentry = sysfs_get_dentry(sd);
if (IS_ERR(old_dentry)) {
error = PTR_ERR(old_dentry);
+ old_dentry = NULL;
goto out;
}

@@ -850,6 +851,7 @@ int sysfs_move_dir(struct kobject *kobj,
old_dentry = sysfs_get_dentry(sd);
if (IS_ERR(old_dentry)) {
error = PTR_ERR(old_dentry);
+ old_dentry = NULL;
goto out;
}
old_parent = old_dentry->d_parent;
@@ -857,6 +859,7 @@ int sysfs_move_dir(struct kobject *kobj,
new_parent = sysfs_get_dentry(new_parent_sd);
if (IS_ERR(new_parent)) {
error = PTR_ERR(new_parent);
+ new_parent = NULL;
goto out;
}

@@ -880,7 +883,6 @@ again:
error = 0;
d_add(new_dentry, NULL);
d_move(old_dentry, new_dentry);
- dput(new_dentry);

/* Remove from old parent's list and insert into new parent's list. */
sysfs_unlink_sibling(sd);

2008-01-16 03:42:37

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 2.6.24-rc7 2/2] sysfs: fix bugs in sysfs_rename/move_dir()



On Wed, 16 Jan 2008, Tejun Heo wrote:
>
> * sysfs_move_dir() has an extra dput() on success path.

Are you sure? How did this ever work?

Also, looking at this, I think the "how did this ever work" question is
answered by "it didn't", but I also think there are still serious problems
there. Look at

again:
mutex_lock(&old_parent->d_inode->i_mutex);
if (!mutex_trylock(&new_parent->d_inode->i_mutex)) {
mutex_unlock(&old_parent->d_inode->i_mutex);
goto again;
}

and wonder what happen sif old_parent == new_parent. Is that trying to
avoid an ABBA deadlock? Normally you'd do it by ordering the locks, or by
taking a third lock to guarantee serialization at a higher level (ie the
"s_vfs_rename_mutex" on the VFS layer)

I'd like to apply these two patches, but I really want to get more of an
ack for them from somebody like Al, or at least more of an explanation for
why it's all the right thing.

Linus

2008-01-16 03:52:40

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 2.6.24-rc7 2/2] sysfs: fix bugs in sysfs_rename/move_dir()

On Tue, Jan 15, 2008 at 07:41:58PM -0800, Linus Torvalds wrote:

> and wonder what happen sif old_parent == new_parent. Is that trying to
> avoid an ABBA deadlock? Normally you'd do it by ordering the locks, or by
> taking a third lock to guarantee serialization at a higher level (ie the
> "s_vfs_rename_mutex" on the VFS layer)
>
> I'd like to apply these two patches, but I really want to get more of an
> ack for them from somebody like Al, or at least more of an explanation for
> why it's all the right thing.

No ACK is coming until we get something resembling analysis of locking
scheme. Which won't happen until we at least get the "what callers are
allowed to do" written down, damnit. As it is, I'm more than inclined
to propose ripping kobject_move() out, especially since it has only two
users - something s390-specific and rfcomm, with its shitloads of problems
beyond just sysfs interaction.

2008-01-16 06:47:29

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 2.6.24-rc7 2/2] sysfs: fix bugs in sysfs_rename/move_dir()

Linus Torvalds wrote:
>
> On Wed, 16 Jan 2008, Tejun Heo wrote:
>> * sysfs_move_dir() has an extra dput() on success path.
>
> Are you sure? How did this ever work?

I'm pretty sure. I've seen dentry blowing up due to early release &&
compared it with older code. It was my mistake during restructuring
error path. The only user of sysfs_move_dir() was S390 Cornelia works
on (cc'd). Cornelia is usually very good at spotting and debugging
sysfs bugs. Dunno how it got slipped this time.

> Also, looking at this, I think the "how did this ever work" question is
> answered by "it didn't",

Before dput() bug was introduced, it worked although error handling path
was broken.

> but I also think there are still serious problems
> there. Look at
>
> again:
> mutex_lock(&old_parent->d_inode->i_mutex);
> if (!mutex_trylock(&new_parent->d_inode->i_mutex)) {
> mutex_unlock(&old_parent->d_inode->i_mutex);
> goto again;
> }
>
> and wonder what happen sif old_parent == new_parent. Is that trying to
> avoid an ABBA deadlock?

It will fall in infinite loop if old_parent == new_parent and for the
question, I suppose so. Cornelia, right?

> Normally you'd do it by ordering the locks, or by
> taking a third lock to guarantee serialization at a higher level (ie the
> "s_vfs_rename_mutex" on the VFS layer)

sysfs currently doesn't depend on VFS locking. VFS locking is done just
to keep VFS layer happy. sysfs_dirent hierarchy is protected by
sysfs_mutex and renaming/moving are protected by sysfs_rename_mutex. As
both ops are under rename_mutex, I think the above code just can grab
both mutexes in any order. It's probably a remnant of the days when
sysfs used VFS locking to protect internal structures.

s390 was the only user of the move interface till now and through all
the recent sysfs change, it didn't receive enough attention other than
Cornelia's testing. Eventually, I think sysfs_rename_dir() and
sysfs_move_dir() should be merged into sysfs_move() but for the current
two users, I don't see anything wrong with the locking.

Thanks.

--
tejun

2008-01-16 08:02:36

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 2.6.24-rc7 2/2] sysfs: fix bugs in sysfs_rename/move_dir()

Hello.

Al Viro wrote:
> No ACK is coming until we get something resembling analysis of locking
> scheme. Which won't happen until we at least get the "what callers are
> allowed to do" written down, damnit.

I agree that sysfs needs further clean up. As I wrote in the earlier
thread, sysfs has been under constant flux of cleanups and updates
although it has slowed down recently due to the hazy number of libata
bugs. For example, several months ago with buggy dentry / inode
reclamation, sysfs could trigger pretty cryptic oopses under memory
pressure and locking was more awkward and buggy.

The two posted patches are bug fixes for apparent bugs which can be
triggered by the current two users of the interface. AFAICS, locking
there is weird but correct for the current two users. If you can find
any problem there, please lemme know. We shouldn't hold this type of
fixes for future clean ups.

> As it is, I'm more than inclined
> to propose ripping kobject_move() out, especially since it has only two
> users - something s390-specific and rfcomm, with its shitloads of problems
> beyond just sysfs interaction.

Can you please elaborate? All sysfs problems discovered by the rfcomm
are fixed by the posted patches. Dave Young has a patch waiting for
verification by the tester. Furthermore, even if we rip out
kobject_move() in the future, I don't think -rc7 is the right time to do it.

I posted some patches a while back which did sysfs locking
reorganization, separation from and proper layering with kobject /
driver model. There were some disagreements regarding the interface and
I got struck by load of ATA bugs. I'll dig it up and give it another
shot in a few weeks.

Thanks.

--
tejun

2008-01-16 08:20:33

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCH 2.6.24-rc7 2/2] sysfs: fix bugs in sysfs_rename/move_dir()

On Wed, 16 Jan 2008 15:47:11 +0900,
Tejun Heo <[email protected]> wrote:

> Linus Torvalds wrote:
> >
> > On Wed, 16 Jan 2008, Tejun Heo wrote:
> >> * sysfs_move_dir() has an extra dput() on success path.
> >
> > Are you sure? How did this ever work?
>
> I'm pretty sure. I've seen dentry blowing up due to early release &&
> compared it with older code. It was my mistake during restructuring
> error path. The only user of sysfs_move_dir() was S390 Cornelia works
> on (cc'd). Cornelia is usually very good at spotting and debugging
> sysfs bugs. Dunno how it got slipped this time.

Hm, don't know either. It never blew for my tests (and I did extensive
moving around of ccw devices...)

>
> > Also, looking at this, I think the "how did this ever work" question is
> > answered by "it didn't",
>
> Before dput() bug was introduced, it worked although error handling path
> was broken.
>
> > but I also think there are still serious problems
> > there. Look at
> >
> > again:
> > mutex_lock(&old_parent->d_inode->i_mutex);
> > if (!mutex_trylock(&new_parent->d_inode->i_mutex)) {
> > mutex_unlock(&old_parent->d_inode->i_mutex);
> > goto again;
> > }
> >
> > and wonder what happen sif old_parent == new_parent. Is that trying to
> > avoid an ABBA deadlock?
>
> It will fall in infinite loop if old_parent == new_parent and for the
> question, I suppose so. Cornelia, right?

Yes. But we jump out early if old_parent == new_parent.

>
> > Normally you'd do it by ordering the locks, or by
> > taking a third lock to guarantee serialization at a higher level (ie the
> > "s_vfs_rename_mutex" on the VFS layer)
>
> sysfs currently doesn't depend on VFS locking. VFS locking is done just
> to keep VFS layer happy. sysfs_dirent hierarchy is protected by
> sysfs_mutex and renaming/moving are protected by sysfs_rename_mutex. As
> both ops are under rename_mutex, I think the above code just can grab
> both mutexes in any order. It's probably a remnant of the days when
> sysfs used VFS locking to protect internal structures.

Yes, I think so.

>
> s390 was the only user of the move interface till now and through all
> the recent sysfs change, it didn't receive enough attention other than
> Cornelia's testing.

I just wonder why I never hit it since I regularily do some testing of
the moving stuff (on Greg's tree and on mainline). But bluetooth has
been using this interface for some time as well (and they were the ones
with the problems triggering this patch).

> Eventually, I think sysfs_rename_dir() and
> sysfs_move_dir() should be merged into sysfs_move() but for the current
> two users, I don't see anything wrong with the locking.

Me too. But maybe I'm just too familiar with the code...

2008-01-17 04:22:29

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 2.6.24-rc7 2/2] sysfs: fix bugs in sysfs_rename/move_dir()

On Wed, Jan 16, 2008 at 04:23:13PM +0900, Tejun Heo wrote:

> The two posted patches are bug fixes for apparent bugs which can be
> triggered by the current two users of the interface. AFAICS, locking
> there is weird but correct for the current two users. If you can find
> any problem there, please lemme know.

How about "what happens after that move-to-NULL if you have a cwd inside
the subtree", for starters?

> We shouldn't hold this type of
> fixes for future clean ups.

No, but I'd rather see the rules for callers of sysfs/kobject primitives
spelled out - before cleanups or review become even possible.

> > As it is, I'm more than inclined
> > to propose ripping kobject_move() out, especially since it has only two
> > users - something s390-specific and rfcomm, with its shitloads of problems
> > beyond just sysfs interaction.
>
> Can you please elaborate? All sysfs problems discovered by the rfcomm
> are fixed by the posted patches. Dave Young has a patch waiting for
> verification by the tester.

Umm... IIRC, there'd been a lot of fun with tty and procfs sides of that;
will check.

> Furthermore, even if we rip out
> kobject_move() in the future, I don't think -rc7 is the right time to do it.

OK... You do have a point, but at this stage I'm not convinced that this
thing is safe and usable. I agree that patches do not make things worse,
but I suspect that the real problem with kobject_move() is that it's a
fundamentally broken interface.