2004-03-29 23:17:33

by Greg KH

[permalink] [raw]
Subject: Re: Unregistering interfaces

On Mon, Mar 29, 2004 at 01:25:51PM -0800, Andrew Morton wrote:
> Greg KH <[email protected]> wrote:
> >
> > On Sun, Mar 28, 2004 at 12:38:57PM -0800, Andrew Morton wrote:
> > > Alan Stern <[email protected]> wrote:
> > > >
> > > > However, a very noticeable and IMO unacceptable delay occurs when there's
> > > > a reference to a kobject caused by a negative dentry that won't get
> > > > recycled until the system decides it's good and ready. That's the real
> > > > problem I wanted to call to people's attention.
> > >
> > > Have you verified that this actually happens?
> >
> > Yes, I've verified this, and it looks like Maneesh also agrees with
> > this.
> >
> > > If so, what are its effects? rmmod hangs for half an hour? Cannot reload
> > > the module?
> >
> > Well, before the patch that I submitted for the USB core, we would hang
> > waiting for the release function to return as there was still a
> > reference to the kobject pending.
> >
> > For other subsystems (and USB), this might cause nasty oopses when the
> > kobject is finally released and yet the module has been unloaded already
> > (as the owner of the reference did not cause the module reference count
> > to increment.) This is bad.
> >
>
> The module should remain in memory, "unhashed", until the final kobject
> reference falls to zero. Destruction of that kobject causes the refcount
> on the module to fall to zero which causes the entire module to be
> released.
>
> (hmm, the existence of a kobject doesn't appear to contribute to its
> module's refcount. Why not?)

It does, if a file for that kobject is opened. In this case, there was
no file opened, so the module refcount isn't incremented.

> But the module system is that smart, so what we do instead is to block
> rmmod until the module refcount falls to zero, and rmmod then does the
> final destruction. We could have punted the module destruction up to some
> reaper thread but for some reason did not do so.

Well, as the module refcount was never incremented, it does not do this.
I just verified this (accidentally) by removing my lp modules and then
doing a bk pull. kswapd forced the stale dentry out, which decremented
the kobject reference count, which then tried to call the release
function in the (now gone) lp module. My machine then spit up a lovely
oops, and was reduced to a unusable sludge as there was no more kswapd
running :(

> So as far as I can tell, the only problem we have is that rmmod will hang
> around until memory pressure, yes?

Nope, oopses will happen. You can verify this yourself by doing much of
what I just did.

This needs to get fixed, as it's not just a USB issue (so I've added
lkml on the cc list.)

> Maybe a shrink_dcache_parent(dentry) on entry to simple_rmdir() would
> suffice?

Will that get rid of the references properly nwhen we remove the
kobject?

thanks,

greg k-h


2004-03-29 23:29:22

by Andrew Morton

[permalink] [raw]
Subject: Re: Unregistering interfaces

Greg KH <[email protected]> wrote:
>
> > The module should remain in memory, "unhashed", until the final kobject
> > reference falls to zero. Destruction of that kobject causes the refcount
> > on the module to fall to zero which causes the entire module to be
> > released.
> >
> > (hmm, the existence of a kobject doesn't appear to contribute to its
> > module's refcount. Why not?)
>
> It does, if a file for that kobject is opened. In this case, there was
> no file opened, so the module refcount isn't incremented.

hm, surprised. Shouldn't the existence of a kobject contribute to its
module's refcount?

> > Maybe a shrink_dcache_parent(dentry) on entry to simple_rmdir() would
> > suffice?
>
> Will that get rid of the references properly nwhen we remove the
> kobject?

That's one the dcache guys could address better, but I was mainly proposing
it as a way of removing any negative dentries. But it appears that we have
problems beyond negative dentries?

2004-03-30 00:02:03

by Greg KH

[permalink] [raw]
Subject: Re: Unregistering interfaces

On Mon, Mar 29, 2004 at 03:31:17PM -0800, Andrew Morton wrote:
> Greg KH <[email protected]> wrote:
> >
> > > The module should remain in memory, "unhashed", until the final kobject
> > > reference falls to zero. Destruction of that kobject causes the refcount
> > > on the module to fall to zero which causes the entire module to be
> > > released.
> > >
> > > (hmm, the existence of a kobject doesn't appear to contribute to its
> > > module's refcount. Why not?)
> >
> > It does, if a file for that kobject is opened. In this case, there was
> > no file opened, so the module refcount isn't incremented.
>
> hm, surprised. Shouldn't the existence of a kobject contribute to its
> module's refcount?

No, a kobject by itself knows nothing about a module. Only the
attribute files do (and they are the things that contain the struct
module *), as they are what user space can grab references to.

I never thought that the kobject would care, as it is only a directory,
and I didn't think that anything could grab directory references on
their own. But then Maneesh's patch wasn't in the kernel at that time
:)

thanks,

greg k-h

2004-03-30 05:47:38

by Maneesh Soni

[permalink] [raw]
Subject: Re: Unregistering interfaces

On Mon, Mar 29, 2004 at 03:31:17PM -0800, Andrew Morton wrote:
> Greg KH <[email protected]> wrote:
> >
> > > The module should remain in memory, "unhashed", until the final kobject
> > > reference falls to zero. Destruction of that kobject causes the refcount
> > > on the module to fall to zero which causes the entire module to be
> > > released.
> > >
> > > (hmm, the existence of a kobject doesn't appear to contribute to its
> > > module's refcount. Why not?)
> >
> > It does, if a file for that kobject is opened. In this case, there was
> > no file opened, so the module refcount isn't incremented.
>
> hm, surprised. Shouldn't the existence of a kobject contribute to its
> module's refcount?
>
> > > Maybe a shrink_dcache_parent(dentry) on entry to simple_rmdir() would
> > > suffice?
> >
> > Will that get rid of the references properly nwhen we remove the
> > kobject?
>
> That's one the dcache guys could address better, but I was mainly proposing
> it as a way of removing any negative dentries. But it appears that we have
> problems beyond negative dentries?

shrink_dcache_parent() will only free up the zero ref. counted dentries,
positive or negatvie doesnot matter. But we may have some faulty user space
app holding a sysfs file, sysfs directory, corresponding kobject, module etc.
Refer http://bugme.osdl.org/show_bug.cgi?id=1884

(Greg, I am including the reply for other thread also here)
http://marc.theaimsgroup.com/?l=linux-kernel&m=108059485726012&w=2

Fix for these problems will depend upon the life time rule for
modules Vs kobjects Vs dentries. When and what should die or live?

I am not very clear about how the first two behave. Still I can think
of a solution within sysfs like this as Alen suggested. But again I am not
very sure if this can be done properly without any races. But anyway I am
trying.

1) backout my patch sysfs-pin-kobject.patch
2) handle NULL d_fsdata while trying to access the corresponding kobject
for an attribute file.

Maneesh
--
Maneesh Soni
Linux Technology Center,
IBM Software Lab, Bangalore, India
email: [email protected]
Phone: 91-80-25044999 Fax: 91-80-25268553
T/L : 9243696

2004-03-30 07:34:13

by Maneesh Soni

[permalink] [raw]
Subject: Re: Unregistering interfaces

On Mon, Mar 29, 2004 at 04:01:29PM -0800, Greg KH wrote:
> On Mon, Mar 29, 2004 at 03:31:17PM -0800, Andrew Morton wrote:
> > Greg KH <[email protected]> wrote:
> > >
> > > > The module should remain in memory, "unhashed", until the final kobject
> > > > reference falls to zero. Destruction of that kobject causes the refcount
> > > > on the module to fall to zero which causes the entire module to be
> > > > released.
> > > >
> > > > (hmm, the existence of a kobject doesn't appear to contribute to its
> > > > module's refcount. Why not?)
> > >
> > > It does, if a file for that kobject is opened. In this case, there was
> > > no file opened, so the module refcount isn't incremented.
> >
> > hm, surprised. Shouldn't the existence of a kobject contribute to its
> > module's refcount?
>
> No, a kobject by itself knows nothing about a module. Only the
> attribute files do (and they are the things that contain the struct
> module *), as they are what user space can grab references to.
>
> I never thought that the kobject would care, as it is only a directory,
> and I didn't think that anything could grab directory references on
> their own. But then Maneesh's patch wasn't in the kernel at that time
> :)
>

I don't how fisible is it to move the owner field to the kobject and keep the
module and kobject both alive as long as there are sysfs directories referring
to them.

Maneesh


--
Maneesh Soni
Linux Technology Center,
IBM Software Lab, Bangalore, India
email: [email protected]
Phone: 91-80-25044999 Fax: 91-80-25268553
T/L : 9243696

2004-03-30 15:40:02

by Alan Stern

[permalink] [raw]
Subject: Re: Unregistering interfaces

On Mon, 29 Mar 2004, Greg KH wrote:

> On Mon, Mar 29, 2004 at 03:31:17PM -0800, Andrew Morton wrote:
> > Greg KH <[email protected]> wrote:
> > >
> > > > The module should remain in memory, "unhashed", until the final kobject
> > > > reference falls to zero. Destruction of that kobject causes the refcount
> > > > on the module to fall to zero which causes the entire module to be
> > > > released.
> > > >
> > > > (hmm, the existence of a kobject doesn't appear to contribute to its
> > > > module's refcount. Why not?)
> > >
> > > It does, if a file for that kobject is opened. In this case, there was
> > > no file opened, so the module refcount isn't incremented.
> >
> > hm, surprised. Shouldn't the existence of a kobject contribute to its
> > module's refcount?
>
> No, a kobject by itself knows nothing about a module. Only the
> attribute files do (and they are the things that contain the struct
> module *), as they are what user space can grab references to.

There's another reason. Practically every module has kobjects registered
all the time; they are not unregistered until the module's unload
procedure runs. If these kobjects contributed to the modules refcount
then it would be impossible ever to rmmod the module without using the -f
flag.

Furthermore, all the excess usage counts from those kobjects would swamp
and hide the genuine usage counts (the ones that are displayed now). The
user would have no way to know whether rmmod -f would work or would hang.

Alan Stern


2004-03-30 23:08:47

by Greg KH

[permalink] [raw]
Subject: Re: Unregistering interfaces

On Tue, Mar 30, 2004 at 11:21:35AM +0530, Maneesh Soni wrote:
>
> I am not very clear about how the first two behave. Still I can think
> of a solution within sysfs like this as Alen suggested. But again I am not
> very sure if this can be done properly without any races. But anyway I am
> trying.
>
> 1) backout my patch sysfs-pin-kobject.patch

I think we need to do this now, as it is not a correct fix, and causes
more problems than good at this time. I suggest you try to fix the oops
you were seeing in either another way, or in a way that does not break
other things :)

I'll send a patch off to Linus to do this in a little bit...

thanks,

greg k-h

2004-03-30 23:14:47

by Andrew Morton

[permalink] [raw]
Subject: Re: Unregistering interfaces

Greg KH <[email protected]> wrote:
>
> On Tue, Mar 30, 2004 at 11:21:35AM +0530, Maneesh Soni wrote:
> >
> > I am not very clear about how the first two behave. Still I can think
> > of a solution within sysfs like this as Alen suggested. But again I am not
> > very sure if this can be done properly without any races. But anyway I am
> > trying.
> >
> > 1) backout my patch sysfs-pin-kobject.patch
>
> I think we need to do this now, as it is not a correct fix, and causes
> more problems than good at this time.

But the patch was correct. sysfs retains a pointer to the kobject, it
should take a ref on it?

> I suggest you try to fix the oops
> you were seeing in either another way, or in a way that does not break
> other things :)

Didn't we demonstrate that the code which broke was already broken? And
that it has other problems regardless of the kobject pinning fix, such as the
userpace-holding-a-file-open-wedges-khubd problem?

Worried that this is all heading in the wrong direction...

2004-03-30 23:32:05

by Greg KH

[permalink] [raw]
Subject: Re: Unregistering interfaces

On Tue, Mar 30, 2004 at 03:16:37PM -0800, Andrew Morton wrote:
> Greg KH <[email protected]> wrote:
> >
> > On Tue, Mar 30, 2004 at 11:21:35AM +0530, Maneesh Soni wrote:
> > >
> > > I am not very clear about how the first two behave. Still I can think
> > > of a solution within sysfs like this as Alen suggested. But again I am not
> > > very sure if this can be done properly without any races. But anyway I am
> > > trying.
> > >
> > > 1) backout my patch sysfs-pin-kobject.patch
> >
> > I think we need to do this now, as it is not a correct fix, and causes
> > more problems than good at this time.
>
> But the patch was correct. sysfs retains a pointer to the kobject, it
> should take a ref on it?

Yes, but it was taking references for files that are not present in
sysfs. That's not good :)

> > I suggest you try to fix the oops
> > you were seeing in either another way, or in a way that does not break
> > other things :)
>
> Didn't we demonstrate that the code which broke was already broken? And
> that it has other problems regardless of the kobject pinning fix, such as the
> userpace-holding-a-file-open-wedges-khubd problem?

No. We fixed the wedge-khubd issue. That was my fault for allowing
that change to go in the first place. Sorry about that.

Let me run some tests with Maneesh's patch pulled out to see if that
solves the oopses I can generate...

> Worried that this is all heading in the wrong direction...

I'm worried about that too.

thanks,

greg k-h

2004-03-30 23:56:40

by Greg KH

[permalink] [raw]
Subject: [PATCH] back out sysfs reference count change

Hi,

The patch below backs out Maneesh's sysfs patch that was recently added
to the kernel. In its defense, the original patch did solve some fixes
that could be duplicated on SMP machines, but the side affect of the
patch caused lots of problems. Basically it caused kobjects to get
their references incremented when files that are not present in the
kobject are asked for (udev can easily trigger this when it looks for
files call "dev" in directories that do not have that file). This can
cause easy oopses when the VFS later ages out those old dentries and the
kobject has its reference finally released (usually after the module
that the kobject lived in was removed.)

I will continue to work with Maneesh to try to solve the original bug,
but for now, this patch needs to be applied.

thanks,

greg k-h

--- 1.10/fs/sysfs/dir.c Fri Mar 19 17:46:51 2004
+++ edited/fs/sysfs/dir.c Tue Mar 30 15:23:21 2004
@@ -20,18 +20,6 @@
return 0;
}

-static void sysfs_d_iput(struct dentry * dentry, struct inode * inode)
-{
- struct kobject * kobj = dentry->d_fsdata;
-
- if (kobj)
- kobject_put(kobj);
- iput(inode);
-}
-
-static struct dentry_operations sysfs_dentry_operations = {
- .d_iput = &sysfs_d_iput,
-};

static int create_dir(struct kobject * k, struct dentry * p,
const char * n, struct dentry ** d)
@@ -45,8 +33,7 @@
S_IFDIR| S_IRWXU | S_IRUGO | S_IXUGO,
init_dir);
if (!error) {
- (*d)->d_op = &sysfs_dentry_operations;
- (*d)->d_fsdata = kobject_get(k);
+ (*d)->d_fsdata = k;
p->d_inode->i_nlink++;
}
dput(*d);

2004-03-30 23:56:30

by Alan Stern

[permalink] [raw]
Subject: Re: Unregistering interfaces

On Tue, 30 Mar 2004, Andrew Morton wrote:

> Greg KH <[email protected]> wrote:
> >
> > I think we need to do this now, as it is not a correct fix, and causes
> > more problems than good at this time.
>
> But the patch was correct. sysfs retains a pointer to the kobject, it
> should take a ref on it?
>
> > I suggest you try to fix the oops
> > you were seeing in either another way, or in a way that does not break
> > other things :)
>
> Didn't we demonstrate that the code which broke was already broken? And
> that it has other problems regardless of the kobject pinning fix, such as the
> userpace-holding-a-file-open-wedges-khubd problem?
>
> Worried that this is all heading in the wrong direction...

There are two problems to consider:

(1) sysfs retains pointers to kobjects long after they have been
unregistered because of the negative dentrys.

(2) khubd blocks when removing configurations.

Maneesh's change caused (1) because it grabbed a reference to protect an
existing pointer. The way to repair the damage is to delete the pointer
ASAP so the reference can be dropped. That will require changing several
other sysfs routines that assume the pointer is valid.

Such a change is needed because otherwise module unloading can be delayed
indefinitely, until the system decides it needs to reuse the negative
dentry.


(2) has been repaired temporarily. We're still discussing the right way
to fix it permanently, though. The underlying cause for the reason that
khubd blocked is an odd feature of sysfs: it's willing to delete
directories that contain subdirectories, while leaving the subdirectories
in existence. (The connection to khubd is slightly obscure but it can be
traced.)

Greg has proposed using an awkward scheme for repeatedly parsing and
creating dynamic data structures from USB configuration descriptors to fix
(2). I'm in favor of changing the behavior of sysfs, so that either it
refuses to delete directories that contain subdirectories or else it
recursively deletes the subdirectories first. At this point nothing has
been settled.

Alan Stern

2004-03-30 23:57:53

by Greg KH

[permalink] [raw]
Subject: Re: Unregistering interfaces

On Tue, Mar 30, 2004 at 03:30:01PM -0800, Greg KH wrote:
>
> Let me run some tests with Maneesh's patch pulled out to see if that
> solves the oopses I can generate...

With the patch removed, I can't oops the kernel. With it in, I can
easily oops it. So I sent the backout patch to Linus.

I'll work with Maneesh to try to solve the original problem that he saw
and tried to fix in such a way that does not cause other problems.

thanks,

greg k-h

2004-03-31 00:09:46

by David Brownell

[permalink] [raw]
Subject: Re: Unregistering interfaces

Alan Stern wrote:

> I'm in favor of changing the behavior of sysfs, so that either it
> refuses to delete directories that contain subdirectories or else it
> recursively deletes the subdirectories first. At this point nothing has
> been settled.

Hmm, certainly I agree khubd should be deleting things bottom-up.

Are you say it isn't doing that already? Or that it's trying to,
but something's preventing that from working?

- Dave




2004-03-31 00:35:38

by Greg KH

[permalink] [raw]
Subject: Re: Unregistering interfaces

On Tue, Mar 30, 2004 at 06:56:27PM -0500, Alan Stern wrote:
> On Tue, 30 Mar 2004, Andrew Morton wrote:
>
> > Greg KH <[email protected]> wrote:
> > >
> > > I think we need to do this now, as it is not a correct fix, and causes
> > > more problems than good at this time.
> >
> > But the patch was correct. sysfs retains a pointer to the kobject, it
> > should take a ref on it?
> >
> > > I suggest you try to fix the oops
> > > you were seeing in either another way, or in a way that does not break
> > > other things :)
> >
> > Didn't we demonstrate that the code which broke was already broken? And
> > that it has other problems regardless of the kobject pinning fix, such as the
> > userpace-holding-a-file-open-wedges-khubd problem?
> >
> > Worried that this is all heading in the wrong direction...
>
> There are two problems to consider:
>
> (1) sysfs retains pointers to kobjects long after they have been
> unregistered because of the negative dentrys.

That is now taken care of with the patch I just sent to Linus by taking
out this patch.

> (2) khubd blocks when removing configurations.

That was fixed by the patch from you, undoing your previous patch which
blocked.

So everyone is happy now, right? :)

thanks,

greg k-h

2004-03-31 00:35:29

by Greg KH

[permalink] [raw]
Subject: Re: Unregistering interfaces

On Tue, Mar 30, 2004 at 04:08:53PM -0800, David Brownell wrote:
> Alan Stern wrote:
>
> > I'm in favor of changing the behavior of sysfs, so that either it
> >refuses to delete directories that contain subdirectories or else it
> >recursively deletes the subdirectories first. At this point nothing has
> >been settled.
>
> Hmm, certainly I agree khubd should be deleting things bottom-up.
>
> Are you say it isn't doing that already? Or that it's trying to,
> but something's preventing that from working?

Look at everything that hangs off of usb devices in the sysfs tree that
the usb core knows nothing about (scsi hosts, tty devices, etc.)

Anyway, it's working properly now, and is why I added that kobject
parent patch a long time ago (which Alan still seems to think is
incorrect...)

thanks,

greg k-h

2004-03-31 02:12:25

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] back out sysfs reference count change

On Wed, 2004-03-31 at 09:55, Greg KH wrote:
> Hi,
>
> The patch below backs out Maneesh's sysfs patch that was recently added
> to the kernel. In its defense, the original patch did solve some fixes
> that could be duplicated on SMP machines, but the side affect of the
> patch caused lots of problems. Basically it caused kobjects to get
> their references incremented when files that are not present in the
> kobject are asked for (udev can easily trigger this when it looks for
> files call "dev" in directories that do not have that file). This can
> cause easy oopses when the VFS later ages out those old dentries and the
> kobject has its reference finally released (usually after the module
> that the kobject lived in was removed.)

I think that the bug in the first place is to have an existing
kobject that didn't bump the module ref count.

If a kobject exists that have a pointer to the module code (the
release function), it _MUST_ have bumped the module ref count,
that's the whole point of the module reference count.

If rmmod blocks forever because that kobject has a stale reference,
that's a different problem, but khubd should not be involved in
that process and should definitely not be blocked and not wait for
the kobject to go away.

Ben.


2004-03-31 02:19:43

by Andrew Morton

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] back out sysfs reference count change

Benjamin Herrenschmidt <[email protected]> wrote:
>
> I think that the bug in the first place is to have an existing
> kobject that didn't bump the module ref count.
>
> If a kobject exists that have a pointer to the module code (the
> release function), it _MUST_ have bumped the module ref count,
> that's the whole point of the module reference count.
>
> If rmmod blocks forever because that kobject has a stale reference,
> that's a different problem, but khubd should not be involved in
> that process and should definitely not be blocked and not wait for
> the kobject to go away.

Amen, Brother Ben. I think the hangup here is that lsmod would show module
refcounts of 2,417. So for cosmetic reasons, the kobject should take a ref
against an intermediate kref, which has a single ref on the module.

But it looks like that's all in a faraway perfect world, and Greg is going
to fix stuff up somehow ;)

2004-03-31 09:22:13

by Maneesh Soni

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] back out sysfs reference count change

On Tue, Mar 30, 2004 at 06:19:15PM -0800, Andrew Morton wrote:
>
> But it looks like that's all in a faraway perfect world, and Greg is going
> to fix stuff up somehow ;)

For convenience I will explain the race here..

cpu 0 cpu 1
kobject_unregister() sysfs_open_file()
kobject_del() check_perm()
sysfs_remove_dir() :
(dentry remains alive due to ref. taken :
on the way to sysfs_open_file) :
kobject_put() :
kobject_cleanup() kobject_get(->d_fsdata)

cpu 1 could end up referring to a freed kobject through dentry->d_fsdata or
starts spitting Badness in kobject_get at lib/kobject.c:429. For triggering
this race try running these two loops simultaneously on SMP

# while true; do insmod drivers/net/dummy.ko; rmmod dummy; done
# while true; do find /sys/class/net | xargs cat; done

Probably it can be solved by making sure that when sysfs file is
opened/read/written some _race_ free check is done and fail if kobject if gone.

Maneesh


--
Maneesh Soni
Linux Technology Center,
IBM Software Lab, Bangalore, India
email: [email protected]
Phone: 91-80-25044999 Fax: 91-80-25268553
T/L : 9243696

2004-03-31 15:11:41

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] back out sysfs reference count change

On Wed, 31 Mar 2004, Maneesh Soni wrote:

> For convenience I will explain the race here..
>
> cpu 0 cpu 1
> kobject_unregister() sysfs_open_file()
> kobject_del() check_perm()
> sysfs_remove_dir() :
> (dentry remains alive due to ref. taken :
> on the way to sysfs_open_file) :
> kobject_put() :
> kobject_cleanup() kobject_get(->d_fsdata)
>
> cpu 1 could end up referring to a freed kobject through dentry->d_fsdata or
> starts spitting Badness in kobject_get at lib/kobject.c:429. For triggering
> this race try running these two loops simultaneously on SMP
>
> # while true; do insmod drivers/net/dummy.ko; rmmod dummy; done
> # while true; do find /sys/class/net | xargs cat; done
>
> Probably it can be solved by making sure that when sysfs file is
> opened/read/written some _race_ free check is done and fail if kobject if gone.
>
> Maneesh

Here's a suggestion. At the start of check_perm() grab the dentry
semaphore, then check whether d_fsdata is NULL, if it isn't then do the
kobject_get(), then unlock the semaphore.

Alan Stern

2004-03-31 15:32:29

by Alan Stern

[permalink] [raw]
Subject: Re: Unregistering interfaces

On Tue, 30 Mar 2004, David Brownell wrote:

> Alan Stern wrote:
>
> > I'm in favor of changing the behavior of sysfs, so that either it
> > refuses to delete directories that contain subdirectories or else it
> > recursively deletes the subdirectories first. At this point nothing has
> > been settled.
>
> Hmm, certainly I agree khubd should be deleting things bottom-up.
>
> Are you say it isn't doing that already? Or that it's trying to,
> but something's preventing that from working?

No, no -- as far as I know khudb gets rid of things in a perfectly correct
manner. But there are plenty of other parts in the Linux kernel, and I
don't know that they all delete things bottom-up. Greg implies that
usb-serial and the kernel's tty subsystem don't do this; I haven't had a
chance to look into it.

Alan Stern

2004-03-31 15:54:20

by Alan Stern

[permalink] [raw]
Subject: Re: Unregistering interfaces

On Tue, 30 Mar 2004, Greg KH wrote:

> On Tue, Mar 30, 2004 at 06:56:27PM -0500, Alan Stern wrote:
> >
> > There are two problems to consider:
> >
> > (1) sysfs retains pointers to kobjects long after they have been
> > unregistered because of the negative dentrys.
>
> That is now taken care of with the patch I just sent to Linus by taking
> out this patch.

Yes, and Maneesh is working to replace it with a more suitable approach.

> > (2) khubd blocks when removing configurations.
>
> That was fixed by the patch from you, undoing your previous patch which
> blocked.
>
> So everyone is happy now, right? :)

That change was just a temporary fix, never intended as a permanent
solution.

Here's a suggestion for a correct solution. It's a little bit awkward,
but less so than other ideas I've heard.

Define struct usb_interface_cache to be a subset of struct usb_interface.
All it needs to contain is the altsetting pointer and num_altsetting.
Within the usb_host_configuration structure add an array of
usb_interface_cache pointers in addition to the existing array of
usb_interface pointers.

When config.c parses the configuration descriptors, it will create the
usb_interface_cache structures instead of creating the usb_interfaces.
Things like configuration selection and /proc/bus/usb/devices can get all
the information they need from this.

When a configuration is installed, usb_set_configuration() will
dynamically perform a deep copy of the usb_interface_cache array and store
the entries in the usb_interface array. These will be "live" entries,
containing the struct device and everything else for use by drivers
(exactly the same as they do now).

When a configuration is uninstalled, the usb_interface array can be erased
and the usual kobject cleanup mechanism will delete the deep-copy
array elements whenever it wants to. There will be no problems with
reinstalling the configuration because when that happens, new interfaces
will be allocated dynamically. Hence there will be no need for khubd to
block, no need for changes to sysfs, and then everyone really will be
happy! :-)

Although there is extra overhead in doing all the copying, it only happens
when configurations are changed. Furthermore, since we will know in
advance the sizes of all the structures to be copied, we will be able to
allocate a single memory region to hold all the altsetting and endpoint
structures for each interface.

If you think this sounds good, I will start working on it.

Alan Stern

2004-03-31 22:18:55

by Greg KH

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] back out sysfs reference count change

On Wed, Mar 31, 2004 at 12:11:30PM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2004-03-31 at 09:55, Greg KH wrote:
> > Hi,
> >
> > The patch below backs out Maneesh's sysfs patch that was recently added
> > to the kernel. In its defense, the original patch did solve some fixes
> > that could be duplicated on SMP machines, but the side affect of the
> > patch caused lots of problems. Basically it caused kobjects to get
> > their references incremented when files that are not present in the
> > kobject are asked for (udev can easily trigger this when it looks for
> > files call "dev" in directories that do not have that file). This can
> > cause easy oopses when the VFS later ages out those old dentries and the
> > kobject has its reference finally released (usually after the module
> > that the kobject lived in was removed.)
>
> I think that the bug in the first place is to have an existing
> kobject that didn't bump the module ref count.
>
> If a kobject exists that have a pointer to the module code (the
> release function), it _MUST_ have bumped the module ref count,
> that's the whole point of the module reference count.

But that is impossible as has already been pointed out by Alan Stern.
If a module creates a kobject, how can the module_exit() function ever
be called if that kobject incremented the module reference count?

So what we do is any reference to the kobject grabbed by userspace
causes the module reference count to go up. That fixes the issue for
the most part (with the exception of the race that Maneesh has
documented.)

thanks,

greg k-h

2004-04-01 01:58:51

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] back out sysfs reference count change


> But that is impossible as has already been pointed out by Alan Stern.
> If a module creates a kobject, how can the module_exit() function ever
> be called if that kobject incremented the module reference count?

I just had a loooong discussion with Rusty on that subject, it's
indeed a nasty one. The problem is that the real solution is to
change the module unload semantics. Regardless of the count, module
exit should be called, and the actual unload (and eventually calling
an additional module "release" function) then should only happen
once the count is down to 0. That means that rmmod would block forever
if the driver is opened, but that is just something that needs to be
known.

But that's not something we'll do for 2.6. For that to work, it also
need various subsystem unregister_* (netdev etc...) functions to not
error when the device is opened, just prevent new opens, and operate
asynchronously (freeing data structures on kobject release) etc...

> So what we do is any reference to the kobject grabbed by userspace
> causes the module reference count to go up. That fixes the issue for
> the most part (with the exception of the race that Maneesh has
> documented.)

Well.... Maybe, I still think it's dodgy, but probably fine for 2.6

Ben.


2004-04-01 03:48:37

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] back out sysfs reference count change

On Thu, 1 Apr 2004, Benjamin Herrenschmidt wrote:

> > But that is impossible as has already been pointed out by Alan Stern.
> > If a module creates a kobject, how can the module_exit() function ever
> > be called if that kobject incremented the module reference count?
>
> I just had a loooong discussion with Rusty on that subject, it's
> indeed a nasty one. The problem is that the real solution is to
> change the module unload semantics. Regardless of the count, module
> exit should be called, and the actual unload (and eventually calling
> an additional module "release" function) then should only happen
> once the count is down to 0. That means that rmmod would block forever
> if the driver is opened, but that is just something that needs to be
> known.
>
> But that's not something we'll do for 2.6. For that to work, it also
> need various subsystem unregister_* (netdev etc...) functions to not
> error when the device is opened, just prevent new opens, and operate
> asynchronously (freeing data structures on kobject release) etc...

There was another lengthy discussion about this back in January. Read
these threads:

http://marc.theaimsgroup.com/?t=107487731800002&r=1&w=2
http://marc.theaimsgroup.com/?t=107509479200002&r=1&w=2

Lots of back-and-forth, but Linus was basically against the idea. For
now at least.

Alan Stern

2004-04-01 05:13:16

by Maneesh Soni

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] back out sysfs reference count change

On Wed, Mar 31, 2004 at 10:11:37AM -0500, Alan Stern wrote:
> On Wed, 31 Mar 2004, Maneesh Soni wrote:
>
> > For convenience I will explain the race here..
> >
> > cpu 0 cpu 1
> > kobject_unregister() sysfs_open_file()
> > kobject_del() check_perm()
> > sysfs_remove_dir() :
> > (dentry remains alive due to ref. taken :
> > on the way to sysfs_open_file) :
> > kobject_put() :
> > kobject_cleanup() kobject_get(->d_fsdata)
> >
> > cpu 1 could end up referring to a freed kobject through dentry->d_fsdata or
> > starts spitting Badness in kobject_get at lib/kobject.c:429. For triggering
> > this race try running these two loops simultaneously on SMP
> >
> > # while true; do insmod drivers/net/dummy.ko; rmmod dummy; done
> > # while true; do find /sys/class/net | xargs cat; done
> >
> > Probably it can be solved by making sure that when sysfs file is
> > opened/read/written some _race_ free check is done and fail if kobject if gone.
> >
> > Maneesh
>
> Here's a suggestion. At the start of check_perm() grab the dentry
> semaphore, then check whether d_fsdata is NULL, if it isn't then do the
> kobject_get(), then unlock the semaphore.
>

I have tried this with no luck. I still get
(Badness in kobject_get at lib/kobject.c:42) which means it is not correct fix.

I am out of any more ideas except something like making sysfs single threaded
or requesting people to try my sysfs backing store patch set. It does not
suffer from the negative dentries problem as it does not create any negative
dentries. I have to re-diff the patch set again to take recent changes into
account.

Maneesh



--
Maneesh Soni
Linux Technology Center,
IBM Software Lab, Bangalore, India
email: [email protected]
Phone: 91-80-25044999 Fax: 91-80-25268553
T/L : 9243696

2004-04-01 07:12:19

by Greg KH

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] back out sysfs reference count change

On Thu, Apr 01, 2004 at 11:56:09AM +1000, Benjamin Herrenschmidt wrote:
>
> > But that is impossible as has already been pointed out by Alan Stern.
> > If a module creates a kobject, how can the module_exit() function ever
> > be called if that kobject incremented the module reference count?
>
> I just had a loooong discussion with Rusty on that subject, it's
> indeed a nasty one.

Ah, he sucks another person into the "module unload" discussion. My
sympathies :)

> The problem is that the real solution is to
> change the module unload semantics. Regardless of the count, module
> exit should be called, and the actual unload (and eventually calling
> an additional module "release" function) then should only happen
> once the count is down to 0. That means that rmmod would block forever
> if the driver is opened, but that is just something that needs to be
> known.
>
> But that's not something we'll do for 2.6. For that to work, it also
> need various subsystem unregister_* (netdev etc...) functions to not
> error when the device is opened, just prevent new opens, and operate
> asynchronously (freeing data structures on kobject release) etc...

I agree, it's a difficult problem to solve, and something we aren't
going to do for 2.6. That's one reason why module unload is its own
config option :)

Anyway, Rusty's proposal of "never unload, just mark not used" and then
load a new copy into memory that he did during the OLS timeframe last
year sounds like the only sane way to get this completely correct, with
the trade off of never releasing memory.

thanks,

greg k-h

2004-04-01 07:13:58

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] back out sysfs reference count change


> > I just had a loooong discussion with Rusty on that subject, it's
> > indeed a nasty one.
>
> Ah, he sucks another person into the "module unload" discussion. My
> sympathies :)

Nah, I started it, since he is about 2 meters from me in the
office, that isn't difficult :)

> I agree, it's a difficult problem to solve, and something we aren't
> going to do for 2.6. That's one reason why module unload is its own
> config option :)
>
> Anyway, Rusty's proposal of "never unload, just mark not used" and then
> load a new copy into memory that he did during the OLS timeframe last
> year sounds like the only sane way to get this completely correct, with
> the trade off of never releasing memory.



2004-04-01 07:26:59

by Greg KH

[permalink] [raw]
Subject: Re: Unregistering interfaces

On Wed, Mar 31, 2004 at 10:54:17AM -0500, Alan Stern wrote:
> Here's a suggestion for a correct solution. It's a little bit awkward,
> but less so than other ideas I've heard.

<snip>

> If you think this sounds good, I will start working on it.

Yes, this sounds like an excellent way of doing this, and is what I was
mulling over today in between doing real work :)

It would be great if you want to work on this, and have the time to.
Feel free to post incremental changes, and if you need any other help
with this.

thanks for volunteering.

greg k-h

2004-04-01 07:29:33

by Greg KH

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] back out sysfs reference count change

On Thu, Apr 01, 2004 at 10:47:40AM +0530, Maneesh Soni wrote:
> I am out of any more ideas except something like making sysfs single threaded
> or requesting people to try my sysfs backing store patch set. It does not
> suffer from the negative dentries problem as it does not create any negative
> dentries. I have to re-diff the patch set again to take recent changes into
> account.

Hm, well that is the best reason so far that I have heard to accept your
patch set.

So let's revisit this after Al audits the patches, ok?

thanks,

greg k-h

2004-04-01 14:56:33

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] back out sysfs reference count change

On Thu, 1 Apr 2004, Maneesh Soni wrote:

> On Wed, Mar 31, 2004 at 10:11:37AM -0500, Alan Stern wrote:
> >
> > Here's a suggestion. At the start of check_perm() grab the dentry
> > semaphore, then check whether d_fsdata is NULL, if it isn't then do the
> > kobject_get(), then unlock the semaphore.
> >
>
> I have tried this with no luck. I still get
> (Badness in kobject_get at lib/kobject.c:42) which means it is not correct fix.

Did you remember to set dentry->d_fsdata to NULL? This has to be done in
sysfs_remove_dir() sometime during the period when dentry->d_inode->i_sem
is held. Right now the pointer to the kobject never gets erased. That
Badness message is an indication that you are using a valid pointer to a
kobject whose refcount is 0.

Alan Stern

2004-04-02 04:34:18

by Maneesh Soni

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] back out sysfs reference count change

On Thu, Apr 01, 2004 at 09:56:16AM -0500, Alan Stern wrote:
> On Thu, 1 Apr 2004, Maneesh Soni wrote:
>
> > On Wed, Mar 31, 2004 at 10:11:37AM -0500, Alan Stern wrote:
> > >
> > > Here's a suggestion. At the start of check_perm() grab the dentry
> > > semaphore, then check whether d_fsdata is NULL, if it isn't then do the
> > > kobject_get(), then unlock the semaphore.
> > >
> >
> > I have tried this with no luck. I still get
> > (Badness in kobject_get at lib/kobject.c:42) which means it is not correct fix.
>
> Did you remember to set dentry->d_fsdata to NULL? This has to be done in
> sysfs_remove_dir() sometime during the period when dentry->d_inode->i_sem
> is held. Right now the pointer to the kobject never gets erased. That
> Badness message is an indication that you are using a valid pointer to a
> kobject whose refcount is 0.
>

Yes, see the patch below. Probably the race window has become smaller but
Badness message is also an indication that somewhere kobject_cleanup has
started. I have not yet seen why race is still there.

There is one more bigger problem in this approach. It may miss kobject_put() in
sysfs_release() call and result will be memory leak, ->release() is not called,
rmmod hang, etc etc. So using ->d_fsdata for this purpose is not the proper
solution, IMO.

cpu 0 kobj->refcount cpu 1
kobject_unregister() (1) sysfs_open_file()
kobject_del() check_perm()
sysfs_remove_dir() (2) kobject_get(->d_fsdata)
(->d_fsdata = NULL)
kobject_put() (1)
sysfs_release()
kobj == NULL


diff -urN linux-2.6.5-rc2/fs/sysfs/dir.c linux-2.6.5-rc2-race-fix/fs/sysfs/dir.c
--- linux-2.6.5-rc2/fs/sysfs/dir.c 2004-03-20 05:41:05.000000000 +0530
+++ linux-2.6.5-rc2-race-fix/fs/sysfs/dir.c 2004-04-02 09:36:53.174323584 +0530
@@ -132,6 +132,11 @@
pr_debug("sysfs %s: removing dir\n",dentry->d_name.name);
down(&dentry->d_inode->i_sem);

+ /* try to avoid further references to kobject even if dentry
+ * is alive after remove dir.
+ */
+ dentry->d_fsdata = NULL;
+
spin_lock(&dcache_lock);
restart:
node = dentry->d_subdirs.next;
diff -urN linux-2.6.5-rc2/fs/sysfs/file.c linux-2.6.5-rc2-race-fix/fs/sysfs/file.c
--- linux-2.6.5-rc2/fs/sysfs/file.c 2004-03-20 05:41:01.000000000 +0530
+++ linux-2.6.5-rc2-race-fix/fs/sysfs/file.c 2004-04-02 09:51:37.409899344 +0530
@@ -78,11 +78,22 @@
static int fill_read_buffer(struct file * file, struct sysfs_buffer * buffer)
{
struct attribute * attr = file->f_dentry->d_fsdata;
- struct kobject * kobj = file->f_dentry->d_parent->d_fsdata;
+ struct dentry * kobj_dir = file->f_dentry->d_parent;
+ struct kobject * kobj = NULL;
struct sysfs_ops * ops = buffer->ops;
int ret = 0;
ssize_t count;

+ down(&kobj_dir->d_inode->i_sem);
+ /* No need to take a ref. on kobject, as it has been taken
+ * during ->open(), but ->d_fsdata can become NULL
+ */
+ if (kobj_dir->d_fsdata)
+ kobj = kobj_dir->d_fsdata;
+ up(&kobj_dir->d_inode->i_sem);
+
+ if (!kobj)
+ return -EINVAL;
if (!buffer->page)
buffer->page = (char *) get_zeroed_page(GFP_KERNEL);
if (!buffer->page)
@@ -199,9 +210,21 @@
flush_write_buffer(struct file * file, struct sysfs_buffer * buffer, size_t count)
{
struct attribute * attr = file->f_dentry->d_fsdata;
- struct kobject * kobj = file->f_dentry->d_parent->d_fsdata;
+ struct dentry * kobj_dir = file->f_dentry->d_parent;
+ struct kobject * kobj = NULL;
struct sysfs_ops * ops = buffer->ops;

+ down(&kobj_dir->d_inode->i_sem);
+ /* No need to take a ref. on kobject, as it has been taken
+ * during ->open(), but ->d_fsdata can become NULL
+ */
+ if (kobj_dir->d_fsdata)
+ kobj = kobj_dir->d_fsdata;
+ up(&kobj_dir->d_inode->i_sem);
+
+ if (!kobj)
+ return -EINVAL;
+
return ops->store(kobj,attr,buffer->page,count);
}

@@ -238,12 +261,18 @@

static int check_perm(struct inode * inode, struct file * file)
{
- struct kobject * kobj = kobject_get(file->f_dentry->d_parent->d_fsdata);
+ struct dentry * kobj_dir = file->f_dentry->d_parent;
+ struct kobject * kobj = NULL;
struct attribute * attr = file->f_dentry->d_fsdata;
struct sysfs_buffer * buffer;
struct sysfs_ops * ops = NULL;
int error = 0;

+ down(&kobj_dir->d_inode->i_sem);
+ if (kobj_dir->d_fsdata)
+ kobj = kobject_get(kobj_dir->d_fsdata);
+ up(&kobj_dir->d_inode->i_sem);
+
if (!kobj || !attr)
goto Einval;

@@ -320,10 +349,19 @@

static int sysfs_release(struct inode * inode, struct file * filp)
{
- struct kobject * kobj = filp->f_dentry->d_parent->d_fsdata;
+ struct dentry * kobj_dir = file->f_dentry->d_parent;
+ struct kobject * kobj = NULL;
struct attribute * attr = filp->f_dentry->d_fsdata;
struct sysfs_buffer * buffer = filp->private_data;

+ down(&kobj_dir->d_inode->i_sem);
+ /* No need to take a ref. on kobject, as it has been taken
+ * during ->open(), but ->d_fsdata can become NULL
+ */
+ if (kobj_dir->d_fsdata)
+ kobj = kobj_dir->d_fsdata;
+ up(&kobj_dir->d_inode->i_sem);
+
if (kobj)
kobject_put(kobj);
module_put(attr->owner);
--
Maneesh Soni
Linux Technology Center,
IBM Software Lab, Bangalore, India
email: [email protected]
Phone: 91-80-25044999 Fax: 91-80-25268553
T/L : 9243696

2004-04-02 21:41:30

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] back out sysfs reference count change

[CC: list pruned on the assumption that most recipients aren't interested]

On Fri, 2 Apr 2004, Maneesh Soni wrote:

> Yes, see the patch below. Probably the race window has become smaller but
> Badness message is also an indication that somewhere kobject_cleanup has
> started. I have not yet seen why race is still there.

Yes. Although the problem can't be due to a race if it involves code
that's protected by a semaphore.

Did this Badness message come from within check_perm()? I don't see how
that could happen with this patch. You're guaranteed that the refcount is
positive until after sysfs_remove_dir() returns. It's _supposed_ to be
positive, anyway; maybe it isn't. You could try checking for that, at the
end of sysf_remove_dir().

> There is one more bigger problem in this approach. It may miss kobject_put() in
> sysfs_release() call and result will be memory leak, ->release() is not called,
> rmmod hang, etc etc. So using ->d_fsdata for this purpose is not the proper
> solution, IMO.

You're quite correct. d_fsdata must remain set so that previous
references can be dropped. Some other part of the dentry or the inode has
to be used instead. I don't know what would be appropriate, perhaps one
of the flags bits.

Alan Stern

2004-04-06 10:09:05

by Maneesh Soni

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] back out sysfs reference count change

On Fri, Apr 02, 2004 at 04:41:16PM -0500, Alan Stern wrote:
> [CC: list pruned on the assumption that most recipients aren't interested]
>
> On Fri, 2 Apr 2004, Maneesh Soni wrote:
>
> > Yes, see the patch below. Probably the race window has become smaller but
> > Badness message is also an indication that somewhere kobject_cleanup has
> > started. I have not yet seen why race is still there.
>
> Yes. Although the problem can't be due to a race if it involves code
> that's protected by a semaphore.
>
> Did this Badness message come from within check_perm()? I don't see how
> that could happen with this patch. You're guaranteed that the refcount is
> positive until after sysfs_remove_dir() returns. It's _supposed_ to be
> positive, anyway; maybe it isn't. You could try checking for that, at the
> end of sysf_remove_dir().
>

Yes, it came from check_perm(). I think I found the reason for that. The
attribute group subdirectory's dentry also has a pointer to the same kobject
as the corresponding directory's dentry. The kobject directory dentry was
taken care of but the attribute group subdirectory was still pointing to the
kobject. And that badness message was coming while opening a file under
attribute subdir.

I am using dentry->d_flags and a new flag value DCACHE_SYSFS_CONNECTED to
indicate that the dentry is connected to a vaild kobject. I could run my
stress test of insmod/rmmod for more than 3 hours without any badness message.

I am copying to the maintainers also and hope to get their comments for
this patch.

Thanks
Maneesh



o The following patch fixes the race involved between unregistering a kobject
and simultaneously opeing a corresponding attribute file in sysfs.

Ideally sysfs should take a ref. to the kobject as long as it has dentries
referring to the kobject, but because of current limitations in
module/kobject ref counting, sysfs's pinning of kobject leads to
hang/delays in rmmod of certain modules. The patch uses a new flag to mark
dentries as disconnected from a valid kobject in the process of unregistering
the kobject. The ->open in sysfs is failed if it finds a disconnected dentry.
Marking and checking for the flag is done under dentry->d_inode->i_sem.


fs/sysfs/bin.c | 8 +++++++-
fs/sysfs/dir.c | 6 ++++++
fs/sysfs/file.c | 11 ++++++++++-
fs/sysfs/group.c | 19 +++++++++++++------
include/linux/dcache.h | 1 +
5 files changed, 37 insertions(+), 8 deletions(-)

diff -puN include/linux/dcache.h~sysfs-d_fsdata-race-fix include/linux/dcache.h
--- linux-2.6.5-mm1/include/linux/dcache.h~sysfs-d_fsdata-race-fix 2004-04-06 15:04:08.000000000 +0530
+++ linux-2.6.5-mm1-maneesh/include/linux/dcache.h 2004-04-06 15:04:39.000000000 +0530
@@ -153,6 +153,7 @@ d_iput: no no yes

#define DCACHE_REFERENCED 0x0008 /* Recently used, don't discard. */
#define DCACHE_UNHASHED 0x0010
+#define DCACHE_SYSFS_CONNECTED 0x0020 /* dentry connected to vaild kobject */

extern spinlock_t dcache_lock;

diff -puN fs/sysfs/dir.c~sysfs-d_fsdata-race-fix fs/sysfs/dir.c
--- linux-2.6.5-mm1/fs/sysfs/dir.c~sysfs-d_fsdata-race-fix 2004-04-06 15:04:14.000000000 +0530
+++ linux-2.6.5-mm1-maneesh/fs/sysfs/dir.c 2004-04-06 15:04:39.000000000 +0530
@@ -34,6 +34,7 @@ static int create_dir(struct kobject * k
init_dir);
if (!error) {
(*d)->d_fsdata = k;
+ (*d)->d_flags = DCACHE_SYSFS_CONNECTED;
p->d_inode->i_nlink++;
}
dput(*d);
@@ -119,6 +120,11 @@ void sysfs_remove_dir(struct kobject * k
pr_debug("sysfs %s: removing dir\n",dentry->d_name.name);
down(&dentry->d_inode->i_sem);

+ /* avoid further references to kobject even if dentry
+ * is alive after remove dir.
+ */
+ dentry->d_flags &= ~DCACHE_SYSFS_CONNECTED;
+
spin_lock(&dcache_lock);
restart:
node = dentry->d_subdirs.next;
diff -puN fs/sysfs/file.c~sysfs-d_fsdata-race-fix fs/sysfs/file.c
--- linux-2.6.5-mm1/fs/sysfs/file.c~sysfs-d_fsdata-race-fix 2004-04-06 15:04:18.000000000 +0530
+++ linux-2.6.5-mm1-maneesh/fs/sysfs/file.c 2004-04-06 15:04:39.000000000 +0530
@@ -134,6 +134,9 @@ static int flush_read_buffer(struct sysf
* is in the file's ->d_fsdata. The target object is in the directory's
* ->d_fsdata.
*
+ * It is safe to use ->d_parent->d_fsdata as both dentry and the kobject
+ * are pinned in ->open().
+ *
* We call fill_read_buffer() to allocate and fill the buffer from the
* object's show() method exactly once (if the read is happening from
* the beginning of the file). That should fill the entire buffer with
@@ -238,12 +241,18 @@ sysfs_write_file(struct file *file, cons

static int check_perm(struct inode * inode, struct file * file)
{
- struct kobject * kobj = kobject_get(file->f_dentry->d_parent->d_fsdata);
+ struct dentry * kobj_dir = file->f_dentry->d_parent;
+ struct kobject * kobj = NULL;
struct attribute * attr = file->f_dentry->d_fsdata;
struct sysfs_buffer * buffer;
struct sysfs_ops * ops = NULL;
int error = 0;

+ down(&kobj_dir->d_inode->i_sem);
+ if (kobj_dir->d_flags & DCACHE_SYSFS_CONNECTED)
+ kobj = kobject_get(kobj_dir->d_fsdata);
+ up(&kobj_dir->d_inode->i_sem);
+
if (!kobj || !attr)
goto Einval;

diff -puN fs/sysfs/bin.c~sysfs-d_fsdata-race-fix fs/sysfs/bin.c
--- linux-2.6.5-mm1/fs/sysfs/bin.c~sysfs-d_fsdata-race-fix 2004-04-06 15:04:21.000000000 +0530
+++ linux-2.6.5-mm1-maneesh/fs/sysfs/bin.c 2004-04-06 15:04:39.000000000 +0530
@@ -94,9 +94,15 @@ static ssize_t write(struct file * file,

static int open(struct inode * inode, struct file * file)
{
- struct kobject * kobj = kobject_get(file->f_dentry->d_parent->d_fsdata);
+ struct dentry * kobj_dir = file->f_dentry->d_parent;
+ struct kobject * kobj = NULL;
struct bin_attribute * attr = file->f_dentry->d_fsdata;
int error = -EINVAL;
+
+ down(&kobj_dir->d_inode->i_sem);
+ if (kobj_dir->d_flags & DCACHE_SYSFS_CONNECTED)
+ kobj = kobject_get(kobj_dir->d_fsdata);
+ up(&kobj_dir->d_inode->i_sem);

if (!kobj || !attr)
goto Done;
diff -puN fs/sysfs/group.c~sysfs-d_fsdata-race-fix fs/sysfs/group.c
--- linux-2.6.5-mm1/fs/sysfs/group.c~sysfs-d_fsdata-race-fix 2004-04-06 15:04:26.000000000 +0530
+++ linux-2.6.5-mm1-maneesh/fs/sysfs/group.c 2004-04-06 15:04:39.000000000 +0530
@@ -10,7 +10,7 @@

#include <linux/kobject.h>
#include <linux/module.h>
-#include <linux/dcache.h>
+#include <linux/fs.h>
#include <linux/err.h>
#include "sysfs.h"

@@ -70,11 +70,18 @@ void sysfs_remove_group(struct kobject *
else
dir = dget(kobj->dentry);

- remove_files(dir,grp);
- if (grp->name)
- sysfs_remove_subdir(dir);
- /* release the ref. taken in this routine */
- dput(dir);
+ if (!IS_ERR(dir)) {
+ if (dir && dir->d_inode) {
+ down(&dir->d_inode->i_sem);
+ dir->d_flags &= ~DCACHE_SYSFS_CONNECTED;
+ up(&dir->d_inode->i_sem);
+ remove_files(dir,grp);
+ if (grp->name)
+ sysfs_remove_subdir(dir);
+ /* release the ref. taken in this routine */
+ dput(dir);
+ }
+ }
}



_

--
Maneesh Soni
Linux Technology Center,
IBM Software Lab, Bangalore, India
email: [email protected]
Phone: 91-80-25044999 Fax: 91-80-25268553
T/L : 9243696

2004-04-06 17:03:50

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] back out sysfs reference count change

On Tue, 6 Apr 2004, Maneesh Soni wrote:

> Yes, it came from check_perm(). I think I found the reason for that. The
> attribute group subdirectory's dentry also has a pointer to the same kobject
> as the corresponding directory's dentry. The kobject directory dentry was
> taken care of but the attribute group subdirectory was still pointing to the
> kobject. And that badness message was coming while opening a file under
> attribute subdir.
>
> I am using dentry->d_flags and a new flag value DCACHE_SYSFS_CONNECTED to
> indicate that the dentry is connected to a vaild kobject. I could run my
> stress test of insmod/rmmod for more than 3 hours without any badness message.
>
> I am copying to the maintainers also and hope to get their comments for
> this patch.
>
> Thanks
> Maneesh

I like your patch a lot. It suggests an additional possibility. Maybe
your DCACHE_SYSFS_CONNECTED flag could be used to indicate that an
attribute is registered. That way, after the file is unregistered (even
if the kobject still exists) reads and writes of the attribute would
return an error. This isn't really necessary since opening an attribute
takes a reference to the module owning the attribute, but it is
conceptually attractive.

Alan Stern


2004-04-14 13:16:12

by Maneesh Soni

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] back out sysfs reference count change

On Tue, Apr 06, 2004 at 03:43:20PM +0530, Maneesh Soni wrote:
[..]
>
> o The following patch fixes the race involved between unregistering a kobject
> and simultaneously opeing a corresponding attribute file in sysfs.
>
> Ideally sysfs should take a ref. to the kobject as long as it has dentries
> referring to the kobject, but because of current limitations in
> module/kobject ref counting, sysfs's pinning of kobject leads to
> hang/delays in rmmod of certain modules. The patch uses a new flag to mark
> dentries as disconnected from a valid kobject in the process of unregistering
> the kobject. The ->open in sysfs is failed if it finds a disconnected dentry.
> Marking and checking for the flag is done under dentry->d_inode->i_sem.
>
[..]


Hi Greg / Andrew,

This problem can be solved without using the d_flags as I did previously.
Viro suggested to just check for DCACHE_UNHASHED in check_perm() and this also
solves the race. Please see the following patch instead of the previous one.


Thanks
Maneesh

PS: As you might have seen sysfs-symlinks-fix.patch the sysfs_get_kobject() is
repeated in this patch also. So, once you have decided about the
sysfs-syminks-fix.patch and this one, I can re-submit a combined one.

----------------------------------------------------------------------------

o The following patch fixes the race involved between unregistering a kobject
and simultaneously opeing a corresponding attribute file in sysfs.

o Ideally sysfs should take a ref. to the kobject as long as it has dentries
referring to the kobjects, but because of current limitations in
module/kobject ref counting, sysfs's pinning of kobject leads to
hang/delays in rmmod of certain modules. The patch checks for unhashed
dentries in check_perm() while opening a sysfs file. If the dentry is
still hashed then it goes ahead and takes the ref to kobject. This done
under the per dentry lock. It does this in the inline routine
sysfs_get_kobject(dentry).


fs/sysfs/bin.c | 2 +-
fs/sysfs/file.c | 2 +-
fs/sysfs/sysfs.h | 13 +++++++++++++
3 files changed, 15 insertions(+), 2 deletions(-)

diff -puN fs/sysfs/sysfs.h~sysfs-d_fsdata-race-fix-2 fs/sysfs/sysfs.h
--- linux-2.6.5-mm5/fs/sysfs/sysfs.h~sysfs-d_fsdata-race-fix-2 2004-04-14 14:55:26.000000000 +0530
+++ linux-2.6.5-mm5-maneesh/fs/sysfs/sysfs.h 2004-04-14 15:29:51.000000000 +0530
@@ -11,3 +11,16 @@ extern void sysfs_hash_and_remove(struct

extern int sysfs_create_subdir(struct kobject *, const char *, struct dentry **);
extern void sysfs_remove_subdir(struct dentry *);
+
+
+static inline struct kobject * sysfs_get_kobject(struct dentry * dentry)
+{
+ struct kobject * kobj = NULL;
+
+ spin_lock(&dentry->d_lock);
+ if (!d_unhashed(dentry))
+ kobj = kobject_get(dentry->d_fsdata);
+ spin_unlock(&dentry->d_lock);
+
+ return kobj;
+}
diff -puN fs/sysfs/file.c~sysfs-d_fsdata-race-fix-2 fs/sysfs/file.c
--- linux-2.6.5-mm5/fs/sysfs/file.c~sysfs-d_fsdata-race-fix-2 2004-04-14 14:55:30.000000000 +0530
+++ linux-2.6.5-mm5-maneesh/fs/sysfs/file.c 2004-04-14 14:56:17.000000000 +0530
@@ -238,7 +238,7 @@ sysfs_write_file(struct file *file, cons

static int check_perm(struct inode * inode, struct file * file)
{
- struct kobject * kobj = kobject_get(file->f_dentry->d_parent->d_fsdata);
+ struct kobject * kobj = sysfs_get_kobject(file->f_dentry->d_parent);
struct attribute * attr = file->f_dentry->d_fsdata;
struct sysfs_buffer * buffer;
struct sysfs_ops * ops = NULL;
diff -puN fs/sysfs/bin.c~sysfs-d_fsdata-race-fix-2 fs/sysfs/bin.c
--- linux-2.6.5-mm5/fs/sysfs/bin.c~sysfs-d_fsdata-race-fix-2 2004-04-14 14:55:32.000000000 +0530
+++ linux-2.6.5-mm5-maneesh/fs/sysfs/bin.c 2004-04-14 14:56:17.000000000 +0530
@@ -94,7 +94,7 @@ static ssize_t write(struct file * file,

static int open(struct inode * inode, struct file * file)
{
- struct kobject * kobj = kobject_get(file->f_dentry->d_parent->d_fsdata);
+ struct kobject * kobj = sysfs_get_kobject(file->f_dentry->d_parent);
struct bin_attribute * attr = file->f_dentry->d_fsdata;
int error = -EINVAL;


_
--
Maneesh Soni
Linux Technology Center,
IBM Software Lab, Bangalore, India
email: [email protected]
Phone: 91-80-25044999 Fax: 91-80-25268553
T/L : 9243696

2004-04-15 21:54:38

by Greg KH

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] back out sysfs reference count change

On Wed, Apr 14, 2004 at 06:50:15PM +0530, Maneesh Soni wrote:
> On Tue, Apr 06, 2004 at 03:43:20PM +0530, Maneesh Soni wrote:
> [..]
> >
> > o The following patch fixes the race involved between unregistering a kobject
> > and simultaneously opeing a corresponding attribute file in sysfs.
> >
> > Ideally sysfs should take a ref. to the kobject as long as it has dentries
> > referring to the kobject, but because of current limitations in
> > module/kobject ref counting, sysfs's pinning of kobject leads to
> > hang/delays in rmmod of certain modules. The patch uses a new flag to mark
> > dentries as disconnected from a valid kobject in the process of unregistering
> > the kobject. The ->open in sysfs is failed if it finds a disconnected dentry.
> > Marking and checking for the flag is done under dentry->d_inode->i_sem.
> >
> [..]
>
>
> Hi Greg / Andrew,
>
> This problem can be solved without using the d_flags as I did previously.
> Viro suggested to just check for DCACHE_UNHASHED in check_perm() and this also
> solves the race. Please see the following patch instead of the previous one.

This patch looks sane, Andrew, can you let it sit in your -mm tree for a
while to see if anything breaks with it?

thanks,

greg k-h

----------------------------------------------------------------------------

o The following patch fixes the race involved between unregistering a kobject
and simultaneously opeing a corresponding attribute file in sysfs.

o Ideally sysfs should take a ref. to the kobject as long as it has dentries
referring to the kobjects, but because of current limitations in
module/kobject ref counting, sysfs's pinning of kobject leads to
hang/delays in rmmod of certain modules. The patch checks for unhashed
dentries in check_perm() while opening a sysfs file. If the dentry is
still hashed then it goes ahead and takes the ref to kobject. This done
under the per dentry lock. It does this in the inline routine
sysfs_get_kobject(dentry).


fs/sysfs/bin.c | 2 +-
fs/sysfs/file.c | 2 +-
fs/sysfs/sysfs.h | 13 +++++++++++++
3 files changed, 15 insertions(+), 2 deletions(-)

diff -puN fs/sysfs/sysfs.h~sysfs-d_fsdata-race-fix-2 fs/sysfs/sysfs.h
--- linux-2.6.5-mm5/fs/sysfs/sysfs.h~sysfs-d_fsdata-race-fix-2 2004-04-14 14:55:26.000000000 +0530
+++ linux-2.6.5-mm5-maneesh/fs/sysfs/sysfs.h 2004-04-14 15:29:51.000000000 +0530
@@ -11,3 +11,16 @@ extern void sysfs_hash_and_remove(struct

extern int sysfs_create_subdir(struct kobject *, const char *, struct dentry **);
extern void sysfs_remove_subdir(struct dentry *);
+
+
+static inline struct kobject * sysfs_get_kobject(struct dentry * dentry)
+{
+ struct kobject * kobj = NULL;
+
+ spin_lock(&dentry->d_lock);
+ if (!d_unhashed(dentry))
+ kobj = kobject_get(dentry->d_fsdata);
+ spin_unlock(&dentry->d_lock);
+
+ return kobj;
+}
diff -puN fs/sysfs/file.c~sysfs-d_fsdata-race-fix-2 fs/sysfs/file.c
--- linux-2.6.5-mm5/fs/sysfs/file.c~sysfs-d_fsdata-race-fix-2 2004-04-14 14:55:30.000000000 +0530
+++ linux-2.6.5-mm5-maneesh/fs/sysfs/file.c 2004-04-14 14:56:17.000000000 +0530
@@ -238,7 +238,7 @@ sysfs_write_file(struct file *file, cons

static int check_perm(struct inode * inode, struct file * file)
{
- struct kobject * kobj = kobject_get(file->f_dentry->d_parent->d_fsdata);
+ struct kobject * kobj = sysfs_get_kobject(file->f_dentry->d_parent);
struct attribute * attr = file->f_dentry->d_fsdata;
struct sysfs_buffer * buffer;
struct sysfs_ops * ops = NULL;
diff -puN fs/sysfs/bin.c~sysfs-d_fsdata-race-fix-2 fs/sysfs/bin.c
--- linux-2.6.5-mm5/fs/sysfs/bin.c~sysfs-d_fsdata-race-fix-2 2004-04-14 14:55:32.000000000 +0530
+++ linux-2.6.5-mm5-maneesh/fs/sysfs/bin.c 2004-04-14 14:56:17.000000000 +0530
@@ -94,7 +94,7 @@ static ssize_t write(struct file * file,

static int open(struct inode * inode, struct file * file)
{
- struct kobject * kobj = kobject_get(file->f_dentry->d_parent->d_fsdata);
+ struct kobject * kobj = sysfs_get_kobject(file->f_dentry->d_parent);
struct bin_attribute * attr = file->f_dentry->d_fsdata;
int error = -EINVAL;


2004-04-15 22:08:49

by Andrew Morton

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] back out sysfs reference count change

Greg KH <[email protected]> wrote:
>
> This patch looks sane, Andrew, can you let it sit in your -mm tree for a
> while to see if anything breaks with it?

It needs work to make it live alongside ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.5/2.6.5-mm6/broken-out/sysfs-d_fsdata-race-fix-2.patch

What are we doing with sysfs-d_fsdata-race-fix-2 btw?


2004-04-16 08:38:23

by Maneesh Soni

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] back out sysfs reference count change

On Thu, Apr 15, 2004 at 03:10:59PM -0700, Andrew Morton wrote:
> Greg KH <[email protected]> wrote:
> >
> > This patch looks sane, Andrew, can you let it sit in your -mm tree for a
> > while to see if anything breaks with it?
>
> It needs work to make it live alongside ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.5/2.6.5-mm6/broken-out/sysfs-d_fsdata-race-fix-2.patch
>
> What are we doing with sysfs-d_fsdata-race-fix-2 btw?

Hi Andrew,
Greg, is talking for the same patch, which is included in -mm6 now.

Maneesh

--
Maneesh Soni
Linux Technology Center,
IBM Software Lab, Bangalore, India
email: [email protected]
Phone: 91-80-25044999 Fax: 91-80-25268553
T/L : 9243696