The following appears to be an inconsistency in the way sysfs behaves.
Tell me what you think...
When a user process parks its CWD in a kobject's sysfs directory and then
the kobject is unregistered, of course the directory is forced to remain
in existence (albeit unlinked) because of the reference held by the
process. But it does not in turn hold a reference to the kobject; the
kobject will be deleted immediately if nothing else refers to it.
On the other hand, if a user process opens a sysfs attribute file and then
sysfs_remove_file() is called, again the file is forced to remain in
existence (albeit unlinked) because of the reference held by the process.
But now it _does_ hold a reference to the kobject; if the kobject is
unregistered it will not be deleted until the user process closes the
attribute file.
Why this non-parallel behavior?
Alan Stern
Note, Pat's email address has changed, I've changed in the CC:
On Wed, Jan 07, 2004 at 10:48:44AM -0500, Alan Stern wrote:
> The following appears to be an inconsistency in the way sysfs behaves.
> Tell me what you think...
>
> When a user process parks its CWD in a kobject's sysfs directory and then
> the kobject is unregistered, of course the directory is forced to remain
> in existence (albeit unlinked) because of the reference held by the
> process. But it does not in turn hold a reference to the kobject; the
> kobject will be deleted immediately if nothing else refers to it.
>
> On the other hand, if a user process opens a sysfs attribute file and then
> sysfs_remove_file() is called, again the file is forced to remain in
> existence (albeit unlinked) because of the reference held by the process.
> But now it _does_ hold a reference to the kobject; if the kobject is
> unregistered it will not be deleted until the user process closes the
> attribute file.
>
> Why this non-parallel behavior?
Because it is very difficult to determine when a user goes into a
directory because we are using the ramfs/libfs code. It also does not
cause any errors if the kobject is removed, as the vfs cleans up
properly.
Only when a file is opened does a kobject need to be pinned, due to
possible errors that could happen.
Hope this helps,
greg k-h
On Wed, Jan 07, 2004 at 04:50:24PM -0500, Alan Stern wrote:
> On Wed, 7 Jan 2004, Greg KH wrote:
>
> > Because it is very difficult to determine when a user goes into a
> > directory because we are using the ramfs/libfs code. It also does not
> > cause any errors if the kobject is removed, as the vfs cleans up
> > properly.
> >
> > Only when a file is opened does a kobject need to be pinned, due to
> > possible errors that could happen.
>
> I had in mind approaching this the opposite way. Instead of trying to
> make open directories also pin a kobject, why not make open attribute
> files not pin them?
>
> It shouldn't be hard to avoid any errors; in fact I had a patch from some
> time ago that would do the trick (although in a hacked-up kind of way).
> The main idea is to return -ENXIO instead of calling the show()/store()
> routines once the attribute has been removed.
And you can do this without adding another lock, race free?
greg k-h
On Wed, 7 Jan 2004, Greg KH wrote:
> Because it is very difficult to determine when a user goes into a
> directory because we are using the ramfs/libfs code. It also does not
> cause any errors if the kobject is removed, as the vfs cleans up
> properly.
>
> Only when a file is opened does a kobject need to be pinned, due to
> possible errors that could happen.
I had in mind approaching this the opposite way. Instead of trying to
make open directories also pin a kobject, why not make open attribute
files not pin them?
It shouldn't be hard to avoid any errors; in fact I had a patch from some
time ago that would do the trick (although in a hacked-up kind of way).
The main idea is to return -ENXIO instead of calling the show()/store()
routines once the attribute has been removed.
Alan Stern
On Wed, 7 Jan 2004, Greg KH wrote:
> On Wed, Jan 07, 2004 at 04:50:24PM -0500, Alan Stern wrote:
> >
> > I had in mind approaching this the opposite way. Instead of trying to
> > make open directories also pin a kobject, why not make open attribute
> > files not pin them?
> >
> > It shouldn't be hard to avoid any errors; in fact I had a patch from some
> > time ago that would do the trick (although in a hacked-up kind of way).
> > The main idea is to return -ENXIO instead of calling the show()/store()
> > routines once the attribute has been removed.
>
> And you can do this without adding another lock, race free?
I used dentry->d_inode->i_sem. While I'm not very familiar with the ins
and outs of the filesystem code, that ought to be safe enough.
The real problem was finding a way to indicate that the file was
disconnected from its kobject. I did that by setting
dentry->d_inode->i_mode to 0. (I didn't want to erase dentry->d_fsdata
for fear that it might be needed somewhere else.) That's definitely not a
good way; it was intended only for my proof-of-principle. No doubt
someone else could do a much better job.
Alan Stern
On Wed, Jan 07, 2004 at 05:24:08PM -0500, Alan Stern wrote:
> The real problem was finding a way to indicate that the file was
> disconnected from its kobject. I did that by setting
> dentry->d_inode->i_mode to 0. (I didn't want to erase dentry->d_fsdata
> for fear that it might be needed somewhere else.) That's definitely not a
> good way; it was intended only for my proof-of-principle. No doubt
> someone else could do a much better job.
Because this is the hardest part, it's easier just to have the behavior
we currently have :)
thanks,
greg k-h
On Wed, Jan 07, 2004 at 05:24:54PM +0000, Greg KH wrote:
> Note, Pat's email address has changed, I've changed in the CC:
>
> On Wed, Jan 07, 2004 at 10:48:44AM -0500, Alan Stern wrote:
> > The following appears to be an inconsistency in the way sysfs behaves.
> > Tell me what you think...
> >
> > When a user process parks its CWD in a kobject's sysfs directory and then
> > the kobject is unregistered, of course the directory is forced to remain
> > in existence (albeit unlinked) because of the reference held by the
> > process. But it does not in turn hold a reference to the kobject; the
> > kobject will be deleted immediately if nothing else refers to it.
> >
> > On the other hand, if a user process opens a sysfs attribute file and then
> > sysfs_remove_file() is called, again the file is forced to remain in
> > existence (albeit unlinked) because of the reference held by the process.
> > But now it _does_ hold a reference to the kobject; if the kobject is
> > unregistered it will not be deleted until the user process closes the
> > attribute file.
> >
> > Why this non-parallel behavior?
>
> Because it is very difficult to determine when a user goes into a
> directory because we are using the ramfs/libfs code. It also does not
> cause any errors if the kobject is removed, as the vfs cleans up
> properly.
>
I have faced similar situation while working on sysfs backing store for
leaf dentries.
http://marc.theaimsgroup.com/?l=linux-kernel&m=107269078726254&w=2
The problem is that we have live references to the kobject dentry but
kobject is gone. Problems can occur if kobject is accessed
through dentry->d_fsdata field. The fix I did was to take a ref. to the
kobject while linking the dentry with the kobject in create_dir(). This
ref. can be released when dentry ref. count goes to zero, that is when
dentry is being freed, through dentry->d_op->d_iput() call. With this
patch we can have a kobject alive during the life time of the corresponding
dentry.
Please comment.
For testing one has to run the following test on a SMP box:
1) Do insmod/rmmod "dummy.o" network driver in a forever loop.
2) Parallely do "find /sys/class/net | xargs cat" also in a forever loop.
Please build kernel with this sysfs patch from -mm1 tree before running
this test:
http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.1-rc2/2.6.1-rc2-mm1/broken-out/sysfs_remove_dir-vs-dcache_readdir-race-fix.patch
In -rc2 you get following while running the above two things after some time.
Badness in kobject_get at lib/kobject.c:439
Call Trace:
[<c024060d>] kobject_get+0x4d/0x4f
[<c0195937>] check_perm+0x20/0x17d
[<c015d833>] get_empty_filp+0x75/0xf8
[<c0195a94>] sysfs_open_file+0x0/0x5
[<c015b9b5>] dentry_open+0x13b/0x1e1
[<c015b878>] filp_open+0x67/0x69
[<c015bde9>] sys_open+0x5b/0x8b
[<c010af3d>] sysenter_past_esp+0x52/0x71
I think things can go worse, like in check_perm()
static int check_perm(struct inode * inode, struct file * file)
{
struct kobject * kobj = kobject_get(file->f_dentry->d_parent->d_fsdata);
....
d_fsdata could be pointing to a already freed kobject.
-------------------------------------------------------------------------------
o The following pins the kobject when sysfs assigns dentry and inode to
the kobject. This ensures that kobject is alive during the life time of
the dentry and inode, and people holding ref. to the dentry can access the
kobject without any problems.
o The ref. taken for the kobject is released through dentry->d_op->d_iput()
call when the dentry ref. count drops to zero and it is being freed. For
this sysfs_dentry_operations is introduced.
fs/sysfs/dir.c | 15 ++++++++++++++-
1 files changed, 14 insertions(+), 1 deletion(-)
diff -puN fs/sysfs/dir.c~sysfs-pin-kobject fs/sysfs/dir.c
--- linux-2.6.1-rc2/fs/sysfs/dir.c~sysfs-pin-kobject 2004-01-08 11:54:37.000000000 +0530
+++ linux-2.6.1-rc2-maneesh/fs/sysfs/dir.c 2004-01-08 15:29:17.000000000 +0530
@@ -20,6 +20,18 @@ static int init_dir(struct inode * inode
return 0;
}
+static void sysfs_d_iput(struct dentry * dentry, struct inode * inode)
+{
+ struct kobject * kobj = dentry->d_fsdata;
+
+ if (kobj)
+ kobject_put(kobj);
+ iput(inode);
+}
+
+static struct dentry_operations sysfs_dentry_operations = {
+ .d_iput = &sysfs_d_iput,
+};
static int create_dir(struct kobject * k, struct dentry * p,
const char * n, struct dentry ** d)
@@ -33,7 +45,8 @@ static int create_dir(struct kobject * k
S_IFDIR| S_IRWXU | S_IRUGO | S_IXUGO,
init_dir);
if (!error) {
- (*d)->d_fsdata = k;
+ (*d)->d_op = &sysfs_dentry_operations;
+ (*d)->d_fsdata = kobject_get(k);
p->d_inode->i_nlink++;
}
dput(*d);
_
--
Maneesh Soni
Linux Technology Center,
IBM Software Lab, Bangalore, India
email: [email protected]
Phone: 91-80-5044999 Fax: 91-80-5268553
T/L : 9243696
On Thu, Jan 08, 2004 at 04:02:42PM +0530, Maneesh Soni wrote:
>
> The problem is that we have live references to the kobject dentry but
> kobject is gone. Problems can occur if kobject is accessed
> through dentry->d_fsdata field. The fix I did was to take a ref. to the
> kobject while linking the dentry with the kobject in create_dir(). This
> ref. can be released when dentry ref. count goes to zero, that is when
> dentry is being freed, through dentry->d_op->d_iput() call. With this
> patch we can have a kobject alive during the life time of the corresponding
> dentry.
>
> Please comment.
This is the patch already in the -mm tree, right? I think it's already
in Pat's pending queue of patches to send off (we're a bit hampered by
the weather right now in this part of the world...)
thanks,
greg k-h
On Thu, Jan 08, 2004 at 12:19:20PM -0800, Greg KH wrote:
> On Thu, Jan 08, 2004 at 04:02:42PM +0530, Maneesh Soni wrote:
> >
> > The problem is that we have live references to the kobject dentry but
> > kobject is gone. Problems can occur if kobject is accessed
> > through dentry->d_fsdata field. The fix I did was to take a ref. to the
> > kobject while linking the dentry with the kobject in create_dir(). This
> > ref. can be released when dentry ref. count goes to zero, that is when
> > dentry is being freed, through dentry->d_op->d_iput() call. With this
> > patch we can have a kobject alive during the life time of the corresponding
> > dentry.
> >
> > Please comment.
>
> This is the patch already in the -mm tree, right? I think it's already
> in Pat's pending queue of patches to send off (we're a bit hampered by
> the weather right now in this part of the world...)
>
No, it is not in the -mm tree. IIRC, I have not seen similar patch earlier.
If patch looks good, I request Andrew to include. It applies to -rc2-mm1 also.
Re-attached for Andrew.
Thanks
Maneesh
o The following pins the kobject when sysfs assigns dentry and inode to
the kobject. This ensures that kobject is alive during the life time of
the dentry and inode, and people holding ref. to the dentry can access the
kobject without any problems.
o The ref. taken for the kobject is released through dentry->d_op->d_iput()
call when the dentry ref. count drops to zero and it is being freed. For
this sysfs_dentry_operations is introduced.
fs/sysfs/dir.c | 15 ++++++++++++++-
1 files changed, 14 insertions(+), 1 deletion(-)
diff -puN fs/sysfs/dir.c~sysfs-pin-kobject fs/sysfs/dir.c
--- linux-2.6.1-rc2/fs/sysfs/dir.c~sysfs-pin-kobject 2004-01-08 11:54:37.000000000 +0530
+++ linux-2.6.1-rc2-maneesh/fs/sysfs/dir.c 2004-01-08 15:29:17.000000000 +0530
@@ -20,6 +20,18 @@ static int init_dir(struct inode * inode
return 0;
}
+static void sysfs_d_iput(struct dentry * dentry, struct inode * inode)
+{
+ struct kobject * kobj = dentry->d_fsdata;
+
+ if (kobj)
+ kobject_put(kobj);
+ iput(inode);
+}
+
+static struct dentry_operations sysfs_dentry_operations = {
+ .d_iput = &sysfs_d_iput,
+};
static int create_dir(struct kobject * k, struct dentry * p,
const char * n, struct dentry ** d)
@@ -33,7 +45,8 @@ static int create_dir(struct kobject * k
S_IFDIR| S_IRWXU | S_IRUGO | S_IXUGO,
init_dir);
if (!error) {
- (*d)->d_fsdata = k;
+ (*d)->d_op = &sysfs_dentry_operations;
+ (*d)->d_fsdata = kobject_get(k);
p->d_inode->i_nlink++;
}
dput(*d);
_
--
Maneesh Soni
Linux Technology Center,
IBM Software Lab, Bangalore, India
email: [email protected]
Phone: 91-80-5044999 Fax: 91-80-5268553
T/L : 9243696