2009-11-03 11:54:07

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 0/13] sysfs lazification.


The sysfs code updates the vfs caches immediately when the sysfs data
structures change causing a lot of unnecessary complications. The
following patchset untangles that beast. Allowing for simpler
more straight forward code, the removal of a hack from the vfs
to support sysfs, and human comprehensible locking on sysfs.

Most of these patches have already been reviewed and acked from the
last time I had time to work on sysfs.

In net the patches look like:

fs/namei.c | 22 ---
fs/sysfs/dir.c | 388 ++++++++++++++++---------------------------------
fs/sysfs/file.c | 41 +----
fs/sysfs/inode.c | 178 ++++++++++++++---------
fs/sysfs/symlink.c | 11 +-
fs/sysfs/sysfs.h | 9 +-
include/linux/namei.h | 1 -
7 files changed, 256 insertions(+), 394 deletions(-)

Eric


2009-11-03 11:57:18

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 01/13] sysfs: Update sysfs_setxattr so it updates secdata under the sysfs_mutex

From: Eric W. Biederman <[email protected]>

The sysfs_mutex is required to ensure updates are and will remain
atomic with respect to other inode iattr updates, that do not happen
through the filesystem.

Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/inode.c | 41 +++++++++++++++++++++++++++++------------
1 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index e28cecf..8a08250 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -122,23 +122,39 @@ int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
return error;
}

+static int sysfs_sd_setsecdata(struct sysfs_dirent *sd, void **secdata, u32 *secdata_len)
+{
+ struct sysfs_inode_attrs *iattrs;
+ void *old_secdata;
+ size_t old_secdata_len;
+
+ iattrs = sd->s_iattr;
+ if (!iattrs)
+ iattrs = sysfs_init_inode_attrs(sd);
+ if (!iattrs)
+ return -ENOMEM;
+
+ old_secdata = iattrs->ia_secdata;
+ old_secdata_len = iattrs->ia_secdata_len;
+
+ iattrs->ia_secdata = *secdata;
+ iattrs->ia_secdata_len = *secdata_len;
+
+ *secdata = old_secdata;
+ *secdata_len = old_secdata_len;
+ return 0;
+}
+
int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
size_t size, int flags)
{
struct sysfs_dirent *sd = dentry->d_fsdata;
- struct sysfs_inode_attrs *iattrs;
void *secdata;
int error;
u32 secdata_len = 0;

if (!sd)
return -EINVAL;
- if (!sd->s_iattr)
- sd->s_iattr = sysfs_init_inode_attrs(sd);
- if (!sd->s_iattr)
- return -ENOMEM;
-
- iattrs = sd->s_iattr;

if (!strncmp(name, XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN)) {
const char *suffix = name + XATTR_SECURITY_PREFIX_LEN;
@@ -150,12 +166,13 @@ int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
&secdata, &secdata_len);
if (error)
goto out;
- if (iattrs->ia_secdata)
- security_release_secctx(iattrs->ia_secdata,
- iattrs->ia_secdata_len);
- iattrs->ia_secdata = secdata;
- iattrs->ia_secdata_len = secdata_len;

+ mutex_lock(&sysfs_mutex);
+ error = sysfs_sd_setsecdata(sd, &secdata, &secdata_len);
+ mutex_unlock(&sysfs_mutex);
+
+ if (secdata)
+ security_release_secctx(secdata, secdata_len);
} else
return -EINVAL;
out:
--
1.6.5.2.143.g8cc62

2009-11-03 11:57:20

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 02/13] sysfs: Rename sysfs_d_iput to sysfs_dentry_iput

From: Eric W. Biederman <[email protected]>

Using dentry instead of d in the function name is what
several other filesystems are doing and it seems to be
a more readable convention.

Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/dir.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index e020183..130dfc3 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -298,7 +298,7 @@ void release_sysfs_dirent(struct sysfs_dirent * sd)
goto repeat;
}

-static void sysfs_d_iput(struct dentry * dentry, struct inode * inode)
+static void sysfs_dentry_iput(struct dentry * dentry, struct inode * inode)
{
struct sysfs_dirent * sd = dentry->d_fsdata;

@@ -307,7 +307,7 @@ static void sysfs_d_iput(struct dentry * dentry, struct inode * inode)
}

static const struct dentry_operations sysfs_dentry_ops = {
- .d_iput = sysfs_d_iput,
+ .d_iput = sysfs_dentry_iput,
};

struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type)
--
1.6.5.2.143.g8cc62

2009-11-03 11:59:10

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 03/13] sysfs: Use dentry_ops instead of directly playing with the dcache

From: Eric W. Biederman <[email protected]>

Calling d_drop unconditionally when a sysfs_dirent is deleted has
the potential to leak mounts, so instead implement dentry delete
and revalidate operations that cause sysfs dentries to be removed
at the appropriate time.

Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/dir.c | 73 +++++++++++++++++++++++++++++++++++--------------------
1 files changed, 46 insertions(+), 27 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 130dfc3..b5e8499 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -298,6 +298,46 @@ void release_sysfs_dirent(struct sysfs_dirent * sd)
goto repeat;
}

+static int sysfs_dentry_delete(struct dentry *dentry)
+{
+ struct sysfs_dirent *sd = dentry->d_fsdata;
+ return !!(sd->s_flags & SYSFS_FLAG_REMOVED);
+}
+
+static int sysfs_dentry_revalidate(struct dentry *dentry, struct nameidata *nd)
+{
+ struct sysfs_dirent *sd = dentry->d_fsdata;
+ int is_dir;
+
+ mutex_lock(&sysfs_mutex);
+
+ /* The sysfs dirent has been deleted */
+ if (sd->s_flags & SYSFS_FLAG_REMOVED)
+ goto out_bad;
+
+ mutex_unlock(&sysfs_mutex);
+out_valid:
+ return 1;
+out_bad:
+ /* Remove the dentry from the dcache hashes.
+ * If this is a deleted dentry we use d_drop instead of d_delete
+ * so sysfs doesn't need to cope with negative dentries.
+ */
+ is_dir = (sysfs_type(sd) == SYSFS_DIR);
+ mutex_unlock(&sysfs_mutex);
+ if (is_dir) {
+ /* If we have submounts we must allow the vfs caches
+ * to lie about the state of the filesystem to prevent
+ * leaks and other nasty things.
+ */
+ if (have_submounts(dentry))
+ goto out_valid;
+ shrink_dcache_parent(dentry);
+ }
+ d_drop(dentry);
+ return 0;
+}
+
static void sysfs_dentry_iput(struct dentry * dentry, struct inode * inode)
{
struct sysfs_dirent * sd = dentry->d_fsdata;
@@ -307,6 +347,8 @@ static void sysfs_dentry_iput(struct dentry * dentry, struct inode * inode)
}

static const struct dentry_operations sysfs_dentry_ops = {
+ .d_revalidate = sysfs_dentry_revalidate,
+ .d_delete = sysfs_dentry_delete,
.d_iput = sysfs_dentry_iput,
};

@@ -527,44 +569,21 @@ void sysfs_remove_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
}

/**
- * sysfs_drop_dentry - drop dentry for the specified sysfs_dirent
+ * sysfs_dec_nlink - Decrement link count for the specified sysfs_dirent
* @sd: target sysfs_dirent
*
- * Drop dentry for @sd. @sd must have been unlinked from its
+ * Decrement nlink for @sd. @sd must have been unlinked from its
* parent on entry to this function such that it can't be looked
* up anymore.
*/
-static void sysfs_drop_dentry(struct sysfs_dirent *sd)
+static void sysfs_dec_nlink(struct sysfs_dirent *sd)
{
struct inode *inode;
- struct dentry *dentry;

inode = ilookup(sysfs_sb, sd->s_ino);
if (!inode)
return;

- /* Drop any existing dentries associated with sd.
- *
- * For the dentry to be properly freed we need to grab a
- * reference to the dentry under the dcache lock, unhash it,
- * and then put it. The playing with the dentry count allows
- * dput to immediately free the dentry if it is not in use.
- */
-repeat:
- spin_lock(&dcache_lock);
- list_for_each_entry(dentry, &inode->i_dentry, d_alias) {
- if (d_unhashed(dentry))
- continue;
- dget_locked(dentry);
- spin_lock(&dentry->d_lock);
- __d_drop(dentry);
- spin_unlock(&dentry->d_lock);
- spin_unlock(&dcache_lock);
- dput(dentry);
- goto repeat;
- }
- spin_unlock(&dcache_lock);
-
/* adjust nlink and update timestamp */
mutex_lock(&inode->i_mutex);

@@ -611,7 +630,7 @@ void sysfs_addrm_finish(struct sysfs_addrm_cxt *acxt)
acxt->removed = sd->s_sibling;
sd->s_sibling = NULL;

- sysfs_drop_dentry(sd);
+ sysfs_dec_nlink(sd);
sysfs_deactivate(sd);
unmap_bin_file(sd);
sysfs_put(sd);
--
1.6.5.2.143.g8cc62

2009-11-03 11:59:46

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 04/13] sysfs: Simplify sysfs_chmod_file semantics

From: Eric W. Biederman <[email protected]>

Currently every caller of sysfs_chmod_file happens at either
file creation time to set a non-default mode or in response
to a specific user requested space change in policy. Making
timestamps of when the chmod happens and notification of
a file changing mode uninteresting.

Remove the unnecessary time stamp and filesystem change
notification, and removes the last of the explicit inotify
and donitfy support from sysfs.

Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/file.c | 10 +---------
1 files changed, 1 insertions(+), 9 deletions(-)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index f5ea468..faa1a80 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -604,17 +604,9 @@ int sysfs_chmod_file(struct kobject *kobj, struct attribute *attr, mode_t mode)
mutex_lock(&inode->i_mutex);

newattrs.ia_mode = (mode & S_IALLUGO) | (inode->i_mode & ~S_IALLUGO);
- newattrs.ia_valid = ATTR_MODE | ATTR_CTIME;
- newattrs.ia_ctime = current_fs_time(inode->i_sb);
+ newattrs.ia_valid = ATTR_MODE;
rc = sysfs_setattr(victim, &newattrs);

- if (rc == 0) {
- fsnotify_change(victim, newattrs.ia_valid);
- mutex_lock(&sysfs_mutex);
- victim_sd->s_mode = newattrs.ia_mode;
- mutex_unlock(&sysfs_mutex);
- }
-
mutex_unlock(&inode->i_mutex);
out:
dput(victim);
--
1.6.5.2.143.g8cc62

2009-11-03 11:57:25

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 05/13] sysfs: Simplify iattr time assignments

From: Eric W. Biederman <[email protected]>

The granularity of sysfs time when we keep it is 1 ns. Which
when passed to timestamp_trunc results in a nop. So remove
the unnecessary function call making sysfs_setattr slightly
easier to read.

Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/inode.c | 9 +++------
1 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index 8a08250..fed7a74 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -103,14 +103,11 @@ int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
if (ia_valid & ATTR_GID)
iattrs->ia_gid = iattr->ia_gid;
if (ia_valid & ATTR_ATIME)
- iattrs->ia_atime = timespec_trunc(iattr->ia_atime,
- inode->i_sb->s_time_gran);
+ iattrs->ia_atime = iattr->ia_atime;
if (ia_valid & ATTR_MTIME)
- iattrs->ia_mtime = timespec_trunc(iattr->ia_mtime,
- inode->i_sb->s_time_gran);
+ iattrs->ia_mtime = iattr->ia_mtime;
if (ia_valid & ATTR_CTIME)
- iattrs->ia_ctime = timespec_trunc(iattr->ia_ctime,
- inode->i_sb->s_time_gran);
+ iattrs->ia_ctime = iattr->ia_ctime;
if (ia_valid & ATTR_MODE) {
umode_t mode = iattr->ia_mode;

--
1.6.5.2.143.g8cc62

2009-11-03 11:57:24

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 06/13] sysfs: Fix locking and factor out sysfs_sd_setattr

From: Eric W. Biederman <[email protected]>

Cleanly separate the work that is specific to setting the
attributes of a sysfs_dirent from what is needed to update
the attributes of a vfs inode.

Additionally grab the sysfs_mutex to keep any nasties from
surprising us when updating the sysfs_dirent.

Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/inode.c | 52 ++++++++++++++++++++++++++++++++--------------------
fs/sysfs/sysfs.h | 1 +
2 files changed, 33 insertions(+), 20 deletions(-)

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index fed7a74..fccfb55 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -64,30 +64,15 @@ struct sysfs_inode_attrs *sysfs_init_inode_attrs(struct sysfs_dirent *sd)

return attrs;
}
-int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
+
+int sysfs_sd_setattr(struct sysfs_dirent *sd, struct iattr * iattr)
{
- struct inode * inode = dentry->d_inode;
- struct sysfs_dirent * sd = dentry->d_fsdata;
struct sysfs_inode_attrs *sd_attrs;
struct iattr *iattrs;
unsigned int ia_valid = iattr->ia_valid;
- int error;
-
- if (!sd)
- return -EINVAL;

sd_attrs = sd->s_iattr;

- error = inode_change_ok(inode, iattr);
- if (error)
- return error;
-
- iattr->ia_valid &= ~ATTR_SIZE; /* ignore size changes */
-
- error = inode_setattr(inode, iattr);
- if (error)
- return error;
-
if (!sd_attrs) {
/* setting attributes for the first time, allocate now */
sd_attrs = sysfs_init_inode_attrs(sd);
@@ -110,12 +95,39 @@ int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
iattrs->ia_ctime = iattr->ia_ctime;
if (ia_valid & ATTR_MODE) {
umode_t mode = iattr->ia_mode;
-
- if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
- mode &= ~S_ISGID;
iattrs->ia_mode = sd->s_mode = mode;
}
}
+ return 0;
+}
+
+int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
+{
+ struct inode * inode = dentry->d_inode;
+ struct sysfs_dirent * sd = dentry->d_fsdata;
+ int error;
+
+ if (!sd)
+ return -EINVAL;
+
+ error = inode_change_ok(inode, iattr);
+ if (error)
+ return error;
+
+ iattr->ia_valid &= ~ATTR_SIZE; /* ignore size changes */
+ if (iattr->ia_valid & ATTR_MODE) {
+ if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
+ iattr->ia_mode &= ~S_ISGID;
+ }
+
+ error = inode_setattr(inode, iattr);
+ if (error)
+ return error;
+
+ mutex_lock(&sysfs_mutex);
+ error = sysfs_sd_setattr(sd, iattr);
+ mutex_unlock(&sysfs_mutex);
+
return error;
}

diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index af4c4e7..a96d967 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -155,6 +155,7 @@ static inline void __sysfs_put(struct sysfs_dirent *sd)
*/
struct inode *sysfs_get_inode(struct sysfs_dirent *sd);
void sysfs_delete_inode(struct inode *inode);
+int sysfs_sd_setattr(struct sysfs_dirent *sd, struct iattr *iattr);
int sysfs_setattr(struct dentry *dentry, struct iattr *iattr);
int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
size_t size, int flags);
--
1.6.5.2.143.g8cc62

2009-11-03 11:59:12

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 07/13] sysfs: Update s_iattr on link and unlink.

From: Eric W. Biederman <[email protected]>

Currently sysfs updates the timestamps on the vfs directory
inode when we create or remove a directory entry but doesn't
update the cached copy on the sysfs_dirent, fix that oversight.

Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/dir.c | 18 ++++++++++++++++++
1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index b5e8499..fa37126 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -464,6 +464,8 @@ void sysfs_addrm_start(struct sysfs_addrm_cxt *acxt,
*/
int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
{
+ struct sysfs_inode_attrs *ps_iattr;
+
if (sysfs_find_dirent(acxt->parent_sd, sd->s_name))
return -EEXIST;

@@ -476,6 +478,13 @@ int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)

sysfs_link_sibling(sd);

+ /* Update timestamps on the parent */
+ ps_iattr = acxt->parent_sd->s_iattr;
+ if (ps_iattr) {
+ struct iattr *ps_iattrs = &ps_iattr->ia_iattr;
+ ps_iattrs->ia_ctime = ps_iattrs->ia_mtime = CURRENT_TIME;
+ }
+
return 0;
}

@@ -554,10 +563,19 @@ int sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
*/
void sysfs_remove_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
{
+ struct sysfs_inode_attrs *ps_iattr;
+
BUG_ON(sd->s_flags & SYSFS_FLAG_REMOVED);

sysfs_unlink_sibling(sd);

+ /* Update timestamps on the parent */
+ ps_iattr = acxt->parent_sd->s_iattr;
+ if (ps_iattr) {
+ struct iattr *ps_iattrs = &ps_iattr->ia_iattr;
+ ps_iattrs->ia_ctime = ps_iattrs->ia_mtime = CURRENT_TIME;
+ }
+
sd->s_flags |= SYSFS_FLAG_REMOVED;
sd->s_sibling = acxt->removed;
acxt->removed = sd;
--
1.6.5.2.143.g8cc62

2009-11-03 11:58:51

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 08/13] sysfs: Nicely indent sysfs_symlink_inode_operations

From: Eric W. Biederman <[email protected]>

Lining up the functions in sysfs_symlink_inode_operations
follows the pattern in the rest of sysfs and makes things
slightly more readable.

Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/symlink.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
index c5081ad..1137418 100644
--- a/fs/sysfs/symlink.c
+++ b/fs/sysfs/symlink.c
@@ -210,10 +210,10 @@ static void sysfs_put_link(struct dentry *dentry, struct nameidata *nd, void *co
}

const struct inode_operations sysfs_symlink_inode_operations = {
- .setxattr = sysfs_setxattr,
- .readlink = generic_readlink,
- .follow_link = sysfs_follow_link,
- .put_link = sysfs_put_link,
+ .setxattr = sysfs_setxattr,
+ .readlink = generic_readlink,
+ .follow_link = sysfs_follow_link,
+ .put_link = sysfs_put_link,
};


--
1.6.5.2.143.g8cc62

2009-11-03 11:57:26

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 09/13] sysfs: Implement sysfs_getattr & sysfs_permission

From: Eric W. Biederman <[email protected]>

With the implementation of sysfs_getattr and sysfs_permission
sysfs becomes able to lazily propogate inode attribute changes
from the sysfs_dirents to the vfs inodes. This paves the way
for deleting significant chunks of now unnecessary code.

Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/dir.c | 2 +
fs/sysfs/inode.c | 64 ++++++++++++++++++++++++++++++++++++++-------------
fs/sysfs/symlink.c | 3 ++
fs/sysfs/sysfs.h | 2 +
4 files changed, 54 insertions(+), 17 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index fa37126..25d052a 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -800,7 +800,9 @@ static struct dentry * sysfs_lookup(struct inode *dir, struct dentry *dentry,

const struct inode_operations sysfs_dir_inode_operations = {
.lookup = sysfs_lookup,
+ .permission = sysfs_permission,
.setattr = sysfs_setattr,
+ .getattr = sysfs_getattr,
.setxattr = sysfs_setxattr,
};

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index fccfb55..2dcafe8 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -37,7 +37,9 @@ static struct backing_dev_info sysfs_backing_dev_info = {
};

static const struct inode_operations sysfs_inode_operations ={
+ .permission = sysfs_permission,
.setattr = sysfs_setattr,
+ .getattr = sysfs_getattr,
.setxattr = sysfs_setxattr,
};

@@ -196,7 +198,6 @@ static inline void set_default_inode_attr(struct inode * inode, mode_t mode)

static inline void set_inode_attr(struct inode * inode, struct iattr * iattr)
{
- inode->i_mode = iattr->ia_mode;
inode->i_uid = iattr->ia_uid;
inode->i_gid = iattr->ia_gid;
inode->i_atime = iattr->ia_atime;
@@ -227,38 +228,56 @@ static int sysfs_count_nlink(struct sysfs_dirent *sd)
return nr + 2;
}

+static void sysfs_refresh_inode(struct sysfs_dirent *sd, struct inode *inode)
+{
+ struct sysfs_inode_attrs *iattrs = sd->s_iattr;
+
+ inode->i_mode = sd->s_mode;
+ if (iattrs) {
+ /* sysfs_dirent has non-default attributes
+ * get them from persistent copy in sysfs_dirent
+ */
+ set_inode_attr(inode, &iattrs->ia_iattr);
+ security_inode_notifysecctx(inode,
+ iattrs->ia_secdata,
+ iattrs->ia_secdata_len);
+ }
+
+ if (sysfs_type(sd) == SYSFS_DIR)
+ inode->i_nlink = sysfs_count_nlink(sd);
+}
+
+int sysfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
+{
+ struct sysfs_dirent *sd = dentry->d_fsdata;
+ struct inode *inode = dentry->d_inode;
+
+ mutex_lock(&sysfs_mutex);
+ sysfs_refresh_inode(sd, inode);
+ mutex_unlock(&sysfs_mutex);
+
+ generic_fillattr(inode, stat);
+ return 0;
+}
+
static void sysfs_init_inode(struct sysfs_dirent *sd, struct inode *inode)
{
struct bin_attribute *bin_attr;
- struct sysfs_inode_attrs *iattrs;

inode->i_private = sysfs_get(sd);
inode->i_mapping->a_ops = &sysfs_aops;
inode->i_mapping->backing_dev_info = &sysfs_backing_dev_info;
inode->i_op = &sysfs_inode_operations;
- inode->i_ino = sd->s_ino;
lockdep_set_class(&inode->i_mutex, &sysfs_inode_imutex_key);

- iattrs = sd->s_iattr;
- if (iattrs) {
- /* sysfs_dirent has non-default attributes
- * get them for the new inode from persistent copy
- * in sysfs_dirent
- */
- set_inode_attr(inode, &iattrs->ia_iattr);
- if (iattrs->ia_secdata)
- security_inode_notifysecctx(inode,
- iattrs->ia_secdata,
- iattrs->ia_secdata_len);
- } else
- set_default_inode_attr(inode, sd->s_mode);
+ set_default_inode_attr(inode, sd->s_mode);
+ sysfs_refresh_inode(sd, inode);

/* initialize inode according to type */
switch (sysfs_type(sd)) {
case SYSFS_DIR:
inode->i_op = &sysfs_dir_inode_operations;
inode->i_fop = &sysfs_dir_operations;
- inode->i_nlink = sysfs_count_nlink(sd);
break;
case SYSFS_KOBJ_ATTR:
inode->i_size = PAGE_SIZE;
@@ -341,3 +360,14 @@ int sysfs_hash_and_remove(struct sysfs_dirent *dir_sd, const char *name)
else
return -ENOENT;
}
+
+int sysfs_permission(struct inode *inode, int mask)
+{
+ struct sysfs_dirent *sd = inode->i_private;
+
+ mutex_lock(&sysfs_mutex);
+ sysfs_refresh_inode(sd, inode);
+ mutex_unlock(&sysfs_mutex);
+
+ return generic_permission(inode, mask, NULL);
+}
diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
index 1137418..c5eff49 100644
--- a/fs/sysfs/symlink.c
+++ b/fs/sysfs/symlink.c
@@ -214,6 +214,9 @@ const struct inode_operations sysfs_symlink_inode_operations = {
.readlink = generic_readlink,
.follow_link = sysfs_follow_link,
.put_link = sysfs_put_link,
+ .setattr = sysfs_setattr,
+ .getattr = sysfs_getattr,
+ .permission = sysfs_permission,
};


diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index a96d967..12ccc07 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -156,7 +156,9 @@ static inline void __sysfs_put(struct sysfs_dirent *sd)
struct inode *sysfs_get_inode(struct sysfs_dirent *sd);
void sysfs_delete_inode(struct inode *inode);
int sysfs_sd_setattr(struct sysfs_dirent *sd, struct iattr *iattr);
+int sysfs_permission(struct inode *inode, int mask);
int sysfs_setattr(struct dentry *dentry, struct iattr *iattr);
+int sysfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat);
int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
size_t size, int flags);
int sysfs_hash_and_remove(struct sysfs_dirent *dir_sd, const char *name);
--
1.6.5.2.143.g8cc62

2009-11-03 11:57:47

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 10/13] sysfs: In sysfs_chmod_file lazily propagate the mode change.

From: Eric W. Biederman <[email protected]>

Now that sysfs_getattr and sysfs_permission refresh the vfs
inode there is no need to immediatly push the mode change
into the vfs cache. Reducing the amount of work needed and
simplifying the locking.

Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/file.c | 31 ++++++++-----------------------
1 files changed, 8 insertions(+), 23 deletions(-)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index faa1a80..dc30d9e 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -579,38 +579,23 @@ EXPORT_SYMBOL_GPL(sysfs_add_file_to_group);
*/
int sysfs_chmod_file(struct kobject *kobj, struct attribute *attr, mode_t mode)
{
- struct sysfs_dirent *victim_sd = NULL;
- struct dentry *victim = NULL;
- struct inode * inode;
+ struct sysfs_dirent *sd;
struct iattr newattrs;
int rc;

- rc = -ENOENT;
- victim_sd = sysfs_get_dirent(kobj->sd, attr->name);
- if (!victim_sd)
- goto out;
+ mutex_lock(&sysfs_mutex);

- mutex_lock(&sysfs_rename_mutex);
- victim = sysfs_get_dentry(victim_sd);
- mutex_unlock(&sysfs_rename_mutex);
- if (IS_ERR(victim)) {
- rc = PTR_ERR(victim);
- victim = NULL;
+ rc = -ENOENT;
+ sd = sysfs_find_dirent(kobj->sd, attr->name);
+ if (!sd)
goto out;
- }
-
- inode = victim->d_inode;

- mutex_lock(&inode->i_mutex);
-
- newattrs.ia_mode = (mode & S_IALLUGO) | (inode->i_mode & ~S_IALLUGO);
+ newattrs.ia_mode = (mode & S_IALLUGO) | (sd->s_mode & ~S_IALLUGO);
newattrs.ia_valid = ATTR_MODE;
- rc = sysfs_setattr(victim, &newattrs);
+ rc = sysfs_sd_setattr(sd, &newattrs);

- mutex_unlock(&inode->i_mutex);
out:
- dput(victim);
- sysfs_put(victim_sd);
+ mutex_unlock(&sysfs_mutex);
return rc;
}
EXPORT_SYMBOL_GPL(sysfs_chmod_file);
--
1.6.5.2.143.g8cc62

2009-11-03 11:58:36

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 11/13] sysfs: Gut sysfs_addrm_start and sysfs_addrm_finish

From: Eric W. Biederman <[email protected]>

With lazy inode updates and dentry operations bringing everything
into sync on demand there is no longer any need to immediately
update the vfs or grab i_mutex to protect those updates as we
make changes to sysfs.

Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/dir.c | 91 ++---------------------------------------------------
fs/sysfs/sysfs.h | 2 -
2 files changed, 4 insertions(+), 89 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 25d052a..a05b027 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -386,12 +386,6 @@ struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type)
return NULL;
}

-static int sysfs_ilookup_test(struct inode *inode, void *arg)
-{
- struct sysfs_dirent *sd = arg;
- return inode->i_ino == sd->s_ino;
-}
-
/**
* sysfs_addrm_start - prepare for sysfs_dirent add/remove
* @acxt: pointer to sysfs_addrm_cxt to be used
@@ -399,47 +393,20 @@ static int sysfs_ilookup_test(struct inode *inode, void *arg)
*
* This function is called when the caller is about to add or
* remove sysfs_dirent under @parent_sd. This function acquires
- * sysfs_mutex, grabs inode for @parent_sd if available and lock
- * i_mutex of it. @acxt is used to keep and pass context to
+ * sysfs_mutex. @acxt is used to keep and pass context to
* other addrm functions.
*
* LOCKING:
* Kernel thread context (may sleep). sysfs_mutex is locked on
- * return. i_mutex of parent inode is locked on return if
- * available.
+ * return.
*/
void sysfs_addrm_start(struct sysfs_addrm_cxt *acxt,
struct sysfs_dirent *parent_sd)
{
- struct inode *inode;
-
memset(acxt, 0, sizeof(*acxt));
acxt->parent_sd = parent_sd;

- /* Lookup parent inode. inode initialization is protected by
- * sysfs_mutex, so inode existence can be determined by
- * looking up inode while holding sysfs_mutex.
- */
mutex_lock(&sysfs_mutex);
-
- inode = ilookup5(sysfs_sb, parent_sd->s_ino, sysfs_ilookup_test,
- parent_sd);
- if (inode) {
- WARN_ON(inode->i_state & I_NEW);
-
- /* parent inode available */
- acxt->parent_inode = inode;
-
- /* sysfs_mutex is below i_mutex in lock hierarchy.
- * First, trylock i_mutex. If fails, unlock
- * sysfs_mutex and lock them in order.
- */
- if (!mutex_trylock(&inode->i_mutex)) {
- mutex_unlock(&sysfs_mutex);
- mutex_lock(&inode->i_mutex);
- mutex_lock(&sysfs_mutex);
- }
- }
}

/**
@@ -471,11 +438,6 @@ int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)

sd->s_parent = sysfs_get(acxt->parent_sd);

- if (sysfs_type(sd) == SYSFS_DIR && acxt->parent_inode)
- inc_nlink(acxt->parent_inode);
-
- acxt->cnt++;
-
sysfs_link_sibling(sd);

/* Update timestamps on the parent */
@@ -579,40 +541,6 @@ void sysfs_remove_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
sd->s_flags |= SYSFS_FLAG_REMOVED;
sd->s_sibling = acxt->removed;
acxt->removed = sd;
-
- if (sysfs_type(sd) == SYSFS_DIR && acxt->parent_inode)
- drop_nlink(acxt->parent_inode);
-
- acxt->cnt++;
-}
-
-/**
- * sysfs_dec_nlink - Decrement link count for the specified sysfs_dirent
- * @sd: target sysfs_dirent
- *
- * Decrement nlink for @sd. @sd must have been unlinked from its
- * parent on entry to this function such that it can't be looked
- * up anymore.
- */
-static void sysfs_dec_nlink(struct sysfs_dirent *sd)
-{
- struct inode *inode;
-
- inode = ilookup(sysfs_sb, sd->s_ino);
- if (!inode)
- return;
-
- /* adjust nlink and update timestamp */
- mutex_lock(&inode->i_mutex);
-
- inode->i_ctime = CURRENT_TIME;
- drop_nlink(inode);
- if (sysfs_type(sd) == SYSFS_DIR)
- drop_nlink(inode);
-
- mutex_unlock(&inode->i_mutex);
-
- iput(inode);
}

/**
@@ -621,25 +549,15 @@ static void sysfs_dec_nlink(struct sysfs_dirent *sd)
*
* Finish up sysfs_dirent add/remove. Resources acquired by
* sysfs_addrm_start() are released and removed sysfs_dirents are
- * cleaned up. Timestamps on the parent inode are updated.
+ * cleaned up.
*
* LOCKING:
- * All mutexes acquired by sysfs_addrm_start() are released.
+ * sysfs_mutex is released.
*/
void sysfs_addrm_finish(struct sysfs_addrm_cxt *acxt)
{
/* release resources acquired by sysfs_addrm_start() */
mutex_unlock(&sysfs_mutex);
- if (acxt->parent_inode) {
- struct inode *inode = acxt->parent_inode;
-
- /* if added/removed, update timestamps on the parent */
- if (acxt->cnt)
- inode->i_ctime = inode->i_mtime = CURRENT_TIME;
-
- mutex_unlock(&inode->i_mutex);
- iput(inode);
- }

/* kill removed sysfs_dirents */
while (acxt->removed) {
@@ -648,7 +566,6 @@ void sysfs_addrm_finish(struct sysfs_addrm_cxt *acxt)
acxt->removed = sd->s_sibling;
sd->s_sibling = NULL;

- sysfs_dec_nlink(sd);
sysfs_deactivate(sd);
unmap_bin_file(sd);
sysfs_put(sd);
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index 12ccc07..90b3501 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -89,9 +89,7 @@ static inline unsigned int sysfs_type(struct sysfs_dirent *sd)
*/
struct sysfs_addrm_cxt {
struct sysfs_dirent *parent_sd;
- struct inode *parent_inode;
struct sysfs_dirent *removed;
- int cnt;
};

/*
--
1.6.5.2.143.g8cc62

2009-11-03 11:57:51

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 12/13] sysfs: Propagate renames to the vfs on demand

From: Eric W. Biederman <[email protected]>

By teaching sysfs_revalidate to hide a dentry for
a sysfs_dirent if the sysfs_dirent has been renamed,
and by teaching sysfs_lookup to return the original
dentry if the sysfs dirent has been renamed. I can
show the results of renames correctly without having to
update the dcache during the directory rename.

This massively simplifies the rename logic allowing a lot
of weird sysfs special cases to be removed along with
a lot of now unnecesary helper code.

Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/namei.c | 22 -------
fs/sysfs/dir.c | 158 ++++++++++---------------------------------------
fs/sysfs/inode.c | 12 ----
fs/sysfs/sysfs.h | 1 -
include/linux/namei.h | 1 -
5 files changed, 32 insertions(+), 162 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index d11f404..d3c190c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1279,28 +1279,6 @@ struct dentry *lookup_one_len(const char *name, struct dentry *base, int len)
return __lookup_hash(&this, base, NULL);
}

-/**
- * lookup_one_noperm - bad hack for sysfs
- * @name: pathname component to lookup
- * @base: base directory to lookup from
- *
- * This is a variant of lookup_one_len that doesn't perform any permission
- * checks. It's a horrible hack to work around the braindead sysfs
- * architecture and should not be used anywhere else.
- *
- * DON'T USE THIS FUNCTION EVER, thanks.
- */
-struct dentry *lookup_one_noperm(const char *name, struct dentry *base)
-{
- int err;
- struct qstr this;
-
- err = __lookup_one_len(name, &this, base, strlen(name));
- if (err)
- return ERR_PTR(err);
- return __lookup_hash(&this, base, NULL);
-}
-
int user_path_at(int dfd, const char __user *name, unsigned flags,
struct path *path)
{
diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index a05b027..0b60212 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -25,7 +25,6 @@
#include "sysfs.h"

DEFINE_MUTEX(sysfs_mutex);
-DEFINE_MUTEX(sysfs_rename_mutex);
DEFINE_SPINLOCK(sysfs_assoc_lock);

static DEFINE_SPINLOCK(sysfs_ino_lock);
@@ -85,46 +84,6 @@ static void sysfs_unlink_sibling(struct sysfs_dirent *sd)
}

/**
- * sysfs_get_dentry - get dentry for the given sysfs_dirent
- * @sd: sysfs_dirent of interest
- *
- * Get dentry for @sd. Dentry is looked up if currently not
- * present. This function descends from the root looking up
- * dentry for each step.
- *
- * LOCKING:
- * mutex_lock(sysfs_rename_mutex)
- *
- * RETURNS:
- * Pointer to found dentry on success, ERR_PTR() value on error.
- */
-struct dentry *sysfs_get_dentry(struct sysfs_dirent *sd)
-{
- struct dentry *dentry = dget(sysfs_sb->s_root);
-
- while (dentry->d_fsdata != sd) {
- struct sysfs_dirent *cur;
- struct dentry *parent;
-
- /* find the first ancestor which hasn't been looked up */
- cur = sd;
- while (cur->s_parent != dentry->d_fsdata)
- cur = cur->s_parent;
-
- /* look it up */
- parent = dentry;
- mutex_lock(&parent->d_inode->i_mutex);
- dentry = lookup_one_noperm(cur->s_name, parent);
- mutex_unlock(&parent->d_inode->i_mutex);
- dput(parent);
-
- if (IS_ERR(dentry))
- break;
- }
- return dentry;
-}
-
-/**
* sysfs_get_active - get an active reference to sysfs_dirent
* @sd: sysfs_dirent to get an active reference to
*
@@ -315,6 +274,14 @@ static int sysfs_dentry_revalidate(struct dentry *dentry, struct nameidata *nd)
if (sd->s_flags & SYSFS_FLAG_REMOVED)
goto out_bad;

+ /* The sysfs dirent has been moved? */
+ if (dentry->d_parent->d_fsdata != sd->s_parent)
+ goto out_bad;
+
+ /* The sysfs dirent has been renamed */
+ if (strcmp(dentry->d_name.name, sd->s_name) != 0)
+ goto out_bad;
+
mutex_unlock(&sysfs_mutex);
out_valid:
return 1;
@@ -322,6 +289,12 @@ out_bad:
/* Remove the dentry from the dcache hashes.
* If this is a deleted dentry we use d_drop instead of d_delete
* so sysfs doesn't need to cope with negative dentries.
+ *
+ * If this is a dentry that has simply been renamed we
+ * use d_drop to remove it from the dcache lookup on its
+ * old parent. If this dentry persists later when a lookup
+ * is performed at its new name the dentry will be readded
+ * to the dcache hashes.
*/
is_dir = (sysfs_type(sd) == SYSFS_DIR);
mutex_unlock(&sysfs_mutex);
@@ -705,10 +678,15 @@ static struct dentry * sysfs_lookup(struct inode *dir, struct dentry *dentry,
}

/* instantiate and hash dentry */
- dentry->d_op = &sysfs_dentry_ops;
- dentry->d_fsdata = sysfs_get(sd);
- d_instantiate(dentry, inode);
- d_rehash(dentry);
+ ret = d_find_alias(inode);
+ if (!ret) {
+ dentry->d_op = &sysfs_dentry_ops;
+ dentry->d_fsdata = sysfs_get(sd);
+ d_add(dentry, inode);
+ } else {
+ d_move(ret, dentry);
+ iput(inode);
+ }

out_unlock:
mutex_unlock(&sysfs_mutex);
@@ -785,62 +763,32 @@ void sysfs_remove_dir(struct kobject * kobj)
int sysfs_rename_dir(struct kobject * kobj, const char *new_name)
{
struct sysfs_dirent *sd = kobj->sd;
- struct dentry *parent = NULL;
- struct dentry *old_dentry = NULL, *new_dentry = NULL;
const char *dup_name = NULL;
int error;

- mutex_lock(&sysfs_rename_mutex);
+ mutex_lock(&sysfs_mutex);

error = 0;
if (strcmp(sd->s_name, new_name) == 0)
goto out; /* nothing to rename */

- /* get the original dentry */
- old_dentry = sysfs_get_dentry(sd);
- if (IS_ERR(old_dentry)) {
- error = PTR_ERR(old_dentry);
- old_dentry = NULL;
- goto out;
- }
-
- parent = old_dentry->d_parent;
-
- /* lock parent and get dentry for new name */
- mutex_lock(&parent->d_inode->i_mutex);
- mutex_lock(&sysfs_mutex);
-
error = -EEXIST;
if (sysfs_find_dirent(sd->s_parent, new_name))
- goto out_unlock;
-
- error = -ENOMEM;
- new_dentry = d_alloc_name(parent, new_name);
- if (!new_dentry)
- goto out_unlock;
+ goto out;

/* rename sysfs_dirent */
error = -ENOMEM;
new_name = dup_name = kstrdup(new_name, GFP_KERNEL);
if (!new_name)
- goto out_unlock;
+ goto out;

dup_name = sd->s_name;
sd->s_name = new_name;

- /* rename */
- d_add(new_dentry, NULL);
- d_move(old_dentry, new_dentry);
-
error = 0;
- out_unlock:
+ out:
mutex_unlock(&sysfs_mutex);
- mutex_unlock(&parent->d_inode->i_mutex);
kfree(dup_name);
- dput(old_dentry);
- dput(new_dentry);
- out:
- mutex_unlock(&sysfs_rename_mutex);
return error;
}

@@ -848,12 +796,11 @@ int sysfs_move_dir(struct kobject *kobj, struct kobject *new_parent_kobj)
{
struct sysfs_dirent *sd = kobj->sd;
struct sysfs_dirent *new_parent_sd;
- struct dentry *old_parent, *new_parent = NULL;
- struct dentry *old_dentry = NULL, *new_dentry = NULL;
int error;

- mutex_lock(&sysfs_rename_mutex);
BUG_ON(!sd->s_parent);
+
+ mutex_lock(&sysfs_mutex);
new_parent_sd = (new_parent_kobj && new_parent_kobj->sd) ?
new_parent_kobj->sd : &sysfs_root;

@@ -861,61 +808,20 @@ int sysfs_move_dir(struct kobject *kobj, struct kobject *new_parent_kobj)
if (sd->s_parent == new_parent_sd)
goto out; /* nothing to move */

- /* get dentries */
- old_dentry = sysfs_get_dentry(sd);
- if (IS_ERR(old_dentry)) {
- error = PTR_ERR(old_dentry);
- old_dentry = NULL;
- goto out;
- }
- old_parent = old_dentry->d_parent;
-
- new_parent = sysfs_get_dentry(new_parent_sd);
- if (IS_ERR(new_parent)) {
- error = PTR_ERR(new_parent);
- new_parent = NULL;
- goto out;
- }
-
-again:
- mutex_lock(&old_parent->d_inode->i_mutex);
- if (!mutex_trylock(&new_parent->d_inode->i_mutex)) {
- mutex_unlock(&old_parent->d_inode->i_mutex);
- goto again;
- }
- mutex_lock(&sysfs_mutex);
-
error = -EEXIST;
if (sysfs_find_dirent(new_parent_sd, sd->s_name))
- goto out_unlock;
-
- error = -ENOMEM;
- new_dentry = d_alloc_name(new_parent, sd->s_name);
- if (!new_dentry)
- goto out_unlock;
-
- error = 0;
- d_add(new_dentry, NULL);
- d_move(old_dentry, new_dentry);
+ goto out;

/* Remove from old parent's list and insert into new parent's list. */
sysfs_unlink_sibling(sd);
sysfs_get(new_parent_sd);
- drop_nlink(old_parent->d_inode);
sysfs_put(sd->s_parent);
sd->s_parent = new_parent_sd;
- inc_nlink(new_parent->d_inode);
sysfs_link_sibling(sd);

- out_unlock:
+ error = 0;
+out:
mutex_unlock(&sysfs_mutex);
- mutex_unlock(&new_parent->d_inode->i_mutex);
- mutex_unlock(&old_parent->d_inode->i_mutex);
- out:
- dput(new_parent);
- dput(old_dentry);
- dput(new_dentry);
- mutex_unlock(&sysfs_rename_mutex);
return error;
}

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index 2dcafe8..082a3eb 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -205,17 +205,6 @@ static inline void set_inode_attr(struct inode * inode, struct iattr * iattr)
inode->i_ctime = iattr->ia_ctime;
}

-
-/*
- * sysfs has a different i_mutex lock order behavior for i_mutex than other
- * filesystems; sysfs i_mutex is called in many places with subsystem locks
- * held. At the same time, many of the VFS locking rules do not apply to
- * sysfs at all (cross directory rename for example). To untangle this mess
- * (which gives false positives in lockdep), we're giving sysfs inodes their
- * own class for i_mutex.
- */
-static struct lock_class_key sysfs_inode_imutex_key;
-
static int sysfs_count_nlink(struct sysfs_dirent *sd)
{
struct sysfs_dirent *child;
@@ -268,7 +257,6 @@ static void sysfs_init_inode(struct sysfs_dirent *sd, struct inode *inode)
inode->i_mapping->a_ops = &sysfs_aops;
inode->i_mapping->backing_dev_info = &sysfs_backing_dev_info;
inode->i_op = &sysfs_inode_operations;
- lockdep_set_class(&inode->i_mutex, &sysfs_inode_imutex_key);

set_default_inode_attr(inode, sd->s_mode);
sysfs_refresh_inode(sd, inode);
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index 90b3501..98a15bf 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -103,7 +103,6 @@ extern struct kmem_cache *sysfs_dir_cachep;
* dir.c
*/
extern struct mutex sysfs_mutex;
-extern struct mutex sysfs_rename_mutex;
extern spinlock_t sysfs_assoc_lock;

extern const struct file_operations sysfs_dir_operations;
diff --git a/include/linux/namei.h b/include/linux/namei.h
index ec0f607..0289467 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -76,7 +76,6 @@ extern struct file *nameidata_to_filp(struct nameidata *nd, int flags);
extern void release_open_intent(struct nameidata *);

extern struct dentry *lookup_one_len(const char *, struct dentry *, int);
-extern struct dentry *lookup_one_noperm(const char *, struct dentry *);

extern int follow_down(struct path *);
extern int follow_up(struct path *);
--
1.6.5.2.143.g8cc62

2009-11-03 11:57:46

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 13/13] sysfs: Factor out sysfs_rename from sysfs_rename_dir and sysfs_move_dir

From: Eric W. Biederman <[email protected]>

These two functions do 90% of the same work and it doesn't significantly
obfuscate the function to allow both the parent dir and the name to change
at the same time. So merge them together to simplify maintenance, and
increase testing.

Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/dir.c | 62 +++++++++++++++++++++++++----------------------------
fs/sysfs/sysfs.h | 3 ++
2 files changed, 32 insertions(+), 33 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 0b60212..e1a86d1 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -760,30 +760,42 @@ void sysfs_remove_dir(struct kobject * kobj)
__sysfs_remove_dir(sd);
}

-int sysfs_rename_dir(struct kobject * kobj, const char *new_name)
+int sysfs_rename(struct sysfs_dirent *sd,
+ struct sysfs_dirent *new_parent_sd, const char *new_name)
{
- struct sysfs_dirent *sd = kobj->sd;
const char *dup_name = NULL;
int error;

mutex_lock(&sysfs_mutex);

error = 0;
- if (strcmp(sd->s_name, new_name) == 0)
+ if ((sd->s_parent == new_parent_sd) &&
+ (strcmp(sd->s_name, new_name) == 0))
goto out; /* nothing to rename */

error = -EEXIST;
- if (sysfs_find_dirent(sd->s_parent, new_name))
+ if (sysfs_find_dirent(new_parent_sd, new_name))
goto out;

/* rename sysfs_dirent */
- error = -ENOMEM;
- new_name = dup_name = kstrdup(new_name, GFP_KERNEL);
- if (!new_name)
- goto out;
+ if (strcmp(sd->s_name, new_name) != 0) {
+ error = -ENOMEM;
+ new_name = dup_name = kstrdup(new_name, GFP_KERNEL);
+ if (!new_name)
+ goto out;
+
+ dup_name = sd->s_name;
+ sd->s_name = new_name;
+ }

- dup_name = sd->s_name;
- sd->s_name = new_name;
+ /* Remove from old parent's list and insert into new parent's list. */
+ if (sd->s_parent != new_parent_sd) {
+ sysfs_unlink_sibling(sd);
+ sysfs_get(new_parent_sd);
+ sysfs_put(sd->s_parent);
+ sd->s_parent = new_parent_sd;
+ sysfs_link_sibling(sd);
+ }

error = 0;
out:
@@ -792,37 +804,21 @@ int sysfs_rename_dir(struct kobject * kobj, const char *new_name)
return error;
}

+int sysfs_rename_dir(struct kobject * kobj, const char *new_name)
+{
+ return sysfs_rename(kobj->sd, kobj->sd->s_parent, new_name);
+}
+
int sysfs_move_dir(struct kobject *kobj, struct kobject *new_parent_kobj)
{
struct sysfs_dirent *sd = kobj->sd;
struct sysfs_dirent *new_parent_sd;
- int error;

BUG_ON(!sd->s_parent);
-
- mutex_lock(&sysfs_mutex);
- new_parent_sd = (new_parent_kobj && new_parent_kobj->sd) ?
+ new_parent_sd = new_parent_kobj && new_parent_kobj->sd ?
new_parent_kobj->sd : &sysfs_root;

- error = 0;
- if (sd->s_parent == new_parent_sd)
- goto out; /* nothing to move */
-
- error = -EEXIST;
- if (sysfs_find_dirent(new_parent_sd, sd->s_name))
- goto out;
-
- /* Remove from old parent's list and insert into new parent's list. */
- sysfs_unlink_sibling(sd);
- sysfs_get(new_parent_sd);
- sysfs_put(sd->s_parent);
- sd->s_parent = new_parent_sd;
- sysfs_link_sibling(sd);
-
- error = 0;
-out:
- mutex_unlock(&sysfs_mutex);
- return error;
+ return sysfs_rename(sd, new_parent_sd, sd->s_name);
}

/* Relationship between s_mode and the DT_xxx types */
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index 98a15bf..ca52e7b 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -130,6 +130,9 @@ int sysfs_create_subdir(struct kobject *kobj, const char *name,
struct sysfs_dirent **p_sd);
void sysfs_remove_subdir(struct sysfs_dirent *sd);

+int sysfs_rename(struct sysfs_dirent *sd,
+ struct sysfs_dirent *new_parent_sd, const char *new_name);
+
static inline struct sysfs_dirent *__sysfs_get(struct sysfs_dirent *sd)
{
if (sd) {
--
1.6.5.2.143.g8cc62

2009-11-03 13:48:43

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 01/13] sysfs: Update sysfs_setxattr so it updates secdata under the sysfs_mutex

Quoting Eric W. Biederman ([email protected]):
> From: Eric W. Biederman <[email protected]>
>
> The sysfs_mutex is required to ensure updates are and will remain
> atomic with respect to other inode iattr updates, that do not happen
> through the filesystem.
>
> Signed-off-by: Eric W. Biederman <[email protected]>
> ---
> fs/sysfs/inode.c | 41 +++++++++++++++++++++++++++++------------
> 1 files changed, 29 insertions(+), 12 deletions(-)
>
> diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
> index e28cecf..8a08250 100644
> --- a/fs/sysfs/inode.c
> +++ b/fs/sysfs/inode.c
> @@ -122,23 +122,39 @@ int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
> return error;
> }
>
> +static int sysfs_sd_setsecdata(struct sysfs_dirent *sd, void **secdata, u32 *secdata_len)
> +{
> + struct sysfs_inode_attrs *iattrs;
> + void *old_secdata;
> + size_t old_secdata_len;
> +
> + iattrs = sd->s_iattr;
> + if (!iattrs)
> + iattrs = sysfs_init_inode_attrs(sd);
> + if (!iattrs)
> + return -ENOMEM;
> +
> + old_secdata = iattrs->ia_secdata;
> + old_secdata_len = iattrs->ia_secdata_len;
> +
> + iattrs->ia_secdata = *secdata;
> + iattrs->ia_secdata_len = *secdata_len;
> +
> + *secdata = old_secdata;
> + *secdata_len = old_secdata_len;
> + return 0;
> +}
> +
> int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
> size_t size, int flags)
> {
> struct sysfs_dirent *sd = dentry->d_fsdata;
> - struct sysfs_inode_attrs *iattrs;
> void *secdata;
> int error;
> u32 secdata_len = 0;
>
> if (!sd)
> return -EINVAL;
> - if (!sd->s_iattr)
> - sd->s_iattr = sysfs_init_inode_attrs(sd);
> - if (!sd->s_iattr)
> - return -ENOMEM;
> -
> - iattrs = sd->s_iattr;
>
> if (!strncmp(name, XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN)) {
> const char *suffix = name + XATTR_SECURITY_PREFIX_LEN;
> @@ -150,12 +166,13 @@ int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
> &secdata, &secdata_len);
> if (error)
> goto out;
> - if (iattrs->ia_secdata)
> - security_release_secctx(iattrs->ia_secdata,
> - iattrs->ia_secdata_len);
> - iattrs->ia_secdata = secdata;
> - iattrs->ia_secdata_len = secdata_len;
>
> + mutex_lock(&sysfs_mutex);
> + error = sysfs_sd_setsecdata(sd, &secdata, &secdata_len);

You're ignoring the potential -ENOMEM return value here?

Worse, if -ENOMEM, then secdata was never set, so you call
security_release_secctx() on a random value left on the stack...

> + mutex_unlock(&sysfs_mutex);
> +
> + if (secdata)
> + security_release_secctx(secdata, secdata_len);
> } else
> return -EINVAL;
> out:
> --
> 1.6.5.2.143.g8cc62
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-11-03 21:09:15

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 01/13] sysfs: Update sysfs_setxattr so it updates secdata under the sysfs_mutex

"Serge E. Hallyn" <[email protected]> writes:

> Quoting Eric W. Biederman ([email protected]):
>> From: Eric W. Biederman <[email protected]>
>>
>> The sysfs_mutex is required to ensure updates are and will remain
>> atomic with respect to other inode iattr updates, that do not happen
>> through the filesystem.
>>
>> Signed-off-by: Eric W. Biederman <[email protected]>
>> ---
>> fs/sysfs/inode.c | 41 +++++++++++++++++++++++++++++------------
>> 1 files changed, 29 insertions(+), 12 deletions(-)
>>
>> diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
>> index e28cecf..8a08250 100644
>> --- a/fs/sysfs/inode.c
>> +++ b/fs/sysfs/inode.c
>> @@ -122,23 +122,39 @@ int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
>> return error;
>> }
>>
>> +static int sysfs_sd_setsecdata(struct sysfs_dirent *sd, void **secdata, u32 *secdata_len)
>> +{
>> + struct sysfs_inode_attrs *iattrs;
>> + void *old_secdata;
>> + size_t old_secdata_len;
>> +
>> + iattrs = sd->s_iattr;
>> + if (!iattrs)
>> + iattrs = sysfs_init_inode_attrs(sd);
>> + if (!iattrs)
>> + return -ENOMEM;
>> +
>> + old_secdata = iattrs->ia_secdata;
>> + old_secdata_len = iattrs->ia_secdata_len;
>> +
>> + iattrs->ia_secdata = *secdata;
>> + iattrs->ia_secdata_len = *secdata_len;
>> +
>> + *secdata = old_secdata;
>> + *secdata_len = old_secdata_len;
>> + return 0;
>> +}
>> +
>> int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
>> size_t size, int flags)
>> {
>> struct sysfs_dirent *sd = dentry->d_fsdata;
>> - struct sysfs_inode_attrs *iattrs;
>> void *secdata;
>> int error;
>> u32 secdata_len = 0;
>>
>> if (!sd)
>> return -EINVAL;
>> - if (!sd->s_iattr)
>> - sd->s_iattr = sysfs_init_inode_attrs(sd);
>> - if (!sd->s_iattr)
>> - return -ENOMEM;
>> -
>> - iattrs = sd->s_iattr;
>>
>> if (!strncmp(name, XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN)) {
>> const char *suffix = name + XATTR_SECURITY_PREFIX_LEN;
>> @@ -150,12 +166,13 @@ int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
>> &secdata, &secdata_len);
>> if (error)
>> goto out;
>> - if (iattrs->ia_secdata)
>> - security_release_secctx(iattrs->ia_secdata,
>> - iattrs->ia_secdata_len);
>> - iattrs->ia_secdata = secdata;
>> - iattrs->ia_secdata_len = secdata_len;
>>
>> + mutex_lock(&sysfs_mutex);
>> + error = sysfs_sd_setsecdata(sd, &secdata, &secdata_len);
>
> You're ignoring the potential -ENOMEM return value here?
>
> Worse, if -ENOMEM, then secdata was never set, so you call
> security_release_secctx() on a random value left on the stack...

No. It is more elegant than that.
If sysfs_sd_setsecdata fails secdata holds the freshly allocated secdata value.
If sysfs_sd_setsecdata succeeds secdata holds the old value of secdata.

Eric

2009-11-03 21:31:39

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 01/13] sysfs: Update sysfs_setxattr so it updates secdata under the sysfs_mutex

Quoting Eric W. Biederman ([email protected]):
> "Serge E. Hallyn" <[email protected]> writes:
>
> > Quoting Eric W. Biederman ([email protected]):
> >> From: Eric W. Biederman <[email protected]>
> >>
> >> The sysfs_mutex is required to ensure updates are and will remain
> >> atomic with respect to other inode iattr updates, that do not happen
> >> through the filesystem.
> >>
> >> Signed-off-by: Eric W. Biederman <[email protected]>
> >> ---
> >> fs/sysfs/inode.c | 41 +++++++++++++++++++++++++++++------------
> >> 1 files changed, 29 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
> >> index e28cecf..8a08250 100644
> >> --- a/fs/sysfs/inode.c
> >> +++ b/fs/sysfs/inode.c
> >> @@ -122,23 +122,39 @@ int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
> >> return error;
> >> }
> >>
> >> +static int sysfs_sd_setsecdata(struct sysfs_dirent *sd, void **secdata, u32 *secdata_len)
> >> +{
> >> + struct sysfs_inode_attrs *iattrs;
> >> + void *old_secdata;
> >> + size_t old_secdata_len;
> >> +
> >> + iattrs = sd->s_iattr;
> >> + if (!iattrs)
> >> + iattrs = sysfs_init_inode_attrs(sd);
> >> + if (!iattrs)
> >> + return -ENOMEM;
> >> +
> >> + old_secdata = iattrs->ia_secdata;
> >> + old_secdata_len = iattrs->ia_secdata_len;
> >> +
> >> + iattrs->ia_secdata = *secdata;
> >> + iattrs->ia_secdata_len = *secdata_len;
> >> +
> >> + *secdata = old_secdata;
> >> + *secdata_len = old_secdata_len;
> >> + return 0;
> >> +}
> >> +
> >> int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
> >> size_t size, int flags)
> >> {
> >> struct sysfs_dirent *sd = dentry->d_fsdata;
> >> - struct sysfs_inode_attrs *iattrs;
> >> void *secdata;
> >> int error;
> >> u32 secdata_len = 0;
> >>
> >> if (!sd)
> >> return -EINVAL;
> >> - if (!sd->s_iattr)
> >> - sd->s_iattr = sysfs_init_inode_attrs(sd);
> >> - if (!sd->s_iattr)
> >> - return -ENOMEM;
> >> -
> >> - iattrs = sd->s_iattr;
> >>
> >> if (!strncmp(name, XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN)) {
> >> const char *suffix = name + XATTR_SECURITY_PREFIX_LEN;
> >> @@ -150,12 +166,13 @@ int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
> >> &secdata, &secdata_len);
> >> if (error)
> >> goto out;
> >> - if (iattrs->ia_secdata)
> >> - security_release_secctx(iattrs->ia_secdata,
> >> - iattrs->ia_secdata_len);
> >> - iattrs->ia_secdata = secdata;
> >> - iattrs->ia_secdata_len = secdata_len;
> >>
> >> + mutex_lock(&sysfs_mutex);
> >> + error = sysfs_sd_setsecdata(sd, &secdata, &secdata_len);
> >
> > You're ignoring the potential -ENOMEM return value here?
> >
> > Worse, if -ENOMEM, then secdata was never set, so you call
> > security_release_secctx() on a random value left on the stack...
>
> No. It is more elegant than that.
> If sysfs_sd_setsecdata fails secdata holds the freshly allocated secdata value.
> If sysfs_sd_setsecdata succeeds secdata holds the old value of secdata.

Gah, sorry.

Acked-by: Serge Hallyn <[email protected]>

(as an aside, I can't see any reason to not just return if strncmp(name,
XATTR_SECURITY_PREFIX,) above to avoid a level of indent and needless gotos)

thanks,
-serge

2009-11-03 22:14:03

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 01/13] sysfs: Update sysfs_setxattr so it updates secdata under the sysfs_mutex

"Serge E. Hallyn" <[email protected]> writes:
>
> Gah, sorry.

No prob.

> Acked-by: Serge Hallyn <[email protected]>
>
> (as an aside, I can't see any reason to not just return if strncmp(name,
> XATTR_SECURITY_PREFIX,) above to avoid a level of indent and needless gotos)

Sounds like it is worth a patch. For now I'm concentrating on getting
this patchset merged, and extraneous cleanups have a way of dragging
the process out beyond what time I have to give.

Eric

2009-11-03 22:27:40

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 03/13] sysfs: Use dentry_ops instead of directly playing with the dcache

Quoting Eric W. Biederman ([email protected]):
> From: Eric W. Biederman <[email protected]>
>
> Calling d_drop unconditionally when a sysfs_dirent is deleted has
> the potential to leak mounts, so instead implement dentry delete
> and revalidate operations that cause sysfs dentries to be removed
> at the appropriate time.
>
> Acked-by: Tejun Heo <[email protected]>
> Signed-off-by: Eric W. Biederman <[email protected]>

I like where it's heading (and spent some time staring at the
related code, looks kosher).

Acked-by: Serge Hallyn <[email protected]>

> ---
> fs/sysfs/dir.c | 73 +++++++++++++++++++++++++++++++++++--------------------
> 1 files changed, 46 insertions(+), 27 deletions(-)
>
> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> index 130dfc3..b5e8499 100644
> --- a/fs/sysfs/dir.c
> +++ b/fs/sysfs/dir.c
> @@ -298,6 +298,46 @@ void release_sysfs_dirent(struct sysfs_dirent * sd)
> goto repeat;
> }
>
> +static int sysfs_dentry_delete(struct dentry *dentry)
> +{
> + struct sysfs_dirent *sd = dentry->d_fsdata;
> + return !!(sd->s_flags & SYSFS_FLAG_REMOVED);
> +}
> +
> +static int sysfs_dentry_revalidate(struct dentry *dentry, struct nameidata *nd)
> +{
> + struct sysfs_dirent *sd = dentry->d_fsdata;
> + int is_dir;
> +
> + mutex_lock(&sysfs_mutex);
> +
> + /* The sysfs dirent has been deleted */
> + if (sd->s_flags & SYSFS_FLAG_REMOVED)
> + goto out_bad;
> +
> + mutex_unlock(&sysfs_mutex);
> +out_valid:
> + return 1;
> +out_bad:
> + /* Remove the dentry from the dcache hashes.
> + * If this is a deleted dentry we use d_drop instead of d_delete
> + * so sysfs doesn't need to cope with negative dentries.
> + */
> + is_dir = (sysfs_type(sd) == SYSFS_DIR);
> + mutex_unlock(&sysfs_mutex);
> + if (is_dir) {
> + /* If we have submounts we must allow the vfs caches
> + * to lie about the state of the filesystem to prevent
> + * leaks and other nasty things.
> + */
> + if (have_submounts(dentry))
> + goto out_valid;
> + shrink_dcache_parent(dentry);
> + }
> + d_drop(dentry);
> + return 0;
> +}
> +
> static void sysfs_dentry_iput(struct dentry * dentry, struct inode * inode)
> {
> struct sysfs_dirent * sd = dentry->d_fsdata;
> @@ -307,6 +347,8 @@ static void sysfs_dentry_iput(struct dentry * dentry, struct inode * inode)
> }
>
> static const struct dentry_operations sysfs_dentry_ops = {
> + .d_revalidate = sysfs_dentry_revalidate,
> + .d_delete = sysfs_dentry_delete,
> .d_iput = sysfs_dentry_iput,
> };
>
> @@ -527,44 +569,21 @@ void sysfs_remove_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
> }
>
> /**
> - * sysfs_drop_dentry - drop dentry for the specified sysfs_dirent
> + * sysfs_dec_nlink - Decrement link count for the specified sysfs_dirent
> * @sd: target sysfs_dirent
> *
> - * Drop dentry for @sd. @sd must have been unlinked from its
> + * Decrement nlink for @sd. @sd must have been unlinked from its
> * parent on entry to this function such that it can't be looked
> * up anymore.
> */
> -static void sysfs_drop_dentry(struct sysfs_dirent *sd)
> +static void sysfs_dec_nlink(struct sysfs_dirent *sd)
> {
> struct inode *inode;
> - struct dentry *dentry;
>
> inode = ilookup(sysfs_sb, sd->s_ino);
> if (!inode)
> return;
>
> - /* Drop any existing dentries associated with sd.
> - *
> - * For the dentry to be properly freed we need to grab a
> - * reference to the dentry under the dcache lock, unhash it,
> - * and then put it. The playing with the dentry count allows
> - * dput to immediately free the dentry if it is not in use.
> - */
> -repeat:
> - spin_lock(&dcache_lock);
> - list_for_each_entry(dentry, &inode->i_dentry, d_alias) {
> - if (d_unhashed(dentry))
> - continue;
> - dget_locked(dentry);
> - spin_lock(&dentry->d_lock);
> - __d_drop(dentry);
> - spin_unlock(&dentry->d_lock);
> - spin_unlock(&dcache_lock);
> - dput(dentry);
> - goto repeat;
> - }
> - spin_unlock(&dcache_lock);
> -
> /* adjust nlink and update timestamp */
> mutex_lock(&inode->i_mutex);
>
> @@ -611,7 +630,7 @@ void sysfs_addrm_finish(struct sysfs_addrm_cxt *acxt)
> acxt->removed = sd->s_sibling;
> sd->s_sibling = NULL;
>
> - sysfs_drop_dentry(sd);
> + sysfs_dec_nlink(sd);
> sysfs_deactivate(sd);
> unmap_bin_file(sd);
> sysfs_put(sd);
> --
> 1.6.5.2.143.g8cc62
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-11-04 02:43:33

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 04/13] sysfs: Simplify sysfs_chmod_file semantics

Quoting Eric W. Biederman ([email protected]):
> From: Eric W. Biederman <[email protected]>
>
> Currently every caller of sysfs_chmod_file happens at either
> file creation time to set a non-default mode or in response
> to a specific user requested space change in policy. Making
> timestamps of when the chmod happens and notification of
> a file changing mode uninteresting.

But these changes can occur by togging values in sysfs files
(i.e. f71805f.c), right? Is this (specifically not doing inotify)
definately uncontroversial?

I can't exactly picture an admin sitting there watching
nautilus for a sysfs file to become writeable, but could
imagine some site's automation getting hung... Or am I way
off base?

> Remove the unnecessary time stamp and filesystem change
> notification, and removes the last of the explicit inotify
> and donitfy support from sysfs.
>
> Acked-by: Tejun Heo <[email protected]>
> Signed-off-by: Eric W. Biederman <[email protected]>
> ---
> fs/sysfs/file.c | 10 +---------
> 1 files changed, 1 insertions(+), 9 deletions(-)
>
> diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
> index f5ea468..faa1a80 100644
> --- a/fs/sysfs/file.c
> +++ b/fs/sysfs/file.c
> @@ -604,17 +604,9 @@ int sysfs_chmod_file(struct kobject *kobj, struct attribute *attr, mode_t mode)
> mutex_lock(&inode->i_mutex);
>
> newattrs.ia_mode = (mode & S_IALLUGO) | (inode->i_mode & ~S_IALLUGO);
> - newattrs.ia_valid = ATTR_MODE | ATTR_CTIME;
> - newattrs.ia_ctime = current_fs_time(inode->i_sb);
> + newattrs.ia_valid = ATTR_MODE;
> rc = sysfs_setattr(victim, &newattrs);
>
> - if (rc == 0) {
> - fsnotify_change(victim, newattrs.ia_valid);
> - mutex_lock(&sysfs_mutex);
> - victim_sd->s_mode = newattrs.ia_mode;
> - mutex_unlock(&sysfs_mutex);
> - }
> -
> mutex_unlock(&inode->i_mutex);
> out:
> dput(victim);
> --
> 1.6.5.2.143.g8cc62
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-11-04 02:57:17

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 05/13] sysfs: Simplify iattr time assignments

Quoting Eric W. Biederman ([email protected]):
> From: Eric W. Biederman <[email protected]>
>
> The granularity of sysfs time when we keep it is 1 ns. Which
> when passed to timestamp_trunc results in a nop. So remove
> the unnecessary function call making sysfs_setattr slightly
> easier to read.
>
> Acked-by: Tejun Heo <[email protected]>
> Signed-off-by: Eric W. Biederman <[email protected]>

Acked-by: Serge Hallyn <[email protected]>

> ---
> fs/sysfs/inode.c | 9 +++------
> 1 files changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
> index 8a08250..fed7a74 100644
> --- a/fs/sysfs/inode.c
> +++ b/fs/sysfs/inode.c
> @@ -103,14 +103,11 @@ int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
> if (ia_valid & ATTR_GID)
> iattrs->ia_gid = iattr->ia_gid;
> if (ia_valid & ATTR_ATIME)
> - iattrs->ia_atime = timespec_trunc(iattr->ia_atime,
> - inode->i_sb->s_time_gran);
> + iattrs->ia_atime = iattr->ia_atime;
> if (ia_valid & ATTR_MTIME)
> - iattrs->ia_mtime = timespec_trunc(iattr->ia_mtime,
> - inode->i_sb->s_time_gran);
> + iattrs->ia_mtime = iattr->ia_mtime;
> if (ia_valid & ATTR_CTIME)
> - iattrs->ia_ctime = timespec_trunc(iattr->ia_ctime,
> - inode->i_sb->s_time_gran);
> + iattrs->ia_ctime = iattr->ia_ctime;
> if (ia_valid & ATTR_MODE) {
> umode_t mode = iattr->ia_mode;
>
> --
> 1.6.5.2.143.g8cc62
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-11-04 03:16:32

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 06/13] sysfs: Fix locking and factor out sysfs_sd_setattr

Quoting Eric W. Biederman ([email protected]):
> From: Eric W. Biederman <[email protected]>
>
> Cleanly separate the work that is specific to setting the
> attributes of a sysfs_dirent from what is needed to update
> the attributes of a vfs inode.
>
> Additionally grab the sysfs_mutex to keep any nasties from
> surprising us when updating the sysfs_dirent.
>
> Acked-by: Tejun Heo <[email protected]>
> Signed-off-by: Eric W. Biederman <[email protected]>
> ---
> fs/sysfs/inode.c | 52 ++++++++++++++++++++++++++++++++--------------------
> fs/sysfs/sysfs.h | 1 +
> 2 files changed, 33 insertions(+), 20 deletions(-)
>
> diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
> index fed7a74..fccfb55 100644
> --- a/fs/sysfs/inode.c
> +++ b/fs/sysfs/inode.c
> @@ -64,30 +64,15 @@ struct sysfs_inode_attrs *sysfs_init_inode_attrs(struct sysfs_dirent *sd)
>
> return attrs;
> }
> -int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
> +
> +int sysfs_sd_setattr(struct sysfs_dirent *sd, struct iattr * iattr)
> {
> - struct inode * inode = dentry->d_inode;
> - struct sysfs_dirent * sd = dentry->d_fsdata;
> struct sysfs_inode_attrs *sd_attrs;
> struct iattr *iattrs;
> unsigned int ia_valid = iattr->ia_valid;
> - int error;
> -
> - if (!sd)
> - return -EINVAL;
>
> sd_attrs = sd->s_iattr;
>
> - error = inode_change_ok(inode, iattr);
> - if (error)
> - return error;
> -
> - iattr->ia_valid &= ~ATTR_SIZE; /* ignore size changes */
> -
> - error = inode_setattr(inode, iattr);
> - if (error)
> - return error;
> -
> if (!sd_attrs) {
> /* setting attributes for the first time, allocate now */
> sd_attrs = sysfs_init_inode_attrs(sd);
> @@ -110,12 +95,39 @@ int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
> iattrs->ia_ctime = iattr->ia_ctime;
> if (ia_valid & ATTR_MODE) {
> umode_t mode = iattr->ia_mode;
> -
> - if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
> - mode &= ~S_ISGID;
> iattrs->ia_mode = sd->s_mode = mode;
> }
> }
> + return 0;
> +}
> +
> +int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
> +{
> + struct inode * inode = dentry->d_inode;
> + struct sysfs_dirent * sd = dentry->d_fsdata;
> + int error;
> +
> + if (!sd)
> + return -EINVAL;
> +
> + error = inode_change_ok(inode, iattr);
> + if (error)
> + return error;
> +
> + iattr->ia_valid &= ~ATTR_SIZE; /* ignore size changes */
> + if (iattr->ia_valid & ATTR_MODE) {
> + if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
> + iattr->ia_mode &= ~S_ISGID;
> + }

Was it a bug that before this patch this wasn't cleared before the
actual inode_setattr()?

Since the S_ISGID will be set for the *new* gid, that is,
iattr->ia_gid, shouldn't the user be required to be
in_group_p(iattr->i_gid)? Note you haven't done the
inode_setattr() yet.

> +
> + error = inode_setattr(inode, iattr);
> + if (error)
> + return error;
> +
> + mutex_lock(&sysfs_mutex);
> + error = sysfs_sd_setattr(sd, iattr);
> + mutex_unlock(&sysfs_mutex);
> +
> return error;
> }
>
> diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
> index af4c4e7..a96d967 100644
> --- a/fs/sysfs/sysfs.h
> +++ b/fs/sysfs/sysfs.h
> @@ -155,6 +155,7 @@ static inline void __sysfs_put(struct sysfs_dirent *sd)
> */
> struct inode *sysfs_get_inode(struct sysfs_dirent *sd);
> void sysfs_delete_inode(struct inode *inode);
> +int sysfs_sd_setattr(struct sysfs_dirent *sd, struct iattr *iattr);
> int sysfs_setattr(struct dentry *dentry, struct iattr *iattr);
> int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
> size_t size, int flags);
> --
> 1.6.5.2.143.g8cc62
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-11-04 03:42:50

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 04/13] sysfs: Simplify sysfs_chmod_file semantics

"Serge E. Hallyn" <[email protected]> writes:

> Quoting Eric W. Biederman ([email protected]):
>> From: Eric W. Biederman <[email protected]>
>>
>> Currently every caller of sysfs_chmod_file happens at either
>> file creation time to set a non-default mode or in response
>> to a specific user requested space change in policy. Making
>> timestamps of when the chmod happens and notification of
>> a file changing mode uninteresting.
>
> But these changes can occur by togging values in sysfs files
> (i.e. f71805f.c), right? Is this (specifically not doing inotify)
> definately uncontroversial?

The fs_notify_change was not introduced to deliberately support
a feature but as a side effect of other cleanups. So there
is no indication that anyone cares about inotify support.

> I can't exactly picture an admin sitting there watching
> nautilus for a sysfs file to become writeable, but could
> imagine some site's automation getting hung... Or am I way
> off base?

I would be stunned if the shell script in the automation that writes
to a sysfs file to make things writeable doesn't on it's next line
kick off whatever needs it to be writable.

With no benefit to using inotify and with only a handful of sysfs
files affected I don't expect this change to break anything in
userspace and I have been happily running with it for a year or so on
all of our machines at work with no one problems.

The reason I am making the change is that the goal of this patchset is
to get sysfs to act like any other distributed filesystem in linux,
and to use the same interfaces in roughly the same ways as other
distributed filesystems. Unfortunately there is not a good interface
for distributed filesystems to support inotify or I would use it.

Eric

2009-11-04 03:49:59

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 06/13] sysfs: Fix locking and factor out sysfs_sd_setattr

"Serge E. Hallyn" <[email protected]> writes:

> Quoting Eric W. Biederman ([email protected]):
>> From: Eric W. Biederman <[email protected]>
>>
>> Cleanly separate the work that is specific to setting the
>> attributes of a sysfs_dirent from what is needed to update
>> the attributes of a vfs inode.
>>
>> Additionally grab the sysfs_mutex to keep any nasties from
>> surprising us when updating the sysfs_dirent.
>>
>> Acked-by: Tejun Heo <[email protected]>
>> Signed-off-by: Eric W. Biederman <[email protected]>
>> ---
>> fs/sysfs/inode.c | 52 ++++++++++++++++++++++++++++++++--------------------
>> fs/sysfs/sysfs.h | 1 +
>> 2 files changed, 33 insertions(+), 20 deletions(-)
>>
>> diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
>> index fed7a74..fccfb55 100644
>> --- a/fs/sysfs/inode.c
>> +++ b/fs/sysfs/inode.c
>> @@ -64,30 +64,15 @@ struct sysfs_inode_attrs *sysfs_init_inode_attrs(struct sysfs_dirent *sd)
>>
>> return attrs;
>> }
>> -int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
>> +
>> +int sysfs_sd_setattr(struct sysfs_dirent *sd, struct iattr * iattr)
>> {
>> - struct inode * inode = dentry->d_inode;
>> - struct sysfs_dirent * sd = dentry->d_fsdata;
>> struct sysfs_inode_attrs *sd_attrs;
>> struct iattr *iattrs;
>> unsigned int ia_valid = iattr->ia_valid;
>> - int error;
>> -
>> - if (!sd)
>> - return -EINVAL;
>>
>> sd_attrs = sd->s_iattr;
>>
>> - error = inode_change_ok(inode, iattr);
>> - if (error)
>> - return error;
>> -
>> - iattr->ia_valid &= ~ATTR_SIZE; /* ignore size changes */
>> -
>> - error = inode_setattr(inode, iattr);
>> - if (error)
>> - return error;
>> -
>> if (!sd_attrs) {
>> /* setting attributes for the first time, allocate now */
>> sd_attrs = sysfs_init_inode_attrs(sd);
>> @@ -110,12 +95,39 @@ int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
>> iattrs->ia_ctime = iattr->ia_ctime;
>> if (ia_valid & ATTR_MODE) {
>> umode_t mode = iattr->ia_mode;
>> -
>> - if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
>> - mode &= ~S_ISGID;
>> iattrs->ia_mode = sd->s_mode = mode;
>> }
>> }
>> + return 0;
>> +}
>> +
>> +int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
>> +{
>> + struct inode * inode = dentry->d_inode;
>> + struct sysfs_dirent * sd = dentry->d_fsdata;
>> + int error;
>> +
>> + if (!sd)
>> + return -EINVAL;
>> +
>> + error = inode_change_ok(inode, iattr);
>> + if (error)
>> + return error;
>> +
>> + iattr->ia_valid &= ~ATTR_SIZE; /* ignore size changes */
>> + if (iattr->ia_valid & ATTR_MODE) {
>> + if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
>> + iattr->ia_mode &= ~S_ISGID;
>> + }
>
> Was it a bug that before this patch this wasn't cleared before the
> actual inode_setattr()?

Not strictly as inode_setattr performs the exact same check.

> Since the S_ISGID will be set for the *new* gid, that is,
> iattr->ia_gid, shouldn't the user be required to be
> in_group_p(iattr->i_gid)? Note you haven't done the
> inode_setattr() yet.

Interesting point, yes I am potentially testing the wrong gid
there. The dangers of moving code around.

Thank you for catching that. !#@#!#

Eric

2009-11-04 03:54:12

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 07/13] sysfs: Update s_iattr on link and unlink.

Quoting Eric W. Biederman ([email protected]):
> From: Eric W. Biederman <[email protected]>
>
> Currently sysfs updates the timestamps on the vfs directory
> inode when we create or remove a directory entry but doesn't
> update the cached copy on the sysfs_dirent, fix that oversight.

confused... why not do this in sysfs_addrm_finish()?

I guess you'd have to do at it at top before dropping sysfs_mutex
so it wouldn't be as pretty as I was thinking, but at least you
could just do it once.

>
> Acked-by: Tejun Heo <[email protected]>
> Signed-off-by: Eric W. Biederman <[email protected]>
> ---
> fs/sysfs/dir.c | 18 ++++++++++++++++++
> 1 files changed, 18 insertions(+), 0 deletions(-)
>
> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> index b5e8499..fa37126 100644
> --- a/fs/sysfs/dir.c
> +++ b/fs/sysfs/dir.c
> @@ -464,6 +464,8 @@ void sysfs_addrm_start(struct sysfs_addrm_cxt *acxt,
> */
> int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
> {
> + struct sysfs_inode_attrs *ps_iattr;
> +
> if (sysfs_find_dirent(acxt->parent_sd, sd->s_name))
> return -EEXIST;
>
> @@ -476,6 +478,13 @@ int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
>
> sysfs_link_sibling(sd);
>
> + /* Update timestamps on the parent */
> + ps_iattr = acxt->parent_sd->s_iattr;
> + if (ps_iattr) {
> + struct iattr *ps_iattrs = &ps_iattr->ia_iattr;
> + ps_iattrs->ia_ctime = ps_iattrs->ia_mtime = CURRENT_TIME;
> + }
> +
> return 0;
> }
>
> @@ -554,10 +563,19 @@ int sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
> */
> void sysfs_remove_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
> {
> + struct sysfs_inode_attrs *ps_iattr;
> +
> BUG_ON(sd->s_flags & SYSFS_FLAG_REMOVED);
>
> sysfs_unlink_sibling(sd);
>
> + /* Update timestamps on the parent */
> + ps_iattr = acxt->parent_sd->s_iattr;
> + if (ps_iattr) {
> + struct iattr *ps_iattrs = &ps_iattr->ia_iattr;
> + ps_iattrs->ia_ctime = ps_iattrs->ia_mtime = CURRENT_TIME;
> + }
> +
> sd->s_flags |= SYSFS_FLAG_REMOVED;
> sd->s_sibling = acxt->removed;
> acxt->removed = sd;
> --
> 1.6.5.2.143.g8cc62
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-11-04 04:11:58

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 07/13] sysfs: Update s_iattr on link and unlink.

"Serge E. Hallyn" <[email protected]> writes:

> Quoting Eric W. Biederman ([email protected]):
>> From: Eric W. Biederman <[email protected]>
>>
>> Currently sysfs updates the timestamps on the vfs directory
>> inode when we create or remove a directory entry but doesn't
>> update the cached copy on the sysfs_dirent, fix that oversight.
>
> confused... why not do this in sysfs_addrm_finish()?
>
> I guess you'd have to do at it at top before dropping sysfs_mutex
> so it wouldn't be as pretty as I was thinking, but at least you
> could just do it once.

Well sysfs_addrm_finish doesn't really know if you did anything.

Beyond that my ultimate goal is to kill sysfs_addrm_start and
sysfs_addrm_finish. Of course that requires fixing all of the
sysfs users that depend on the impossible to get right recursive
directory removal in sysfs, so it is not the subject of this patchset.

Eric

2009-11-04 04:20:37

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 09/13] sysfs: Implement sysfs_getattr & sysfs_permission

Quoting Eric W. Biederman ([email protected]):
> From: Eric W. Biederman <[email protected]>
>
> With the implementation of sysfs_getattr and sysfs_permission
> sysfs becomes able to lazily propogate inode attribute changes
> from the sysfs_dirents to the vfs inodes. This paves the way
> for deleting significant chunks of now unnecessary code.
>
> Acked-by: Tejun Heo <[email protected]>
> Signed-off-by: Eric W. Biederman <[email protected]>
> ---
> fs/sysfs/dir.c | 2 +
> fs/sysfs/inode.c | 64 ++++++++++++++++++++++++++++++++++++++-------------
> fs/sysfs/symlink.c | 3 ++
> fs/sysfs/sysfs.h | 2 +
> 4 files changed, 54 insertions(+), 17 deletions(-)
>
> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> index fa37126..25d052a 100644
> --- a/fs/sysfs/dir.c
> +++ b/fs/sysfs/dir.c
> @@ -800,7 +800,9 @@ static struct dentry * sysfs_lookup(struct inode *dir, struct dentry *dentry,
>
> const struct inode_operations sysfs_dir_inode_operations = {
> .lookup = sysfs_lookup,
> + .permission = sysfs_permission,
> .setattr = sysfs_setattr,
> + .getattr = sysfs_getattr,
> .setxattr = sysfs_setxattr,
> };
>
> diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
> index fccfb55..2dcafe8 100644
> --- a/fs/sysfs/inode.c
> +++ b/fs/sysfs/inode.c
> @@ -37,7 +37,9 @@ static struct backing_dev_info sysfs_backing_dev_info = {
> };
>
> static const struct inode_operations sysfs_inode_operations ={
> + .permission = sysfs_permission,
> .setattr = sysfs_setattr,
> + .getattr = sysfs_getattr,
> .setxattr = sysfs_setxattr,
> };
>
> @@ -196,7 +198,6 @@ static inline void set_default_inode_attr(struct inode * inode, mode_t mode)
>
> static inline void set_inode_attr(struct inode * inode, struct iattr * iattr)
> {
> - inode->i_mode = iattr->ia_mode;
> inode->i_uid = iattr->ia_uid;
> inode->i_gid = iattr->ia_gid;
> inode->i_atime = iattr->ia_atime;
> @@ -227,38 +228,56 @@ static int sysfs_count_nlink(struct sysfs_dirent *sd)
> return nr + 2;
> }
>
> +static void sysfs_refresh_inode(struct sysfs_dirent *sd, struct inode *inode)
> +{
> + struct sysfs_inode_attrs *iattrs = sd->s_iattr;
> +
> + inode->i_mode = sd->s_mode;
> + if (iattrs) {
> + /* sysfs_dirent has non-default attributes
> + * get them from persistent copy in sysfs_dirent
> + */
> + set_inode_attr(inode, &iattrs->ia_iattr);
> + security_inode_notifysecctx(inode,
> + iattrs->ia_secdata,
> + iattrs->ia_secdata_len);
> + }
> +
> + if (sysfs_type(sd) == SYSFS_DIR)
> + inode->i_nlink = sysfs_count_nlink(sd);
> +}
> +
> +int sysfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
> +{
> + struct sysfs_dirent *sd = dentry->d_fsdata;
> + struct inode *inode = dentry->d_inode;
> +
> + mutex_lock(&sysfs_mutex);
> + sysfs_refresh_inode(sd, inode);
> + mutex_unlock(&sysfs_mutex);

So the inode->i_mutex is not needed?

> +
> + generic_fillattr(inode, stat);
> + return 0;
> +}
> +
> static void sysfs_init_inode(struct sysfs_dirent *sd, struct inode *inode)
> {
> struct bin_attribute *bin_attr;
> - struct sysfs_inode_attrs *iattrs;
>
> inode->i_private = sysfs_get(sd);
> inode->i_mapping->a_ops = &sysfs_aops;
> inode->i_mapping->backing_dev_info = &sysfs_backing_dev_info;
> inode->i_op = &sysfs_inode_operations;
> - inode->i_ino = sd->s_ino;
> lockdep_set_class(&inode->i_mutex, &sysfs_inode_imutex_key);
>
> - iattrs = sd->s_iattr;
> - if (iattrs) {
> - /* sysfs_dirent has non-default attributes
> - * get them for the new inode from persistent copy
> - * in sysfs_dirent
> - */
> - set_inode_attr(inode, &iattrs->ia_iattr);
> - if (iattrs->ia_secdata)
> - security_inode_notifysecctx(inode,
> - iattrs->ia_secdata,
> - iattrs->ia_secdata_len);
> - } else
> - set_default_inode_attr(inode, sd->s_mode);
> + set_default_inode_attr(inode, sd->s_mode);
> + sysfs_refresh_inode(sd, inode);
>
> /* initialize inode according to type */
> switch (sysfs_type(sd)) {
> case SYSFS_DIR:
> inode->i_op = &sysfs_dir_inode_operations;
> inode->i_fop = &sysfs_dir_operations;
> - inode->i_nlink = sysfs_count_nlink(sd);
> break;
> case SYSFS_KOBJ_ATTR:
> inode->i_size = PAGE_SIZE;
> @@ -341,3 +360,14 @@ int sysfs_hash_and_remove(struct sysfs_dirent *dir_sd, const char *name)
> else
> return -ENOENT;
> }
> +
> +int sysfs_permission(struct inode *inode, int mask)
> +{
> + struct sysfs_dirent *sd = inode->i_private;
> +
> + mutex_lock(&sysfs_mutex);
> + sysfs_refresh_inode(sd, inode);
> + mutex_unlock(&sysfs_mutex);
> +
> + return generic_permission(inode, mask, NULL);
> +}
> diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
> index 1137418..c5eff49 100644
> --- a/fs/sysfs/symlink.c
> +++ b/fs/sysfs/symlink.c
> @@ -214,6 +214,9 @@ const struct inode_operations sysfs_symlink_inode_operations = {
> .readlink = generic_readlink,
> .follow_link = sysfs_follow_link,
> .put_link = sysfs_put_link,
> + .setattr = sysfs_setattr,
> + .getattr = sysfs_getattr,
> + .permission = sysfs_permission,
> };
>
>
> diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
> index a96d967..12ccc07 100644
> --- a/fs/sysfs/sysfs.h
> +++ b/fs/sysfs/sysfs.h
> @@ -156,7 +156,9 @@ static inline void __sysfs_put(struct sysfs_dirent *sd)
> struct inode *sysfs_get_inode(struct sysfs_dirent *sd);
> void sysfs_delete_inode(struct inode *inode);
> int sysfs_sd_setattr(struct sysfs_dirent *sd, struct iattr *iattr);
> +int sysfs_permission(struct inode *inode, int mask);
> int sysfs_setattr(struct dentry *dentry, struct iattr *iattr);
> +int sysfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat);
> int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
> size_t size, int flags);
> int sysfs_hash_and_remove(struct sysfs_dirent *dir_sd, const char *name);
> --
> 1.6.5.2.143.g8cc62
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-11-04 04:23:44

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 10/13] sysfs: In sysfs_chmod_file lazily propagate the mode change.

Quoting Eric W. Biederman ([email protected]):
> From: Eric W. Biederman <[email protected]>
>
> Now that sysfs_getattr and sysfs_permission refresh the vfs
> inode there is no need to immediatly push the mode change
> into the vfs cache. Reducing the amount of work needed and
> simplifying the locking.
>
> Acked-by: Tejun Heo <[email protected]>
> Signed-off-by: Eric W. Biederman <[email protected]>

Nice.

Acked-by: Serge Hallyn <[email protected]>

> ---
> fs/sysfs/file.c | 31 ++++++++-----------------------
> 1 files changed, 8 insertions(+), 23 deletions(-)
>
> diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
> index faa1a80..dc30d9e 100644
> --- a/fs/sysfs/file.c
> +++ b/fs/sysfs/file.c
> @@ -579,38 +579,23 @@ EXPORT_SYMBOL_GPL(sysfs_add_file_to_group);
> */
> int sysfs_chmod_file(struct kobject *kobj, struct attribute *attr, mode_t mode)
> {
> - struct sysfs_dirent *victim_sd = NULL;
> - struct dentry *victim = NULL;
> - struct inode * inode;
> + struct sysfs_dirent *sd;
> struct iattr newattrs;
> int rc;
>
> - rc = -ENOENT;
> - victim_sd = sysfs_get_dirent(kobj->sd, attr->name);
> - if (!victim_sd)
> - goto out;
> + mutex_lock(&sysfs_mutex);
>
> - mutex_lock(&sysfs_rename_mutex);
> - victim = sysfs_get_dentry(victim_sd);
> - mutex_unlock(&sysfs_rename_mutex);
> - if (IS_ERR(victim)) {
> - rc = PTR_ERR(victim);
> - victim = NULL;
> + rc = -ENOENT;
> + sd = sysfs_find_dirent(kobj->sd, attr->name);
> + if (!sd)
> goto out;
> - }
> -
> - inode = victim->d_inode;
>
> - mutex_lock(&inode->i_mutex);
> -
> - newattrs.ia_mode = (mode & S_IALLUGO) | (inode->i_mode & ~S_IALLUGO);
> + newattrs.ia_mode = (mode & S_IALLUGO) | (sd->s_mode & ~S_IALLUGO);
> newattrs.ia_valid = ATTR_MODE;
> - rc = sysfs_setattr(victim, &newattrs);
> + rc = sysfs_sd_setattr(sd, &newattrs);
>
> - mutex_unlock(&inode->i_mutex);
> out:
> - dput(victim);
> - sysfs_put(victim_sd);
> + mutex_unlock(&sysfs_mutex);
> return rc;
> }
> EXPORT_SYMBOL_GPL(sysfs_chmod_file);
> --
> 1.6.5.2.143.g8cc62
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-11-04 04:32:50

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 11/13] sysfs: Gut sysfs_addrm_start and sysfs_addrm_finish

Quoting Eric W. Biederman ([email protected]):
> From: Eric W. Biederman <[email protected]>
>
> With lazy inode updates and dentry operations bringing everything
> into sync on demand there is no longer any need to immediately
> update the vfs or grab i_mutex to protect those updates as we
> make changes to sysfs.
>
> Signed-off-by: Eric W. Biederman <[email protected]>

Acked-by: Serge Hallyn <[email protected]>

> ---
> fs/sysfs/dir.c | 91 ++---------------------------------------------------
> fs/sysfs/sysfs.h | 2 -
> 2 files changed, 4 insertions(+), 89 deletions(-)
>
> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> index 25d052a..a05b027 100644
> --- a/fs/sysfs/dir.c
> +++ b/fs/sysfs/dir.c
> @@ -386,12 +386,6 @@ struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type)
> return NULL;
> }
>
> -static int sysfs_ilookup_test(struct inode *inode, void *arg)
> -{
> - struct sysfs_dirent *sd = arg;
> - return inode->i_ino == sd->s_ino;
> -}
> -
> /**
> * sysfs_addrm_start - prepare for sysfs_dirent add/remove
> * @acxt: pointer to sysfs_addrm_cxt to be used
> @@ -399,47 +393,20 @@ static int sysfs_ilookup_test(struct inode *inode, void *arg)
> *
> * This function is called when the caller is about to add or
> * remove sysfs_dirent under @parent_sd. This function acquires
> - * sysfs_mutex, grabs inode for @parent_sd if available and lock
> - * i_mutex of it. @acxt is used to keep and pass context to
> + * sysfs_mutex. @acxt is used to keep and pass context to
> * other addrm functions.
> *
> * LOCKING:
> * Kernel thread context (may sleep). sysfs_mutex is locked on
> - * return. i_mutex of parent inode is locked on return if
> - * available.
> + * return.
> */
> void sysfs_addrm_start(struct sysfs_addrm_cxt *acxt,
> struct sysfs_dirent *parent_sd)
> {
> - struct inode *inode;
> -
> memset(acxt, 0, sizeof(*acxt));
> acxt->parent_sd = parent_sd;
>
> - /* Lookup parent inode. inode initialization is protected by
> - * sysfs_mutex, so inode existence can be determined by
> - * looking up inode while holding sysfs_mutex.
> - */
> mutex_lock(&sysfs_mutex);
> -
> - inode = ilookup5(sysfs_sb, parent_sd->s_ino, sysfs_ilookup_test,
> - parent_sd);
> - if (inode) {
> - WARN_ON(inode->i_state & I_NEW);
> -
> - /* parent inode available */
> - acxt->parent_inode = inode;
> -
> - /* sysfs_mutex is below i_mutex in lock hierarchy.
> - * First, trylock i_mutex. If fails, unlock
> - * sysfs_mutex and lock them in order.
> - */
> - if (!mutex_trylock(&inode->i_mutex)) {
> - mutex_unlock(&sysfs_mutex);
> - mutex_lock(&inode->i_mutex);
> - mutex_lock(&sysfs_mutex);
> - }
> - }
> }
>
> /**
> @@ -471,11 +438,6 @@ int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
>
> sd->s_parent = sysfs_get(acxt->parent_sd);
>
> - if (sysfs_type(sd) == SYSFS_DIR && acxt->parent_inode)
> - inc_nlink(acxt->parent_inode);
> -
> - acxt->cnt++;
> -
> sysfs_link_sibling(sd);
>
> /* Update timestamps on the parent */
> @@ -579,40 +541,6 @@ void sysfs_remove_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
> sd->s_flags |= SYSFS_FLAG_REMOVED;
> sd->s_sibling = acxt->removed;
> acxt->removed = sd;
> -
> - if (sysfs_type(sd) == SYSFS_DIR && acxt->parent_inode)
> - drop_nlink(acxt->parent_inode);
> -
> - acxt->cnt++;
> -}
> -
> -/**
> - * sysfs_dec_nlink - Decrement link count for the specified sysfs_dirent
> - * @sd: target sysfs_dirent
> - *
> - * Decrement nlink for @sd. @sd must have been unlinked from its
> - * parent on entry to this function such that it can't be looked
> - * up anymore.
> - */
> -static void sysfs_dec_nlink(struct sysfs_dirent *sd)
> -{
> - struct inode *inode;
> -
> - inode = ilookup(sysfs_sb, sd->s_ino);
> - if (!inode)
> - return;
> -
> - /* adjust nlink and update timestamp */
> - mutex_lock(&inode->i_mutex);
> -
> - inode->i_ctime = CURRENT_TIME;
> - drop_nlink(inode);
> - if (sysfs_type(sd) == SYSFS_DIR)
> - drop_nlink(inode);
> -
> - mutex_unlock(&inode->i_mutex);
> -
> - iput(inode);
> }
>
> /**
> @@ -621,25 +549,15 @@ static void sysfs_dec_nlink(struct sysfs_dirent *sd)
> *
> * Finish up sysfs_dirent add/remove. Resources acquired by
> * sysfs_addrm_start() are released and removed sysfs_dirents are
> - * cleaned up. Timestamps on the parent inode are updated.
> + * cleaned up.
> *
> * LOCKING:
> - * All mutexes acquired by sysfs_addrm_start() are released.
> + * sysfs_mutex is released.
> */
> void sysfs_addrm_finish(struct sysfs_addrm_cxt *acxt)
> {
> /* release resources acquired by sysfs_addrm_start() */
> mutex_unlock(&sysfs_mutex);
> - if (acxt->parent_inode) {
> - struct inode *inode = acxt->parent_inode;
> -
> - /* if added/removed, update timestamps on the parent */
> - if (acxt->cnt)
> - inode->i_ctime = inode->i_mtime = CURRENT_TIME;
> -
> - mutex_unlock(&inode->i_mutex);
> - iput(inode);
> - }
>
> /* kill removed sysfs_dirents */
> while (acxt->removed) {
> @@ -648,7 +566,6 @@ void sysfs_addrm_finish(struct sysfs_addrm_cxt *acxt)
> acxt->removed = sd->s_sibling;
> sd->s_sibling = NULL;
>
> - sysfs_dec_nlink(sd);
> sysfs_deactivate(sd);
> unmap_bin_file(sd);
> sysfs_put(sd);
> diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
> index 12ccc07..90b3501 100644
> --- a/fs/sysfs/sysfs.h
> +++ b/fs/sysfs/sysfs.h
> @@ -89,9 +89,7 @@ static inline unsigned int sysfs_type(struct sysfs_dirent *sd)
> */
> struct sysfs_addrm_cxt {
> struct sysfs_dirent *parent_sd;
> - struct inode *parent_inode;
> struct sysfs_dirent *removed;
> - int cnt;
> };
>
> /*
> --
> 1.6.5.2.143.g8cc62
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-11-04 04:33:38

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 02/13] sysfs: Rename sysfs_d_iput to sysfs_dentry_iput

Quoting Eric W. Biederman ([email protected]):
> From: Eric W. Biederman <[email protected]>
>
> Using dentry instead of d in the function name is what
> several other filesystems are doing and it seems to be
> a more readable convention.
>
> Acked-by: Tejun Heo <[email protected]>
> Signed-off-by: Eric W. Biederman <[email protected]>

Acked-by: Serge Hallyn <[email protected]>

> ---
> fs/sysfs/dir.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> index e020183..130dfc3 100644
> --- a/fs/sysfs/dir.c
> +++ b/fs/sysfs/dir.c
> @@ -298,7 +298,7 @@ void release_sysfs_dirent(struct sysfs_dirent * sd)
> goto repeat;
> }
>
> -static void sysfs_d_iput(struct dentry * dentry, struct inode * inode)
> +static void sysfs_dentry_iput(struct dentry * dentry, struct inode * inode)
> {
> struct sysfs_dirent * sd = dentry->d_fsdata;
>
> @@ -307,7 +307,7 @@ static void sysfs_d_iput(struct dentry * dentry, struct inode * inode)
> }
>
> static const struct dentry_operations sysfs_dentry_ops = {
> - .d_iput = sysfs_d_iput,
> + .d_iput = sysfs_dentry_iput,
> };
>
> struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type)
> --
> 1.6.5.2.143.g8cc62
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-11-04 04:33:52

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 08/13] sysfs: Nicely indent sysfs_symlink_inode_operations

Quoting Eric W. Biederman ([email protected]):
> From: Eric W. Biederman <[email protected]>
>
> Lining up the functions in sysfs_symlink_inode_operations
> follows the pattern in the rest of sysfs and makes things
> slightly more readable.
>
> Acked-by: Tejun Heo <[email protected]>
> Signed-off-by: Eric W. Biederman <[email protected]>

Acked-by: Serge Hallyn <[email protected]>

> ---
> fs/sysfs/symlink.c | 8 ++++----
> 1 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
> index c5081ad..1137418 100644
> --- a/fs/sysfs/symlink.c
> +++ b/fs/sysfs/symlink.c
> @@ -210,10 +210,10 @@ static void sysfs_put_link(struct dentry *dentry, struct nameidata *nd, void *co
> }
>
> const struct inode_operations sysfs_symlink_inode_operations = {
> - .setxattr = sysfs_setxattr,
> - .readlink = generic_readlink,
> - .follow_link = sysfs_follow_link,
> - .put_link = sysfs_put_link,
> + .setxattr = sysfs_setxattr,
> + .readlink = generic_readlink,
> + .follow_link = sysfs_follow_link,
> + .put_link = sysfs_put_link,
> };
>
>
> --
> 1.6.5.2.143.g8cc62
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-11-04 04:55:46

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 04/13] sysfs: Simplify sysfs_chmod_file semantics

Quoting Eric W. Biederman ([email protected]):
> "Serge E. Hallyn" <[email protected]> writes:
>
> > Quoting Eric W. Biederman ([email protected]):
> >> From: Eric W. Biederman <[email protected]>
> >>
> >> Currently every caller of sysfs_chmod_file happens at either
> >> file creation time to set a non-default mode or in response
> >> to a specific user requested space change in policy. Making
> >> timestamps of when the chmod happens and notification of
> >> a file changing mode uninteresting.
> >
> > But these changes can occur by togging values in sysfs files
> > (i.e. f71805f.c), right? Is this (specifically not doing inotify)
> > definately uncontroversial?
>
> The fs_notify_change was not introduced to deliberately support
> a feature but as a side effect of other cleanups. So there
> is no indication that anyone cares about inotify support.
>
> > I can't exactly picture an admin sitting there watching
> > nautilus for a sysfs file to become writeable, but could
> > imagine some site's automation getting hung... Or am I way
> > off base?
>
> I would be stunned if the shell script in the automation that writes
> to a sysfs file to make things writeable doesn't on it's next line
> kick off whatever needs it to be writable.
>
> With no benefit to using inotify and with only a handful of sysfs
> files affected I don't expect this change to break anything in
> userspace and I have been happily running with it for a year or so on
> all of our machines at work with no one problems.
>
> The reason I am making the change is that the goal of this patchset is
> to get sysfs to act like any other distributed filesystem in linux,
> and to use the same interfaces in roughly the same ways as other
> distributed filesystems. Unfortunately there is not a good interface
> for distributed filesystems to support inotify or I would use it.
>
> Eric

Ok - I personally agree, but I know there are admins out there with
very different mindsets from mine

Acked-by: Serge Hallyn <[email protected]>

-serge

2009-11-04 04:58:20

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 07/13] sysfs: Update s_iattr on link and unlink.

Quoting Eric W. Biederman ([email protected]):
> "Serge E. Hallyn" <[email protected]> writes:
>
> > Quoting Eric W. Biederman ([email protected]):
> >> From: Eric W. Biederman <[email protected]>
> >>
> >> Currently sysfs updates the timestamps on the vfs directory
> >> inode when we create or remove a directory entry but doesn't
> >> update the cached copy on the sysfs_dirent, fix that oversight.
> >
> > confused... why not do this in sysfs_addrm_finish()?
> >
> > I guess you'd have to do at it at top before dropping sysfs_mutex
> > so it wouldn't be as pretty as I was thinking, but at least you
> > could just do it once.
>
> Well sysfs_addrm_finish doesn't really know if you did anything.

Oh right - well it used to through cnt right? but not after your
last patch.

> Beyond that my ultimate goal is to kill sysfs_addrm_start and
> sysfs_addrm_finish. Of course that requires fixing all of the
> sysfs users that depend on the impossible to get right recursive
> directory removal in sysfs, so it is not the subject of this patchset.

I didn't see the patch nixing inode->i_mtime (and cnt) changing from
sysfs_addrm_finish() until after responding. Got it now.

Acked-by: Serge Hallyn <[email protected]>

-serge

2009-11-04 05:50:31

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 09/13] sysfs: Implement sysfs_getattr & sysfs_permission

"Serge E. Hallyn" <[email protected]> writes:

> So the inode->i_mutex is not needed?

Good question. Nothing in sysfs needs it. The VFS does not grab the
inode mutex on this path, but the vfs does grab the inode mutex when
writing to the inode.

Since the VFs isn't grabbing the inode_mutex there is probably a race in
here somewhere if someone looks at things just right.

I am too tired tonight to be that person.

Eric

2009-11-04 12:04:11

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 14/13] sysfs: sysfs_setattr remove unnecessary permission check.


inode_change_ok already clears the SGID bit when necessary so there is
no reason for sysfs_setattr to carry code to do the same, and it is
good to kill the extra copy because when I moved the code, I goofed
and in certain corner cases the code will look at the wrong gid.

Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/inode.c | 4 ----
1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index 72e2e99..e2595a7 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -120,10 +120,6 @@ int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
return error;

iattr->ia_valid &= ~ATTR_SIZE; /* ignore size changes */
- if (iattr->ia_valid & ATTR_MODE) {
- if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
- iattr->ia_mode &= ~S_ISGID;
- }

error = inode_setattr(inode, iattr);
if (error)
--
1.6.5.2.143.g8cc62

2009-11-04 12:05:09

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 15/13] sysfs: Protect sysfs_refresh_inode with inode mutex.


In general everything that writes to vfs inodes holds the
inode mutex, so hold the inode mutex over sysfs_refresh_inode.
The sysfs data structures don't need this but it looks like the
vfs might.

Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/inode.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index e2595a7..ad549f5 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -240,9 +240,11 @@ int sysfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *sta
struct sysfs_dirent *sd = dentry->d_fsdata;
struct inode *inode = dentry->d_inode;

+ mutex_lock(&inode->i_mutex);
mutex_lock(&sysfs_mutex);
sysfs_refresh_inode(sd, inode);
mutex_unlock(&sysfs_mutex);
+ mutex_unlock(&inode->i_mutex);

generic_fillattr(inode, stat);
return 0;
@@ -353,9 +355,11 @@ int sysfs_permission(struct inode *inode, int mask)
{
struct sysfs_dirent *sd = inode->i_private;

+ mutex_lock(&inode->i_mutex);
mutex_lock(&sysfs_mutex);
sysfs_refresh_inode(sd, inode);
mutex_unlock(&sysfs_mutex);
+ mutex_unlock(&inode->i_mutex);

return generic_permission(inode, mask, NULL);
}
--
1.6.5.2.143.g8cc62

2009-11-04 12:57:00

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 06/13] sysfs: Fix locking and factor out sysfs_sd_setattr

[email protected] (Eric W. Biederman) writes:

> "Serge E. Hallyn" <[email protected]> writes:
>
>> Quoting Eric W. Biederman ([email protected]):
>>> From: Eric W. Biederman <[email protected]>
>>>
>>> Cleanly separate the work that is specific to setting the
>>> attributes of a sysfs_dirent from what is needed to update
>>> the attributes of a vfs inode.
>>>
>>> Additionally grab the sysfs_mutex to keep any nasties from
>>> surprising us when updating the sysfs_dirent.
>>>
>>> Acked-by: Tejun Heo <[email protected]>
>>> Signed-off-by: Eric W. Biederman <[email protected]>
>>> ---
>>> fs/sysfs/inode.c | 52 ++++++++++++++++++++++++++++++++--------------------
>>> fs/sysfs/sysfs.h | 1 +
>>> 2 files changed, 33 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
>>> index fed7a74..fccfb55 100644
>>> --- a/fs/sysfs/inode.c
>>> +++ b/fs/sysfs/inode.c
>>> @@ -64,30 +64,15 @@ struct sysfs_inode_attrs *sysfs_init_inode_attrs(struct sysfs_dirent *sd)
>>>
>>> return attrs;
>>> }
>>> -int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
>>> +
>>> +int sysfs_sd_setattr(struct sysfs_dirent *sd, struct iattr * iattr)
>>> {
>>> - struct inode * inode = dentry->d_inode;
>>> - struct sysfs_dirent * sd = dentry->d_fsdata;
>>> struct sysfs_inode_attrs *sd_attrs;
>>> struct iattr *iattrs;
>>> unsigned int ia_valid = iattr->ia_valid;
>>> - int error;
>>> -
>>> - if (!sd)
>>> - return -EINVAL;
>>>
>>> sd_attrs = sd->s_iattr;
>>>
>>> - error = inode_change_ok(inode, iattr);
>>> - if (error)
>>> - return error;
>>> -
>>> - iattr->ia_valid &= ~ATTR_SIZE; /* ignore size changes */
>>> -
>>> - error = inode_setattr(inode, iattr);
>>> - if (error)
>>> - return error;
>>> -
>>> if (!sd_attrs) {
>>> /* setting attributes for the first time, allocate now */
>>> sd_attrs = sysfs_init_inode_attrs(sd);
>>> @@ -110,12 +95,39 @@ int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
>>> iattrs->ia_ctime = iattr->ia_ctime;
>>> if (ia_valid & ATTR_MODE) {
>>> umode_t mode = iattr->ia_mode;
>>> -
>>> - if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
>>> - mode &= ~S_ISGID;
>>> iattrs->ia_mode = sd->s_mode = mode;
>>> }
>>> }
>>> + return 0;
>>> +}
>>> +
>>> +int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
>>> +{
>>> + struct inode * inode = dentry->d_inode;
>>> + struct sysfs_dirent * sd = dentry->d_fsdata;
>>> + int error;
>>> +
>>> + if (!sd)
>>> + return -EINVAL;
>>> +
>>> + error = inode_change_ok(inode, iattr);
>>> + if (error)
>>> + return error;
>>> +
>>> + iattr->ia_valid &= ~ATTR_SIZE; /* ignore size changes */
>>> + if (iattr->ia_valid & ATTR_MODE) {
>>> + if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
>>> + iattr->ia_mode &= ~S_ISGID;
>>> + }
>>
>> Was it a bug that before this patch this wasn't cleared before the
>> actual inode_setattr()?
>
> Not strictly as inode_setattr performs the exact same check.
>
>> Since the S_ISGID will be set for the *new* gid, that is,
>> iattr->ia_gid, shouldn't the user be required to be
>> in_group_p(iattr->i_gid)? Note you haven't done the
>> inode_setattr() yet.
>
> Interesting point, yes I am potentially testing the wrong gid
> there. The dangers of moving code around.
>
> Thank you for catching that. !#@#!#

Turns out it wasn't as bad as it looks because inode_change_ok already
does what I am trying to do properly.

Although come to think of it, SGID on a sysfs file is completely bogus.
We don't exec files and we don't allow users to create anything.

I have already sent the follow up patch to remove those few lines of code.

Eric

2009-11-04 14:24:25

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 09/13] sysfs: Implement sysfs_getattr & sysfs_permission

Quoting Eric W. Biederman ([email protected]):
> "Serge E. Hallyn" <[email protected]> writes:
>
> > So the inode->i_mutex is not needed?
>
> Good question. Nothing in sysfs needs it. The VFS does not grab the
> inode mutex on this path, but the vfs does grab the inode mutex when
> writing to the inode.

All callers of fs/attr.c:notify_change() do seem to take the i_mutex,
though. And Documentation/filesystem/Locking claims that ->setattr()
does need i_mutex. So I assume that setting of inode->i_ctime etc,
which is what you're doing here, needs to be protected by the i_mutex.

> Since the VFs isn't grabbing the inode_mutex there is probably a race in
> here somewhere if someone looks at things just right.
>
> I am too tired tonight to be that person.

The readers take no lock of any sort (i.e. generic_fillattr and its
callers) so IIUC they could get inconsistent data...

-serge

2009-11-04 14:25:37

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 14/13] sysfs: sysfs_setattr remove unnecessary permission check.

Quoting Eric W. Biederman ([email protected]):
>
> inode_change_ok already clears the SGID bit when necessary so there is
> no reason for sysfs_setattr to carry code to do the same, and it is
> good to kill the extra copy because when I moved the code, I goofed
> and in certain corner cases the code will look at the wrong gid.
>
> Signed-off-by: Eric W. Biederman <[email protected]>

Acked-by: Serge Hallyn <[email protected]>

> ---
> fs/sysfs/inode.c | 4 ----
> 1 files changed, 0 insertions(+), 4 deletions(-)
>
> diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
> index 72e2e99..e2595a7 100644
> --- a/fs/sysfs/inode.c
> +++ b/fs/sysfs/inode.c
> @@ -120,10 +120,6 @@ int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
> return error;
>
> iattr->ia_valid &= ~ATTR_SIZE; /* ignore size changes */
> - if (iattr->ia_valid & ATTR_MODE) {
> - if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
> - iattr->ia_mode &= ~S_ISGID;
> - }
>
> error = inode_setattr(inode, iattr);
> if (error)
> --
> 1.6.5.2.143.g8cc62
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-11-04 14:27:31

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 15/13] sysfs: Protect sysfs_refresh_inode with inode mutex.

Quoting Eric W. Biederman ([email protected]):
>
> In general everything that writes to vfs inodes holds the
> inode mutex, so hold the inode mutex over sysfs_refresh_inode.
> The sysfs data structures don't need this but it looks like the
> vfs might.
>
> Signed-off-by: Eric W. Biederman <[email protected]>

Oh right so pls disregard my last reply to patch 9 :)


Acked-by: Serge Hallyn <[email protected]>

> ---
> fs/sysfs/inode.c | 4 ++++
> 1 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
> index e2595a7..ad549f5 100644
> --- a/fs/sysfs/inode.c
> +++ b/fs/sysfs/inode.c
> @@ -240,9 +240,11 @@ int sysfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *sta
> struct sysfs_dirent *sd = dentry->d_fsdata;
> struct inode *inode = dentry->d_inode;
>
> + mutex_lock(&inode->i_mutex);
> mutex_lock(&sysfs_mutex);
> sysfs_refresh_inode(sd, inode);
> mutex_unlock(&sysfs_mutex);
> + mutex_unlock(&inode->i_mutex);
>
> generic_fillattr(inode, stat);
> return 0;
> @@ -353,9 +355,11 @@ int sysfs_permission(struct inode *inode, int mask)
> {
> struct sysfs_dirent *sd = inode->i_private;
>
> + mutex_lock(&inode->i_mutex);
> mutex_lock(&sysfs_mutex);
> sysfs_refresh_inode(sd, inode);
> mutex_unlock(&sysfs_mutex);
> + mutex_unlock(&inode->i_mutex);
>
> return generic_permission(inode, mask, NULL);
> }
> --
> 1.6.5.2.143.g8cc62

2009-11-04 20:51:20

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 15/13] sysfs: Protect sysfs_refresh_inode with inode mutex.

"Serge E. Hallyn" <[email protected]> writes:

> Quoting Eric W. Biederman ([email protected]):
>>
>> In general everything that writes to vfs inodes holds the
>> inode mutex, so hold the inode mutex over sysfs_refresh_inode.
>> The sysfs data structures don't need this but it looks like the
>> vfs might.
>>
>> Signed-off-by: Eric W. Biederman <[email protected]>
>
> Oh right so pls disregard my last reply to patch 9 :)

I also checked and nfs has the same basic structure and does take the
inode mutex on these paths, inside of nfs_refresh_inode which is
called from __nfs_revalidate_inode.

So it is definitely the consistent thing to do.

Eric

2009-11-04 21:49:54

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 12/13] sysfs: Propagate renames to the vfs on demand

Quoting Eric W. Biederman ([email protected]):
> From: Eric W. Biederman <[email protected]>
>
> By teaching sysfs_revalidate to hide a dentry for
> a sysfs_dirent if the sysfs_dirent has been renamed,
> and by teaching sysfs_lookup to return the original
> dentry if the sysfs dirent has been renamed. I can
> show the results of renames correctly without having to
> update the dcache during the directory rename.
>
> This massively simplifies the rename logic allowing a lot
> of weird sysfs special cases to be removed along with
> a lot of now unnecesary helper code.
>
> Acked-by: Tejun Heo <[email protected]>
> Signed-off-by: Eric W. Biederman <[email protected]>

Patch looks *great*, except:

...

> @@ -315,6 +274,14 @@ static int sysfs_dentry_revalidate(struct dentry *dentry, struct nameidata *nd)
> if (sd->s_flags & SYSFS_FLAG_REMOVED)
> goto out_bad;
>
> + /* The sysfs dirent has been moved? */
> + if (dentry->d_parent->d_fsdata != sd->s_parent)
> + goto out_bad;
> +
> + /* The sysfs dirent has been renamed */
> + if (strcmp(dentry->d_name.name, sd->s_name) != 0)
> + goto out_bad;
> +
> mutex_unlock(&sysfs_mutex);
> out_valid:
> return 1;
> @@ -322,6 +289,12 @@ out_bad:
> /* Remove the dentry from the dcache hashes.
> * If this is a deleted dentry we use d_drop instead of d_delete
> * so sysfs doesn't need to cope with negative dentries.
> + *
> + * If this is a dentry that has simply been renamed we
> + * use d_drop to remove it from the dcache lookup on its
> + * old parent. If this dentry persists later when a lookup
> + * is performed at its new name the dentry will be readded
> + * to the dcache hashes.
> */
> is_dir = (sysfs_type(sd) == SYSFS_DIR);
> mutex_unlock(&sysfs_mutex);

After this, if (is_dir) and (have_submounts(dentry)) then you'll
still goto out_valid and return 1. Is that what you want?

-serge

2009-11-04 21:59:44

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 12/13] sysfs: Propagate renames to the vfs on demand

"Serge E. Hallyn" <[email protected]> writes:

> Quoting Eric W. Biederman ([email protected]):
>> From: Eric W. Biederman <[email protected]>
>>
>> By teaching sysfs_revalidate to hide a dentry for
>> a sysfs_dirent if the sysfs_dirent has been renamed,
>> and by teaching sysfs_lookup to return the original
>> dentry if the sysfs dirent has been renamed. I can
>> show the results of renames correctly without having to
>> update the dcache during the directory rename.
>>
>> This massively simplifies the rename logic allowing a lot
>> of weird sysfs special cases to be removed along with
>> a lot of now unnecesary helper code.
>>
>> Acked-by: Tejun Heo <[email protected]>
>> Signed-off-by: Eric W. Biederman <[email protected]>
>
> Patch looks *great*, except:
>
> ...
>
>> @@ -315,6 +274,14 @@ static int sysfs_dentry_revalidate(struct dentry *dentry, struct nameidata *nd)
>> if (sd->s_flags & SYSFS_FLAG_REMOVED)
>> goto out_bad;
>>
>> + /* The sysfs dirent has been moved? */
>> + if (dentry->d_parent->d_fsdata != sd->s_parent)
>> + goto out_bad;
>> +
>> + /* The sysfs dirent has been renamed */
>> + if (strcmp(dentry->d_name.name, sd->s_name) != 0)
>> + goto out_bad;
>> +
>> mutex_unlock(&sysfs_mutex);
>> out_valid:
>> return 1;
>> @@ -322,6 +289,12 @@ out_bad:
>> /* Remove the dentry from the dcache hashes.
>> * If this is a deleted dentry we use d_drop instead of d_delete
>> * so sysfs doesn't need to cope with negative dentries.
>> + *
>> + * If this is a dentry that has simply been renamed we
>> + * use d_drop to remove it from the dcache lookup on its
>> + * old parent. If this dentry persists later when a lookup
>> + * is performed at its new name the dentry will be readded
>> + * to the dcache hashes.
>> */
>> is_dir = (sysfs_type(sd) == SYSFS_DIR);
>> mutex_unlock(&sysfs_mutex);
>
> After this, if (is_dir) and (have_submounts(dentry)) then you'll
> still goto out_valid and return 1. Is that what you want?

It isn't what I want but it is what the VFS requires. If let the vfs
continue on it's delusional state we will leak the vfs mount and
everything mounted on top of it, with no way to remove the mounts.

Eric

2009-11-04 22:19:56

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 13/13] sysfs: Factor out sysfs_rename from sysfs_rename_dir and sysfs_move_dir

Quoting Eric W. Biederman ([email protected]):
> From: Eric W. Biederman <[email protected]>
>
> These two functions do 90% of the same work and it doesn't significantly
> obfuscate the function to allow both the parent dir and the name to change
> at the same time. So merge them together to simplify maintenance, and
> increase testing.
>
> Acked-by: Tejun Heo <[email protected]>
> Signed-off-by: Eric W. Biederman <[email protected]>

Based on just this patchset it seems sysfs_rename() could be static,
but since it isn't static I assume your later patchset will use it
outside fs/sysfs/dir.c?

Acked-by: Serge Hallyn <[email protected]>


> ---
> fs/sysfs/dir.c | 62 +++++++++++++++++++++++++----------------------------
> fs/sysfs/sysfs.h | 3 ++
> 2 files changed, 32 insertions(+), 33 deletions(-)
>
> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> index 0b60212..e1a86d1 100644
> --- a/fs/sysfs/dir.c
> +++ b/fs/sysfs/dir.c
> @@ -760,30 +760,42 @@ void sysfs_remove_dir(struct kobject * kobj)
> __sysfs_remove_dir(sd);
> }
>
> -int sysfs_rename_dir(struct kobject * kobj, const char *new_name)
> +int sysfs_rename(struct sysfs_dirent *sd,
> + struct sysfs_dirent *new_parent_sd, const char *new_name)
> {
> - struct sysfs_dirent *sd = kobj->sd;
> const char *dup_name = NULL;
> int error;
>
> mutex_lock(&sysfs_mutex);
>
> error = 0;
> - if (strcmp(sd->s_name, new_name) == 0)
> + if ((sd->s_parent == new_parent_sd) &&
> + (strcmp(sd->s_name, new_name) == 0))
> goto out; /* nothing to rename */
>
> error = -EEXIST;
> - if (sysfs_find_dirent(sd->s_parent, new_name))
> + if (sysfs_find_dirent(new_parent_sd, new_name))
> goto out;
>
> /* rename sysfs_dirent */
> - error = -ENOMEM;
> - new_name = dup_name = kstrdup(new_name, GFP_KERNEL);
> - if (!new_name)
> - goto out;
> + if (strcmp(sd->s_name, new_name) != 0) {
> + error = -ENOMEM;
> + new_name = dup_name = kstrdup(new_name, GFP_KERNEL);
> + if (!new_name)
> + goto out;
> +
> + dup_name = sd->s_name;
> + sd->s_name = new_name;
> + }
>
> - dup_name = sd->s_name;
> - sd->s_name = new_name;
> + /* Remove from old parent's list and insert into new parent's list. */
> + if (sd->s_parent != new_parent_sd) {
> + sysfs_unlink_sibling(sd);
> + sysfs_get(new_parent_sd);
> + sysfs_put(sd->s_parent);
> + sd->s_parent = new_parent_sd;
> + sysfs_link_sibling(sd);
> + }
>
> error = 0;
> out:
> @@ -792,37 +804,21 @@ int sysfs_rename_dir(struct kobject * kobj, const char *new_name)
> return error;
> }
>
> +int sysfs_rename_dir(struct kobject * kobj, const char *new_name)
> +{
> + return sysfs_rename(kobj->sd, kobj->sd->s_parent, new_name);
> +}
> +
> int sysfs_move_dir(struct kobject *kobj, struct kobject *new_parent_kobj)
> {
> struct sysfs_dirent *sd = kobj->sd;
> struct sysfs_dirent *new_parent_sd;
> - int error;
>
> BUG_ON(!sd->s_parent);
> -
> - mutex_lock(&sysfs_mutex);
> - new_parent_sd = (new_parent_kobj && new_parent_kobj->sd) ?
> + new_parent_sd = new_parent_kobj && new_parent_kobj->sd ?
> new_parent_kobj->sd : &sysfs_root;
>
> - error = 0;
> - if (sd->s_parent == new_parent_sd)
> - goto out; /* nothing to move */
> -
> - error = -EEXIST;
> - if (sysfs_find_dirent(new_parent_sd, sd->s_name))
> - goto out;
> -
> - /* Remove from old parent's list and insert into new parent's list. */
> - sysfs_unlink_sibling(sd);
> - sysfs_get(new_parent_sd);
> - sysfs_put(sd->s_parent);
> - sd->s_parent = new_parent_sd;
> - sysfs_link_sibling(sd);
> -
> - error = 0;
> -out:
> - mutex_unlock(&sysfs_mutex);
> - return error;
> + return sysfs_rename(sd, new_parent_sd, sd->s_name);
> }
>
> /* Relationship between s_mode and the DT_xxx types */
> diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
> index 98a15bf..ca52e7b 100644
> --- a/fs/sysfs/sysfs.h
> +++ b/fs/sysfs/sysfs.h
> @@ -130,6 +130,9 @@ int sysfs_create_subdir(struct kobject *kobj, const char *name,
> struct sysfs_dirent **p_sd);
> void sysfs_remove_subdir(struct sysfs_dirent *sd);
>
> +int sysfs_rename(struct sysfs_dirent *sd,
> + struct sysfs_dirent *new_parent_sd, const char *new_name);
> +
> static inline struct sysfs_dirent *__sysfs_get(struct sysfs_dirent *sd)
> {
> if (sd) {
> --
> 1.6.5.2.143.g8cc62
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-11-04 22:23:53

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 13/13] sysfs: Factor out sysfs_rename from sysfs_rename_dir and sysfs_move_dir

"Serge E. Hallyn" <[email protected]> writes:

> Quoting Eric W. Biederman ([email protected]):
>> From: Eric W. Biederman <[email protected]>
>>
>> These two functions do 90% of the same work and it doesn't significantly
>> obfuscate the function to allow both the parent dir and the name to change
>> at the same time. So merge them together to simplify maintenance, and
>> increase testing.
>>
>> Acked-by: Tejun Heo <[email protected]>
>> Signed-off-by: Eric W. Biederman <[email protected]>
>
> Based on just this patchset it seems sysfs_rename() could be static,
> but since it isn't static I assume your later patchset will use it
> outside fs/sysfs/dir.c?

To implement sysfs_rename_link, but that is too much to digest/review/push
all at once.

I have a snapshot of my development tree up on kernel.org if you are
interested.

Eric

2009-11-04 22:37:29

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 13/13] sysfs: Factor out sysfs_rename from sysfs_rename_dir and sysfs_move_dir

Quoting Eric W. Biederman ([email protected]):
> "Serge E. Hallyn" <[email protected]> writes:
>
> > Quoting Eric W. Biederman ([email protected]):
> >> From: Eric W. Biederman <[email protected]>
> >>
> >> These two functions do 90% of the same work and it doesn't significantly
> >> obfuscate the function to allow both the parent dir and the name to change
> >> at the same time. So merge them together to simplify maintenance, and
> >> increase testing.
> >>
> >> Acked-by: Tejun Heo <[email protected]>
> >> Signed-off-by: Eric W. Biederman <[email protected]>
> >
> > Based on just this patchset it seems sysfs_rename() could be static,
> > but since it isn't static I assume your later patchset will use it
> > outside fs/sysfs/dir.c?
>
> To implement sysfs_rename_link, but that is too much to digest/review/push
> all at once.

Ok, just making sure it'll get used.

To repeat myself,

Acked-by: Serge Hallyn <[email protected]>

> I have a snapshot of my development tree up on kernel.org if you are
> interested.

I think this was an appropriately sized set :) Might take a look this
weekend though.

thanks,
-serge

2009-11-05 04:51:24

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 13/13] sysfs: Factor out sysfs_rename from sysfs_rename_dir and sysfs_move_dir

"Serge E. Hallyn" <[email protected]> writes:

> Quoting Eric W. Biederman ([email protected]):
>> "Serge E. Hallyn" <[email protected]> writes:
>>
>> > Quoting Eric W. Biederman ([email protected]):
>> >> From: Eric W. Biederman <[email protected]>
>> >>
>> >> These two functions do 90% of the same work and it doesn't significantly
>> >> obfuscate the function to allow both the parent dir and the name to change
>> >> at the same time. So merge them together to simplify maintenance, and
>> >> increase testing.
>> >>
>> >> Acked-by: Tejun Heo <[email protected]>
>> >> Signed-off-by: Eric W. Biederman <[email protected]>
>> >
>> > Based on just this patchset it seems sysfs_rename() could be static,
>> > but since it isn't static I assume your later patchset will use it
>> > outside fs/sysfs/dir.c?
>>
>> To implement sysfs_rename_link, but that is too much to digest/review/push
>> all at once.
>
> Ok, just making sure it'll get used.
>
> To repeat myself,
>
> Acked-by: Serge Hallyn <[email protected]>
>
>> I have a snapshot of my development tree up on kernel.org if you are
>> interested.
>
> I think this was an appropriately sized set :) Might take a look this
> weekend though.

Thanks for the review.

Eric

2009-11-06 23:37:25

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 0/13] sysfs lazification.

On Tue, Nov 03, 2009 at 03:53:56AM -0800, Eric W. Biederman wrote:
>
> The sysfs code updates the vfs caches immediately when the sysfs data
> structures change causing a lot of unnecessary complications. The
> following patchset untangles that beast. Allowing for simpler
> more straight forward code, the removal of a hack from the vfs
> to support sysfs, and human comprehensible locking on sysfs.
>
> Most of these patches have already been reviewed and acked from the
> last time I had time to work on sysfs.
>
> In net the patches look like:

Can you resend these based on the review that you just got, with the new
acks and changes so that I can apply them?

thanks,

greg k-h

2009-11-07 02:07:06

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 09/13] sysfs: Implement sysfs_getattr & sysfs_permission

Eric W. Biederman wrote:
> diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
> index 1137418..c5eff49 100644
> --- a/fs/sysfs/symlink.c
> +++ b/fs/sysfs/symlink.c
> @@ -214,6 +214,9 @@ const struct inode_operations sysfs_symlink_inode_operations = {
> .readlink = generic_readlink,
> .follow_link = sysfs_follow_link,
> .put_link = sysfs_put_link,
> + .setattr = sysfs_setattr,

It would be nice either to separate addition of setattr to symlinks
into a separate patch or note it in the description.

Thanks.

--
tejun

2009-11-07 02:08:56

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 11/13] sysfs: Gut sysfs_addrm_start and sysfs_addrm_finish

Eric W. Biederman wrote:
> From: Eric W. Biederman <[email protected]>
>
> With lazy inode updates and dentry operations bringing everything
> into sync on demand there is no longer any need to immediately
> update the vfs or grab i_mutex to protect those updates as we
> make changes to sysfs.
>
> Signed-off-by: Eric W. Biederman <[email protected]>

Acked-by: Tejun Heo <[email protected]>

Thanks.

--
tejun

2009-11-07 02:12:04

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 12/13] sysfs: Propagate renames to the vfs on demand

Hello,

Eric W. Biederman wrote:
> It isn't what I want but it is what the VFS requires. If let the vfs
> continue on it's delusional state we will leak the vfs mount and
> everything mounted on top of it, with no way to remove the mounts.

This is caused by not having any way to prevent deletion on
directories with submounts, right? How does other distributed
filesystems deal with directories with submounts going away underneath
it?

Thanks.

--
tejun

2009-11-07 02:16:38

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 12/13] sysfs: Propagate renames to the vfs on demand

Tejun Heo <[email protected]> writes:

> Hello,
>
> Eric W. Biederman wrote:
>> It isn't what I want but it is what the VFS requires. If let the vfs
>> continue on it's delusional state we will leak the vfs mount and
>> everything mounted on top of it, with no way to remove the mounts.
>
> This is caused by not having any way to prevent deletion on
> directories with submounts, right? How does other distributed
> filesystems deal with directories with submounts going away underneath
> it?

NFS does exactly the same thing I am doing.

Eric

2009-11-07 02:18:43

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 14/13] sysfs: sysfs_setattr remove unnecessary permission check.

Eric W. Biederman wrote:
> inode_change_ok already clears the SGID bit when necessary so there is
> no reason for sysfs_setattr to carry code to do the same, and it is
> good to kill the extra copy because when I moved the code, I goofed
> and in certain corner cases the code will look at the wrong gid.
>
> Signed-off-by: Eric W. Biederman <[email protected]>

Acked-by: Tejun Heo <[email protected]>

I would much prefer this one being prior to 06 but if it's too
painful, please at least note it in the description of 06.

Thanks.

--
tejun

2009-11-07 02:20:31

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 12/13] sysfs: Propagate renames to the vfs on demand

Hello,

Eric W. Biederman wrote:
> NFS does exactly the same thing I am doing.

Oh... well, sysfs directories parenting other filesystems are pretty
rare and well defined. Although it's not very pretty, I don't think
we'll see any actual problem there. Thanks for the explanation.

--
tejun

2009-11-07 02:26:59

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 15/13] sysfs: Protect sysfs_refresh_inode with inode mutex.

Eric W. Biederman wrote:
> In general everything that writes to vfs inodes holds the
> inode mutex, so hold the inode mutex over sysfs_refresh_inode.
> The sysfs data structures don't need this but it looks like the
> vfs might.
>
> Signed-off-by: Eric W. Biederman <[email protected]>

Acked-by: Tejun Heo <[email protected]>

Sidenote: Hmmm... Originally, sysfs completely depended on vfs locking
but with sysfs_dirent separation, the tree structure itself and some
attributes went under the protection of sysfs_mutex while leaving more
vfs oriented fields under vfs locking. This patchset makes sysfs
lazier so it can't depend on any vfs layer locking. I think you've
converted all necessary places while removing dependency on
dentry/inode from update operations but it might be a good idea to do
a audit pass over how fields are being protected now.

Thanks for your patience.

--
tejun

2009-11-07 02:27:40

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 01/13] sysfs: Update sysfs_setxattr so it updates secdata under the sysfs_mutex

Eric W. Biederman wrote:
> From: Eric W. Biederman <[email protected]>
>
> The sysfs_mutex is required to ensure updates are and will remain
> atomic with respect to other inode iattr updates, that do not happen
> through the filesystem.
>
> Signed-off-by: Eric W. Biederman <[email protected]>

Acked-by: Tejun Heo <[email protected]>

--
tejun

2009-11-07 02:34:53

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 12/13] sysfs: Propagate renames to the vfs on demand

Tejun Heo <[email protected]> writes:

> Hello,
>
> Eric W. Biederman wrote:
>> NFS does exactly the same thing I am doing.
>
> Oh... well, sysfs directories parenting other filesystems are pretty
> rare and well defined. Although it's not very pretty, I don't think
> we'll see any actual problem there. Thanks for the explanation.

To be perfectly clear, if we hit this ugly case. The internal
sysfs_dirent tree is always what it should be. Only the VFS dentry
cache doesn't get updated.

Eric

2009-11-07 11:13:33

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH 12/13] sysfs: Propagate renames to the vfs on demand

On Fri, 06 Nov 2009, [email protected] (Eric W. Biederman wrote:
> Tejun Heo <[email protected]> writes:
>
> > Hello,
> >
> > Eric W. Biederman wrote:
> >> It isn't what I want but it is what the VFS requires. If let the vfs
> >> continue on it's delusional state we will leak the vfs mount and
> >> everything mounted on top of it, with no way to remove the mounts.

"umount -l" on the whole thing will clear any submounts up too.

> >
> > This is caused by not having any way to prevent deletion on
> > directories with submounts, right? How does other distributed
> > filesystems deal with directories with submounts going away underneath
> > it?
>
> NFS does exactly the same thing I am doing.

Yes, this is a problem for NFS too. You cannot tell the NFS server
"this directory is mounted on some client, don't let anything happen
to it!". Basically the remaining choices are:

a) let the old path leading up to the mount still be accessible, even
though it doesn't exist anymore on the server (or has been replaced
with something different)

b) automatically dissolve any submounts if the path disappeard on the
server

I think Al was arguing in favor of b), while Linus said that mounts
must never just disappear, so a) is better. I don't think an
agreement was reached.

Thanks,
Miklos

2009-11-07 11:57:29

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 12/13] sysfs: Propagate renames to the vfs on demand

Miklos Szeredi <[email protected]> writes:

> On Fri, 06 Nov 2009, [email protected] (Eric W. Biederman wrote:
>> Tejun Heo <[email protected]> writes:
>>
>> > Hello,
>> >
>> > Eric W. Biederman wrote:
>> >> It isn't what I want but it is what the VFS requires. If let the vfs
>> >> continue on it's delusional state we will leak the vfs mount and
>> >> everything mounted on top of it, with no way to remove the mounts.
>
> "umount -l" on the whole thing will clear any submounts up too.
>
>> >
>> > This is caused by not having any way to prevent deletion on
>> > directories with submounts, right? How does other distributed
>> > filesystems deal with directories with submounts going away underneath
>> > it?
>>
>> NFS does exactly the same thing I am doing.
>
> Yes, this is a problem for NFS too. You cannot tell the NFS server
> "this directory is mounted on some client, don't let anything happen
> to it!". Basically the remaining choices are:
>
> a) let the old path leading up to the mount still be accessible, even
> though it doesn't exist anymore on the server (or has been replaced
> with something different)
>
> b) automatically dissolve any submounts if the path disappeard on the
> server
>
> I think Al was arguing in favor of b), while Linus said that mounts
> must never just disappear, so a) is better. I don't think an
> agreement was reached.

I haven't seen that conversation. I do know it is non-intutive and if
you attempt to delete what is a mount point in another mount namespace
and it won't go away. (What we do for non-distributed filesystems).
So I would favor mount points dissolving if we had the infrastructure.

Regardless the goal for now is to simply catch up with other distributed
filesystems.

Eric

2009-11-08 06:04:17

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 09/13] sysfs: Implement sysfs_getattr & sysfs_permission

Tejun Heo <[email protected]> writes:

> Eric W. Biederman wrote:
>> diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
>> index 1137418..c5eff49 100644
>> --- a/fs/sysfs/symlink.c
>> +++ b/fs/sysfs/symlink.c
>> @@ -214,6 +214,9 @@ const struct inode_operations sysfs_symlink_inode_operations = {
>> .readlink = generic_readlink,
>> .follow_link = sysfs_follow_link,
>> .put_link = sysfs_put_link,
>> + .setattr = sysfs_setattr,
>
> It would be nice either to separate addition of setattr to symlinks
> into a separate patch or note it in the description.

Good point note added.

Eric

2009-11-08 07:04:05

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 15/13] sysfs: Protect sysfs_refresh_inode with inode mutex.

Tejun Heo <[email protected]> writes:

> Eric W. Biederman wrote:
>> In general everything that writes to vfs inodes holds the
>> inode mutex, so hold the inode mutex over sysfs_refresh_inode.
>> The sysfs data structures don't need this but it looks like the
>> vfs might.
>>
>> Signed-off-by: Eric W. Biederman <[email protected]>
>
> Acked-by: Tejun Heo <[email protected]>
>
> Sidenote: Hmmm... Originally, sysfs completely depended on vfs locking
> but with sysfs_dirent separation, the tree structure itself and some
> attributes went under the protection of sysfs_mutex while leaving more
> vfs oriented fields under vfs locking. This patchset makes sysfs
> lazier so it can't depend on any vfs layer locking. I think you've
> converted all necessary places while removing dependency on
> dentry/inode from update operations but it might be a good idea to do
> a audit pass over how fields are being protected now.

You raised a good point. I took a quick second pass through.
I did not see anything I have missed, and I did not change anything
else on the vfs path.

So at the very least I don't expect there are any locking related
regressions.

Eric

2009-11-08 07:06:18

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/13] sysfs lazification.

Greg KH <[email protected]> writes:

> On Tue, Nov 03, 2009 at 03:53:56AM -0800, Eric W. Biederman wrote:
>>
>> The sysfs code updates the vfs caches immediately when the sysfs data
>> structures change causing a lot of unnecessary complications. The
>> following patchset untangles that beast. Allowing for simpler
>> more straight forward code, the removal of a hack from the vfs
>> to support sysfs, and human comprehensible locking on sysfs.
>>
>> Most of these patches have already been reviewed and acked from the
>> last time I had time to work on sysfs.
>>
>> In net the patches look like:
>
> Can you resend these based on the review that you just got, with the new
> acks and changes so that I can apply them?

Coming in just a minute. The fixes are the trailing patches.

Eric

2009-11-08 07:25:17

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 0/15] sysfs lazification final


The sysfs code updates the vfs caches immediately when the sysfs data
structures change causing a lot of unnecessary complications. The
following patchset untangles that beast. Allowing for simpler
more straight forward code, the removal of a hack from the vfs
to support sysfs, and human comprehensible locking on sysfs.

Most of these patches have already been reviewed and acked from the
last time I had time to work on sysfs.

This acks have been folded in and the two small bugs found in the
previous review have been fixed in the trailing patches (they are
minor enough nits that even a bisect that happens to land in the
middle should not see sysfs problems).

In net the patches look like:

fs/namei.c | 22 ---
fs/sysfs/dir.c | 388 ++++++++++++++++---------------------------------
fs/sysfs/file.c | 41 +----
fs/sysfs/inode.c | 178 ++++++++++++++---------
fs/sysfs/symlink.c | 11 +-
fs/sysfs/sysfs.h | 9 +-
include/linux/namei.h | 1 -
7 files changed, 256 insertions(+), 394 deletions(-)

Eric

2009-11-08 07:30:40

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 01/15] sysfs: Update sysfs_setxattr so it updates secdata under the sysfs_mutex

From: Eric W. Biederman <[email protected]>

The sysfs_mutex is required to ensure updates are and will remain
atomic with respect to other inode iattr updates, that do not happen
through the filesystem.

Acked-by: Serge Hallyn <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/inode.c | 41 +++++++++++++++++++++++++++++------------
1 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index e28cecf..8a08250 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -122,23 +122,39 @@ int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
return error;
}

+static int sysfs_sd_setsecdata(struct sysfs_dirent *sd, void **secdata, u32 *secdata_len)
+{
+ struct sysfs_inode_attrs *iattrs;
+ void *old_secdata;
+ size_t old_secdata_len;
+
+ iattrs = sd->s_iattr;
+ if (!iattrs)
+ iattrs = sysfs_init_inode_attrs(sd);
+ if (!iattrs)
+ return -ENOMEM;
+
+ old_secdata = iattrs->ia_secdata;
+ old_secdata_len = iattrs->ia_secdata_len;
+
+ iattrs->ia_secdata = *secdata;
+ iattrs->ia_secdata_len = *secdata_len;
+
+ *secdata = old_secdata;
+ *secdata_len = old_secdata_len;
+ return 0;
+}
+
int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
size_t size, int flags)
{
struct sysfs_dirent *sd = dentry->d_fsdata;
- struct sysfs_inode_attrs *iattrs;
void *secdata;
int error;
u32 secdata_len = 0;

if (!sd)
return -EINVAL;
- if (!sd->s_iattr)
- sd->s_iattr = sysfs_init_inode_attrs(sd);
- if (!sd->s_iattr)
- return -ENOMEM;
-
- iattrs = sd->s_iattr;

if (!strncmp(name, XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN)) {
const char *suffix = name + XATTR_SECURITY_PREFIX_LEN;
@@ -150,12 +166,13 @@ int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
&secdata, &secdata_len);
if (error)
goto out;
- if (iattrs->ia_secdata)
- security_release_secctx(iattrs->ia_secdata,
- iattrs->ia_secdata_len);
- iattrs->ia_secdata = secdata;
- iattrs->ia_secdata_len = secdata_len;

+ mutex_lock(&sysfs_mutex);
+ error = sysfs_sd_setsecdata(sd, &secdata, &secdata_len);
+ mutex_unlock(&sysfs_mutex);
+
+ if (secdata)
+ security_release_secctx(secdata, secdata_len);
} else
return -EINVAL;
out:
--
1.6.5.2.143.g8cc62

2009-11-08 07:27:26

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 02/15] sysfs: Rename sysfs_d_iput to sysfs_dentry_iput

From: Eric W. Biederman <[email protected]>

Using dentry instead of d in the function name is what
several other filesystems are doing and it seems to be
a more readable convention.

Acked-by: Tejun Heo <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/dir.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index e020183..130dfc3 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -298,7 +298,7 @@ void release_sysfs_dirent(struct sysfs_dirent * sd)
goto repeat;
}

-static void sysfs_d_iput(struct dentry * dentry, struct inode * inode)
+static void sysfs_dentry_iput(struct dentry * dentry, struct inode * inode)
{
struct sysfs_dirent * sd = dentry->d_fsdata;

@@ -307,7 +307,7 @@ static void sysfs_d_iput(struct dentry * dentry, struct inode * inode)
}

static const struct dentry_operations sysfs_dentry_ops = {
- .d_iput = sysfs_d_iput,
+ .d_iput = sysfs_dentry_iput,
};

struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type)
--
1.6.5.2.143.g8cc62

2009-11-08 07:29:50

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 03/15] sysfs: Use dentry_ops instead of directly playing with the dcache

From: Eric W. Biederman <[email protected]>

Calling d_drop unconditionally when a sysfs_dirent is deleted has
the potential to leak mounts, so instead implement dentry delete
and revalidate operations that cause sysfs dentries to be removed
at the appropriate time.

Acked-by: Tejun Heo <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/dir.c | 73 +++++++++++++++++++++++++++++++++++--------------------
1 files changed, 46 insertions(+), 27 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 130dfc3..b5e8499 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -298,6 +298,46 @@ void release_sysfs_dirent(struct sysfs_dirent * sd)
goto repeat;
}

+static int sysfs_dentry_delete(struct dentry *dentry)
+{
+ struct sysfs_dirent *sd = dentry->d_fsdata;
+ return !!(sd->s_flags & SYSFS_FLAG_REMOVED);
+}
+
+static int sysfs_dentry_revalidate(struct dentry *dentry, struct nameidata *nd)
+{
+ struct sysfs_dirent *sd = dentry->d_fsdata;
+ int is_dir;
+
+ mutex_lock(&sysfs_mutex);
+
+ /* The sysfs dirent has been deleted */
+ if (sd->s_flags & SYSFS_FLAG_REMOVED)
+ goto out_bad;
+
+ mutex_unlock(&sysfs_mutex);
+out_valid:
+ return 1;
+out_bad:
+ /* Remove the dentry from the dcache hashes.
+ * If this is a deleted dentry we use d_drop instead of d_delete
+ * so sysfs doesn't need to cope with negative dentries.
+ */
+ is_dir = (sysfs_type(sd) == SYSFS_DIR);
+ mutex_unlock(&sysfs_mutex);
+ if (is_dir) {
+ /* If we have submounts we must allow the vfs caches
+ * to lie about the state of the filesystem to prevent
+ * leaks and other nasty things.
+ */
+ if (have_submounts(dentry))
+ goto out_valid;
+ shrink_dcache_parent(dentry);
+ }
+ d_drop(dentry);
+ return 0;
+}
+
static void sysfs_dentry_iput(struct dentry * dentry, struct inode * inode)
{
struct sysfs_dirent * sd = dentry->d_fsdata;
@@ -307,6 +347,8 @@ static void sysfs_dentry_iput(struct dentry * dentry, struct inode * inode)
}

static const struct dentry_operations sysfs_dentry_ops = {
+ .d_revalidate = sysfs_dentry_revalidate,
+ .d_delete = sysfs_dentry_delete,
.d_iput = sysfs_dentry_iput,
};

@@ -527,44 +569,21 @@ void sysfs_remove_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
}

/**
- * sysfs_drop_dentry - drop dentry for the specified sysfs_dirent
+ * sysfs_dec_nlink - Decrement link count for the specified sysfs_dirent
* @sd: target sysfs_dirent
*
- * Drop dentry for @sd. @sd must have been unlinked from its
+ * Decrement nlink for @sd. @sd must have been unlinked from its
* parent on entry to this function such that it can't be looked
* up anymore.
*/
-static void sysfs_drop_dentry(struct sysfs_dirent *sd)
+static void sysfs_dec_nlink(struct sysfs_dirent *sd)
{
struct inode *inode;
- struct dentry *dentry;

inode = ilookup(sysfs_sb, sd->s_ino);
if (!inode)
return;

- /* Drop any existing dentries associated with sd.
- *
- * For the dentry to be properly freed we need to grab a
- * reference to the dentry under the dcache lock, unhash it,
- * and then put it. The playing with the dentry count allows
- * dput to immediately free the dentry if it is not in use.
- */
-repeat:
- spin_lock(&dcache_lock);
- list_for_each_entry(dentry, &inode->i_dentry, d_alias) {
- if (d_unhashed(dentry))
- continue;
- dget_locked(dentry);
- spin_lock(&dentry->d_lock);
- __d_drop(dentry);
- spin_unlock(&dentry->d_lock);
- spin_unlock(&dcache_lock);
- dput(dentry);
- goto repeat;
- }
- spin_unlock(&dcache_lock);
-
/* adjust nlink and update timestamp */
mutex_lock(&inode->i_mutex);

@@ -611,7 +630,7 @@ void sysfs_addrm_finish(struct sysfs_addrm_cxt *acxt)
acxt->removed = sd->s_sibling;
sd->s_sibling = NULL;

- sysfs_drop_dentry(sd);
+ sysfs_dec_nlink(sd);
sysfs_deactivate(sd);
unmap_bin_file(sd);
sysfs_put(sd);
--
1.6.5.2.143.g8cc62

2009-11-08 07:27:30

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 04/15] sysfs: Simplify sysfs_chmod_file semantics

From: Eric W. Biederman <[email protected]>

Currently every caller of sysfs_chmod_file happens at either
file creation time to set a non-default mode or in response
to a specific user requested space change in policy. Making
timestamps of when the chmod happens and notification of
a file changing mode uninteresting.

Remove the unnecessary time stamp and filesystem change
notification, and removes the last of the explicit inotify
and donitfy support from sysfs.

Acked-by: Tejun Heo <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/file.c | 10 +---------
1 files changed, 1 insertions(+), 9 deletions(-)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index f5ea468..faa1a80 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -604,17 +604,9 @@ int sysfs_chmod_file(struct kobject *kobj, struct attribute *attr, mode_t mode)
mutex_lock(&inode->i_mutex);

newattrs.ia_mode = (mode & S_IALLUGO) | (inode->i_mode & ~S_IALLUGO);
- newattrs.ia_valid = ATTR_MODE | ATTR_CTIME;
- newattrs.ia_ctime = current_fs_time(inode->i_sb);
+ newattrs.ia_valid = ATTR_MODE;
rc = sysfs_setattr(victim, &newattrs);

- if (rc == 0) {
- fsnotify_change(victim, newattrs.ia_valid);
- mutex_lock(&sysfs_mutex);
- victim_sd->s_mode = newattrs.ia_mode;
- mutex_unlock(&sysfs_mutex);
- }
-
mutex_unlock(&inode->i_mutex);
out:
dput(victim);
--
1.6.5.2.143.g8cc62

2009-11-08 07:29:47

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 05/15] sysfs: Simplify iattr time assignments

From: Eric W. Biederman <[email protected]>

The granularity of sysfs time when we keep it is 1 ns. Which
when passed to timestamp_trunc results in a nop. So remove
the unnecessary function call making sysfs_setattr slightly
easier to read.

Acked-by: Tejun Heo <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/inode.c | 9 +++------
1 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index 8a08250..fed7a74 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -103,14 +103,11 @@ int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
if (ia_valid & ATTR_GID)
iattrs->ia_gid = iattr->ia_gid;
if (ia_valid & ATTR_ATIME)
- iattrs->ia_atime = timespec_trunc(iattr->ia_atime,
- inode->i_sb->s_time_gran);
+ iattrs->ia_atime = iattr->ia_atime;
if (ia_valid & ATTR_MTIME)
- iattrs->ia_mtime = timespec_trunc(iattr->ia_mtime,
- inode->i_sb->s_time_gran);
+ iattrs->ia_mtime = iattr->ia_mtime;
if (ia_valid & ATTR_CTIME)
- iattrs->ia_ctime = timespec_trunc(iattr->ia_ctime,
- inode->i_sb->s_time_gran);
+ iattrs->ia_ctime = iattr->ia_ctime;
if (ia_valid & ATTR_MODE) {
umode_t mode = iattr->ia_mode;

--
1.6.5.2.143.g8cc62

2009-11-08 07:30:14

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 06/15] sysfs: Fix locking and factor out sysfs_sd_setattr

From: Eric W. Biederman <[email protected]>

Cleanly separate the work that is specific to setting the
attributes of a sysfs_dirent from what is needed to update
the attributes of a vfs inode.

Additionally grab the sysfs_mutex to keep any nasties from
surprising us when updating the sysfs_dirent.

Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/inode.c | 52 ++++++++++++++++++++++++++++++++--------------------
fs/sysfs/sysfs.h | 1 +
2 files changed, 33 insertions(+), 20 deletions(-)

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index fed7a74..fccfb55 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -64,30 +64,15 @@ struct sysfs_inode_attrs *sysfs_init_inode_attrs(struct sysfs_dirent *sd)

return attrs;
}
-int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
+
+int sysfs_sd_setattr(struct sysfs_dirent *sd, struct iattr * iattr)
{
- struct inode * inode = dentry->d_inode;
- struct sysfs_dirent * sd = dentry->d_fsdata;
struct sysfs_inode_attrs *sd_attrs;
struct iattr *iattrs;
unsigned int ia_valid = iattr->ia_valid;
- int error;
-
- if (!sd)
- return -EINVAL;

sd_attrs = sd->s_iattr;

- error = inode_change_ok(inode, iattr);
- if (error)
- return error;
-
- iattr->ia_valid &= ~ATTR_SIZE; /* ignore size changes */
-
- error = inode_setattr(inode, iattr);
- if (error)
- return error;
-
if (!sd_attrs) {
/* setting attributes for the first time, allocate now */
sd_attrs = sysfs_init_inode_attrs(sd);
@@ -110,12 +95,39 @@ int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
iattrs->ia_ctime = iattr->ia_ctime;
if (ia_valid & ATTR_MODE) {
umode_t mode = iattr->ia_mode;
-
- if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
- mode &= ~S_ISGID;
iattrs->ia_mode = sd->s_mode = mode;
}
}
+ return 0;
+}
+
+int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
+{
+ struct inode * inode = dentry->d_inode;
+ struct sysfs_dirent * sd = dentry->d_fsdata;
+ int error;
+
+ if (!sd)
+ return -EINVAL;
+
+ error = inode_change_ok(inode, iattr);
+ if (error)
+ return error;
+
+ iattr->ia_valid &= ~ATTR_SIZE; /* ignore size changes */
+ if (iattr->ia_valid & ATTR_MODE) {
+ if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
+ iattr->ia_mode &= ~S_ISGID;
+ }
+
+ error = inode_setattr(inode, iattr);
+ if (error)
+ return error;
+
+ mutex_lock(&sysfs_mutex);
+ error = sysfs_sd_setattr(sd, iattr);
+ mutex_unlock(&sysfs_mutex);
+
return error;
}

diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index af4c4e7..a96d967 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -155,6 +155,7 @@ static inline void __sysfs_put(struct sysfs_dirent *sd)
*/
struct inode *sysfs_get_inode(struct sysfs_dirent *sd);
void sysfs_delete_inode(struct inode *inode);
+int sysfs_sd_setattr(struct sysfs_dirent *sd, struct iattr *iattr);
int sysfs_setattr(struct dentry *dentry, struct iattr *iattr);
int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
size_t size, int flags);
--
1.6.5.2.143.g8cc62

2009-11-08 07:29:22

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 07/15] sysfs: Update s_iattr on link and unlink.

From: Eric W. Biederman <[email protected]>

Currently sysfs updates the timestamps on the vfs directory
inode when we create or remove a directory entry but doesn't
update the cached copy on the sysfs_dirent, fix that oversight.

Acked-by: Tejun Heo <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/dir.c | 18 ++++++++++++++++++
1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index b5e8499..fa37126 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -464,6 +464,8 @@ void sysfs_addrm_start(struct sysfs_addrm_cxt *acxt,
*/
int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
{
+ struct sysfs_inode_attrs *ps_iattr;
+
if (sysfs_find_dirent(acxt->parent_sd, sd->s_name))
return -EEXIST;

@@ -476,6 +478,13 @@ int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)

sysfs_link_sibling(sd);

+ /* Update timestamps on the parent */
+ ps_iattr = acxt->parent_sd->s_iattr;
+ if (ps_iattr) {
+ struct iattr *ps_iattrs = &ps_iattr->ia_iattr;
+ ps_iattrs->ia_ctime = ps_iattrs->ia_mtime = CURRENT_TIME;
+ }
+
return 0;
}

@@ -554,10 +563,19 @@ int sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
*/
void sysfs_remove_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
{
+ struct sysfs_inode_attrs *ps_iattr;
+
BUG_ON(sd->s_flags & SYSFS_FLAG_REMOVED);

sysfs_unlink_sibling(sd);

+ /* Update timestamps on the parent */
+ ps_iattr = acxt->parent_sd->s_iattr;
+ if (ps_iattr) {
+ struct iattr *ps_iattrs = &ps_iattr->ia_iattr;
+ ps_iattrs->ia_ctime = ps_iattrs->ia_mtime = CURRENT_TIME;
+ }
+
sd->s_flags |= SYSFS_FLAG_REMOVED;
sd->s_sibling = acxt->removed;
acxt->removed = sd;
--
1.6.5.2.143.g8cc62

2009-11-08 07:27:35

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 08/15] sysfs: Nicely indent sysfs_symlink_inode_operations

From: Eric W. Biederman <[email protected]>

Lining up the functions in sysfs_symlink_inode_operations
follows the pattern in the rest of sysfs and makes things
slightly more readable.

Acked-by: Tejun Heo <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/symlink.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
index c5081ad..1137418 100644
--- a/fs/sysfs/symlink.c
+++ b/fs/sysfs/symlink.c
@@ -210,10 +210,10 @@ static void sysfs_put_link(struct dentry *dentry, struct nameidata *nd, void *co
}

const struct inode_operations sysfs_symlink_inode_operations = {
- .setxattr = sysfs_setxattr,
- .readlink = generic_readlink,
- .follow_link = sysfs_follow_link,
- .put_link = sysfs_put_link,
+ .setxattr = sysfs_setxattr,
+ .readlink = generic_readlink,
+ .follow_link = sysfs_follow_link,
+ .put_link = sysfs_put_link,
};


--
1.6.5.2.143.g8cc62

2009-11-08 07:27:38

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 09/15] sysfs: Implement sysfs_getattr & sysfs_permission

From: Eric W. Biederman <[email protected]>

With the implementation of sysfs_getattr and sysfs_permission
sysfs becomes able to lazily propogate inode attribute changes
from the sysfs_dirents to the vfs inodes. This paves the way
for deleting significant chunks of now unnecessary code.

While doing this we did not reference sysfs_setattr from
sysfs_symlink_inode_operations so I added along with
sysfs_getattr and sysfs_permission.

Acked-by: Tejun Heo <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/dir.c | 2 +
fs/sysfs/inode.c | 64 ++++++++++++++++++++++++++++++++++++++-------------
fs/sysfs/symlink.c | 3 ++
fs/sysfs/sysfs.h | 2 +
4 files changed, 54 insertions(+), 17 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index fa37126..25d052a 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -800,7 +800,9 @@ static struct dentry * sysfs_lookup(struct inode *dir, struct dentry *dentry,

const struct inode_operations sysfs_dir_inode_operations = {
.lookup = sysfs_lookup,
+ .permission = sysfs_permission,
.setattr = sysfs_setattr,
+ .getattr = sysfs_getattr,
.setxattr = sysfs_setxattr,
};

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index fccfb55..2dcafe8 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -37,7 +37,9 @@ static struct backing_dev_info sysfs_backing_dev_info = {
};

static const struct inode_operations sysfs_inode_operations ={
+ .permission = sysfs_permission,
.setattr = sysfs_setattr,
+ .getattr = sysfs_getattr,
.setxattr = sysfs_setxattr,
};

@@ -196,7 +198,6 @@ static inline void set_default_inode_attr(struct inode * inode, mode_t mode)

static inline void set_inode_attr(struct inode * inode, struct iattr * iattr)
{
- inode->i_mode = iattr->ia_mode;
inode->i_uid = iattr->ia_uid;
inode->i_gid = iattr->ia_gid;
inode->i_atime = iattr->ia_atime;
@@ -227,38 +228,56 @@ static int sysfs_count_nlink(struct sysfs_dirent *sd)
return nr + 2;
}

+static void sysfs_refresh_inode(struct sysfs_dirent *sd, struct inode *inode)
+{
+ struct sysfs_inode_attrs *iattrs = sd->s_iattr;
+
+ inode->i_mode = sd->s_mode;
+ if (iattrs) {
+ /* sysfs_dirent has non-default attributes
+ * get them from persistent copy in sysfs_dirent
+ */
+ set_inode_attr(inode, &iattrs->ia_iattr);
+ security_inode_notifysecctx(inode,
+ iattrs->ia_secdata,
+ iattrs->ia_secdata_len);
+ }
+
+ if (sysfs_type(sd) == SYSFS_DIR)
+ inode->i_nlink = sysfs_count_nlink(sd);
+}
+
+int sysfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
+{
+ struct sysfs_dirent *sd = dentry->d_fsdata;
+ struct inode *inode = dentry->d_inode;
+
+ mutex_lock(&sysfs_mutex);
+ sysfs_refresh_inode(sd, inode);
+ mutex_unlock(&sysfs_mutex);
+
+ generic_fillattr(inode, stat);
+ return 0;
+}
+
static void sysfs_init_inode(struct sysfs_dirent *sd, struct inode *inode)
{
struct bin_attribute *bin_attr;
- struct sysfs_inode_attrs *iattrs;

inode->i_private = sysfs_get(sd);
inode->i_mapping->a_ops = &sysfs_aops;
inode->i_mapping->backing_dev_info = &sysfs_backing_dev_info;
inode->i_op = &sysfs_inode_operations;
- inode->i_ino = sd->s_ino;
lockdep_set_class(&inode->i_mutex, &sysfs_inode_imutex_key);

- iattrs = sd->s_iattr;
- if (iattrs) {
- /* sysfs_dirent has non-default attributes
- * get them for the new inode from persistent copy
- * in sysfs_dirent
- */
- set_inode_attr(inode, &iattrs->ia_iattr);
- if (iattrs->ia_secdata)
- security_inode_notifysecctx(inode,
- iattrs->ia_secdata,
- iattrs->ia_secdata_len);
- } else
- set_default_inode_attr(inode, sd->s_mode);
+ set_default_inode_attr(inode, sd->s_mode);
+ sysfs_refresh_inode(sd, inode);

/* initialize inode according to type */
switch (sysfs_type(sd)) {
case SYSFS_DIR:
inode->i_op = &sysfs_dir_inode_operations;
inode->i_fop = &sysfs_dir_operations;
- inode->i_nlink = sysfs_count_nlink(sd);
break;
case SYSFS_KOBJ_ATTR:
inode->i_size = PAGE_SIZE;
@@ -341,3 +360,14 @@ int sysfs_hash_and_remove(struct sysfs_dirent *dir_sd, const char *name)
else
return -ENOENT;
}
+
+int sysfs_permission(struct inode *inode, int mask)
+{
+ struct sysfs_dirent *sd = inode->i_private;
+
+ mutex_lock(&sysfs_mutex);
+ sysfs_refresh_inode(sd, inode);
+ mutex_unlock(&sysfs_mutex);
+
+ return generic_permission(inode, mask, NULL);
+}
diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
index 1137418..c5eff49 100644
--- a/fs/sysfs/symlink.c
+++ b/fs/sysfs/symlink.c
@@ -214,6 +214,9 @@ const struct inode_operations sysfs_symlink_inode_operations = {
.readlink = generic_readlink,
.follow_link = sysfs_follow_link,
.put_link = sysfs_put_link,
+ .setattr = sysfs_setattr,
+ .getattr = sysfs_getattr,
+ .permission = sysfs_permission,
};


diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index a96d967..12ccc07 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -156,7 +156,9 @@ static inline void __sysfs_put(struct sysfs_dirent *sd)
struct inode *sysfs_get_inode(struct sysfs_dirent *sd);
void sysfs_delete_inode(struct inode *inode);
int sysfs_sd_setattr(struct sysfs_dirent *sd, struct iattr *iattr);
+int sysfs_permission(struct inode *inode, int mask);
int sysfs_setattr(struct dentry *dentry, struct iattr *iattr);
+int sysfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat);
int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
size_t size, int flags);
int sysfs_hash_and_remove(struct sysfs_dirent *dir_sd, const char *name);
--
1.6.5.2.143.g8cc62

2009-11-08 07:29:02

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 10/15] sysfs: In sysfs_chmod_file lazily propagate the mode change.

From: Eric W. Biederman <[email protected]>

Now that sysfs_getattr and sysfs_permission refresh the vfs
inode there is no need to immediatly push the mode change
into the vfs cache. Reducing the amount of work needed and
simplifying the locking.

Acked-by: Tejun Heo <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/file.c | 31 ++++++++-----------------------
1 files changed, 8 insertions(+), 23 deletions(-)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index faa1a80..dc30d9e 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -579,38 +579,23 @@ EXPORT_SYMBOL_GPL(sysfs_add_file_to_group);
*/
int sysfs_chmod_file(struct kobject *kobj, struct attribute *attr, mode_t mode)
{
- struct sysfs_dirent *victim_sd = NULL;
- struct dentry *victim = NULL;
- struct inode * inode;
+ struct sysfs_dirent *sd;
struct iattr newattrs;
int rc;

- rc = -ENOENT;
- victim_sd = sysfs_get_dirent(kobj->sd, attr->name);
- if (!victim_sd)
- goto out;
+ mutex_lock(&sysfs_mutex);

- mutex_lock(&sysfs_rename_mutex);
- victim = sysfs_get_dentry(victim_sd);
- mutex_unlock(&sysfs_rename_mutex);
- if (IS_ERR(victim)) {
- rc = PTR_ERR(victim);
- victim = NULL;
+ rc = -ENOENT;
+ sd = sysfs_find_dirent(kobj->sd, attr->name);
+ if (!sd)
goto out;
- }
-
- inode = victim->d_inode;

- mutex_lock(&inode->i_mutex);
-
- newattrs.ia_mode = (mode & S_IALLUGO) | (inode->i_mode & ~S_IALLUGO);
+ newattrs.ia_mode = (mode & S_IALLUGO) | (sd->s_mode & ~S_IALLUGO);
newattrs.ia_valid = ATTR_MODE;
- rc = sysfs_setattr(victim, &newattrs);
+ rc = sysfs_sd_setattr(sd, &newattrs);

- mutex_unlock(&inode->i_mutex);
out:
- dput(victim);
- sysfs_put(victim_sd);
+ mutex_unlock(&sysfs_mutex);
return rc;
}
EXPORT_SYMBOL_GPL(sysfs_chmod_file);
--
1.6.5.2.143.g8cc62

2009-11-08 07:28:05

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 11/15] sysfs: Gut sysfs_addrm_start and sysfs_addrm_finish

From: Eric W. Biederman <[email protected]>

With lazy inode updates and dentry operations bringing everything
into sync on demand there is no longer any need to immediately
update the vfs or grab i_mutex to protect those updates as we
make changes to sysfs.

Acked-by: Serge Hallyn <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/dir.c | 91 ++---------------------------------------------------
fs/sysfs/sysfs.h | 2 -
2 files changed, 4 insertions(+), 89 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 25d052a..a05b027 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -386,12 +386,6 @@ struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type)
return NULL;
}

-static int sysfs_ilookup_test(struct inode *inode, void *arg)
-{
- struct sysfs_dirent *sd = arg;
- return inode->i_ino == sd->s_ino;
-}
-
/**
* sysfs_addrm_start - prepare for sysfs_dirent add/remove
* @acxt: pointer to sysfs_addrm_cxt to be used
@@ -399,47 +393,20 @@ static int sysfs_ilookup_test(struct inode *inode, void *arg)
*
* This function is called when the caller is about to add or
* remove sysfs_dirent under @parent_sd. This function acquires
- * sysfs_mutex, grabs inode for @parent_sd if available and lock
- * i_mutex of it. @acxt is used to keep and pass context to
+ * sysfs_mutex. @acxt is used to keep and pass context to
* other addrm functions.
*
* LOCKING:
* Kernel thread context (may sleep). sysfs_mutex is locked on
- * return. i_mutex of parent inode is locked on return if
- * available.
+ * return.
*/
void sysfs_addrm_start(struct sysfs_addrm_cxt *acxt,
struct sysfs_dirent *parent_sd)
{
- struct inode *inode;
-
memset(acxt, 0, sizeof(*acxt));
acxt->parent_sd = parent_sd;

- /* Lookup parent inode. inode initialization is protected by
- * sysfs_mutex, so inode existence can be determined by
- * looking up inode while holding sysfs_mutex.
- */
mutex_lock(&sysfs_mutex);
-
- inode = ilookup5(sysfs_sb, parent_sd->s_ino, sysfs_ilookup_test,
- parent_sd);
- if (inode) {
- WARN_ON(inode->i_state & I_NEW);
-
- /* parent inode available */
- acxt->parent_inode = inode;
-
- /* sysfs_mutex is below i_mutex in lock hierarchy.
- * First, trylock i_mutex. If fails, unlock
- * sysfs_mutex and lock them in order.
- */
- if (!mutex_trylock(&inode->i_mutex)) {
- mutex_unlock(&sysfs_mutex);
- mutex_lock(&inode->i_mutex);
- mutex_lock(&sysfs_mutex);
- }
- }
}

/**
@@ -471,11 +438,6 @@ int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)

sd->s_parent = sysfs_get(acxt->parent_sd);

- if (sysfs_type(sd) == SYSFS_DIR && acxt->parent_inode)
- inc_nlink(acxt->parent_inode);
-
- acxt->cnt++;
-
sysfs_link_sibling(sd);

/* Update timestamps on the parent */
@@ -579,40 +541,6 @@ void sysfs_remove_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
sd->s_flags |= SYSFS_FLAG_REMOVED;
sd->s_sibling = acxt->removed;
acxt->removed = sd;
-
- if (sysfs_type(sd) == SYSFS_DIR && acxt->parent_inode)
- drop_nlink(acxt->parent_inode);
-
- acxt->cnt++;
-}
-
-/**
- * sysfs_dec_nlink - Decrement link count for the specified sysfs_dirent
- * @sd: target sysfs_dirent
- *
- * Decrement nlink for @sd. @sd must have been unlinked from its
- * parent on entry to this function such that it can't be looked
- * up anymore.
- */
-static void sysfs_dec_nlink(struct sysfs_dirent *sd)
-{
- struct inode *inode;
-
- inode = ilookup(sysfs_sb, sd->s_ino);
- if (!inode)
- return;
-
- /* adjust nlink and update timestamp */
- mutex_lock(&inode->i_mutex);
-
- inode->i_ctime = CURRENT_TIME;
- drop_nlink(inode);
- if (sysfs_type(sd) == SYSFS_DIR)
- drop_nlink(inode);
-
- mutex_unlock(&inode->i_mutex);
-
- iput(inode);
}

/**
@@ -621,25 +549,15 @@ static void sysfs_dec_nlink(struct sysfs_dirent *sd)
*
* Finish up sysfs_dirent add/remove. Resources acquired by
* sysfs_addrm_start() are released and removed sysfs_dirents are
- * cleaned up. Timestamps on the parent inode are updated.
+ * cleaned up.
*
* LOCKING:
- * All mutexes acquired by sysfs_addrm_start() are released.
+ * sysfs_mutex is released.
*/
void sysfs_addrm_finish(struct sysfs_addrm_cxt *acxt)
{
/* release resources acquired by sysfs_addrm_start() */
mutex_unlock(&sysfs_mutex);
- if (acxt->parent_inode) {
- struct inode *inode = acxt->parent_inode;
-
- /* if added/removed, update timestamps on the parent */
- if (acxt->cnt)
- inode->i_ctime = inode->i_mtime = CURRENT_TIME;
-
- mutex_unlock(&inode->i_mutex);
- iput(inode);
- }

/* kill removed sysfs_dirents */
while (acxt->removed) {
@@ -648,7 +566,6 @@ void sysfs_addrm_finish(struct sysfs_addrm_cxt *acxt)
acxt->removed = sd->s_sibling;
sd->s_sibling = NULL;

- sysfs_dec_nlink(sd);
sysfs_deactivate(sd);
unmap_bin_file(sd);
sysfs_put(sd);
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index 12ccc07..90b3501 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -89,9 +89,7 @@ static inline unsigned int sysfs_type(struct sysfs_dirent *sd)
*/
struct sysfs_addrm_cxt {
struct sysfs_dirent *parent_sd;
- struct inode *parent_inode;
struct sysfs_dirent *removed;
- int cnt;
};

/*
--
1.6.5.2.143.g8cc62

2009-11-08 07:28:39

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 12/15] sysfs: Propagate renames to the vfs on demand

From: Eric W. Biederman <[email protected]>

By teaching sysfs_revalidate to hide a dentry for
a sysfs_dirent if the sysfs_dirent has been renamed,
and by teaching sysfs_lookup to return the original
dentry if the sysfs dirent has been renamed. I can
show the results of renames correctly without having to
update the dcache during the directory rename.

This massively simplifies the rename logic allowing a lot
of weird sysfs special cases to be removed along with
a lot of now unnecesary helper code.

Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/namei.c | 22 -------
fs/sysfs/dir.c | 158 ++++++++++---------------------------------------
fs/sysfs/inode.c | 12 ----
fs/sysfs/sysfs.h | 1 -
include/linux/namei.h | 1 -
5 files changed, 32 insertions(+), 162 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index d11f404..d3c190c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1279,28 +1279,6 @@ struct dentry *lookup_one_len(const char *name, struct dentry *base, int len)
return __lookup_hash(&this, base, NULL);
}

-/**
- * lookup_one_noperm - bad hack for sysfs
- * @name: pathname component to lookup
- * @base: base directory to lookup from
- *
- * This is a variant of lookup_one_len that doesn't perform any permission
- * checks. It's a horrible hack to work around the braindead sysfs
- * architecture and should not be used anywhere else.
- *
- * DON'T USE THIS FUNCTION EVER, thanks.
- */
-struct dentry *lookup_one_noperm(const char *name, struct dentry *base)
-{
- int err;
- struct qstr this;
-
- err = __lookup_one_len(name, &this, base, strlen(name));
- if (err)
- return ERR_PTR(err);
- return __lookup_hash(&this, base, NULL);
-}
-
int user_path_at(int dfd, const char __user *name, unsigned flags,
struct path *path)
{
diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index a05b027..0b60212 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -25,7 +25,6 @@
#include "sysfs.h"

DEFINE_MUTEX(sysfs_mutex);
-DEFINE_MUTEX(sysfs_rename_mutex);
DEFINE_SPINLOCK(sysfs_assoc_lock);

static DEFINE_SPINLOCK(sysfs_ino_lock);
@@ -85,46 +84,6 @@ static void sysfs_unlink_sibling(struct sysfs_dirent *sd)
}

/**
- * sysfs_get_dentry - get dentry for the given sysfs_dirent
- * @sd: sysfs_dirent of interest
- *
- * Get dentry for @sd. Dentry is looked up if currently not
- * present. This function descends from the root looking up
- * dentry for each step.
- *
- * LOCKING:
- * mutex_lock(sysfs_rename_mutex)
- *
- * RETURNS:
- * Pointer to found dentry on success, ERR_PTR() value on error.
- */
-struct dentry *sysfs_get_dentry(struct sysfs_dirent *sd)
-{
- struct dentry *dentry = dget(sysfs_sb->s_root);
-
- while (dentry->d_fsdata != sd) {
- struct sysfs_dirent *cur;
- struct dentry *parent;
-
- /* find the first ancestor which hasn't been looked up */
- cur = sd;
- while (cur->s_parent != dentry->d_fsdata)
- cur = cur->s_parent;
-
- /* look it up */
- parent = dentry;
- mutex_lock(&parent->d_inode->i_mutex);
- dentry = lookup_one_noperm(cur->s_name, parent);
- mutex_unlock(&parent->d_inode->i_mutex);
- dput(parent);
-
- if (IS_ERR(dentry))
- break;
- }
- return dentry;
-}
-
-/**
* sysfs_get_active - get an active reference to sysfs_dirent
* @sd: sysfs_dirent to get an active reference to
*
@@ -315,6 +274,14 @@ static int sysfs_dentry_revalidate(struct dentry *dentry, struct nameidata *nd)
if (sd->s_flags & SYSFS_FLAG_REMOVED)
goto out_bad;

+ /* The sysfs dirent has been moved? */
+ if (dentry->d_parent->d_fsdata != sd->s_parent)
+ goto out_bad;
+
+ /* The sysfs dirent has been renamed */
+ if (strcmp(dentry->d_name.name, sd->s_name) != 0)
+ goto out_bad;
+
mutex_unlock(&sysfs_mutex);
out_valid:
return 1;
@@ -322,6 +289,12 @@ out_bad:
/* Remove the dentry from the dcache hashes.
* If this is a deleted dentry we use d_drop instead of d_delete
* so sysfs doesn't need to cope with negative dentries.
+ *
+ * If this is a dentry that has simply been renamed we
+ * use d_drop to remove it from the dcache lookup on its
+ * old parent. If this dentry persists later when a lookup
+ * is performed at its new name the dentry will be readded
+ * to the dcache hashes.
*/
is_dir = (sysfs_type(sd) == SYSFS_DIR);
mutex_unlock(&sysfs_mutex);
@@ -705,10 +678,15 @@ static struct dentry * sysfs_lookup(struct inode *dir, struct dentry *dentry,
}

/* instantiate and hash dentry */
- dentry->d_op = &sysfs_dentry_ops;
- dentry->d_fsdata = sysfs_get(sd);
- d_instantiate(dentry, inode);
- d_rehash(dentry);
+ ret = d_find_alias(inode);
+ if (!ret) {
+ dentry->d_op = &sysfs_dentry_ops;
+ dentry->d_fsdata = sysfs_get(sd);
+ d_add(dentry, inode);
+ } else {
+ d_move(ret, dentry);
+ iput(inode);
+ }

out_unlock:
mutex_unlock(&sysfs_mutex);
@@ -785,62 +763,32 @@ void sysfs_remove_dir(struct kobject * kobj)
int sysfs_rename_dir(struct kobject * kobj, const char *new_name)
{
struct sysfs_dirent *sd = kobj->sd;
- struct dentry *parent = NULL;
- struct dentry *old_dentry = NULL, *new_dentry = NULL;
const char *dup_name = NULL;
int error;

- mutex_lock(&sysfs_rename_mutex);
+ mutex_lock(&sysfs_mutex);

error = 0;
if (strcmp(sd->s_name, new_name) == 0)
goto out; /* nothing to rename */

- /* get the original dentry */
- old_dentry = sysfs_get_dentry(sd);
- if (IS_ERR(old_dentry)) {
- error = PTR_ERR(old_dentry);
- old_dentry = NULL;
- goto out;
- }
-
- parent = old_dentry->d_parent;
-
- /* lock parent and get dentry for new name */
- mutex_lock(&parent->d_inode->i_mutex);
- mutex_lock(&sysfs_mutex);
-
error = -EEXIST;
if (sysfs_find_dirent(sd->s_parent, new_name))
- goto out_unlock;
-
- error = -ENOMEM;
- new_dentry = d_alloc_name(parent, new_name);
- if (!new_dentry)
- goto out_unlock;
+ goto out;

/* rename sysfs_dirent */
error = -ENOMEM;
new_name = dup_name = kstrdup(new_name, GFP_KERNEL);
if (!new_name)
- goto out_unlock;
+ goto out;

dup_name = sd->s_name;
sd->s_name = new_name;

- /* rename */
- d_add(new_dentry, NULL);
- d_move(old_dentry, new_dentry);
-
error = 0;
- out_unlock:
+ out:
mutex_unlock(&sysfs_mutex);
- mutex_unlock(&parent->d_inode->i_mutex);
kfree(dup_name);
- dput(old_dentry);
- dput(new_dentry);
- out:
- mutex_unlock(&sysfs_rename_mutex);
return error;
}

@@ -848,12 +796,11 @@ int sysfs_move_dir(struct kobject *kobj, struct kobject *new_parent_kobj)
{
struct sysfs_dirent *sd = kobj->sd;
struct sysfs_dirent *new_parent_sd;
- struct dentry *old_parent, *new_parent = NULL;
- struct dentry *old_dentry = NULL, *new_dentry = NULL;
int error;

- mutex_lock(&sysfs_rename_mutex);
BUG_ON(!sd->s_parent);
+
+ mutex_lock(&sysfs_mutex);
new_parent_sd = (new_parent_kobj && new_parent_kobj->sd) ?
new_parent_kobj->sd : &sysfs_root;

@@ -861,61 +808,20 @@ int sysfs_move_dir(struct kobject *kobj, struct kobject *new_parent_kobj)
if (sd->s_parent == new_parent_sd)
goto out; /* nothing to move */

- /* get dentries */
- old_dentry = sysfs_get_dentry(sd);
- if (IS_ERR(old_dentry)) {
- error = PTR_ERR(old_dentry);
- old_dentry = NULL;
- goto out;
- }
- old_parent = old_dentry->d_parent;
-
- new_parent = sysfs_get_dentry(new_parent_sd);
- if (IS_ERR(new_parent)) {
- error = PTR_ERR(new_parent);
- new_parent = NULL;
- goto out;
- }
-
-again:
- mutex_lock(&old_parent->d_inode->i_mutex);
- if (!mutex_trylock(&new_parent->d_inode->i_mutex)) {
- mutex_unlock(&old_parent->d_inode->i_mutex);
- goto again;
- }
- mutex_lock(&sysfs_mutex);
-
error = -EEXIST;
if (sysfs_find_dirent(new_parent_sd, sd->s_name))
- goto out_unlock;
-
- error = -ENOMEM;
- new_dentry = d_alloc_name(new_parent, sd->s_name);
- if (!new_dentry)
- goto out_unlock;
-
- error = 0;
- d_add(new_dentry, NULL);
- d_move(old_dentry, new_dentry);
+ goto out;

/* Remove from old parent's list and insert into new parent's list. */
sysfs_unlink_sibling(sd);
sysfs_get(new_parent_sd);
- drop_nlink(old_parent->d_inode);
sysfs_put(sd->s_parent);
sd->s_parent = new_parent_sd;
- inc_nlink(new_parent->d_inode);
sysfs_link_sibling(sd);

- out_unlock:
+ error = 0;
+out:
mutex_unlock(&sysfs_mutex);
- mutex_unlock(&new_parent->d_inode->i_mutex);
- mutex_unlock(&old_parent->d_inode->i_mutex);
- out:
- dput(new_parent);
- dput(old_dentry);
- dput(new_dentry);
- mutex_unlock(&sysfs_rename_mutex);
return error;
}

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index 2dcafe8..082a3eb 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -205,17 +205,6 @@ static inline void set_inode_attr(struct inode * inode, struct iattr * iattr)
inode->i_ctime = iattr->ia_ctime;
}

-
-/*
- * sysfs has a different i_mutex lock order behavior for i_mutex than other
- * filesystems; sysfs i_mutex is called in many places with subsystem locks
- * held. At the same time, many of the VFS locking rules do not apply to
- * sysfs at all (cross directory rename for example). To untangle this mess
- * (which gives false positives in lockdep), we're giving sysfs inodes their
- * own class for i_mutex.
- */
-static struct lock_class_key sysfs_inode_imutex_key;
-
static int sysfs_count_nlink(struct sysfs_dirent *sd)
{
struct sysfs_dirent *child;
@@ -268,7 +257,6 @@ static void sysfs_init_inode(struct sysfs_dirent *sd, struct inode *inode)
inode->i_mapping->a_ops = &sysfs_aops;
inode->i_mapping->backing_dev_info = &sysfs_backing_dev_info;
inode->i_op = &sysfs_inode_operations;
- lockdep_set_class(&inode->i_mutex, &sysfs_inode_imutex_key);

set_default_inode_attr(inode, sd->s_mode);
sysfs_refresh_inode(sd, inode);
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index 90b3501..98a15bf 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -103,7 +103,6 @@ extern struct kmem_cache *sysfs_dir_cachep;
* dir.c
*/
extern struct mutex sysfs_mutex;
-extern struct mutex sysfs_rename_mutex;
extern spinlock_t sysfs_assoc_lock;

extern const struct file_operations sysfs_dir_operations;
diff --git a/include/linux/namei.h b/include/linux/namei.h
index ec0f607..0289467 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -76,7 +76,6 @@ extern struct file *nameidata_to_filp(struct nameidata *nd, int flags);
extern void release_open_intent(struct nameidata *);

extern struct dentry *lookup_one_len(const char *, struct dentry *, int);
-extern struct dentry *lookup_one_noperm(const char *, struct dentry *);

extern int follow_down(struct path *);
extern int follow_up(struct path *);
--
1.6.5.2.143.g8cc62

2009-11-08 07:28:37

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 13/15] sysfs: Factor out sysfs_rename from sysfs_rename_dir and sysfs_move_dir

From: Eric W. Biederman <[email protected]>

These two functions do 90% of the same work and it doesn't significantly
obfuscate the function to allow both the parent dir and the name to change
at the same time. So merge them together to simplify maintenance, and
increase testing.

Acked-by: Tejun Heo <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/dir.c | 62 +++++++++++++++++++++++++----------------------------
fs/sysfs/sysfs.h | 3 ++
2 files changed, 32 insertions(+), 33 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 0b60212..e1a86d1 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -760,30 +760,42 @@ void sysfs_remove_dir(struct kobject * kobj)
__sysfs_remove_dir(sd);
}

-int sysfs_rename_dir(struct kobject * kobj, const char *new_name)
+int sysfs_rename(struct sysfs_dirent *sd,
+ struct sysfs_dirent *new_parent_sd, const char *new_name)
{
- struct sysfs_dirent *sd = kobj->sd;
const char *dup_name = NULL;
int error;

mutex_lock(&sysfs_mutex);

error = 0;
- if (strcmp(sd->s_name, new_name) == 0)
+ if ((sd->s_parent == new_parent_sd) &&
+ (strcmp(sd->s_name, new_name) == 0))
goto out; /* nothing to rename */

error = -EEXIST;
- if (sysfs_find_dirent(sd->s_parent, new_name))
+ if (sysfs_find_dirent(new_parent_sd, new_name))
goto out;

/* rename sysfs_dirent */
- error = -ENOMEM;
- new_name = dup_name = kstrdup(new_name, GFP_KERNEL);
- if (!new_name)
- goto out;
+ if (strcmp(sd->s_name, new_name) != 0) {
+ error = -ENOMEM;
+ new_name = dup_name = kstrdup(new_name, GFP_KERNEL);
+ if (!new_name)
+ goto out;
+
+ dup_name = sd->s_name;
+ sd->s_name = new_name;
+ }

- dup_name = sd->s_name;
- sd->s_name = new_name;
+ /* Remove from old parent's list and insert into new parent's list. */
+ if (sd->s_parent != new_parent_sd) {
+ sysfs_unlink_sibling(sd);
+ sysfs_get(new_parent_sd);
+ sysfs_put(sd->s_parent);
+ sd->s_parent = new_parent_sd;
+ sysfs_link_sibling(sd);
+ }

error = 0;
out:
@@ -792,37 +804,21 @@ int sysfs_rename_dir(struct kobject * kobj, const char *new_name)
return error;
}

+int sysfs_rename_dir(struct kobject * kobj, const char *new_name)
+{
+ return sysfs_rename(kobj->sd, kobj->sd->s_parent, new_name);
+}
+
int sysfs_move_dir(struct kobject *kobj, struct kobject *new_parent_kobj)
{
struct sysfs_dirent *sd = kobj->sd;
struct sysfs_dirent *new_parent_sd;
- int error;

BUG_ON(!sd->s_parent);
-
- mutex_lock(&sysfs_mutex);
- new_parent_sd = (new_parent_kobj && new_parent_kobj->sd) ?
+ new_parent_sd = new_parent_kobj && new_parent_kobj->sd ?
new_parent_kobj->sd : &sysfs_root;

- error = 0;
- if (sd->s_parent == new_parent_sd)
- goto out; /* nothing to move */
-
- error = -EEXIST;
- if (sysfs_find_dirent(new_parent_sd, sd->s_name))
- goto out;
-
- /* Remove from old parent's list and insert into new parent's list. */
- sysfs_unlink_sibling(sd);
- sysfs_get(new_parent_sd);
- sysfs_put(sd->s_parent);
- sd->s_parent = new_parent_sd;
- sysfs_link_sibling(sd);
-
- error = 0;
-out:
- mutex_unlock(&sysfs_mutex);
- return error;
+ return sysfs_rename(sd, new_parent_sd, sd->s_name);
}

/* Relationship between s_mode and the DT_xxx types */
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index 98a15bf..ca52e7b 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -130,6 +130,9 @@ int sysfs_create_subdir(struct kobject *kobj, const char *name,
struct sysfs_dirent **p_sd);
void sysfs_remove_subdir(struct sysfs_dirent *sd);

+int sysfs_rename(struct sysfs_dirent *sd,
+ struct sysfs_dirent *new_parent_sd, const char *new_name);
+
static inline struct sysfs_dirent *__sysfs_get(struct sysfs_dirent *sd)
{
if (sd) {
--
1.6.5.2.143.g8cc62

2009-11-08 07:27:41

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 14/15] sysfs: sysfs_setattr remove unnecessary permission check.

From: Eric W. Biederman <[email protected]>

inode_change_ok already clears the SGID bit when necessary
so there is no reason for sysfs_setattr to carry code to do
the same, and it is good to kill the extra copy because when
I moved the code last in certain corner cases the code will
look at the wrong gid.

Acked-by: Serge Hallyn <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/inode.c | 4 ----
1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index 082a3eb..8197e1a 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -117,10 +117,6 @@ int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
return error;

iattr->ia_valid &= ~ATTR_SIZE; /* ignore size changes */
- if (iattr->ia_valid & ATTR_MODE) {
- if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
- iattr->ia_mode &= ~S_ISGID;
- }

error = inode_setattr(inode, iattr);
if (error)
--
1.6.5.2.143.g8cc62

2009-11-08 07:27:42

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 15/15] sysfs: Protect sysfs_refresh_inode with inode mutex.

From: Eric W. Biederman <[email protected]>

In general everything that writes to vfs inodes holds the
inode mutex, so hold the inode mutex over sysfs_refresh_inode.
The sysfs data structures don't need this but it looks like the
vfs might.

Acked-by: Serge Hallyn <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/inode.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index 8197e1a..75516cd 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -237,9 +237,11 @@ int sysfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *sta
struct sysfs_dirent *sd = dentry->d_fsdata;
struct inode *inode = dentry->d_inode;

+ mutex_lock(&inode->i_mutex);
mutex_lock(&sysfs_mutex);
sysfs_refresh_inode(sd, inode);
mutex_unlock(&sysfs_mutex);
+ mutex_unlock(&inode->i_mutex);

generic_fillattr(inode, stat);
return 0;
@@ -349,9 +351,11 @@ int sysfs_permission(struct inode *inode, int mask)
{
struct sysfs_dirent *sd = inode->i_private;

+ mutex_lock(&inode->i_mutex);
mutex_lock(&sysfs_mutex);
sysfs_refresh_inode(sd, inode);
mutex_unlock(&sysfs_mutex);
+ mutex_unlock(&inode->i_mutex);

return generic_permission(inode, mask, NULL);
}
--
1.6.5.2.143.g8cc62

2009-11-09 03:58:05

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 0/15] sysfs lazification final

Eric W. Biederman wrote:
> The sysfs code updates the vfs caches immediately when the sysfs data
> structures change causing a lot of unnecessary complications. The
> following patchset untangles that beast. Allowing for simpler
> more straight forward code, the removal of a hack from the vfs
> to support sysfs, and human comprehensible locking on sysfs.
>
> Most of these patches have already been reviewed and acked from the
> last time I had time to work on sysfs.
>
> This acks have been folded in and the two small bugs found in the
> previous review have been fixed in the trailing patches (they are
> minor enough nits that even a bisect that happens to land in the
> middle should not see sysfs problems).

Thanks a lot for bringing some sanity to sysfs. :-)

--
tejun

2009-11-09 14:15:01

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 12/13] sysfs: Propagate renames to the vfs on demand

Quoting Tejun Heo ([email protected]):
> Hello,
>
> Eric W. Biederman wrote:
> > It isn't what I want but it is what the VFS requires. If let the vfs
> > continue on it's delusional state we will leak the vfs mount and
> > everything mounted on top of it, with no way to remove the mounts.
>
> This is caused by not having any way to prevent deletion on
> directories with submounts, right? How does other distributed
> filesystems deal with directories with submounts going away underneath
> it?

Ooooh. I see, I was thinking only about the rename case, and forgot
this was the path for deleted files, too. For the rename case it
should be ok to let the dentry be put since the submounts will be
accessible at the new location, right? Should that be handled
separately?

-serge

2009-11-09 14:26:42

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 06/15] sysfs: Fix locking and factor out sysfs_sd_setattr

Quoting Eric W. Biederman ([email protected]):
> From: Eric W. Biederman <[email protected]>
>
> Cleanly separate the work that is specific to setting the
> attributes of a sysfs_dirent from what is needed to update
> the attributes of a vfs inode.
>
> Additionally grab the sysfs_mutex to keep any nasties from
> surprising us when updating the sysfs_dirent.
>
> Acked-by: Tejun Heo <[email protected]>
> Signed-off-by: Eric W. Biederman <[email protected]>

Oh, sorry, guess i never did

Acked-by: Serge Hallyn <[email protected]>

> ---
> fs/sysfs/inode.c | 52 ++++++++++++++++++++++++++++++++--------------------
> fs/sysfs/sysfs.h | 1 +
> 2 files changed, 33 insertions(+), 20 deletions(-)
>
> diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
> index fed7a74..fccfb55 100644
> --- a/fs/sysfs/inode.c
> +++ b/fs/sysfs/inode.c
> @@ -64,30 +64,15 @@ struct sysfs_inode_attrs *sysfs_init_inode_attrs(struct sysfs_dirent *sd)
>
> return attrs;
> }
> -int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
> +
> +int sysfs_sd_setattr(struct sysfs_dirent *sd, struct iattr * iattr)
> {
> - struct inode * inode = dentry->d_inode;
> - struct sysfs_dirent * sd = dentry->d_fsdata;
> struct sysfs_inode_attrs *sd_attrs;
> struct iattr *iattrs;
> unsigned int ia_valid = iattr->ia_valid;
> - int error;
> -
> - if (!sd)
> - return -EINVAL;
>
> sd_attrs = sd->s_iattr;
>
> - error = inode_change_ok(inode, iattr);
> - if (error)
> - return error;
> -
> - iattr->ia_valid &= ~ATTR_SIZE; /* ignore size changes */
> -
> - error = inode_setattr(inode, iattr);
> - if (error)
> - return error;
> -
> if (!sd_attrs) {
> /* setting attributes for the first time, allocate now */
> sd_attrs = sysfs_init_inode_attrs(sd);
> @@ -110,12 +95,39 @@ int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
> iattrs->ia_ctime = iattr->ia_ctime;
> if (ia_valid & ATTR_MODE) {
> umode_t mode = iattr->ia_mode;
> -
> - if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
> - mode &= ~S_ISGID;
> iattrs->ia_mode = sd->s_mode = mode;
> }
> }
> + return 0;
> +}
> +
> +int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
> +{
> + struct inode * inode = dentry->d_inode;
> + struct sysfs_dirent * sd = dentry->d_fsdata;
> + int error;
> +
> + if (!sd)
> + return -EINVAL;
> +
> + error = inode_change_ok(inode, iattr);
> + if (error)
> + return error;
> +
> + iattr->ia_valid &= ~ATTR_SIZE; /* ignore size changes */
> + if (iattr->ia_valid & ATTR_MODE) {
> + if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
> + iattr->ia_mode &= ~S_ISGID;
> + }
> +
> + error = inode_setattr(inode, iattr);
> + if (error)
> + return error;
> +
> + mutex_lock(&sysfs_mutex);
> + error = sysfs_sd_setattr(sd, iattr);
> + mutex_unlock(&sysfs_mutex);
> +
> return error;
> }
>
> diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
> index af4c4e7..a96d967 100644
> --- a/fs/sysfs/sysfs.h
> +++ b/fs/sysfs/sysfs.h
> @@ -155,6 +155,7 @@ static inline void __sysfs_put(struct sysfs_dirent *sd)
> */
> struct inode *sysfs_get_inode(struct sysfs_dirent *sd);
> void sysfs_delete_inode(struct inode *inode);
> +int sysfs_sd_setattr(struct sysfs_dirent *sd, struct iattr *iattr);
> int sysfs_setattr(struct dentry *dentry, struct iattr *iattr);
> int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
> size_t size, int flags);
> --
> 1.6.5.2.143.g8cc62

2009-11-09 22:34:53

by Eric Biederman

[permalink] [raw]
Subject: Re: [PATCH 12/13] sysfs: Propagate renames to the vfs on demand

On Mon, Nov 9, 2009 at 6:14 AM, Serge E. Hallyn <[email protected]> wrote:
> Quoting Tejun Heo ([email protected]):
>> Hello,
>>
>> Eric W. Biederman wrote:
>> > It isn't what I want but it is what the VFS requires. ?If let the vfs
>> > continue on it's delusional state we will leak the vfs mount and
>> > everything mounted on top of it, with no way to remove the mounts.
>>
>> This is caused by not having any way to prevent deletion on
>> directories with submounts, right? ?How does other distributed
>> filesystems deal with directories with submounts going away underneath
>> it?
>
> Ooooh. ?I see, I was thinking only about the rename case, and forgot
> this was the path for deleted files, too. ?For the rename case it
> should be ok to let the dentry be put since the submounts will be
> accessible at the new location, right? ?Should that be handled
> separately?

No in the rename case it isn't ok to let the dentry be discarded put as mounts
are implemented using a hash of the struct dentry's address, and if you aren't
the mount point you are referenced as d_parent.

For rename I am slightly better than NFS. sysfs does not support hard
links so if I am looking up the new name I can look for a preexisting
dentry for my
inode and if I find one I call d_move on it to lazily perform the rename.

Eric

2009-11-17 09:11:35

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/13] sysfs lazification.

Greg KH <[email protected]> writes:

> On Tue, Nov 03, 2009 at 03:53:56AM -0800, Eric W. Biederman wrote:
>>
>> The sysfs code updates the vfs caches immediately when the sysfs data
>> structures change causing a lot of unnecessary complications. The
>> following patchset untangles that beast. Allowing for simpler
>> more straight forward code, the removal of a hack from the vfs
>> to support sysfs, and human comprehensible locking on sysfs.
>>
>> Most of these patches have already been reviewed and acked from the
>> last time I had time to work on sysfs.
>>
>> In net the patches look like:
>
> Can you resend these based on the review that you just got, with the new
> acks and changes so that I can apply them?

Are you going to apply them soon? If you don't have time I can start
a sysfs tree and just ask Linus to pull them.

Eric


2009-11-17 15:42:16

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 0/13] sysfs lazification.

On Tue, Nov 17, 2009 at 01:11:22AM -0800, Eric W. Biederman wrote:
> Greg KH <[email protected]> writes:
>
> > On Tue, Nov 03, 2009 at 03:53:56AM -0800, Eric W. Biederman wrote:
> >>
> >> The sysfs code updates the vfs caches immediately when the sysfs data
> >> structures change causing a lot of unnecessary complications. The
> >> following patchset untangles that beast. Allowing for simpler
> >> more straight forward code, the removal of a hack from the vfs
> >> to support sysfs, and human comprehensible locking on sysfs.
> >>
> >> Most of these patches have already been reviewed and acked from the
> >> last time I had time to work on sysfs.
> >>
> >> In net the patches look like:
> >
> > Can you resend these based on the review that you just got, with the new
> > acks and changes so that I can apply them?
>
> Are you going to apply them soon? If you don't have time I can start
> a sysfs tree and just ask Linus to pull them.

Well, as the merge window is not open right now, that might be tough to
get Linus to take them at the moment :)

I was away last week, and will apply them to my tree this week, thanks
for your patience.

greg k-h

2009-11-17 15:56:13

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/13] sysfs lazification.

Greg KH <[email protected]> writes:

> On Tue, Nov 17, 2009 at 01:11:22AM -0800, Eric W. Biederman wrote:
>> Greg KH <[email protected]> writes:
>>
>> > On Tue, Nov 03, 2009 at 03:53:56AM -0800, Eric W. Biederman wrote:
>> >>
>> >> The sysfs code updates the vfs caches immediately when the sysfs data
>> >> structures change causing a lot of unnecessary complications. The
>> >> following patchset untangles that beast. Allowing for simpler
>> >> more straight forward code, the removal of a hack from the vfs
>> >> to support sysfs, and human comprehensible locking on sysfs.
>> >>
>> >> Most of these patches have already been reviewed and acked from the
>> >> last time I had time to work on sysfs.
>> >>
>> >> In net the patches look like:
>> >
>> > Can you resend these based on the review that you just got, with the new
>> > acks and changes so that I can apply them?
>>
>> Are you going to apply them soon? If you don't have time I can start
>> a sysfs tree and just ask Linus to pull them.
>
> Well, as the merge window is not open right now, that might be tough to
> get Linus to take them at the moment :)

True. ;)

> I was away last week, and will apply them to my tree this week, thanks
> for your patience.

I'm working on it.

Eric

2009-11-20 21:20:56

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 06/15] sysfs: Fix locking and factor out sysfs_sd_setattr

On Sat, Nov 07, 2009 at 11:27:04PM -0800, Eric W. Biederman wrote:
> From: Eric W. Biederman <[email protected]>
>
> Cleanly separate the work that is specific to setting the
> attributes of a sysfs_dirent from what is needed to update
> the attributes of a vfs inode.
>
> Additionally grab the sysfs_mutex to keep any nasties from
> surprising us when updating the sysfs_dirent.
>
> Acked-by: Tejun Heo <[email protected]>
> Signed-off-by: Eric W. Biederman <[email protected]>

Due to the extended attr work for sysfs that went into Linus's tree
recently, this patch doesn't apply at all anymore.

I've applied the first 5 to my tree, care to respin the other 10 and
resend them?

thanks,

greg k-h

2009-11-20 21:39:19

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 06/15] sysfs: Fix locking and factor out sysfs_sd_setattr

Greg KH <[email protected]> writes:

> On Sat, Nov 07, 2009 at 11:27:04PM -0800, Eric W. Biederman wrote:
>> From: Eric W. Biederman <[email protected]>
>>
>> Cleanly separate the work that is specific to setting the
>> attributes of a sysfs_dirent from what is needed to update
>> the attributes of a vfs inode.
>>
>> Additionally grab the sysfs_mutex to keep any nasties from
>> surprising us when updating the sysfs_dirent.
>>
>> Acked-by: Tejun Heo <[email protected]>
>> Signed-off-by: Eric W. Biederman <[email protected]>
>
> Due to the extended attr work for sysfs that went into Linus's tree
> recently, this patch doesn't apply at all anymore.

I will take a look. I'm scratching my about how that caused a problem
because that patch came out of my tree originally, before the rest of
these patches. So if anything I should have problems in the other direction.

Eric

2009-11-21 00:07:20

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 06/15] sysfs: Fix locking and factor out sysfs_sd_setattr

Greg KH <[email protected]> writes:

> On Sat, Nov 07, 2009 at 11:27:04PM -0800, Eric W. Biederman wrote:
>> From: Eric W. Biederman <[email protected]>
>>
>> Cleanly separate the work that is specific to setting the
>> attributes of a sysfs_dirent from what is needed to update
>> the attributes of a vfs inode.
>>
>> Additionally grab the sysfs_mutex to keep any nasties from
>> surprising us when updating the sysfs_dirent.
>>
>> Acked-by: Tejun Heo <[email protected]>
>> Signed-off-by: Eric W. Biederman <[email protected]>
>
> Due to the extended attr work for sysfs that went into Linus's tree
> recently, this patch doesn't apply at all anymore.

It looks like the issue is the a trivial context conflict with
sysfs-mark-a-locally-only-used-function-static.patch

Which marks one of those new functions as static.

Here come the patches 6-15 rebased on top of that.

Eric

2009-11-21 05:14:52

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 06/15] sysfs: Fix locking and factor out sysfs_sd_setattr

On Fri, Nov 20, 2009 at 04:07:11PM -0800, Eric W. Biederman wrote:
> Greg KH <[email protected]> writes:
>
> > On Sat, Nov 07, 2009 at 11:27:04PM -0800, Eric W. Biederman wrote:
> >> From: Eric W. Biederman <[email protected]>
> >>
> >> Cleanly separate the work that is specific to setting the
> >> attributes of a sysfs_dirent from what is needed to update
> >> the attributes of a vfs inode.
> >>
> >> Additionally grab the sysfs_mutex to keep any nasties from
> >> surprising us when updating the sysfs_dirent.
> >>
> >> Acked-by: Tejun Heo <[email protected]>
> >> Signed-off-by: Eric W. Biederman <[email protected]>
> >
> > Due to the extended attr work for sysfs that went into Linus's tree
> > recently, this patch doesn't apply at all anymore.
>
> It looks like the issue is the a trivial context conflict with
> sysfs-mark-a-locally-only-used-function-static.patch

Ah, sorry about that, I should have noticed that, I was looking for a
much harder problem :(

I'll go queue these up in a bit.

thanks,

greg k-h

2009-11-30 21:58:31

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 0/15] sysfs lazification final

On Sat, Nov 07, 2009 at 11:25:03PM -0800, Eric W. Biederman wrote:
>
> The sysfs code updates the vfs caches immediately when the sysfs data
> structures change causing a lot of unnecessary complications. The
> following patchset untangles that beast. Allowing for simpler
> more straight forward code, the removal of a hack from the vfs
> to support sysfs, and human comprehensible locking on sysfs.
>
> Most of these patches have already been reviewed and acked from the
> last time I had time to work on sysfs.
>
> This acks have been folded in and the two small bugs found in the
> previous review have been fixed in the trailing patches (they are
> minor enough nits that even a bisect that happens to land in the
> middle should not see sysfs problems).

I've applied all of these to my tree now, and sorry, but something is
broken pretty badly.

When doing a simple 'ls /sys/class/input/' the process locks up. This
means that X can't find any input devices, which makes for a bit of a
problem when wanting to use your mouse or keyboard :(

Attached is the state of my processes when this happens, if that helps
out any.

So I'm going to drop all of these from my tree again, as they are not
ready for merging at this point :(

sorry,

greg k-h


Attachments:
(No filename) (1.23 kB)
dmesg (243.82 kB)
Download all attachments

2009-11-30 21:58:07

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 0/15] sysfs lazification final

On Mon, Nov 30, 2009 at 01:33:37PM -0800, Greg KH wrote:
> On Sat, Nov 07, 2009 at 11:25:03PM -0800, Eric W. Biederman wrote:
> >
> > The sysfs code updates the vfs caches immediately when the sysfs data
> > structures change causing a lot of unnecessary complications. The
> > following patchset untangles that beast. Allowing for simpler
> > more straight forward code, the removal of a hack from the vfs
> > to support sysfs, and human comprehensible locking on sysfs.
> >
> > Most of these patches have already been reviewed and acked from the
> > last time I had time to work on sysfs.
> >
> > This acks have been folded in and the two small bugs found in the
> > previous review have been fixed in the trailing patches (they are
> > minor enough nits that even a bisect that happens to land in the
> > middle should not see sysfs problems).
>
> I've applied all of these to my tree now, and sorry, but something is
> broken pretty badly.
>
> When doing a simple 'ls /sys/class/input/' the process locks up. This
> means that X can't find any input devices, which makes for a bit of a
> problem when wanting to use your mouse or keyboard :(
>
> Attached is the state of my processes when this happens, if that helps
> out any.
>
> So I'm going to drop all of these from my tree again, as they are not
> ready for merging at this point :(

In looking at the stuck processes, it seems your last patch was the
problem. Removing that caused things to work again, so I've only
dropped that one.

Next time, please test your patches before submitting them :(

thanks,

greg k-h

2009-12-01 00:12:48

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/15] sysfs lazification final

Greg KH <[email protected]> writes:

> On Mon, Nov 30, 2009 at 01:33:37PM -0800, Greg KH wrote:
>> On Sat, Nov 07, 2009 at 11:25:03PM -0800, Eric W. Biederman wrote:
>> >
>> > The sysfs code updates the vfs caches immediately when the sysfs data
>> > structures change causing a lot of unnecessary complications. The
>> > following patchset untangles that beast. Allowing for simpler
>> > more straight forward code, the removal of a hack from the vfs
>> > to support sysfs, and human comprehensible locking on sysfs.
>> >
>> > Most of these patches have already been reviewed and acked from the
>> > last time I had time to work on sysfs.
>> >
>> > This acks have been folded in and the two small bugs found in the
>> > previous review have been fixed in the trailing patches (they are
>> > minor enough nits that even a bisect that happens to land in the
>> > middle should not see sysfs problems).
>>
>> I've applied all of these to my tree now, and sorry, but something is
>> broken pretty badly.
>>
>> When doing a simple 'ls /sys/class/input/' the process locks up. This
>> means that X can't find any input devices, which makes for a bit of a
>> problem when wanting to use your mouse or keyboard :(
>>
>> Attached is the state of my processes when this happens, if that helps
>> out any.
>>
>> So I'm going to drop all of these from my tree again, as they are not
>> ready for merging at this point :(
>
> In looking at the stuck processes, it seems your last patch was the
> problem. Removing that caused things to work again, so I've only
> dropped that one.
>
> Next time, please test your patches before submitting them :(

Weird I thought I had tested this.

That last patch to add locking that is only needed for vfs coherency
has certainly seen less testing than the others.

I also remember verify that nfs does the same thing, when in fact
nfs takes inode->i_lock not inode->i_mutex in the same situation.

generic_permission takes no locks so this is really about serializing
writes to the inode. The vfs only takes inode->i_mutex, when calling
notify_change.

So it appears I have stepped into a murky corner of the vfs.

I will take a look and do a bit more testing. At the moment it looks
like a solution to serializing writes to the stat attributes on the inode is
going to be simply holding sysfs_mutex over inode_setattr in
sysfs_setattr. Assuming a solution is needed at all.

Eric