2010-01-11 20:16:30

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 0/7] General sysfs enhancements


This patchset is built on top of 2.6.33-rc3 +
"sysfs: Cache the last sysfs_dirent to improve readdir scalability v2"

Greg I assume you have that patch somewhere in queue.

This patchset contains little cleanups all around sysfs.
- Some basic cleanups.
- Some sysfs scaling work
- Implementation of sysfs_rename

drivers/base/core.c | 18 ++++++------------
fs/sysfs/dir.c | 22 +++++++++++++++++++++-
fs/sysfs/inode.c | 28 ++++++++--------------------
fs/sysfs/mount.c | 5 ++---
fs/sysfs/symlink.c | 38 ++++++++++++++++++++++++++++++++++++++
fs/sysfs/sysfs.h | 6 +++---
include/linux/sysfs.h | 9 +++++++++
7 files changed, 87 insertions(+), 39 deletions(-)

This should be the final patchset before we can seriously look at
network namespace support in sysfs, again.

Eric


2010-01-11 20:22:59

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 6/7] sysfs: Pass super_block to sysfs_get_inode

From: Eric W. Biederman <[email protected]>

Currently sysfs_get_inode magically returns an inode on
sysfs_sb. Make the super_block parameter explicit and
the code becomes clearer.

Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/dir.c | 2 +-
fs/sysfs/inode.c | 5 +++--
fs/sysfs/mount.c | 2 +-
fs/sysfs/sysfs.h | 2 +-
4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 1c364be..a491c72 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -701,7 +701,7 @@ static struct dentry * sysfs_lookup(struct inode *dir, struct dentry *dentry,
}

/* attach dentry and inode */
- inode = sysfs_get_inode(sd);
+ inode = sysfs_get_inode(dir->i_sb, sd);
if (!inode) {
ret = ERR_PTR(-ENOMEM);
goto out_unlock;
diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index f0172b3..487e33c 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -271,6 +271,7 @@ static void sysfs_init_inode(struct sysfs_dirent *sd, struct inode *inode)

/**
* sysfs_get_inode - get inode for sysfs_dirent
+ * @sb: super block
* @sd: sysfs_dirent to allocate inode for
*
* Get inode for @sd. If such inode doesn't exist, a new inode
@@ -283,11 +284,11 @@ static void sysfs_init_inode(struct sysfs_dirent *sd, struct inode *inode)
* RETURNS:
* Pointer to allocated inode on success, NULL on failure.
*/
-struct inode * sysfs_get_inode(struct sysfs_dirent *sd)
+struct inode * sysfs_get_inode(struct super_block *sb, struct sysfs_dirent *sd)
{
struct inode *inode;

- inode = iget_locked(sysfs_sb, sd->s_ino);
+ inode = iget_locked(sb, sd->s_ino);
if (inode && (inode->i_state & I_NEW))
sysfs_init_inode(sd, inode);

diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index 372a32c..77bce54 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -55,7 +55,7 @@ static int sysfs_fill_super(struct super_block *sb, void *data, int silent)

/* get root inode, initialize and unlock it */
mutex_lock(&sysfs_mutex);
- inode = sysfs_get_inode(&sysfs_root);
+ inode = sysfs_get_inode(sb, &sysfs_root);
mutex_unlock(&sysfs_mutex);
if (!inode) {
pr_debug("sysfs: could not get root inode\n");
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index 85d5c6c..1ae6009 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -169,7 +169,7 @@ static inline void __sysfs_put(struct sysfs_dirent *sd)
/*
* inode.c
*/
-struct inode *sysfs_get_inode(struct sysfs_dirent *sd);
+struct inode *sysfs_get_inode(struct super_block *sb, struct sysfs_dirent *sd);
void sysfs_delete_inode(struct inode *inode);
int sysfs_sd_setattr(struct sysfs_dirent *sd, struct iattr *iattr);
int sysfs_permission(struct inode *inode, int mask);
--
1.6.5.2.143.g8cc62

2010-01-11 20:22:18

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 2/7] sysfs: Pack sysfs_dirent more tightly.

From: Eric W. Biederman <[email protected]>

Placing the 16bit s_mode between a pointer and a long doesn't pack well
especailly on 64bit where we wast 48 bits. So move s_mode and
declare it as a unsigned short. This is the sysfs backing store
after all we don't need fields extra large just in case someday
we want userspace to be able to use a larger value.

Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/sysfs.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index cdd9377..0417805 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -66,8 +66,8 @@ struct sysfs_dirent {
};

unsigned int s_flags;
+ unsigned short s_mode;
ino_t s_ino;
- umode_t s_mode;
struct sysfs_inode_attrs *s_iattr;
};

--
1.6.5.2.143.g8cc62

2010-01-11 20:23:18

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 3/7] sysfs: Keep an nlink count on sysfs directories.

From: Eric W. Biederman <[email protected]>

On large directories sysfs_count_nlinks can be a significant
bottleneck, so keep a count in sysfs_dirent. If we exceed
the maximum number of directory entries we can store return
nlink of 1. An nlink of 1 matches what reiserfs does in
this case, and it let's find and similar utlities know that
we have a the directory nlink can not be used for optimization
purposes.

Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/dir.c | 20 ++++++++++++++++++++
fs/sysfs/inode.c | 15 +--------------
fs/sysfs/mount.c | 1 +
fs/sysfs/sysfs.h | 1 +
4 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 5c4703d..1c364be 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -359,6 +359,7 @@ struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type)
sd->s_name = name;
sd->s_mode = mode;
sd->s_flags = type;
+ sd->s_nlink = type == SYSFS_DIR? 2:1;

return sd;

@@ -392,6 +393,23 @@ void sysfs_addrm_start(struct sysfs_addrm_cxt *acxt,
mutex_lock(&sysfs_mutex);
}

+static void sysfs_dir_inc_nlink(struct sysfs_dirent *sd)
+{
+ /* If we overflow force nlink to be 1 */
+ if (sd->s_nlink > 1) {
+ sd->s_nlink++;
+ if (sd->s_nlink == 0)
+ sd->s_nlink = 1;
+ }
+}
+
+static void sysfs_dir_dec_nlink(struct sysfs_dirent *sd)
+{
+ /* Don't decrement if we overflowed an increment */
+ if (sd->s_nlink != 1)
+ sd->s_nlink--;
+}
+
/**
* __sysfs_add_one - add sysfs_dirent to parent without warning
* @acxt: addrm context to use
@@ -420,6 +438,7 @@ int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
return -EEXIST;

sd->s_parent = sysfs_get(acxt->parent_sd);
+ sysfs_dir_inc_nlink(sd->s_parent);

sysfs_link_sibling(sd);

@@ -512,6 +531,7 @@ void sysfs_remove_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)

BUG_ON(sd->s_flags & SYSFS_FLAG_REMOVED);

+ sysfs_dir_dec_nlink(sd->s_parent);
sysfs_unlink_sibling(sd);

/* Update timestamps on the parent */
diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index 104cbc1..f0172b3 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -201,18 +201,6 @@ static inline void set_inode_attr(struct inode * inode, struct iattr * iattr)
inode->i_ctime = iattr->ia_ctime;
}

-static int sysfs_count_nlink(struct sysfs_dirent *sd)
-{
- struct sysfs_dirent *child;
- int nr = 0;
-
- for (child = sd->s_dir.children; child; child = child->s_sibling)
- if (sysfs_type(child) == SYSFS_DIR)
- nr++;
-
- return nr + 2;
-}
-
static void sysfs_refresh_inode(struct sysfs_dirent *sd, struct inode *inode)
{
struct sysfs_inode_attrs *iattrs = sd->s_iattr;
@@ -228,8 +216,7 @@ static void sysfs_refresh_inode(struct sysfs_dirent *sd, struct inode *inode)
iattrs->ia_secdata_len);
}

- if (sysfs_type(sd) == SYSFS_DIR)
- inode->i_nlink = sysfs_count_nlink(sd);
+ inode->i_nlink = sd->s_nlink;
}

int sysfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index 4974995..372a32c 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -37,6 +37,7 @@ struct sysfs_dirent sysfs_root = {
.s_count = ATOMIC_INIT(1),
.s_flags = SYSFS_DIR,
.s_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
+ .s_nlink = 2,
.s_ino = 1,
};

diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index 0417805..85d5c6c 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -67,6 +67,7 @@ struct sysfs_dirent {

unsigned int s_flags;
unsigned short s_mode;
+ unsigned short s_nlink;
ino_t s_ino;
struct sysfs_inode_attrs *s_iattr;
};
--
1.6.5.2.143.g8cc62

2010-01-11 20:22:15

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 1/7] sysfs: Serialize updates to the vfs inode

From: Eric W. Biederman <[email protected]>

The way the vfs is structured only calls to the filesystem
methods actually update the vfs inode. We add to the
normal number of places where the vfs inode is updated by
also updating the vfs inode in sysfs_refresh_inode.

Grabbing the inode mutex in sysfs_permission and sysfs_getattr
causes deadlocks, because somtimes those operations are called
with the inode mutex held, but not always. Therefore we can
not depend upon the inode mutex to serialize all updates
to the vfs inode.

We take the sysfs_mutex in all of those places so we can
also use it to protect the vfs inode. To accomplish that
we simply requires extending the vfs inode in sysfs_setattr
over inode_change_ok (so we have an unchanging inode
when we perform the check), and inode_setattr.

Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/inode.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index 220b758..104cbc1 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -112,20 +112,20 @@ int sysfs_setattr(struct dentry *dentry, struct iattr *iattr)
if (!sd)
return -EINVAL;

+ mutex_lock(&sysfs_mutex);
error = inode_change_ok(inode, iattr);
if (error)
- return error;
+ goto out;

iattr->ia_valid &= ~ATTR_SIZE; /* ignore size changes */

error = inode_setattr(inode, iattr);
if (error)
- return error;
+ goto out;

- mutex_lock(&sysfs_mutex);
error = sysfs_sd_setattr(sd, iattr);
+out:
mutex_unlock(&sysfs_mutex);
-
return error;
}

--
1.6.5.2.143.g8cc62

2010-01-11 20:22:25

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 7/7] sysfs: Kill unused sysfs_sb variable.

From: Eric W. Biederman <[email protected]>

Now that there are no more users we can remove
the sysfs_sb variable.

Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/mount.c | 2 --
fs/sysfs/sysfs.h | 1 -
2 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index 77bce54..50cf41c 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -23,7 +23,6 @@


static struct vfsmount *sysfs_mount;
-struct super_block * sysfs_sb = NULL;
struct kmem_cache *sysfs_dir_cachep;

static const struct super_operations sysfs_ops = {
@@ -51,7 +50,6 @@ static int sysfs_fill_super(struct super_block *sb, void *data, int silent)
sb->s_magic = SYSFS_MAGIC;
sb->s_op = &sysfs_ops;
sb->s_time_gran = 1;
- sysfs_sb = sb;

/* get root inode, initialize and unlock it */
mutex_lock(&sysfs_mutex);
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index 1ae6009..0ea96af 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -112,7 +112,6 @@ struct sysfs_addrm_cxt {
* mount.c
*/
extern struct sysfs_dirent sysfs_root;
-extern struct super_block *sysfs_sb;
extern struct kmem_cache *sysfs_dir_cachep;

/*
--
1.6.5.2.143.g8cc62

2010-01-11 20:22:40

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 4/7] sysfs: Implement sysfs_rename_link

From: Eric W. Biederman <[email protected]>

Because of rename ordering problems we occassionally give false
warnings about invalid sysfs operations. So using sysfs_rename
create a sysfs_rename_link function that doesn't need strange
workarounds.

Cc: Benjamin Thery <[email protected]>
Cc: Daniel Lezcano <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/symlink.c | 38 ++++++++++++++++++++++++++++++++++++++
include/linux/sysfs.h | 9 +++++++++
2 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
index c5eff49..1b9a3a1 100644
--- a/fs/sysfs/symlink.c
+++ b/fs/sysfs/symlink.c
@@ -123,6 +123,44 @@ void sysfs_remove_link(struct kobject * kobj, const char * name)
sysfs_hash_and_remove(parent_sd, name);
}

+/**
+ * sysfs_rename_link - rename symlink in object's directory.
+ * @kobj: object we're acting for.
+ * @targ: object we're pointing to.
+ * @old: previous name of the symlink.
+ * @new: new name of the symlink.
+ *
+ * A helper function for the common rename symlink idiom.
+ */
+int sysfs_rename_link(struct kobject *kobj, struct kobject *targ,
+ const char *old, const char *new)
+{
+ struct sysfs_dirent *parent_sd, *sd = NULL;
+ int result;
+
+ if (!kobj)
+ parent_sd = &sysfs_root;
+ else
+ parent_sd = kobj->sd;
+
+ result = -ENOENT;
+ sd = sysfs_get_dirent(parent_sd, old);
+ if (!sd)
+ goto out;
+
+ result = -EINVAL;
+ if (sysfs_type(sd) != SYSFS_KOBJ_LINK)
+ goto out;
+ if (sd->s_symlink.target_sd->s_dir.kobj != targ)
+ goto out;
+
+ result = sysfs_rename(sd, parent_sd, new);
+
+out:
+ sysfs_put(sd);
+ return result;
+}
+
static int sysfs_get_target_path(struct sysfs_dirent *parent_sd,
struct sysfs_dirent *target_sd, char *path)
{
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index cfa8308..e322573 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -110,6 +110,9 @@ int __must_check sysfs_create_link_nowarn(struct kobject *kobj,
const char *name);
void sysfs_remove_link(struct kobject *kobj, const char *name);

+int sysfs_rename_link(struct kobject *kobj, struct kobject *target,
+ const char *old_name, const char *new_name);
+
int __must_check sysfs_create_group(struct kobject *kobj,
const struct attribute_group *grp);
int sysfs_update_group(struct kobject *kobj,
@@ -203,6 +206,12 @@ static inline void sysfs_remove_link(struct kobject *kobj, const char *name)
{
}

+static inline int sysfs_rename_link(struct kobject *k, struct kobject *t,
+ const char *old_name, const char *new_name)
+{
+ return 0;
+}
+
static inline int sysfs_create_group(struct kobject *kobj,
const struct attribute_group *grp)
{
--
1.6.5.2.143.g8cc62

2010-01-11 20:23:35

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 5/7] driver core: Use sysfs_rename_link in device_rename

From: Eric W. Biederman <[email protected]>

Don't open code the renaming of symlinks in sysfs
instead use the new helper function sysfs_rename_link

Signed-off-by: Eric W. Biederman <[email protected]>
---
drivers/base/core.c | 18 ++++++------------
1 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 2820257..ba66394 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1574,22 +1574,16 @@ int device_rename(struct device *dev, char *new_name)
if (old_class_name) {
new_class_name = make_class_name(dev->class->name, &dev->kobj);
if (new_class_name) {
- error = sysfs_create_link_nowarn(&dev->parent->kobj,
- &dev->kobj,
- new_class_name);
- if (error)
- goto out;
- sysfs_remove_link(&dev->parent->kobj, old_class_name);
+ error = sysfs_rename_link(&dev->parent->kobj,
+ &dev->kobj,
+ old_class_name,
+ new_class_name);
}
}
#else
if (dev->class) {
- error = sysfs_create_link_nowarn(&dev->class->p->class_subsys.kobj,
- &dev->kobj, dev_name(dev));
- if (error)
- goto out;
- sysfs_remove_link(&dev->class->p->class_subsys.kobj,
- old_device_name);
+ error = sysfs_rename_link(&dev->class->p->class_subsys.kobj,
+ &dev->kobj, old_device_name, new_name);
}
#endif

--
1.6.5.2.143.g8cc62

2010-01-12 00:39:51

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 2/7] sysfs: Pack sysfs_dirent more tightly.

On 01/12/2010 05:21 AM, Eric W. Biederman wrote:
> From: Eric W. Biederman <[email protected]>
>
> Placing the 16bit s_mode between a pointer and a long doesn't pack well
> especailly on 64bit where we wast 48 bits. So move s_mode and
> declare it as a unsigned short. This is the sysfs backing store
> after all we don't need fields extra large just in case someday
> we want userspace to be able to use a larger value.
>
> Signed-off-by: Eric W. Biederman <[email protected]>

Acked-by: Tejun Heo <[email protected]>

--
tejun

2010-01-12 00:45:10

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 3/7] sysfs: Keep an nlink count on sysfs directories.

Hello,

On 01/12/2010 05:21 AM, Eric W. Biederman wrote:
> On large directories sysfs_count_nlinks can be a significant
> bottleneck, so keep a count in sysfs_dirent.

I was about to suggest changing s_flags to ushort too. Hmmm... adding
a new field to sysfs_dirent somewhat worries me but this doesn't add
to the size of the structure. How significant bottlenect are we
talking about?

> If we exceed the maximum number of directory entries we can store
> return nlink of 1. An nlink of 1 matches what reiserfs does in this
> case, and it let's find and similar utlities know that we have a the
> directory nlink can not be used for optimization purposes.

Hmmm... what's the limit on reiserfs? Is it 64k too?

Thanks.

--
tejun

2010-01-12 01:03:03

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 3/7] sysfs: Keep an nlink count on sysfs directories.

Tejun Heo <[email protected]> writes:

> Hello,
>
> On 01/12/2010 05:21 AM, Eric W. Biederman wrote:
>> On large directories sysfs_count_nlinks can be a significant
>> bottleneck, so keep a count in sysfs_dirent.
>
> I was about to suggest changing s_flags to ushort too. Hmmm... adding
> a new field to sysfs_dirent somewhat worries me but this doesn't add
> to the size of the structure. How significant bottlenect are we
> talking about?

It was seen in measurements of sysfs before my last round of changes,
which cause us to refresh the inode, and call sysfs_count_nlink more
often.

I am surprised no one has complained about 2.6.33-rcN yet and reported
a performance regression.

Ultimately not having a cached nlink count transforms what should
be constant time operations to operations that run in time O(N).

>> If we exceed the maximum number of directory entries we can store
>> return nlink of 1. An nlink of 1 matches what reiserfs does in this
>> case, and it let's find and similar utlities know that we have a the
>> directory nlink can not be used for optimization purposes.
>
> Hmmm... what's the limit on reiserfs? Is it 64k too?

The resierfs limit is a bit short of a 32bit number. Ext[234]
all have a 16bit nlink field, and they fail the operation
when you attempt to increment nlink past their limit.

In this case the comparison with reiserfs is to show that at some
point throwing up our hands and not counting and just returning nlink
1 is something userspace can occassionally expect to see. It is
common enough that find has handled this idiom for years.

Since we can handle this without increasing the size of the sysfs_dirent
I figure we should have a good quality of implementation for the common
case and return something userspace can deal with for the extreme cases.

Eric

2010-01-12 01:06:59

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 3/7] sysfs: Keep an nlink count on sysfs directories.

Benjamin LaHaise <[email protected]> writes:

> On Tue, Jan 12, 2010 at 09:46:59AM +0900, Tejun Heo wrote:
>> Hello,
>>
>> On 01/12/2010 05:21 AM, Eric W. Biederman wrote:
>> > On large directories sysfs_count_nlinks can be a significant
>> > bottleneck, so keep a count in sysfs_dirent.
>>
>> I was about to suggest changing s_flags to ushort too. Hmmm... adding
>> a new field to sysfs_dirent somewhat worries me but this doesn't add
>> to the size of the structure. How significant bottlenect are we
>> talking about?
>
> 100,000 entries in a sysfs directory is a requirement for network devices.

I haven't limited that.

>> > If we exceed the maximum number of directory entries we can store
>> > return nlink of 1. An nlink of 1 matches what reiserfs does in this
>> > case, and it let's find and similar utlities know that we have a the
>> > directory nlink can not be used for optimization purposes.
>>
>> Hmmm... what's the limit on reiserfs? Is it 64k too?
>
> 64k is too small. 10 gig interfaces can currently service 50-100k users,
> each of which requires their own network device.

64k is the point at which I stop returning a userful nlink number to
userspace, sysfs directories can still scale to any size.

I could legitimately always return nlink == 1, and all userspace
utilities that I know of would be fine, but I am trying to be a bit
more polite than that.

I don't see the link count as interesting enough to store more than
16bits for it. Even with 32bits of storage for nlink sysfs would have
to have to handle the rollover case as I am doing now. So I don't
see any advantage to storing more bits.

Eric

2010-01-12 01:12:48

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: [PATCH 3/7] sysfs: Keep an nlink count on sysfs directories.

On Mon, Jan 11, 2010 at 05:06:53PM -0800, Eric W. Biederman wrote:
> I don't see the link count as interesting enough to store more than
> 16bits for it. Even with 32bits of storage for nlink sysfs would have
> to have to handle the rollover case as I am doing now. So I don't
> see any advantage to storing more bits.

I'm not terribly concerned with what value gets returned, but rather about
how long is spent calculating it. Ideally, new sysfs directory entries
can be inserted in an O(1) or other reasonable order operation. I'll try
to find some time to re-run my interface scaling tests with your latest
changes.

=ben

2010-01-12 01:15:08

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: [PATCH 3/7] sysfs: Keep an nlink count on sysfs directories.

On Tue, Jan 12, 2010 at 09:46:59AM +0900, Tejun Heo wrote:
> Hello,
>
> On 01/12/2010 05:21 AM, Eric W. Biederman wrote:
> > On large directories sysfs_count_nlinks can be a significant
> > bottleneck, so keep a count in sysfs_dirent.
>
> I was about to suggest changing s_flags to ushort too. Hmmm... adding
> a new field to sysfs_dirent somewhat worries me but this doesn't add
> to the size of the structure. How significant bottlenect are we
> talking about?

100,000 entries in a sysfs directory is a requirement for network devices.

> > If we exceed the maximum number of directory entries we can store
> > return nlink of 1. An nlink of 1 matches what reiserfs does in this
> > case, and it let's find and similar utlities know that we have a the
> > directory nlink can not be used for optimization purposes.
>
> Hmmm... what's the limit on reiserfs? Is it 64k too?

64k is too small. 10 gig interfaces can currently service 50-100k users,
each of which requires their own network device.

-ben

2010-01-12 01:23:27

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 3/7] sysfs: Keep an nlink count on sysfs directories.

Benjamin LaHaise <[email protected]> writes:

> On Mon, Jan 11, 2010 at 05:06:53PM -0800, Eric W. Biederman wrote:
>> I don't see the link count as interesting enough to store more than
>> 16bits for it. Even with 32bits of storage for nlink sysfs would have
>> to have to handle the rollover case as I am doing now. So I don't
>> see any advantage to storing more bits.
>
> I'm not terribly concerned with what value gets returned, but rather about
> how long is spent calculating it. Ideally, new sysfs directory entries
> can be inserted in an O(1) or other reasonable order operation. I'll try
> to find some time to re-run my interface scaling tests with your latest
> changes.

Sounds good. My changes so far have been the easy low hanging fruit.

readdir should be at O(N) and stat etc should all be constant time.
Lookups by name are still O(N), and I'm not ready to look at going
better until after I have network namespace support into sysfs.

Eric

2010-01-12 05:41:47

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 1/7] sysfs: Serialize updates to the vfs inode

Quoting Eric W. Biederman ([email protected]):
> From: Eric W. Biederman <[email protected]>
>
> The way the vfs is structured only calls to the filesystem
> methods actually update the vfs inode. We add to the
> normal number of places where the vfs inode is updated by
> also updating the vfs inode in sysfs_refresh_inode.
>
> Grabbing the inode mutex in sysfs_permission and sysfs_getattr
> causes deadlocks, because somtimes those operations are called
> with the inode mutex held, but not always. Therefore we can
> not depend upon the inode mutex to serialize all updates
> to the vfs inode.
>
> We take the sysfs_mutex in all of those places so we can
> also use it to protect the vfs inode. To accomplish that
> we simply requires extending the vfs inode in sysfs_setattr
> over inode_change_ok (so we have an unchanging inode
> when we perform the check), and inode_setattr.
>
> Signed-off-by: Eric W. Biederman <[email protected]>

I'm a little confused about the patch intro: it makes it sound
like this is preparatory for a followup, but in fact it is a bugfix,
right? Hard to exploit i think, but should it go to -stable?

Acked-by: Serge Hallyn <[email protected]>

> ---
> fs/sysfs/inode.c | 8 ++++----
> 1 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
> index 220b758..104cbc1 100644
> --- a/fs/sysfs/inode.c
> +++ b/fs/sysfs/inode.c
> @@ -112,20 +112,20 @@ int sysfs_setattr(struct dentry *dentry, struct iattr *iattr)
> if (!sd)
> return -EINVAL;
>
> + mutex_lock(&sysfs_mutex);
> error = inode_change_ok(inode, iattr);
> if (error)
> - return error;
> + goto out;
>
> iattr->ia_valid &= ~ATTR_SIZE; /* ignore size changes */
>
> error = inode_setattr(inode, iattr);
> if (error)
> - return error;
> + goto out;
>
> - mutex_lock(&sysfs_mutex);
> error = sysfs_sd_setattr(sd, iattr);
> +out:
> mutex_unlock(&sysfs_mutex);
> -
> return error;
> }
>
> --
> 1.6.5.2.143.g8cc62
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2010-01-12 05:56:46

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 3/7] sysfs: Keep an nlink count on sysfs directories.

Quoting Eric W. Biederman ([email protected]):
> From: Eric W. Biederman <[email protected]>
>
> On large directories sysfs_count_nlinks can be a significant
> bottleneck, so keep a count in sysfs_dirent. If we exceed
> the maximum number of directory entries we can store return
> nlink of 1. An nlink of 1 matches what reiserfs does in
> this case, and it let's find and similar utlities know that
> we have a the directory nlink can not be used for optimization
> purposes.
>
> Signed-off-by: Eric W. Biederman <[email protected]>
> ---
> fs/sysfs/dir.c | 20 ++++++++++++++++++++
> fs/sysfs/inode.c | 15 +--------------
> fs/sysfs/mount.c | 1 +
> fs/sysfs/sysfs.h | 1 +
> 4 files changed, 23 insertions(+), 14 deletions(-)
>
> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> index 5c4703d..1c364be 100644
> --- a/fs/sysfs/dir.c
> +++ b/fs/sysfs/dir.c
> @@ -359,6 +359,7 @@ struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type)
> sd->s_name = name;
> sd->s_mode = mode;
> sd->s_flags = type;
> + sd->s_nlink = type == SYSFS_DIR? 2:1;
>
> return sd;
>
> @@ -392,6 +393,23 @@ void sysfs_addrm_start(struct sysfs_addrm_cxt *acxt,
> mutex_lock(&sysfs_mutex);
> }
>
> +static void sysfs_dir_inc_nlink(struct sysfs_dirent *sd)
> +{
> + /* If we overflow force nlink to be 1 */
> + if (sd->s_nlink > 1) {
> + sd->s_nlink++;
> + if (sd->s_nlink == 0)
> + sd->s_nlink = 1;
> + }
> +}

Now, the original code only ups nlink on parent if child is a dir.
IIUC your code will now inc nlink for files as well. Is any userspace
going to be confused by that?

> +
> +static void sysfs_dir_dec_nlink(struct sysfs_dirent *sd)
> +{
> + /* Don't decrement if we overflowed an increment */
> + if (sd->s_nlink != 1)
> + sd->s_nlink--;
> +}
> +
> /**
> * __sysfs_add_one - add sysfs_dirent to parent without warning
> * @acxt: addrm context to use
> @@ -420,6 +438,7 @@ int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
> return -EEXIST;
>
> sd->s_parent = sysfs_get(acxt->parent_sd);
> + sysfs_dir_inc_nlink(sd->s_parent);
>
> sysfs_link_sibling(sd);
>
> @@ -512,6 +531,7 @@ void sysfs_remove_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
>
> BUG_ON(sd->s_flags & SYSFS_FLAG_REMOVED);
>
> + sysfs_dir_dec_nlink(sd->s_parent);
> sysfs_unlink_sibling(sd);
>
> /* Update timestamps on the parent */
> diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
> index 104cbc1..f0172b3 100644
> --- a/fs/sysfs/inode.c
> +++ b/fs/sysfs/inode.c
> @@ -201,18 +201,6 @@ static inline void set_inode_attr(struct inode * inode, struct iattr * iattr)
> inode->i_ctime = iattr->ia_ctime;
> }
>
> -static int sysfs_count_nlink(struct sysfs_dirent *sd)
> -{
> - struct sysfs_dirent *child;
> - int nr = 0;
> -
> - for (child = sd->s_dir.children; child; child = child->s_sibling)
> - if (sysfs_type(child) == SYSFS_DIR)
> - nr++;
> -
> - return nr + 2;
> -}
> -
> static void sysfs_refresh_inode(struct sysfs_dirent *sd, struct inode *inode)
> {
> struct sysfs_inode_attrs *iattrs = sd->s_iattr;
> @@ -228,8 +216,7 @@ static void sysfs_refresh_inode(struct sysfs_dirent *sd, struct inode *inode)
> iattrs->ia_secdata_len);
> }
>
> - if (sysfs_type(sd) == SYSFS_DIR)
> - inode->i_nlink = sysfs_count_nlink(sd);
> + inode->i_nlink = sd->s_nlink;
> }
>
> int sysfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
> diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
> index 4974995..372a32c 100644
> --- a/fs/sysfs/mount.c
> +++ b/fs/sysfs/mount.c
> @@ -37,6 +37,7 @@ struct sysfs_dirent sysfs_root = {
> .s_count = ATOMIC_INIT(1),
> .s_flags = SYSFS_DIR,
> .s_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
> + .s_nlink = 2,
> .s_ino = 1,
> };
>
> diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
> index 0417805..85d5c6c 100644
> --- a/fs/sysfs/sysfs.h
> +++ b/fs/sysfs/sysfs.h
> @@ -67,6 +67,7 @@ struct sysfs_dirent {
>
> unsigned int s_flags;
> unsigned short s_mode;
> + unsigned short s_nlink;
> ino_t s_ino;
> struct sysfs_inode_attrs *s_iattr;
> };
> --
> 1.6.5.2.143.g8cc62

2010-01-12 06:16:46

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 3/7] sysfs: Keep an nlink count on sysfs directories.

Hello,

On 01/12/2010 10:06 AM, Eric W. Biederman wrote:
>> 64k is too small. 10 gig interfaces can currently service 50-100k users,
>> each of which requires their own network device.
>
> 64k is the point at which I stop returning a userful nlink number to
> userspace, sysfs directories can still scale to any size.
>
> I could legitimately always return nlink == 1, and all userspace
> utilities that I know of would be fine, but I am trying to be a bit
> more polite than that.

I don't think it's more polite. It's more error-prone. 64k is a
limit where breach would be rare enough to be unexpected but would
still happen on corner cases. I don't know what would be the end
result of incorrect nlink but it's far better to have something which
behaves consistently than to have something which flips its meaning
after reaching certain rare but still quite possible limit. If nlink
is something we can ignore, just set it to 1 for all sysfs files. If
nlink must be kept correct, use 32bit limit and fail to create more.

> I don't see the link count as interesting enough to store more than
> 16bits for it. Even with 32bits of storage for nlink sysfs would have
> to have to handle the rollover case as I am doing now. So I don't
> see any advantage to storing more bits.

I really don't think reiserfs's example is a good one to follow
especially with significantly lowered limit.

Thanks.

--
tejun

2010-01-12 06:19:42

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 4/7] sysfs: Implement sysfs_rename_link

On 01/12/2010 05:21 AM, Eric W. Biederman wrote:
> From: Eric W. Biederman <[email protected]>
>
> Because of rename ordering problems we occassionally give false
> warnings about invalid sysfs operations. So using sysfs_rename
> create a sysfs_rename_link function that doesn't need strange
> workarounds.
>
> Cc: Benjamin Thery <[email protected]>
> Cc: Daniel Lezcano <[email protected]>
> Cc: Tejun Heo <[email protected]>
> Signed-off-by: Eric W. Biederman <[email protected]>

Acked-by: Tejun Heo <[email protected]>

Thanks.

--
tejun

2010-01-12 06:19:49

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 5/7] driver core: Use sysfs_rename_link in device_rename

On 01/12/2010 05:21 AM, Eric W. Biederman wrote:
> From: Eric W. Biederman <[email protected]>
>
> Don't open code the renaming of symlinks in sysfs
> instead use the new helper function sysfs_rename_link
>
> Signed-off-by: Eric W. Biederman <[email protected]>

Acked-by: Tejun Heo <[email protected]>

Thanks.

--
tejun

2010-01-12 08:30:45

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 3/7] sysfs: Keep an nlink count on sysfs directories.

"Serge E. Hallyn" <[email protected]> writes:

> Quoting Eric W. Biederman ([email protected]):
>> From: Eric W. Biederman <[email protected]>
>>
>> On large directories sysfs_count_nlinks can be a significant
>> bottleneck, so keep a count in sysfs_dirent. If we exceed
>> the maximum number of directory entries we can store return
>> nlink of 1. An nlink of 1 matches what reiserfs does in
>> this case, and it let's find and similar utlities know that
>> we have a the directory nlink can not be used for optimization
>> purposes.
>>
>> Signed-off-by: Eric W. Biederman <[email protected]>
>> ---
>> fs/sysfs/dir.c | 20 ++++++++++++++++++++
>> fs/sysfs/inode.c | 15 +--------------
>> fs/sysfs/mount.c | 1 +
>> fs/sysfs/sysfs.h | 1 +
>> 4 files changed, 23 insertions(+), 14 deletions(-)
>>
>> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
>> index 5c4703d..1c364be 100644
>> --- a/fs/sysfs/dir.c
>> +++ b/fs/sysfs/dir.c
>> @@ -359,6 +359,7 @@ struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type)
>> sd->s_name = name;
>> sd->s_mode = mode;
>> sd->s_flags = type;
>> + sd->s_nlink = type == SYSFS_DIR? 2:1;
>>
>> return sd;
>>
>> @@ -392,6 +393,23 @@ void sysfs_addrm_start(struct sysfs_addrm_cxt *acxt,
>> mutex_lock(&sysfs_mutex);
>> }
>>
>> +static void sysfs_dir_inc_nlink(struct sysfs_dirent *sd)
>> +{
>> + /* If we overflow force nlink to be 1 */
>> + if (sd->s_nlink > 1) {
>> + sd->s_nlink++;
>> + if (sd->s_nlink == 0)
>> + sd->s_nlink = 1;
>> + }
>> +}
>
> Now, the original code only ups nlink on parent if child is a dir.
> IIUC your code will now inc nlink for files as well. Is any userspace
> going to be confused by that?

Good point. nlink should only be upped for directories.

Eric

2010-01-12 12:40:54

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCH 3/7] sysfs: Keep an nlink count on sysfs directories.

On Mon, 11 Jan 2010 12:21:51 -0800,
"Eric W. Biederman" <[email protected]> wrote:

> From: Eric W. Biederman <[email protected]>
>
> On large directories sysfs_count_nlinks can be a significant
> bottleneck, so keep a count in sysfs_dirent. If we exceed
> the maximum number of directory entries we can store return
> nlink of 1. An nlink of 1 matches what reiserfs does in
> this case, and it let's find and similar utlities know that
> we have a the directory nlink can not be used for optimization
> purposes.
>
> Signed-off-by: Eric W. Biederman <[email protected]>
> ---
> fs/sysfs/dir.c | 20 ++++++++++++++++++++
> fs/sysfs/inode.c | 15 +--------------
> fs/sysfs/mount.c | 1 +
> fs/sysfs/sysfs.h | 1 +
> 4 files changed, 23 insertions(+), 14 deletions(-)
>

> @@ -420,6 +438,7 @@ int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
> return -EEXIST;
>
> sd->s_parent = sysfs_get(acxt->parent_sd);
> + sysfs_dir_inc_nlink(sd->s_parent);
>
> sysfs_link_sibling(sd);
>
> @@ -512,6 +531,7 @@ void sysfs_remove_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
>
> BUG_ON(sd->s_flags & SYSFS_FLAG_REMOVED);
>
> + sysfs_dir_dec_nlink(sd->s_parent);
> sysfs_unlink_sibling(sd);
>
> /* Update timestamps on the parent */

Shouldn't s_nlink be adjusted in sysfs_rename for the move case as well?

2010-01-12 15:35:05

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 3/7] sysfs: Keep an nlink count on sysfs directories.

Cornelia Huck <[email protected]> writes:

> On Mon, 11 Jan 2010 12:21:51 -0800,
> "Eric W. Biederman" <[email protected]> wrote:
>
>> From: Eric W. Biederman <[email protected]>
>>
>> On large directories sysfs_count_nlinks can be a significant
>> bottleneck, so keep a count in sysfs_dirent. If we exceed
>> the maximum number of directory entries we can store return
>> nlink of 1. An nlink of 1 matches what reiserfs does in
>> this case, and it let's find and similar utlities know that
>> we have a the directory nlink can not be used for optimization
>> purposes.
>>
>> Signed-off-by: Eric W. Biederman <[email protected]>
>> ---
>> fs/sysfs/dir.c | 20 ++++++++++++++++++++
>> fs/sysfs/inode.c | 15 +--------------
>> fs/sysfs/mount.c | 1 +
>> fs/sysfs/sysfs.h | 1 +
>> 4 files changed, 23 insertions(+), 14 deletions(-)
>>
>
>> @@ -420,6 +438,7 @@ int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
>> return -EEXIST;
>>
>> sd->s_parent = sysfs_get(acxt->parent_sd);
>> + sysfs_dir_inc_nlink(sd->s_parent);
>>
>> sysfs_link_sibling(sd);
>>
>> @@ -512,6 +531,7 @@ void sysfs_remove_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
>>
>> BUG_ON(sd->s_flags & SYSFS_FLAG_REMOVED);
>>
>> + sysfs_dir_dec_nlink(sd->s_parent);
>> sysfs_unlink_sibling(sd);
>>
>> /* Update timestamps on the parent */
>
> Shouldn't s_nlink be adjusted in sysfs_rename for the move case as well?

Yes. I hereby nominate this as my most buggy patch ever.
Ugh.

I thought I actually cared about nlinks when I bothered to keep a
nlink count, but I am starting to think I really don't.

Eric

2010-01-12 15:53:16

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: [PATCH 3/7] sysfs: Keep an nlink count on sysfs directories.

On Mon, 11 Jan 2010 19:53:08 EST, Benjamin LaHaise said:
> 64k is too small. 10 gig interfaces can currently service 50-100k users,

I'll certainly buy that.

> each of which requires their own network device.

You throwing each user into their own separate network namespace?



Attachments:
(No filename) (227.00 B)

2010-01-12 17:31:04

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 4/7] sysfs: Implement sysfs_rename_link

Quoting Eric W. Biederman ([email protected]):
> From: Eric W. Biederman <[email protected]>
>
> Because of rename ordering problems we occassionally give false
> warnings about invalid sysfs operations. So using sysfs_rename
> create a sysfs_rename_link function that doesn't need strange
> workarounds.

It looks nice and clean esp compared to the code you're replacing.
OTOH I do wonder whether your sysfs_type(sd) != SYSFS_KOBJ_LINK check
will catch unexpected ways that device_rename() was being called.
Shouldn't, I realize, since sysfs_create_link() always sets SYSFS_KOBJ_LINK,
but the check wasn't there before, so stranger things have happened.

> Cc: Benjamin Thery <[email protected]>
> Cc: Daniel Lezcano <[email protected]>
> Cc: Tejun Heo <[email protected]>
> Signed-off-by: Eric W. Biederman <[email protected]>

Acked-by: Serge Hallyn <[email protected]>

> ---
> fs/sysfs/symlink.c | 38 ++++++++++++++++++++++++++++++++++++++
> include/linux/sysfs.h | 9 +++++++++
> 2 files changed, 47 insertions(+), 0 deletions(-)
>
> diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
> index c5eff49..1b9a3a1 100644
> --- a/fs/sysfs/symlink.c
> +++ b/fs/sysfs/symlink.c
> @@ -123,6 +123,44 @@ void sysfs_remove_link(struct kobject * kobj, const char * name)
> sysfs_hash_and_remove(parent_sd, name);
> }
>
> +/**
> + * sysfs_rename_link - rename symlink in object's directory.
> + * @kobj: object we're acting for.
> + * @targ: object we're pointing to.
> + * @old: previous name of the symlink.
> + * @new: new name of the symlink.
> + *
> + * A helper function for the common rename symlink idiom.
> + */
> +int sysfs_rename_link(struct kobject *kobj, struct kobject *targ,
> + const char *old, const char *new)
> +{
> + struct sysfs_dirent *parent_sd, *sd = NULL;
> + int result;
> +
> + if (!kobj)
> + parent_sd = &sysfs_root;
> + else
> + parent_sd = kobj->sd;
> +
> + result = -ENOENT;
> + sd = sysfs_get_dirent(parent_sd, old);
> + if (!sd)
> + goto out;
> +
> + result = -EINVAL;
> + if (sysfs_type(sd) != SYSFS_KOBJ_LINK)
> + goto out;
> + if (sd->s_symlink.target_sd->s_dir.kobj != targ)
> + goto out;
> +
> + result = sysfs_rename(sd, parent_sd, new);
> +
> +out:
> + sysfs_put(sd);
> + return result;
> +}
> +
> static int sysfs_get_target_path(struct sysfs_dirent *parent_sd,
> struct sysfs_dirent *target_sd, char *path)
> {
> diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
> index cfa8308..e322573 100644
> --- a/include/linux/sysfs.h
> +++ b/include/linux/sysfs.h
> @@ -110,6 +110,9 @@ int __must_check sysfs_create_link_nowarn(struct kobject *kobj,
> const char *name);
> void sysfs_remove_link(struct kobject *kobj, const char *name);
>
> +int sysfs_rename_link(struct kobject *kobj, struct kobject *target,
> + const char *old_name, const char *new_name);
> +
> int __must_check sysfs_create_group(struct kobject *kobj,
> const struct attribute_group *grp);
> int sysfs_update_group(struct kobject *kobj,
> @@ -203,6 +206,12 @@ static inline void sysfs_remove_link(struct kobject *kobj, const char *name)
> {
> }
>
> +static inline int sysfs_rename_link(struct kobject *k, struct kobject *t,
> + const char *old_name, const char *new_name)
> +{
> + return 0;
> +}
> +
> static inline int sysfs_create_group(struct kobject *kobj,
> const struct attribute_group *grp)
> {
> --
> 1.6.5.2.143.g8cc62

2010-01-12 17:34:54

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 5/7] driver core: Use sysfs_rename_link in device_rename

Quoting Eric W. Biederman ([email protected]):
> From: Eric W. Biederman <[email protected]>
>
> Don't open code the renaming of symlinks in sysfs
> instead use the new helper function sysfs_rename_link
>
> Signed-off-by: Eric W. Biederman <[email protected]>

Acked-by: Serge Hallyn <[email protected]>

> ---
> drivers/base/core.c | 18 ++++++------------
> 1 files changed, 6 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 2820257..ba66394 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -1574,22 +1574,16 @@ int device_rename(struct device *dev, char *new_name)
> if (old_class_name) {
> new_class_name = make_class_name(dev->class->name, &dev->kobj);
> if (new_class_name) {
> - error = sysfs_create_link_nowarn(&dev->parent->kobj,
> - &dev->kobj,
> - new_class_name);
> - if (error)
> - goto out;
> - sysfs_remove_link(&dev->parent->kobj, old_class_name);
> + error = sysfs_rename_link(&dev->parent->kobj,
> + &dev->kobj,
> + old_class_name,
> + new_class_name);
> }
> }
> #else
> if (dev->class) {
> - error = sysfs_create_link_nowarn(&dev->class->p->class_subsys.kobj,
> - &dev->kobj, dev_name(dev));
> - if (error)
> - goto out;
> - sysfs_remove_link(&dev->class->p->class_subsys.kobj,
> - old_device_name);
> + error = sysfs_rename_link(&dev->class->p->class_subsys.kobj,
> + &dev->kobj, old_device_name, new_name);
> }
> #endif
>
> --
> 1.6.5.2.143.g8cc62

2010-01-12 17:43:57

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 6/7] sysfs: Pass super_block to sysfs_get_inode

Quoting Eric W. Biederman ([email protected]):
> From: Eric W. Biederman <[email protected]>
>
> Currently sysfs_get_inode magically returns an inode on
> sysfs_sb. Make the super_block parameter explicit and
> the code becomes clearer.
>
> Acked-by: Tejun Heo <[email protected]>
> Signed-off-by: Eric W. Biederman <[email protected]>

Acked-by: Serge Hallyn <[email protected]>

> ---
> fs/sysfs/dir.c | 2 +-
> fs/sysfs/inode.c | 5 +++--
> fs/sysfs/mount.c | 2 +-
> fs/sysfs/sysfs.h | 2 +-
> 4 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> index 1c364be..a491c72 100644
> --- a/fs/sysfs/dir.c
> +++ b/fs/sysfs/dir.c
> @@ -701,7 +701,7 @@ static struct dentry * sysfs_lookup(struct inode *dir, struct dentry *dentry,
> }
>
> /* attach dentry and inode */
> - inode = sysfs_get_inode(sd);
> + inode = sysfs_get_inode(dir->i_sb, sd);
> if (!inode) {
> ret = ERR_PTR(-ENOMEM);
> goto out_unlock;
> diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
> index f0172b3..487e33c 100644
> --- a/fs/sysfs/inode.c
> +++ b/fs/sysfs/inode.c
> @@ -271,6 +271,7 @@ static void sysfs_init_inode(struct sysfs_dirent *sd, struct inode *inode)
>
> /**
> * sysfs_get_inode - get inode for sysfs_dirent
> + * @sb: super block
> * @sd: sysfs_dirent to allocate inode for
> *
> * Get inode for @sd. If such inode doesn't exist, a new inode
> @@ -283,11 +284,11 @@ static void sysfs_init_inode(struct sysfs_dirent *sd, struct inode *inode)
> * RETURNS:
> * Pointer to allocated inode on success, NULL on failure.
> */
> -struct inode * sysfs_get_inode(struct sysfs_dirent *sd)
> +struct inode * sysfs_get_inode(struct super_block *sb, struct sysfs_dirent *sd)
> {
> struct inode *inode;
>
> - inode = iget_locked(sysfs_sb, sd->s_ino);
> + inode = iget_locked(sb, sd->s_ino);
> if (inode && (inode->i_state & I_NEW))
> sysfs_init_inode(sd, inode);
>
> diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
> index 372a32c..77bce54 100644
> --- a/fs/sysfs/mount.c
> +++ b/fs/sysfs/mount.c
> @@ -55,7 +55,7 @@ static int sysfs_fill_super(struct super_block *sb, void *data, int silent)
>
> /* get root inode, initialize and unlock it */
> mutex_lock(&sysfs_mutex);
> - inode = sysfs_get_inode(&sysfs_root);
> + inode = sysfs_get_inode(sb, &sysfs_root);
> mutex_unlock(&sysfs_mutex);
> if (!inode) {
> pr_debug("sysfs: could not get root inode\n");
> diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
> index 85d5c6c..1ae6009 100644
> --- a/fs/sysfs/sysfs.h
> +++ b/fs/sysfs/sysfs.h
> @@ -169,7 +169,7 @@ static inline void __sysfs_put(struct sysfs_dirent *sd)
> /*
> * inode.c
> */
> -struct inode *sysfs_get_inode(struct sysfs_dirent *sd);
> +struct inode *sysfs_get_inode(struct super_block *sb, struct sysfs_dirent *sd);
> void sysfs_delete_inode(struct inode *inode);
> int sysfs_sd_setattr(struct sysfs_dirent *sd, struct iattr *iattr);
> int sysfs_permission(struct inode *inode, int mask);
> --
> 1.6.5.2.143.g8cc62

2010-01-12 17:44:27

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 7/7] sysfs: Kill unused sysfs_sb variable.

Quoting Eric W. Biederman ([email protected]):
> From: Eric W. Biederman <[email protected]>
>
> Now that there are no more users we can remove
> the sysfs_sb variable.
>
> Acked-by: Tejun Heo <[email protected]>
> Signed-off-by: Eric W. Biederman <[email protected]>

Acked-by: Serge Hallyn <[email protected]>

> ---
> fs/sysfs/mount.c | 2 --
> fs/sysfs/sysfs.h | 1 -
> 2 files changed, 0 insertions(+), 3 deletions(-)
>
> diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
> index 77bce54..50cf41c 100644
> --- a/fs/sysfs/mount.c
> +++ b/fs/sysfs/mount.c
> @@ -23,7 +23,6 @@
>
>
> static struct vfsmount *sysfs_mount;
> -struct super_block * sysfs_sb = NULL;
> struct kmem_cache *sysfs_dir_cachep;
>
> static const struct super_operations sysfs_ops = {
> @@ -51,7 +50,6 @@ static int sysfs_fill_super(struct super_block *sb, void *data, int silent)
> sb->s_magic = SYSFS_MAGIC;
> sb->s_op = &sysfs_ops;
> sb->s_time_gran = 1;
> - sysfs_sb = sb;
>
> /* get root inode, initialize and unlock it */
> mutex_lock(&sysfs_mutex);
> diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
> index 1ae6009..0ea96af 100644
> --- a/fs/sysfs/sysfs.h
> +++ b/fs/sysfs/sysfs.h
> @@ -112,7 +112,6 @@ struct sysfs_addrm_cxt {
> * mount.c
> */
> extern struct sysfs_dirent sysfs_root;
> -extern struct super_block *sysfs_sb;
> extern struct kmem_cache *sysfs_dir_cachep;
>
> /*
> --
> 1.6.5.2.143.g8cc62
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2010-01-15 21:47:42

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 0/7] General sysfs enhancements

On Mon, Jan 11, 2010 at 12:16:09PM -0800, Eric W. Biederman wrote:
>
> This patchset is built on top of 2.6.33-rc3 +
> "sysfs: Cache the last sysfs_dirent to improve readdir scalability v2"
>
> Greg I assume you have that patch somewhere in queue.

It is now applied.

> This patchset contains little cleanups all around sysfs.
> - Some basic cleanups.
> - Some sysfs scaling work
> - Implementation of sysfs_rename

Care to resend these, as it seems that some needed to be redone.

thanks,

greg k-h

2010-02-13 03:20:58

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 0/6] General sysfs enhancements

Greg KH <[email protected]> writes:

> On Mon, Jan 11, 2010 at 12:16:09PM -0800, Eric W. Biederman wrote:
>>
>> This patchset is built on top of 2.6.33-rc3 +
>> "sysfs: Cache the last sysfs_dirent to improve readdir scalability v2"
>>
>> Greg I assume you have that patch somewhere in queue.
>
> It is now applied.
>
>> This patchset contains little cleanups all around sysfs.
>> - Some basic cleanups.
>> - Some sysfs scaling work
>> - Implementation of sysfs_rename
>
> Care to resend these, as it seems that some needed to be redone.

Here goes. I am finally recovered enough from my cold to find the
time and energy to do this properly.

For now I am dropping the controversial and buggy nlink changing patch
from my patchset, I can come back to that another time.

Eric

2010-02-13 03:22:45

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 1/6] sysfs: Serialize updates to the vfs inode

From: Eric W. Biederman <[email protected]>

The vfs depends upon filesystem methods to update the
vfs inode. Sysfs adds to the normal number of places
where the vfs inode is updated by also updatng the
vfs inode in sysfs_refresh_inode.

Typically the inode mutex is used to serialize updates
to the vfs inode, but grabbing the inode mutex in
sysfs_permission and sysfs_getattr causes deadlocks,
because sometimes the vfs calls those operations with
the inode mutex held. Therefore sysfs can not use the
inode mutex to serial updates to the vfs inode.

The sysfs_mutex is acquired in all of the routines
where sysfs updates the vfs inode, and with a small
change we can consistently protext sysfs vfs inode
updates with the sysfs_mutex. To protect the sysfs
vfs inode updates with the sysfs_mutex simply requires
extending the scope of sysfs_mutex in sysfs_setattr
over inode_setattr, and over inode_change_ok (so we
have an unchanging inode when we perform the check).

Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/inode.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index 6a06a1d..0d09f6c 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -111,20 +111,20 @@ int sysfs_setattr(struct dentry *dentry, struct iattr *iattr)
if (!sd)
return -EINVAL;

+ mutex_lock(&sysfs_mutex);
error = inode_change_ok(inode, iattr);
if (error)
- return error;
+ goto out;

iattr->ia_valid &= ~ATTR_SIZE; /* ignore size changes */

error = inode_setattr(inode, iattr);
if (error)
- return error;
+ goto out;

- mutex_lock(&sysfs_mutex);
error = sysfs_sd_setattr(sd, iattr);
+out:
mutex_unlock(&sysfs_mutex);
-
return error;
}

--
1.6.5.2.143.g8cc62

2010-02-13 03:22:55

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 2/6] sysfs: Pack sysfs_dirent more tightly.

From: Eric W. Biederman <[email protected]>

Placing the 16bit s_mode between a pointer and a long doesn't pack well
especailly on 64bit where we wast 48 bits. So move s_mode and
declare it as a unsigned short. This is the sysfs backing store
after all we don't need fields extra large just in case someday
we want userspace to be able to use a larger value.

Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/sysfs.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index 37e0e08..5a3192a 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -66,8 +66,8 @@ struct sysfs_dirent {
};

unsigned int s_flags;
+ unsigned short s_mode;
ino_t s_ino;
- umode_t s_mode;
struct sysfs_inode_attrs *s_iattr;
};

--
1.6.5.2.143.g8cc62

2010-02-13 03:22:53

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 3/6] sysfs: Implement sysfs_rename_link

From: Eric W. Biederman <[email protected]>

Because of rename ordering problems we occassionally give false
warnings about invalid sysfs operations. So using sysfs_rename
create a sysfs_rename_link function that doesn't need strange
workarounds.

Cc: Benjamin Thery <[email protected]>
Cc: Daniel Lezcano <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/symlink.c | 38 ++++++++++++++++++++++++++++++++++++++
include/linux/sysfs.h | 9 +++++++++
2 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
index c5eff49..1b9a3a1 100644
--- a/fs/sysfs/symlink.c
+++ b/fs/sysfs/symlink.c
@@ -123,6 +123,44 @@ void sysfs_remove_link(struct kobject * kobj, const char * name)
sysfs_hash_and_remove(parent_sd, name);
}

+/**
+ * sysfs_rename_link - rename symlink in object's directory.
+ * @kobj: object we're acting for.
+ * @targ: object we're pointing to.
+ * @old: previous name of the symlink.
+ * @new: new name of the symlink.
+ *
+ * A helper function for the common rename symlink idiom.
+ */
+int sysfs_rename_link(struct kobject *kobj, struct kobject *targ,
+ const char *old, const char *new)
+{
+ struct sysfs_dirent *parent_sd, *sd = NULL;
+ int result;
+
+ if (!kobj)
+ parent_sd = &sysfs_root;
+ else
+ parent_sd = kobj->sd;
+
+ result = -ENOENT;
+ sd = sysfs_get_dirent(parent_sd, old);
+ if (!sd)
+ goto out;
+
+ result = -EINVAL;
+ if (sysfs_type(sd) != SYSFS_KOBJ_LINK)
+ goto out;
+ if (sd->s_symlink.target_sd->s_dir.kobj != targ)
+ goto out;
+
+ result = sysfs_rename(sd, parent_sd, new);
+
+out:
+ sysfs_put(sd);
+ return result;
+}
+
static int sysfs_get_target_path(struct sysfs_dirent *parent_sd,
struct sysfs_dirent *target_sd, char *path)
{
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index 2b24ae5..7ee8c31 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -148,6 +148,9 @@ int __must_check sysfs_create_link_nowarn(struct kobject *kobj,
const char *name);
void sysfs_remove_link(struct kobject *kobj, const char *name);

+int sysfs_rename_link(struct kobject *kobj, struct kobject *target,
+ const char *old_name, const char *new_name);
+
int __must_check sysfs_create_group(struct kobject *kobj,
const struct attribute_group *grp);
int sysfs_update_group(struct kobject *kobj,
@@ -241,6 +244,12 @@ static inline void sysfs_remove_link(struct kobject *kobj, const char *name)
{
}

+static inline int sysfs_rename_link(struct kobject *k, struct kobject *t,
+ const char *old_name, const char *new_name)
+{
+ return 0;
+}
+
static inline int sysfs_create_group(struct kobject *kobj,
const struct attribute_group *grp)
{
--
1.6.5.2.143.g8cc62

2010-02-13 03:23:18

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 4/6] driver core: Use sysfs_rename_link in device_rename

From: Eric W. Biederman <[email protected]>

Don't open code the renaming of symlinks in sysfs
instead use the new helper function sysfs_rename_link

Acked-by: Tejun Heo <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
drivers/base/core.c | 18 ++++++------------
1 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 2820257..ba66394 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1574,22 +1574,16 @@ int device_rename(struct device *dev, char *new_name)
if (old_class_name) {
new_class_name = make_class_name(dev->class->name, &dev->kobj);
if (new_class_name) {
- error = sysfs_create_link_nowarn(&dev->parent->kobj,
- &dev->kobj,
- new_class_name);
- if (error)
- goto out;
- sysfs_remove_link(&dev->parent->kobj, old_class_name);
+ error = sysfs_rename_link(&dev->parent->kobj,
+ &dev->kobj,
+ old_class_name,
+ new_class_name);
}
}
#else
if (dev->class) {
- error = sysfs_create_link_nowarn(&dev->class->p->class_subsys.kobj,
- &dev->kobj, dev_name(dev));
- if (error)
- goto out;
- sysfs_remove_link(&dev->class->p->class_subsys.kobj,
- old_device_name);
+ error = sysfs_rename_link(&dev->class->p->class_subsys.kobj,
+ &dev->kobj, old_device_name, new_name);
}
#endif

--
1.6.5.2.143.g8cc62

2010-02-13 03:23:14

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 5/6] sysfs: Pass super_block to sysfs_get_inode

From: Eric W. Biederman <[email protected]>

Currently sysfs_get_inode magically returns an inode on
sysfs_sb. Make the super_block parameter explicit and
the code becomes clearer.

Acked-by: Tejun Heo <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/dir.c | 2 +-
fs/sysfs/inode.c | 5 +++--
fs/sysfs/mount.c | 2 +-
fs/sysfs/sysfs.h | 2 +-
4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 481fdec..5907178 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -645,7 +645,7 @@ static struct dentry * sysfs_lookup(struct inode *dir, struct dentry *dentry,
}

/* attach dentry and inode */
- inode = sysfs_get_inode(sd);
+ inode = sysfs_get_inode(dir->i_sb, sd);
if (!inode) {
ret = ERR_PTR(-ENOMEM);
goto out_unlock;
diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index 0d09f6c..082daae 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -283,6 +283,7 @@ static void sysfs_init_inode(struct sysfs_dirent *sd, struct inode *inode)

/**
* sysfs_get_inode - get inode for sysfs_dirent
+ * @sb: super block
* @sd: sysfs_dirent to allocate inode for
*
* Get inode for @sd. If such inode doesn't exist, a new inode
@@ -295,11 +296,11 @@ static void sysfs_init_inode(struct sysfs_dirent *sd, struct inode *inode)
* RETURNS:
* Pointer to allocated inode on success, NULL on failure.
*/
-struct inode * sysfs_get_inode(struct sysfs_dirent *sd)
+struct inode * sysfs_get_inode(struct super_block *sb, struct sysfs_dirent *sd)
{
struct inode *inode;

- inode = iget_locked(sysfs_sb, sd->s_ino);
+ inode = iget_locked(sb, sd->s_ino);
if (inode && (inode->i_state & I_NEW))
sysfs_init_inode(sd, inode);

diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index 4974995..89db07e 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -54,7 +54,7 @@ static int sysfs_fill_super(struct super_block *sb, void *data, int silent)

/* get root inode, initialize and unlock it */
mutex_lock(&sysfs_mutex);
- inode = sysfs_get_inode(&sysfs_root);
+ inode = sysfs_get_inode(sb, &sysfs_root);
mutex_unlock(&sysfs_mutex);
if (!inode) {
pr_debug("sysfs: could not get root inode\n");
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index 5a3192a..7593d71 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -172,7 +172,7 @@ static inline void __sysfs_put(struct sysfs_dirent *sd)
/*
* inode.c
*/
-struct inode *sysfs_get_inode(struct sysfs_dirent *sd);
+struct inode *sysfs_get_inode(struct super_block *sb, struct sysfs_dirent *sd);
void sysfs_delete_inode(struct inode *inode);
int sysfs_sd_setattr(struct sysfs_dirent *sd, struct iattr *iattr);
int sysfs_permission(struct inode *inode, int mask);
--
1.6.5.2.143.g8cc62

2010-02-13 03:23:49

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 6/6] sysfs: Kill unused sysfs_sb variable.

From: Eric W. Biederman <[email protected]>

Now that there are no more users we can remove
the sysfs_sb variable.

Acked-by: Tejun Heo <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/mount.c | 2 --
fs/sysfs/sysfs.h | 1 -
2 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index 89db07e..0cb1088 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -23,7 +23,6 @@


static struct vfsmount *sysfs_mount;
-struct super_block * sysfs_sb = NULL;
struct kmem_cache *sysfs_dir_cachep;

static const struct super_operations sysfs_ops = {
@@ -50,7 +49,6 @@ static int sysfs_fill_super(struct super_block *sb, void *data, int silent)
sb->s_magic = SYSFS_MAGIC;
sb->s_op = &sysfs_ops;
sb->s_time_gran = 1;
- sysfs_sb = sb;

/* get root inode, initialize and unlock it */
mutex_lock(&sysfs_mutex);
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index 7593d71..30f5a44 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -115,7 +115,6 @@ struct sysfs_addrm_cxt {
* mount.c
*/
extern struct sysfs_dirent sysfs_root;
-extern struct super_block *sysfs_sb;
extern struct kmem_cache *sysfs_dir_cachep;

/*
--
1.6.5.2.143.g8cc62