2007-05-28 16:24:23

by Tejun Heo

[permalink] [raw]
Subject: [PATCHSET 2.6.22-rc2-mm1] sysfs: reduce memory footprint of sysfs_dirent

Hello, all.

This patchset reduces the size of sysfs_dirent to 88 byte from 136 on
64bit and to 52 from 76 on 32bit. Combined with forthcoming
reclaimable sysfs directories, this will make sysfs much more scalable
on very large machines and very small machines.

This patchset contains the following three patches.

#01: move-s_active-functions-to-fs-sysfs-dir-c, prep for #02
#02: slim-down-sysfs_dirent-s_active
#03: use-signly-linked-list-for-sysfs_dirent-tree

I'm pretty sure #02 is a good idea but #03 is debatable. The code
doesn't look much uglier after the conversion tho. Inputs welcome.

This patchset is on top of

2.6.22-rc2-mm1
+ [1] sysfs-assorted-fixes patchset

fs/sysfs/dir.c | 272 ++++++++++++++++++++++++++++++++++++++++++++-----------
fs/sysfs/inode.c | 12 +-
fs/sysfs/mount.c | 3
fs/sysfs/sysfs.h | 111 ++--------------------
4 files changed, 237 insertions(+), 161 deletions(-)

Thanks.

--
tejun

[1] http://thread.gmane.org/gmane.linux.kernel/535372



2007-05-28 16:24:37

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 1/3] sysfs: move s_active functions to fs/sysfs/dir.c

These functions are about to receive more complexity and doesn't
really need to be inlined in the first place. Move them from
fs/sysfs/sysfs.h to fs/sysfs/dir.c.

Signed-off-by: Tejun Heo <[email protected]>
---
fs/sysfs/dir.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++
fs/sysfs/sysfs.h | 94 +++--------------------------------------------------
2 files changed, 94 insertions(+), 88 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 06dff2c..f5f0b93 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -20,6 +20,94 @@ spinlock_t kobj_sysfs_assoc_lock = SPIN_
static spinlock_t sysfs_ino_lock = SPIN_LOCK_UNLOCKED;
static DEFINE_IDA(sysfs_ino_ida);

+/**
+ * sysfs_get_active - get an active reference to sysfs_dirent
+ * @sd: sysfs_dirent to get an active reference to
+ *
+ * Get an active reference of @sd. This function is noop if @sd
+ * is NULL.
+ *
+ * RETURNS:
+ * Pointer to @sd on success, NULL on failure.
+ */
+struct sysfs_dirent *sysfs_get_active(struct sysfs_dirent *sd)
+{
+ if (sd) {
+ if (unlikely(!down_read_trylock(&sd->s_active)))
+ sd = NULL;
+ }
+ return sd;
+}
+
+/**
+ * sysfs_put_active - put an active reference to sysfs_dirent
+ * @sd: sysfs_dirent to put an active reference to
+ *
+ * Put an active reference to @sd. This function is noop if @sd
+ * is NULL.
+ */
+void sysfs_put_active(struct sysfs_dirent *sd)
+{
+ if (sd)
+ up_read(&sd->s_active);
+}
+
+/**
+ * sysfs_get_active_two - get active references to sysfs_dirent and parent
+ * @sd: sysfs_dirent of interest
+ *
+ * Get active reference to @sd and its parent. Parent's active
+ * reference is grabbed first. This function is noop if @sd is
+ * NULL.
+ *
+ * RETURNS:
+ * Pointer to @sd on success, NULL on failure.
+ */
+struct sysfs_dirent *sysfs_get_active_two(struct sysfs_dirent *sd)
+{
+ if (sd) {
+ if (sd->s_parent && unlikely(!sysfs_get_active(sd->s_parent)))
+ return NULL;
+ if (unlikely(!sysfs_get_active(sd))) {
+ sysfs_put_active(sd->s_parent);
+ return NULL;
+ }
+ }
+ return sd;
+}
+
+/**
+ * sysfs_put_active_two - put active references to sysfs_dirent and parent
+ * @sd: sysfs_dirent of interest
+ *
+ * Put active references to @sd and its parent. This function is
+ * noop if @sd is NULL.
+ */
+void sysfs_put_active_two(struct sysfs_dirent *sd)
+{
+ if (sd) {
+ sysfs_put_active(sd);
+ sysfs_put_active(sd->s_parent);
+ }
+}
+
+/**
+ * sysfs_deactivate - deactivate sysfs_dirent
+ * @sd: sysfs_dirent to deactivate
+ *
+ * Deny new active references and drain existing ones. s_active
+ * will be unlocked when the sysfs_dirent is released.
+ */
+void sysfs_deactivate(struct sysfs_dirent *sd)
+{
+ down_write_nested(&sd->s_active, SYSFS_S_ACTIVE_DEACTIVATE);
+
+ /* s_active will be unlocked by the thread doing the final put
+ * on @sd. Lie to lockdep.
+ */
+ rwsem_release(&sd->s_active.dep_map, 1, _RET_IP_);
+}
+
static int sysfs_alloc_ino(ino_t *pino)
{
int ino, rc;
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index 627bf39..f8779ea 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -56,6 +56,12 @@ enum sysfs_s_active_class
extern struct vfsmount * sysfs_mount;
extern struct kmem_cache *sysfs_dir_cachep;

+extern struct sysfs_dirent *sysfs_get_active(struct sysfs_dirent *sd);
+extern void sysfs_put_active(struct sysfs_dirent *sd);
+extern struct sysfs_dirent *sysfs_get_active_two(struct sysfs_dirent *sd);
+extern void sysfs_put_active_two(struct sysfs_dirent *sd);
+extern void sysfs_deactivate(struct sysfs_dirent *sd);
+
extern void sysfs_delete_inode(struct inode *inode);
extern void sysfs_init_inode(struct sysfs_dirent *sd, struct inode *inode);
extern struct inode * sysfs_get_inode(struct sysfs_dirent *sd);
@@ -104,94 +110,6 @@ static inline void sysfs_put(struct sysf
release_sysfs_dirent(sd);
}

-/**
- * sysfs_get_active - get an active reference to sysfs_dirent
- * @sd: sysfs_dirent to get an active reference to
- *
- * Get an active reference of @sd. This function is noop if @sd
- * is NULL.
- *
- * RETURNS:
- * Pointer to @sd on success, NULL on failure.
- */
-static inline struct sysfs_dirent *sysfs_get_active(struct sysfs_dirent *sd)
-{
- if (sd) {
- if (unlikely(!down_read_trylock(&sd->s_active)))
- sd = NULL;
- }
- return sd;
-}
-
-/**
- * sysfs_put_active - put an active reference to sysfs_dirent
- * @sd: sysfs_dirent to put an active reference to
- *
- * Put an active reference to @sd. This function is noop if @sd
- * is NULL.
- */
-static inline void sysfs_put_active(struct sysfs_dirent *sd)
-{
- if (sd)
- up_read(&sd->s_active);
-}
-
-/**
- * sysfs_get_active_two - get active references to sysfs_dirent and parent
- * @sd: sysfs_dirent of interest
- *
- * Get active reference to @sd and its parent. Parent's active
- * reference is grabbed first. This function is noop if @sd is
- * NULL.
- *
- * RETURNS:
- * Pointer to @sd on success, NULL on failure.
- */
-static inline struct sysfs_dirent *sysfs_get_active_two(struct sysfs_dirent *sd)
-{
- if (sd) {
- if (sd->s_parent && unlikely(!sysfs_get_active(sd->s_parent)))
- return NULL;
- if (unlikely(!sysfs_get_active(sd))) {
- sysfs_put_active(sd->s_parent);
- return NULL;
- }
- }
- return sd;
-}
-
-/**
- * sysfs_put_active_two - put active references to sysfs_dirent and parent
- * @sd: sysfs_dirent of interest
- *
- * Put active references to @sd and its parent. This function is
- * noop if @sd is NULL.
- */
-static inline void sysfs_put_active_two(struct sysfs_dirent *sd)
-{
- if (sd) {
- sysfs_put_active(sd);
- sysfs_put_active(sd->s_parent);
- }
-}
-
-/**
- * sysfs_deactivate - deactivate sysfs_dirent
- * @sd: sysfs_dirent to deactivate
- *
- * Deny new active references and drain existing ones. s_active
- * will be unlocked when the sysfs_dirent is released.
- */
-static inline void sysfs_deactivate(struct sysfs_dirent *sd)
-{
- down_write_nested(&sd->s_active, SYSFS_S_ACTIVE_DEACTIVATE);
-
- /* s_active will be unlocked by the thread doing the final put
- * on @sd. Lie to lockdep.
- */
- rwsem_release(&sd->s_active.dep_map, 1, _RET_IP_);
-}
-
static inline int sysfs_is_shadowed_inode(struct inode *inode)
{
return S_ISDIR(inode->i_mode) && inode->i_op->follow_link;
--
1.4.3.4


2007-05-28 16:24:51

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 2/3] sysfs: slim down sysfs_dirent->s_active

Make sysfs_dirent->s_active an atomic_t instead of rwsem. This
reduces the size of sysfs_dirent from 136 to 104 on 64bit and from 76
to 60 on 32bit with lock debugging turned off. With lock debugging
turned on the reduction is much larger.

s_active starts at zero and each active reference increments s_active.
Putting a reference decrements s_active. Deactivation subtracts
SD_DEACTIVATED_BIAS which is currently INT_MIN and assumed to be small
enough to make s_active negative. If s_active is negative,
sysfs_get() no longer grants new references. Deactivation succeeds
immediately if there is no active user; otherwise, it waits using a
completion for the last put.

Due to the removal of lockdep tricks, this change makes things less
trickier in release_sysfs_dirent(). As all the complexity is
contained in three s_active functions, I think it's more readable this
way.

Signed-off-by: Tejun Heo <[email protected]>
---
fs/sysfs/dir.c | 74 +++++++++++++++++++++++++++++++++++-------------------
fs/sysfs/sysfs.h | 13 +--------
2 files changed, 50 insertions(+), 37 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index f5f0b93..40596a0 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -10,6 +10,7 @@
#include <linux/kobject.h>
#include <linux/namei.h>
#include <linux/idr.h>
+#include <linux/completion.h>
#include <asm/semaphore.h>
#include "sysfs.h"

@@ -32,11 +33,24 @@ static DEFINE_IDA(sysfs_ino_ida);
*/
struct sysfs_dirent *sysfs_get_active(struct sysfs_dirent *sd)
{
- if (sd) {
- if (unlikely(!down_read_trylock(&sd->s_active)))
- sd = NULL;
+ if (unlikely(!sd))
+ return NULL;
+
+ while (1) {
+ int v, t;
+
+ v = atomic_read(&sd->s_active);
+ if (unlikely(v < 0))
+ return NULL;
+
+ t = atomic_cmpxchg(&sd->s_active, v, v + 1);
+ if (likely(t == v))
+ return sd;
+ if (t < 0)
+ return NULL;
+
+ cpu_relax();
}
- return sd;
}

/**
@@ -48,8 +62,21 @@ struct sysfs_dirent *sysfs_get_active(st
*/
void sysfs_put_active(struct sysfs_dirent *sd)
{
- if (sd)
- up_read(&sd->s_active);
+ struct completion *cmpl;
+ int v;
+
+ if (unlikely(!sd))
+ return;
+
+ v = atomic_dec_return(&sd->s_active);
+ if (likely(v != SD_DEACTIVATED_BIAS))
+ return;
+
+ /* atomic_dec_return() is a mb(), we'll always see the updated
+ * sd->s_sibling.next.
+ */
+ cmpl = (void *)sd->s_sibling.next;
+ complete(cmpl);
}

/**
@@ -95,17 +122,25 @@ void sysfs_put_active_two(struct sysfs_d
* sysfs_deactivate - deactivate sysfs_dirent
* @sd: sysfs_dirent to deactivate
*
- * Deny new active references and drain existing ones. s_active
- * will be unlocked when the sysfs_dirent is released.
+ * Deny new active references and drain existing ones.
*/
void sysfs_deactivate(struct sysfs_dirent *sd)
{
- down_write_nested(&sd->s_active, SYSFS_S_ACTIVE_DEACTIVATE);
+ DECLARE_COMPLETION_ONSTACK(wait);
+ int v;
+
+ BUG_ON(!list_empty(&sd->s_sibling));
+ sd->s_sibling.next = (void *)&wait;

- /* s_active will be unlocked by the thread doing the final put
- * on @sd. Lie to lockdep.
+ /* atomic_add_return() is a mb(), put_active() will always see
+ * the updated sd->s_sibling.next.
*/
- rwsem_release(&sd->s_active.dep_map, 1, _RET_IP_);
+ v = atomic_add_return(SD_DEACTIVATED_BIAS, &sd->s_active);
+
+ if (v != SD_DEACTIVATED_BIAS)
+ wait_for_completion(&wait);
+
+ INIT_LIST_HEAD(&sd->s_sibling);
}

static int sysfs_alloc_ino(ino_t *pino)
@@ -141,19 +176,6 @@ void release_sysfs_dirent(struct sysfs_d
repeat:
parent_sd = sd->s_parent;

- /* If @sd is being released after deletion, s_active is write
- * locked. If @sd is cursor for directory walk or being
- * released prematurely, s_active has no reader or writer.
- *
- * sysfs_deactivate() lies to lockdep that s_active is
- * unlocked immediately. Lie one more time to cover the
- * previous lie.
- */
- if (!down_write_trylock(&sd->s_active))
- rwsem_acquire(&sd->s_active.dep_map,
- SYSFS_S_ACTIVE_DEACTIVATE, 0, _RET_IP_);
- up_write(&sd->s_active);
-
if (sd->s_type & SYSFS_KOBJ_LINK)
sysfs_put(sd->s_elem.symlink.target_sd);
if (sd->s_type & SYSFS_COPY_NAME)
@@ -213,8 +235,8 @@ struct sysfs_dirent *sysfs_new_dirent(co
goto err_out;

atomic_set(&sd->s_count, 1);
+ atomic_set(&sd->s_active, 0);
atomic_set(&sd->s_event, 1);
- init_rwsem(&sd->s_active);
INIT_LIST_HEAD(&sd->s_children);
INIT_LIST_HEAD(&sd->s_sibling);

diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index f8779ea..ae006b0 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -21,7 +21,7 @@ struct sysfs_elem_bin_attr {
*/
struct sysfs_dirent {
atomic_t s_count;
- struct rw_semaphore s_active;
+ atomic_t s_active;
struct sysfs_dirent * s_parent;
struct list_head s_sibling;
struct list_head s_children;
@@ -42,16 +42,7 @@ struct sysfs_dirent {
atomic_t s_event;
};

-/*
- * A sysfs file which deletes another file when written to need to
- * write lock the s_active of the victim while its s_active is read
- * locked for the write operation. Tell lockdep that this is okay.
- */
-enum sysfs_s_active_class
-{
- SYSFS_S_ACTIVE_NORMAL, /* file r/w access, etc - default */
- SYSFS_S_ACTIVE_DEACTIVATE, /* file deactivation */
-};
+#define SD_DEACTIVATED_BIAS INT_MIN

extern struct vfsmount * sysfs_mount;
extern struct kmem_cache *sysfs_dir_cachep;
--
1.4.3.4


2007-05-28 16:25:25

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 3/3] sysfs: use singly-linked list for sysfs_dirent tree

Make sysfs_dirent use singly linked list for its tree structure.
sysfs_link_sibling() and sysfs_unlink_sibling() functions are added to
handle simpler cases. It adds some complexity and cpu cycle overhead
but reduced memory footprint is worthwhile on big machines.

This change reduces the sizeof sysfs_dirent from 104 to 88 on 64bit
and from 60 to 52 on 32bit.

Signed-off-by: Tejun Heo <[email protected]>
---
fs/sysfs/dir.c | 146 +++++++++++++++++++++++++++++++++++++----------------
fs/sysfs/inode.c | 12 +++--
fs/sysfs/mount.c | 3 -
fs/sysfs/sysfs.h | 4 +-
4 files changed, 111 insertions(+), 54 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 40596a0..b4074ad 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -22,6 +22,48 @@ static spinlock_t sysfs_ino_lock = SPIN_
static DEFINE_IDA(sysfs_ino_ida);

/**
+ * sysfs_link_sibling - link sysfs_dirent into sibling list
+ * @sd: sysfs_dirent of interest
+ *
+ * Link @sd into its sibling list which starts from
+ * sd->s_parent->s_children.
+ *
+ * Locking:
+ * mutex_lock(sd->s_parent->dentry->d_inode->i_mutex)
+ */
+static void sysfs_link_sibling(struct sysfs_dirent *sd)
+{
+ struct sysfs_dirent *parent_sd = sd->s_parent;
+
+ BUG_ON(sd->s_sibling);
+ sd->s_sibling = parent_sd->s_children;
+ parent_sd->s_children = sd;
+}
+
+/**
+ * sysfs_unlink_sibling - unlink sysfs_dirent from sibling list
+ * @sd: sysfs_dirent of interest
+ *
+ * Unlink @sd from its sibling list which starts from
+ * sd->s_parent->s_children.
+ *
+ * Locking:
+ * mutex_lock(sd->s_parent->dentry->d_inode->i_mutex)
+ */
+static void sysfs_unlink_sibling(struct sysfs_dirent *sd)
+{
+ struct sysfs_dirent **pos;
+
+ for (pos = &sd->s_parent->s_children; *pos; pos = &(*pos)->s_sibling) {
+ if (*pos == sd) {
+ *pos = sd->s_sibling;
+ sd->s_sibling = NULL;
+ break;
+ }
+ }
+}
+
+/**
* sysfs_get_active - get an active reference to sysfs_dirent
* @sd: sysfs_dirent to get an active reference to
*
@@ -73,9 +115,9 @@ void sysfs_put_active(struct sysfs_diren
return;

/* atomic_dec_return() is a mb(), we'll always see the updated
- * sd->s_sibling.next.
+ * sd->s_sibling.
*/
- cmpl = (void *)sd->s_sibling.next;
+ cmpl = (void *)sd->s_sibling;
complete(cmpl);
}

@@ -129,18 +171,18 @@ void sysfs_deactivate(struct sysfs_diren
DECLARE_COMPLETION_ONSTACK(wait);
int v;

- BUG_ON(!list_empty(&sd->s_sibling));
- sd->s_sibling.next = (void *)&wait;
+ BUG_ON(sd->s_sibling);
+ sd->s_sibling = (void *)&wait;

/* atomic_add_return() is a mb(), put_active() will always see
- * the updated sd->s_sibling.next.
+ * the updated sd->s_sibling.
*/
v = atomic_add_return(SD_DEACTIVATED_BIAS, &sd->s_active);

if (v != SD_DEACTIVATED_BIAS)
wait_for_completion(&wait);

- INIT_LIST_HEAD(&sd->s_sibling);
+ sd->s_sibling = NULL;
}

static int sysfs_alloc_ino(ino_t *pino)
@@ -237,8 +279,6 @@ struct sysfs_dirent *sysfs_new_dirent(co
atomic_set(&sd->s_count, 1);
atomic_set(&sd->s_active, 0);
atomic_set(&sd->s_event, 1);
- INIT_LIST_HEAD(&sd->s_children);
- INIT_LIST_HEAD(&sd->s_sibling);

sd->s_name = name;
sd->s_mode = mode;
@@ -273,7 +313,7 @@ void sysfs_attach_dirent(struct sysfs_di

if (parent_sd) {
sd->s_parent = sysfs_get(parent_sd);
- list_add(&sd->s_sibling, &parent_sd->s_children);
+ sysfs_link_sibling(sd);
}
}

@@ -289,7 +329,7 @@ int sysfs_dirent_exist(struct sysfs_dire
{
struct sysfs_dirent * sd;

- list_for_each_entry(sd, &parent_sd->s_children, s_sibling) {
+ for (sd = parent_sd->s_children; sd; sd = sd->s_sibling) {
if (sd->s_type) {
if (strcmp(sd->s_name, new))
continue;
@@ -409,7 +449,7 @@ static struct dentry * sysfs_lookup(stru
struct inode *inode;
int found = 0;

- list_for_each_entry(sd, &parent_sd->s_children, s_sibling) {
+ for (sd = parent_sd->s_children; sd; sd = sd->s_sibling) {
if ((sd->s_type & SYSFS_NOT_PINNED) &&
!strcmp(sd->s_name, dentry->d_name.name)) {
found = 1;
@@ -458,7 +498,7 @@ static void remove_dir(struct dentry * d

mutex_lock(&parent->d_inode->i_mutex);

- list_del_init(&sd->s_sibling);
+ sysfs_unlink_sibling(sd);

pr_debug(" o %s removing done (%d)\n",d->d_name.name,
atomic_read(&d->d_count));
@@ -478,9 +518,9 @@ void sysfs_remove_subdir(struct dentry *

static void __sysfs_remove_dir(struct dentry *dentry)
{
- LIST_HEAD(removed);
- struct sysfs_dirent * parent_sd;
- struct sysfs_dirent * sd, * tmp;
+ struct sysfs_dirent *removed = NULL;
+ struct sysfs_dirent *parent_sd;
+ struct sysfs_dirent **pos;

if (!dentry)
return;
@@ -488,15 +528,25 @@ static void __sysfs_remove_dir(struct de
pr_debug("sysfs %s: removing dir\n",dentry->d_name.name);
mutex_lock(&dentry->d_inode->i_mutex);
parent_sd = dentry->d_fsdata;
- list_for_each_entry_safe(sd, tmp, &parent_sd->s_children, s_sibling) {
- if (!sd->s_type || !(sd->s_type & SYSFS_NOT_PINNED))
- continue;
- list_move(&sd->s_sibling, &removed);
+ pos = &parent_sd->s_children;
+ while (*pos) {
+ struct sysfs_dirent *sd = *pos;
+
+ if (sd->s_type && (sd->s_type & SYSFS_NOT_PINNED)) {
+ *pos = sd->s_sibling;
+ sd->s_sibling = removed;
+ removed = sd;
+ } else
+ pos = &(*pos)->s_sibling;
}
mutex_unlock(&dentry->d_inode->i_mutex);

- list_for_each_entry_safe(sd, tmp, &removed, s_sibling) {
- list_del_init(&sd->s_sibling);
+ while (removed) {
+ struct sysfs_dirent *sd = removed;
+
+ removed = sd->s_sibling;
+ sd->s_sibling = NULL;
+
sysfs_drop_dentry(sd);
sysfs_deactivate(sd);
sysfs_put(sd);
@@ -577,11 +627,11 @@ int sysfs_rename_dir(struct kobject * ko
d_add(new_dentry, NULL);
d_move(kobj->dentry, new_dentry);

- list_del_init(&sd->s_sibling);
+ sysfs_unlink_sibling(sd);
sysfs_get(parent_sd);
sysfs_put(sd->s_parent);
sd->s_parent = parent_sd;
- list_add(&sd->s_sibling, &parent_sd->s_children);
+ sysfs_link_sibling(sd);

error = 0;
goto out_unlock;
@@ -633,11 +683,11 @@ again:
dput(new_dentry);

/* Remove from old parent's list and insert into new parent's list. */
- list_del_init(&sd->s_sibling);
+ sysfs_unlink_sibling(sd);
sysfs_get(new_parent_sd);
sysfs_put(sd->s_parent);
sd->s_parent = new_parent_sd;
- list_add(&sd->s_sibling, &new_parent_sd->s_children);
+ sysfs_link_sibling(sd);

out:
mutex_unlock(&new_parent_dentry->d_inode->i_mutex);
@@ -668,7 +718,7 @@ static int sysfs_dir_close(struct inode
struct sysfs_dirent * cursor = file->private_data;

mutex_lock(&dentry->d_inode->i_mutex);
- list_del_init(&cursor->s_sibling);
+ sysfs_unlink_sibling(cursor);
mutex_unlock(&dentry->d_inode->i_mutex);

release_sysfs_dirent(cursor);
@@ -687,7 +737,7 @@ static int sysfs_readdir(struct file * f
struct dentry *dentry = filp->f_path.dentry;
struct sysfs_dirent * parent_sd = dentry->d_fsdata;
struct sysfs_dirent *cursor = filp->private_data;
- struct list_head *p, *q = &cursor->s_sibling;
+ struct sysfs_dirent **pos;
ino_t ino;
int i = filp->f_pos;

@@ -710,16 +760,21 @@ static int sysfs_readdir(struct file * f
i++;
/* fallthrough */
default:
+ pos = &parent_sd->s_children;
+ while (*pos != cursor)
+ pos = &(*pos)->s_sibling;
+
+ /* unlink cursor */
+ *pos = cursor->s_sibling;
+
if (filp->f_pos == 2)
- list_move(q, &parent_sd->s_children);
+ pos = &parent_sd->s_children;

- for (p=q->next; p!= &parent_sd->s_children; p=p->next) {
- struct sysfs_dirent *next;
+ for ( ; *pos; pos = &(*pos)->s_sibling) {
+ struct sysfs_dirent *next = *pos;
const char * name;
int len;

- next = list_entry(p, struct sysfs_dirent,
- s_sibling);
if (!next->s_type)
continue;

@@ -729,12 +784,14 @@ static int sysfs_readdir(struct file * f

if (filldir(dirent, name, len, filp->f_pos, ino,
dt_type(next)) < 0)
- return 0;
+ break;

- list_move(q, p);
- p = q;
filp->f_pos++;
}
+
+ /* put cursor back in */
+ cursor->s_sibling = *pos;
+ *pos = cursor;
}
return 0;
}
@@ -759,20 +816,21 @@ static loff_t sysfs_dir_lseek(struct fil
if (file->f_pos >= 2) {
struct sysfs_dirent *sd = dentry->d_fsdata;
struct sysfs_dirent *cursor = file->private_data;
- struct list_head *p;
+ struct sysfs_dirent **pos;
loff_t n = file->f_pos - 2;

- list_del(&cursor->s_sibling);
- p = sd->s_children.next;
- while (n && p != &sd->s_children) {
- struct sysfs_dirent *next;
- next = list_entry(p, struct sysfs_dirent,
- s_sibling);
+ sysfs_unlink_sibling(cursor);
+
+ pos = &sd->s_children;
+ while (n && *pos) {
+ struct sysfs_dirent *next = *pos;
if (next->s_type)
n--;
- p = p->next;
+ pos = &(*pos)->s_sibling;
}
- list_add_tail(&cursor->s_sibling, p);
+
+ cursor->s_sibling = *pos;
+ *pos = cursor;
}
}
mutex_unlock(&dentry->d_inode->i_mutex);
diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index 167c7fb..4d79a61 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -283,8 +283,8 @@ void sysfs_drop_dentry(struct sysfs_dire

int sysfs_hash_and_remove(struct dentry * dir, const char * name)
{
- struct sysfs_dirent * sd;
- struct sysfs_dirent * parent_sd;
+ struct sysfs_dirent **pos, *sd;
+ struct sysfs_dirent *parent_sd = dir->d_fsdata;
int found = 0;

if (!dir)
@@ -294,13 +294,15 @@ int sysfs_hash_and_remove(struct dentry
/* no inode means this hasn't been made visible yet */
return -ENOENT;

- parent_sd = dir->d_fsdata;
mutex_lock_nested(&dir->d_inode->i_mutex, I_MUTEX_PARENT);
- list_for_each_entry(sd, &parent_sd->s_children, s_sibling) {
+ for (pos = &parent_sd->s_children; *pos; pos = &(*pos)->s_sibling) {
+ sd = *pos;
+
if (!sd->s_type)
continue;
if (!strcmp(sd->s_name, name)) {
- list_del_init(&sd->s_sibling);
+ *pos = sd->s_sibling;
+ sd->s_sibling = NULL;
found = 1;
break;
}
diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index 2d6294b..2e66701 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -24,12 +24,9 @@ static const struct super_operations sys

static struct sysfs_dirent sysfs_root = {
.s_count = ATOMIC_INIT(1),
- .s_sibling = LIST_HEAD_INIT(sysfs_root.s_sibling),
- .s_children = LIST_HEAD_INIT(sysfs_root.s_children),
.s_type = SYSFS_ROOT,
.s_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
.s_ino = 1,
- .s_iattr = NULL,
};

static int sysfs_fill_super(struct super_block *sb, void *data, int silent)
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index ae006b0..6f8aaf3 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -23,8 +23,8 @@ struct sysfs_dirent {
atomic_t s_count;
atomic_t s_active;
struct sysfs_dirent * s_parent;
- struct list_head s_sibling;
- struct list_head s_children;
+ struct sysfs_dirent * s_sibling;
+ struct sysfs_dirent * s_children;
const char * s_name;

union {
--
1.4.3.4


2007-05-29 16:04:39

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: [PATCHSET 2.6.22-rc2-mm1] sysfs: reduce memory footprint of sysfs_dirent

Hi Tejun,

On 5/28/07, Tejun Heo <[email protected]> wrote:
> Hello, all.
>
> This patchset reduces the size of sysfs_dirent to 88 byte from 136 on
> 64bit and to 52 from 76 on 32bit. Combined with forthcoming
> reclaimable sysfs directories, this will make sysfs much more scalable
> on very large machines and very small machines.
>
> This patchset contains the following three patches.
>
> #01: move-s_active-functions-to-fs-sysfs-dir-c, prep for #02
> #02: slim-down-sysfs_dirent-s_active
> #03: use-signly-linked-list-for-sysfs_dirent-tree
>
> I'm pretty sure #02 is a good idea but #03 is debatable. The code
> doesn't look much uglier after the conversion tho. Inputs welcome.
>

I think that #3 is a good idea as well - we have a lot of sysfs
objects on any given system and slimming down sysfs benefits everyone.

--
Dmitry

2007-05-29 16:13:28

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCH 2/3] sysfs: slim down sysfs_dirent->s_active

On Tue, 29 May 2007 01:24:06 +0900,
Tejun Heo <[email protected]> wrote:

> Make sysfs_dirent->s_active an atomic_t instead of rwsem. This
> reduces the size of sysfs_dirent from 136 to 104 on 64bit and from 76
> to 60 on 32bit with lock debugging turned off. With lock debugging
> turned on the reduction is much larger.

Cool.

> s_active starts at zero and each active reference increments s_active.
> Putting a reference decrements s_active. Deactivation subtracts
> SD_DEACTIVATED_BIAS which is currently INT_MIN and assumed to be small
> enough to make s_active negative. If s_active is negative,
> sysfs_get() no longer grants new references. Deactivation succeeds
> immediately if there is no active user; otherwise, it waits using a
> completion for the last put.
>
> Due to the removal of lockdep tricks, this change makes things less
> trickier in release_sysfs_dirent(). As all the complexity is
> contained in three s_active functions, I think it's more readable this
> way.

Agreed.


> @@ -32,11 +33,24 @@ static DEFINE_IDA(sysfs_ino_ida);
> */
> struct sysfs_dirent *sysfs_get_active(struct sysfs_dirent *sd)
> {
> - if (sd) {
> - if (unlikely(!down_read_trylock(&sd->s_active)))
> - sd = NULL;
> + if (unlikely(!sd))
> + return NULL;
> +
> + while (1) {
> + int v, t;
> +
> + v = atomic_read(&sd->s_active);
> + if (unlikely(v < 0))
> + return NULL;
> +
> + t = atomic_cmpxchg(&sd->s_active, v, v + 1);
> + if (likely(t == v))
> + return sd;
> + if (t < 0)
> + return NULL;
> +
> + cpu_relax();
> }
> - return sd;
> }

I don't quite like v and t, but don't have a better naming suggestion
either.

2007-05-29 16:18:15

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCHSET 2.6.22-rc2-mm1] sysfs: reduce memory footprint of sysfs_dirent

On Tue, 29 May 2007 12:04:30 -0400,
"Dmitry Torokhov" <[email protected]> wrote:

> I think that #3 is a good idea as well - we have a lot of sysfs
> objects on any given system and slimming down sysfs benefits everyone.

Yes, and the added complexity is not that bad.

The patchset seems to work fine so far on the (small z/VM guest) system
I tested it a bit on; I'll try a huge (LPAR) system next.

2007-05-29 17:23:42

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCHSET 2.6.22-rc2-mm1] sysfs: reduce memory footprint of sysfs_dirent

On Tue, 29 May 2007 18:17:59 +0200,
Cornelia Huck <[email protected]> wrote:

> The patchset seems to work fine so far on the (small z/VM guest) system
> I tested it a bit on; I'll try a huge (LPAR) system next.

Seems fine as well. I got 12MB more of free memory, with ~3000
subchannels (a ccw device behind each) and a few system devices.