2006-10-10 18:27:22

by Chandra Seetharaman

[permalink] [raw]
Subject: [PATCH 0/5] Allow more than PAGESIZE data read in configfs

Currently, maximum amount of data that can be read from a configfs
attribute file is limited to PAGESIZE bytes. This is a limitation for
some of the usages of configfs.

This patchset removes that limitaion by using seq_file for read operations
instead of using locally allocated buffer.

Tested the interface change with configfs_example.c in the Documentation
directory.

--

----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- [email protected] | .......you may get it.
----------------------------------------------------------------------


2006-10-10 18:21:57

by Chandra Seetharaman

[permalink] [raw]
Subject: [PATCH 4/5] Change Documentation to reflect the new interface

Change the interface definition of show_attribute() to use seq_file.

Signed-Off-By: Chandra Seetharaman <[email protected]>
--

Documentation/filesystems/configfs/configfs.txt | 2 +-
1 files changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.18/Documentation/filesystems/configfs/configfs.txt
===================================================================
--- linux-2.6.18.orig/Documentation/filesystems/configfs/configfs.txt
+++ linux-2.6.18/Documentation/filesystems/configfs/configfs.txt
@@ -162,7 +162,7 @@ among other things. For that, it needs
void (*release)(struct config_item *);
ssize_t (*show_attribute)(struct config_item *,
struct configfs_attribute *,
- char *);
+ struct seq_file *);
ssize_t (*store_attribute)(struct config_item *,
struct configfs_attribute *,
const char *, size_t);

--

----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- [email protected] | .......you may get it.
----------------------------------------------------------------------

2006-10-10 18:21:05

by Chandra Seetharaman

[permalink] [raw]
Subject: [PATCH 2/5] Use seq_file for read side of operations

configfs currently has a limitation that the kernel can send only
PAGESIZE of data through a attribute.

This patch removes that limitation by using seq_file in the read side of
operations.

Signed-Off-By: Chandra Seetharaman <[email protected]>
--

fs/configfs/file.c | 130 ++++++++++-------------------------------------
include/linux/configfs.h | 3 -
2 files changed, 31 insertions(+), 102 deletions(-)

Index: linux-2.6.18/fs/configfs/file.c
===================================================================
--- linux-2.6.18.orig/fs/configfs/file.c
+++ linux-2.6.18/fs/configfs/file.c
@@ -40,112 +40,33 @@ struct configfs_buffer {
char * page;
struct configfs_item_operations * ops;
struct semaphore sem;
- int needs_read_fill;
+ struct dentry * dentry;
};


/**
- * fill_read_buffer - allocate and fill buffer from item.
- * @dentry: dentry pointer.
- * @buffer: data buffer for file.
- *
- * Allocate @buffer->page, if it hasn't been already, then call the
- * config_item's show() method to fill the buffer with this attribute's
- * data.
- * This is called only once, on the file's first read.
- */
-static int fill_read_buffer(struct dentry * dentry, struct configfs_buffer * buffer)
-{
- struct configfs_attribute * attr = to_attr(dentry);
- struct config_item * item = to_item(dentry->d_parent);
- struct configfs_item_operations * ops = buffer->ops;
- int ret = 0;
- ssize_t count;
-
- if (!buffer->page)
- buffer->page = (char *) get_zeroed_page(GFP_KERNEL);
- if (!buffer->page)
- return -ENOMEM;
-
- count = ops->show_attribute(item,attr,buffer->page);
- buffer->needs_read_fill = 0;
- BUG_ON(count > (ssize_t)PAGE_SIZE);
- if (count >= 0)
- buffer->count = count;
- else
- ret = count;
- return ret;
-}
-
-
-/**
- * flush_read_buffer - push buffer to userspace.
- * @buffer: data buffer for file.
- * @userbuf: user-passed buffer.
- * @count: number of bytes requested.
- * @ppos: file position.
- *
- * Copy the buffer we filled in fill_read_buffer() to userspace.
- * This is done at the reader's leisure, copying and advancing
- * the amount they specify each time.
- * This may be called continuously until the buffer is empty.
- */
-static int flush_read_buffer(struct configfs_buffer * buffer, char __user * buf,
- size_t count, loff_t * ppos)
-{
- int error;
-
- if (*ppos > buffer->count)
- return 0;
-
- if (count > (buffer->count - *ppos))
- count = buffer->count - *ppos;
-
- error = copy_to_user(buf,buffer->page + *ppos,count);
- if (!error)
- *ppos += count;
- return error ? -EFAULT : count;
-}
-
-/**
* configfs_read_file - read an attribute.
- * @file: file pointer.
- * @buf: buffer to fill.
- * @count: number of bytes to read.
- * @ppos: starting offset in file.
+ * @s: seq_file pointer.
+ * @v: unused void pointer
*
* Userspace wants to read an attribute file. The attribute descriptor
- * is in the file's ->d_fsdata. The target item is in the directory's
- * ->d_fsdata.
+ * is available through dentry, which is stored in the seq_file.
+ * The target item is in the dentry's parent.
+ *
+ * We just call the show() method directly.
*
- * We call fill_read_buffer() to allocate and fill the buffer from the
- * item's show() method exactly once (if the read is happening from
- * the beginning of the file). That should fill the entire buffer with
- * all the data the item has to offer for that attribute.
- * We then call flush_read_buffer() to copy the buffer to userspace
- * in the increments specified.
*/
-
-static ssize_t
-configfs_read_file(struct file *file, char __user *buf, size_t count, loff_t *ppos)
+static int
+configfs_read_file(struct seq_file *s, void *v)
{
- struct configfs_buffer * buffer = file->private_data;
- ssize_t retval = 0;
+ struct configfs_buffer * buffer = s->private;
+ struct configfs_item_operations * ops = buffer->ops;
+ struct configfs_attribute * attr = to_attr(buffer->dentry);
+ struct config_item * item = to_item(buffer->dentry->d_parent);

- down(&buffer->sem);
- if (buffer->needs_read_fill) {
- if ((retval = fill_read_buffer(file->f_dentry,buffer)))
- goto out;
- }
- pr_debug("%s: count = %d, ppos = %lld, buf = %s\n",
- __FUNCTION__,count,*ppos,buffer->page);
- retval = flush_read_buffer(buffer,buf,count,ppos);
-out:
- up(&buffer->sem);
- return retval;
+ return ops->show_attribute(item, attr, s);
}

-
/**
* fill_write_buffer - copy buffer from userspace.
* @buffer: data buffer for file.
@@ -169,7 +90,6 @@ fill_write_buffer(struct configfs_buffer
if (count > PAGE_SIZE)
count = PAGE_SIZE;
error = copy_from_user(buffer->page,buf,count);
- buffer->needs_read_fill = 1;
return error ? -EFAULT : count;
}

@@ -216,7 +136,8 @@ flush_write_buffer(struct dentry * dentr
static ssize_t
configfs_write_file(struct file *file, const char __user *buf, size_t count, loff_t *ppos)
{
- struct configfs_buffer * buffer = file->private_data;
+ struct configfs_buffer * buffer =
+ ((struct seq_file *)file->private_data)->private;
ssize_t len;

down(&buffer->sem);
@@ -281,9 +202,14 @@ static int check_perm(struct inode * ino
}
memset(buffer,0,sizeof(struct configfs_buffer));
init_MUTEX(&buffer->sem);
- buffer->needs_read_fill = 1;
buffer->ops = ops;
- file->private_data = buffer;
+ buffer->dentry = file->f_dentry;
+
+ error = single_open(file, configfs_read_file, buffer);
+ if (error) {
+ kfree(buffer);
+ goto Enomem;
+ }
goto Done;

Einval:
@@ -309,7 +235,8 @@ static int configfs_release(struct inode
struct config_item * item = to_item(filp->f_dentry->d_parent);
struct configfs_attribute * attr = to_attr(filp->f_dentry);
struct module * owner = attr->ca_owner;
- struct configfs_buffer * buffer = filp->private_data;
+ struct configfs_buffer * buffer =
+ ((struct seq_file *)filp->private_data)->private;

if (item)
config_item_put(item);
@@ -321,13 +248,14 @@ static int configfs_release(struct inode
free_page((unsigned long)buffer->page);
kfree(buffer);
}
- return 0;
+
+ return single_release(inode, filp);
}

const struct file_operations configfs_file_operations = {
- .read = configfs_read_file,
+ .read = seq_read,
.write = configfs_write_file,
- .llseek = generic_file_llseek,
+ .llseek = seq_lseek,
.open = configfs_open_file,
.release = configfs_release,
};
Index: linux-2.6.18/include/linux/configfs.h
===================================================================
--- linux-2.6.18.orig/include/linux/configfs.h
+++ linux-2.6.18/include/linux/configfs.h
@@ -40,6 +40,7 @@
#include <linux/types.h>
#include <linux/list.h>
#include <linux/kref.h>
+#include <linux/seq_file.h>

#include <asm/atomic.h>
#include <asm/semaphore.h>
@@ -147,7 +148,7 @@ struct configfs_attribute {
*/
struct configfs_item_operations {
void (*release)(struct config_item *);
- ssize_t (*show_attribute)(struct config_item *, struct configfs_attribute *,char *);
+ int (*show_attribute)(struct config_item *, struct configfs_attribute *,struct seq_file *);
ssize_t (*store_attribute)(struct config_item *,struct configfs_attribute *,const char *, size_t);
int (*allow_link)(struct config_item *src, struct config_item *target);
int (*drop_link)(struct config_item *src, struct config_item *target);

--

----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- [email protected] | .......you may get it.
----------------------------------------------------------------------

2006-10-10 18:21:33

by Chandra Seetharaman

[permalink] [raw]
Subject: [PATCH 3/5] Change configfs_example.c to use the new interface

Change the example to suit the modified interface to use seq_file.

Signed-Off-By: Chandra Seetharaman <[email protected]>
--

Documentation/filesystems/configfs/configfs_example.c | 30 +++++++++---------
1 files changed, 15 insertions(+), 15 deletions(-)

Index: linux-2.6.18/Documentation/filesystems/configfs/configfs_example.c
===================================================================
--- linux-2.6.18.orig/Documentation/filesystems/configfs/configfs_example.c
+++ linux-2.6.18/Documentation/filesystems/configfs/configfs_example.c
@@ -53,7 +53,7 @@ struct childless {

struct childless_attribute {
struct configfs_attribute attr;
- ssize_t (*show)(struct childless *, char *);
+ ssize_t (*show)(struct childless *, struct seq_file *);
ssize_t (*store)(struct childless *, const char *, size_t);
};

@@ -63,20 +63,20 @@ static inline struct childless *to_child
}

static ssize_t childless_showme_read(struct childless *childless,
- char *page)
+ struct seq_file *s)
{
ssize_t pos;

- pos = sprintf(page, "%d\n", childless->showme);
+ pos = seq_printf(s, "%d\n", childless->showme);
childless->showme++;

return pos;
}

static ssize_t childless_storeme_read(struct childless *childless,
- char *page)
+ struct seq_file *s)
{
- return sprintf(page, "%d\n", childless->storeme);
+ return seq_printf(s, "%d\n", childless->storeme);
}

static ssize_t childless_storeme_write(struct childless *childless,
@@ -99,9 +99,9 @@ static ssize_t childless_storeme_write(s
}

static ssize_t childless_description_read(struct childless *childless,
- char *page)
+ struct seq_file *s)
{
- return sprintf(page,
+ return seq_printf(s,
"[01-childless]\n"
"\n"
"The childless subsystem is the simplest possible subsystem in\n"
@@ -133,7 +133,7 @@ static struct configfs_attribute *childl

static ssize_t childless_attr_show(struct config_item *item,
struct configfs_attribute *attr,
- char *page)
+ struct seq_file *s)
{
struct childless *childless = to_childless(item);
struct childless_attribute *childless_attr =
@@ -141,7 +141,7 @@ static ssize_t childless_attr_show(struc
ssize_t ret = 0;

if (childless_attr->show)
- ret = childless_attr->show(childless, page);
+ ret = childless_attr->show(childless, s);
return ret;
}

@@ -216,12 +216,12 @@ static struct configfs_attribute *simple

static ssize_t simple_child_attr_show(struct config_item *item,
struct configfs_attribute *attr,
- char *page)
+ struct seq_file *s)
{
ssize_t count;
struct simple_child *simple_child = to_simple_child(item);

- count = sprintf(page, "%d\n", simple_child->storeme);
+ count = seq_printf(s, "%d\n", simple_child->storeme);

return count;
}
@@ -304,9 +304,9 @@ static struct configfs_attribute *simple

static ssize_t simple_children_attr_show(struct config_item *item,
struct configfs_attribute *attr,
- char *page)
+ struct seq_file *s)
{
- return sprintf(page,
+ return seq_printf(s,
"[02-simple-children]\n"
"\n"
"This subsystem allows the creation of child config_items. These\n"
@@ -390,9 +390,9 @@ static struct configfs_attribute *group_

static ssize_t group_children_attr_show(struct config_item *item,
struct configfs_attribute *attr,
- char *page)
+ struct seq_file *s)
{
- return sprintf(page,
+ return seq_printf(s,
"[03-group-children]\n"
"\n"
"This subsystem allows the creation of child config_groups. These\n"

--

----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- [email protected] | .......you may get it.
----------------------------------------------------------------------

2006-10-10 18:21:33

by Chandra Seetharaman

[permalink] [raw]
Subject: [PATCH 5/5] Change the existing code to use the new interface

Changes to the current user of configfs, OCFS2, to use the new
interface of show_attribute().

Signed-Off-By: Chandra Seetharaman <[email protected]>
--

fs/ocfs2/cluster/heartbeat.c | 32 ++++++++++++++++----------------
fs/ocfs2/cluster/nodemanager.c | 25 ++++++++++++++-----------
2 files changed, 30 insertions(+), 27 deletions(-)

Index: linux-2.6.18/fs/ocfs2/cluster/heartbeat.c
===================================================================
--- linux-2.6.18.orig/fs/ocfs2/cluster/heartbeat.c
+++ linux-2.6.18/fs/ocfs2/cluster/heartbeat.c
@@ -1111,9 +1111,9 @@ static int o2hb_read_block_input(struct
}

static ssize_t o2hb_region_block_bytes_read(struct o2hb_region *reg,
- char *page)
+ struct seq_file *s)
{
- return sprintf(page, "%u\n", reg->hr_block_bytes);
+ return seq_printf(s, "%u\n", reg->hr_block_bytes);
}

static ssize_t o2hb_region_block_bytes_write(struct o2hb_region *reg,
@@ -1139,9 +1139,9 @@ static ssize_t o2hb_region_block_bytes_w
}

static ssize_t o2hb_region_start_block_read(struct o2hb_region *reg,
- char *page)
+ struct seq_file *s)
{
- return sprintf(page, "%llu\n", reg->hr_start_block);
+ return seq_printf(s, "%llu\n", reg->hr_start_block);
}

static ssize_t o2hb_region_start_block_write(struct o2hb_region *reg,
@@ -1164,9 +1164,9 @@ static ssize_t o2hb_region_start_block_w
}

static ssize_t o2hb_region_blocks_read(struct o2hb_region *reg,
- char *page)
+ struct seq_file *s)
{
- return sprintf(page, "%d\n", reg->hr_blocks);
+ return seq_printf(s, "%d\n", reg->hr_blocks);
}

static ssize_t o2hb_region_blocks_write(struct o2hb_region *reg,
@@ -1192,12 +1192,12 @@ static ssize_t o2hb_region_blocks_write(
}

static ssize_t o2hb_region_dev_read(struct o2hb_region *reg,
- char *page)
+ struct seq_file *s)
{
unsigned int ret = 0;

if (reg->hr_bdev)
- ret = sprintf(page, "%s\n", reg->hr_dev_name);
+ ret = seq_printf(s, "%s\n", reg->hr_dev_name);

return ret;
}
@@ -1443,7 +1443,7 @@ out:

struct o2hb_region_attribute {
struct configfs_attribute attr;
- ssize_t (*show)(struct o2hb_region *, char *);
+ ssize_t (*show)(struct o2hb_region *, struct seq_file *);
ssize_t (*store)(struct o2hb_region *, const char *, size_t);
};

@@ -1489,7 +1489,7 @@ static struct configfs_attribute *o2hb_r

static ssize_t o2hb_region_show(struct config_item *item,
struct configfs_attribute *attr,
- char *page)
+ struct seq_file *s)
{
struct o2hb_region *reg = to_o2hb_region(item);
struct o2hb_region_attribute *o2hb_region_attr =
@@ -1497,7 +1497,7 @@ static ssize_t o2hb_region_show(struct c
ssize_t ret = 0;

if (o2hb_region_attr->show)
- ret = o2hb_region_attr->show(reg, page);
+ ret = o2hb_region_attr->show(reg, s);
return ret;
}

@@ -1581,13 +1581,13 @@ static void o2hb_heartbeat_group_drop_it

struct o2hb_heartbeat_group_attribute {
struct configfs_attribute attr;
- ssize_t (*show)(struct o2hb_heartbeat_group *, char *);
+ ssize_t (*show)(struct o2hb_heartbeat_group *, struct seq_file *);
ssize_t (*store)(struct o2hb_heartbeat_group *, const char *, size_t);
};

static ssize_t o2hb_heartbeat_group_show(struct config_item *item,
struct configfs_attribute *attr,
- char *page)
+ struct seq_file *s)
{
struct o2hb_heartbeat_group *reg = to_o2hb_heartbeat_group(to_config_group(item));
struct o2hb_heartbeat_group_attribute *o2hb_heartbeat_group_attr =
@@ -1595,7 +1595,7 @@ static ssize_t o2hb_heartbeat_group_show
ssize_t ret = 0;

if (o2hb_heartbeat_group_attr->show)
- ret = o2hb_heartbeat_group_attr->show(reg, page);
+ ret = o2hb_heartbeat_group_attr->show(reg, s);
return ret;
}

@@ -1614,9 +1614,9 @@ static ssize_t o2hb_heartbeat_group_stor
}

static ssize_t o2hb_heartbeat_group_threshold_show(struct o2hb_heartbeat_group *group,
- char *page)
+ struct seq_file *s)
{
- return sprintf(page, "%u\n", o2hb_dead_threshold);
+ return seq_printf(s, "%u\n", o2hb_dead_threshold);
}

static ssize_t o2hb_heartbeat_group_threshold_store(struct o2hb_heartbeat_group *group,
Index: linux-2.6.18/fs/ocfs2/cluster/nodemanager.c
===================================================================
--- linux-2.6.18.orig/fs/ocfs2/cluster/nodemanager.c
+++ linux-2.6.18/fs/ocfs2/cluster/nodemanager.c
@@ -238,9 +238,9 @@ static void o2nm_node_release(struct con
kfree(node);
}

-static ssize_t o2nm_node_num_read(struct o2nm_node *node, char *page)
+static ssize_t o2nm_node_num_read(struct o2nm_node *node, struct seq_file *s)
{
- return sprintf(page, "%d\n", node->nd_num);
+ return seq_printf(s, "%d\n", node->nd_num);
}

static struct o2nm_cluster *to_o2nm_cluster_from_node(struct o2nm_node *node)
@@ -293,9 +293,10 @@ static ssize_t o2nm_node_num_write(struc

return count;
}
-static ssize_t o2nm_node_ipv4_port_read(struct o2nm_node *node, char *page)
+static ssize_t o2nm_node_ipv4_port_read(struct o2nm_node *node,
+ struct seq_file *s)
{
- return sprintf(page, "%u\n", ntohs(node->nd_ipv4_port));
+ return seq_printf(s, "%u\n", ntohs(node->nd_ipv4_port));
}

static ssize_t o2nm_node_ipv4_port_write(struct o2nm_node *node,
@@ -318,9 +319,10 @@ static ssize_t o2nm_node_ipv4_port_write
return count;
}

-static ssize_t o2nm_node_ipv4_address_read(struct o2nm_node *node, char *page)
+static ssize_t o2nm_node_ipv4_address_read(struct o2nm_node *node,
+ struct seq_file *s)
{
- return sprintf(page, "%u.%u.%u.%u\n", NIPQUAD(node->nd_ipv4_address));
+ return seq_printf(s, "%u.%u.%u.%u\n", NIPQUAD(node->nd_ipv4_address));
}

static ssize_t o2nm_node_ipv4_address_write(struct o2nm_node *node,
@@ -361,9 +363,10 @@ static ssize_t o2nm_node_ipv4_address_wr
return count;
}

-static ssize_t o2nm_node_local_read(struct o2nm_node *node, char *page)
+static ssize_t o2nm_node_local_read(struct o2nm_node *node,
+ struct seq_file *s)
{
- return sprintf(page, "%d\n", node->nd_local);
+ return seq_printf(s, "%d\n", node->nd_local);
}

static ssize_t o2nm_node_local_write(struct o2nm_node *node, const char *page,
@@ -417,7 +420,7 @@ static ssize_t o2nm_node_local_write(str

struct o2nm_node_attribute {
struct configfs_attribute attr;
- ssize_t (*show)(struct o2nm_node *, char *);
+ ssize_t (*show)(struct o2nm_node *, struct seq_file *);
ssize_t (*store)(struct o2nm_node *, const char *, size_t);
};

@@ -474,7 +477,7 @@ static int o2nm_attr_index(struct config

static ssize_t o2nm_node_show(struct config_item *item,
struct configfs_attribute *attr,
- char *page)
+ struct seq_file *s)
{
struct o2nm_node *node = to_o2nm_node(item);
struct o2nm_node_attribute *o2nm_node_attr =
@@ -482,7 +485,7 @@ static ssize_t o2nm_node_show(struct con
ssize_t ret = 0;

if (o2nm_node_attr->show)
- ret = o2nm_node_attr->show(node, page);
+ ret = o2nm_node_attr->show(node, s);
return ret;
}


--

----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- [email protected] | .......you may get it.
----------------------------------------------------------------------

2006-10-10 18:21:06

by Chandra Seetharaman

[permalink] [raw]
Subject: [PATCH 1/5] Fix a module count leak.

check_perm() does not drop the reference to the module when kmalloc()
failure occurs. This patch fixes the problem.

Signed-Off-By: Chandra Seetharaman <[email protected]>
--

fs/configfs/file.c | 16 +++++++++-------
1 files changed, 9 insertions(+), 7 deletions(-)

Index: linux-2.6.18/fs/configfs/file.c
===================================================================
--- linux-2.6.18.orig/fs/configfs/file.c
+++ linux-2.6.18/fs/configfs/file.c
@@ -275,14 +275,15 @@ static int check_perm(struct inode * ino
* it in file->private_data for easy access.
*/
buffer = kmalloc(sizeof(struct configfs_buffer),GFP_KERNEL);
- if (buffer) {
- memset(buffer,0,sizeof(struct configfs_buffer));
- init_MUTEX(&buffer->sem);
- buffer->needs_read_fill = 1;
- buffer->ops = ops;
- file->private_data = buffer;
- } else
+ if (!buffer) {
error = -ENOMEM;
+ goto Enomem;
+ }
+ memset(buffer,0,sizeof(struct configfs_buffer));
+ init_MUTEX(&buffer->sem);
+ buffer->needs_read_fill = 1;
+ buffer->ops = ops;
+ file->private_data = buffer;
goto Done;

Einval:
@@ -290,6 +291,7 @@ static int check_perm(struct inode * ino
goto Done;
Eaccess:
error = -EACCES;
+ Enomem:
module_put(attr->ca_owner);
Done:
if (error && item)

--

----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- [email protected] | .......you may get it.
----------------------------------------------------------------------

2006-10-10 20:35:32

by Joel Becker

[permalink] [raw]
Subject: Re: [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Tue, Oct 10, 2006 at 11:20:43AM -0700, Chandra Seetharaman wrote:
> Currently, maximum amount of data that can be read from a configfs
> attribute file is limited to PAGESIZE bytes. This is a limitation for
> some of the usages of configfs.

NAK. This forces a complex and inappropriate interface on the
majority of users, and doesn't honor configfs' simplicity-first design.
I understand Chandra's concerns, but this patch isn't the right
way to do it.

Joel

--

Dort wo man B?cher verbrennt, verbrennt man am Ende auch Mensch.
(Wherever they burn books, they will also end up burning people.)
- Heinrich Heine

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2006-10-10 21:31:57

by Paul Menage

[permalink] [raw]
Subject: Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On 10/10/06, Joel Becker <[email protected]> wrote:
> On Tue, Oct 10, 2006 at 11:20:43AM -0700, Chandra Seetharaman wrote:
> > Currently, maximum amount of data that can be read from a configfs
> > attribute file is limited to PAGESIZE bytes. This is a limitation for
> > some of the usages of configfs.
>
> NAK. This forces a complex and inappropriate interface on the
> majority of users, and doesn't honor configfs' simplicity-first design.

How is the seq_file interface complex and inappropriate? For the
configfs clients it's basically a drop-in replacement for sprintf(),
as Chandra's patches show.

Paul

2006-10-10 21:58:52

by Joel Becker

[permalink] [raw]
Subject: Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Tue, Oct 10, 2006 at 02:31:43PM -0700, Paul Menage wrote:
> > NAK. This forces a complex and inappropriate interface on the
> >majority of users, and doesn't honor configfs' simplicity-first design.
>
> How is the seq_file interface complex and inappropriate? For the
> configfs clients it's basically a drop-in replacement for sprintf(),
> as Chandra's patches show.

Well, they now have to learn seq_file. They now get to assume
that "spewing large amounts of junk" is the default rather than "single
attribute", which is correct. None of it is relevant for the majority
of correct users.
It exposes the "I'm a file" knowledge down to the client module.
The entire point of configfs is that the "filesystem bits" are
independant of the "client bits". To the client, it's an item
hierarchy. To the user, the interface happens to be a filesystem.
Technically, the seq_printf() as a drop-in replacement seems to
be functional. I'm worried about lifetiming, but I think it's OK (what
do I mean? If I open the file, I'd better not be able to remove the
client module until everything is torn down. If I close the file, it
had better get all torn down before module_put() so that when
->release() returns, te module can safely be removed. I *think* this
change satisfies these worries, but it's something that absolutely has
to be done right. Yes, I'm very paranoid about this).
My bigger worry is that we haven't solved the write side. How
does one *set* a large attribute? It had better not be multiple
attributes. I know that your module doesn't set it, but hey, we don't
codify that requirement. Perhaps a patch where we say "if you are a
large display attribute, we'll use seq_file and error on write because
it isn't allowed" but that leaves the old buffer-based approach for
normal-sized read-write attributes.

Joel

--

Life's Little Instruction Book #252

"Take good care of those you love."

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2006-10-10 22:17:57

by Joel Becker

[permalink] [raw]
Subject: Re: [PATCH 1/5] Fix a module count leak.

Obviously a bug. I've pulled it into my tree and it should appear in
OCFS2's ALL branch very shortly.

Joel

On Tue, Oct 10, 2006 at 11:20:49AM -0700, Chandra Seetharaman wrote:
> check_perm() does not drop the reference to the module when kmalloc()
> failure occurs. This patch fixes the problem.
>
> Signed-Off-By: Chandra Seetharaman <[email protected]>
> --
>
> fs/configfs/file.c | 16 +++++++++-------
> 1 files changed, 9 insertions(+), 7 deletions(-)
>
> Index: linux-2.6.18/fs/configfs/file.c
> ===================================================================
> --- linux-2.6.18.orig/fs/configfs/file.c
> +++ linux-2.6.18/fs/configfs/file.c
> @@ -275,14 +275,15 @@ static int check_perm(struct inode * ino
> * it in file->private_data for easy access.
> */
> buffer = kmalloc(sizeof(struct configfs_buffer),GFP_KERNEL);
> - if (buffer) {
> - memset(buffer,0,sizeof(struct configfs_buffer));
> - init_MUTEX(&buffer->sem);
> - buffer->needs_read_fill = 1;
> - buffer->ops = ops;
> - file->private_data = buffer;
> - } else
> + if (!buffer) {
> error = -ENOMEM;
> + goto Enomem;
> + }
> + memset(buffer,0,sizeof(struct configfs_buffer));
> + init_MUTEX(&buffer->sem);
> + buffer->needs_read_fill = 1;
> + buffer->ops = ops;
> + file->private_data = buffer;
> goto Done;
>
> Einval:
> @@ -290,6 +291,7 @@ static int check_perm(struct inode * ino
> goto Done;
> Eaccess:
> error = -EACCES;
> + Enomem:
> module_put(attr->ca_owner);
> Done:
> if (error && item)
>
> --
>
> ----------------------------------------------------------------------
> Chandra Seetharaman | Be careful what you choose....
> - [email protected] | .......you may get it.
> ----------------------------------------------------------------------

--

"I almost ran over an angel
He had a nice big fat cigar.
'In a sense,' he said, 'You're alone here
So if you jump, you'd best jump far.'"

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2006-10-10 23:13:56

by Chandra Seetharaman

[permalink] [raw]
Subject: Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Tue, 2006-10-10 at 14:58 -0700, Joel Becker wrote:
> On Tue, Oct 10, 2006 at 02:31:43PM -0700, Paul Menage wrote:
> > > NAK. This forces a complex and inappropriate interface on the
> > >majority of users, and doesn't honor configfs' simplicity-first design.
> >
> > How is the seq_file interface complex and inappropriate? For the
> > configfs clients it's basically a drop-in replacement for sprintf(),
> > as Chandra's patches show.
>
> Well, they now have to learn seq_file. They now get to assume

If they are simple users, they don't have to "learn" seq_file semantics,
they would just replace their sprintf's with seq_printfs (as my changes
in OCFS2 show).

IMO, seq_file interface is not that complex to learn either.

> that "spewing large amounts of junk" is the default rather than "single
> attribute", which is correct. None of it is relevant for the majority
> of correct users.

"char *" can also be used to spew out large amount of data (ok, maybe up
to PAGESIZE in configfs's case :). My point is that changing char * to
seq_file doesn't necessarily "introduce" the issue (of spewing large
amounts of data).

> It exposes the "I'm a file" knowledge down to the client module.
> The entire point of configfs is that the "filesystem bits" are
> independant of the "client bits". To the client, it's an item
> hierarchy. To the user, the interface happens to be a filesystem.

This issue is moot, unless you have intentions of changing the user
interface of configfs to be anything other than a file system, isn't
it ?

> Technically, the seq_printf() as a drop-in replacement seems to
> be functional. I'm worried about lifetiming, but I think it's OK (what
> do I mean? If I open the file, I'd better not be able to remove the
> client module until everything is torn down. If I close the file, it
> had better get all torn down before module_put() so that when
> ->release() returns, te module can safely be removed. I *think* this
> change satisfies these worries, but it's something that absolutely has
> to be done right. Yes, I'm very paranoid about this).
> My bigger worry is that we haven't solved the write side. How
> does one *set* a large attribute? It had better not be multiple
> attributes. I know that your module doesn't set it, but hey, we don't
> codify that requirement. Perhaps a patch where we say "if you are a
> large display attribute, we'll use seq_file and error on write because
> it isn't allowed" but that leaves the old buffer-based approach for
> normal-sized read-write attributes.

Now we are in need of *large* reads. We can add this feature and let it
evolve to the next level later when somebody needs to *set* a large
attribute.

Also, these changes do not result in any change in the user level
interface. So, we can afford this interface changes to change again
later.

> Joel
>
--

----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- [email protected] | .......you may get it.
----------------------------------------------------------------------


2006-10-11 00:15:39

by Joel Becker

[permalink] [raw]
Subject: Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Tue, Oct 10, 2006 at 04:13:52PM -0700, Chandra Seetharaman wrote:
> On Tue, 2006-10-10 at 14:58 -0700, Joel Becker wrote:
> > Well, they now have to learn seq_file. They now get to assume
>
> If they are simple users, they don't have to "learn" seq_file semantics,
> they would just replace their sprintf's with seq_printfs (as my changes
> in OCFS2 show).

The sed(1) is trivial sure. seq_file isn't terribly complex.
It's less about mere code knowledge and more about intention. Really,
it's how people understand configfs will deal with their attributes.

> "char *" can also be used to spew out large amount of data (ok, maybe up
> to PAGESIZE in configfs's case :). My point is that changing char * to
> seq_file doesn't necessarily "introduce" the issue (of spewing large
> amounts of data).

If I see a seq_file, I assume there are multiple things to
iterate over. Don't you?

> This issue is moot, unless you have intentions of changing the user
> interface of configfs to be anything other than a file system, isn't
> it ?

It could be today, without much trouble. The entire point is to
prevent client modules from implementing a filesystem or any filesystem
semantics. They implement an item hierarchy with attributes. The
attributes are read-write with ->show() and ->set(). The filesystem
should be invisible to the client.

> Now we are in need of *large* reads. We can add this feature and let it
> evolve to the next level later when somebody needs to *set* a large
> attribute.

I don't want some set of ad-hoc rules based on legacy broken
ideas. "Well, you can do this, or this, or this, or even this, and all
sort of work, but it's a mess" is not a good thing.

Joel

--

Life's Little Instruction Book #20

"Be forgiving of yourself and others."

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2006-10-11 00:50:05

by Matt Helsley

[permalink] [raw]
Subject: Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Tue, 2006-10-10 at 14:58 -0700, Joel Becker wrote:
> On Tue, Oct 10, 2006 at 02:31:43PM -0700, Paul Menage wrote:
> > > NAK. This forces a complex and inappropriate interface on the
> > >majority of users, and doesn't honor configfs' simplicity-first design.
> >
> > How is the seq_file interface complex and inappropriate? For the
> > configfs clients it's basically a drop-in replacement for sprintf(),
> > as Chandra's patches show.
>
> Well, they now have to learn seq_file. They now get to assume
> that "spewing large amounts of junk" is the default rather than "single
> attribute", which is correct. None of it is relevant for the majority
> of correct users.

We want to be able to export a sequence of small (<< 1 page),
homogenous, unstructured (scalar), attributes through configfs using the
same file. While this is rather specific, I'd guess it would be a common
occurrence.

Yes, keeping track of writing to these sequences (add, remove, replace)
is a problem. But that's what the file position is for. configfs could
support exporting sequences of common (scalar) types that need to be
exported from the kernel. Types like u32, u64, pid_t, etc. Then seek can
be used to index into the sequence to indicate append or replace. seek
and truncate would have to be translated into bytes so configfs can
manage the buffers behind the scenes while passing sequence position to
the code that uses configfs.

Does this seem reasonable?

Cheers,
-Matt Helsley

2006-10-11 01:29:09

by Joel Becker

[permalink] [raw]
Subject: Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Tue, Oct 10, 2006 at 05:49:59PM -0700, Matt Helsley wrote:
> We want to be able to export a sequence of small (<< 1 page),
> homogenous, unstructured (scalar), attributes through configfs using the
> same file. While this is rather specific, I'd guess it would be a common
> occurrence.

Pray tell, why? "One attribute per file" is the mantra here.
You really should think hard before you break it. Simple heuristic:
would you have to parse the buffer? Then it's wrong.

> Yes, keeping track of writing to these sequences (add, remove, replace)
> is a problem. But that's what the file position is for. configfs could
>
> Does this seem reasonable?

No. It adds complexity to an interface that is supposed to be
simple. Now, I'm not sure what you are trying to do here, so I don't
know how it fits in. Is it really "multiple attributes per file", or
"this attribute is a list of entries"?
An example. If you had to set an IP address and a port, here's
your scenarios:

[Right]
echo "10.0.0.1" > /sys/kernel/config/subsys/item/address
echo "8000" > /sys/kernel/config/subsys/item/port

[Wrong]
echo 10.0.0.1\n8000" > /sys/kernel/config/subsys/item/address-and-port

But perhaps you are not setting two distinct attributes. I
don't understand what you are doing, so some detail would be nice.

Joel

--

Life's Little Instruction Book #407

"Every once in a while, take the scenic route."

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2006-10-11 09:12:38

by Joel Becker

[permalink] [raw]
Subject: Re: [PATCH 2/5] Use seq_file for read side of operations

On Tue, Oct 10, 2006 at 11:20:55AM -0700, Chandra Seetharaman wrote:
> + error = single_open(file, configfs_read_file, buffer);
> + if (error) {
> + kfree(buffer);
> + goto Enomem;
> + }

Btw, with single_open(), how do you ever get more than one call
to ->show()? Shouldn't single_next() prevent that?

Joel

--

Life's Little Instruction Book #451

"Don't be afraid to say, 'I'm sorry.'"

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2006-10-11 20:20:04

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Tue, 10 Oct 2006 13:35:11 -0700
Joel Becker <[email protected]> wrote:

> On Tue, Oct 10, 2006 at 11:20:43AM -0700, Chandra Seetharaman wrote:
> > Currently, maximum amount of data that can be read from a configfs
> > attribute file is limited to PAGESIZE bytes. This is a limitation for
> > some of the usages of configfs.
>
> NAK. This forces a complex and inappropriate interface on the
> majority of users, and doesn't honor configfs' simplicity-first design.

The patch deletes a pile of custom code from configfs and replaces it with
calls to standard kernel infrastructure and fixes a shortcoming/bug in the
process. Migration over to the new interface is trivial and almost
scriptable.

Nice patch. What's not to like about it??

> I understand Chandra's concerns, but this patch isn't the right
> way to do it.

To do what? Fix the artificial PAGE_SIZE contraint? The patch would be
justified on cleanup grounds even if nobody was hitting that limit.

2006-10-11 21:42:07

by Joel Becker

[permalink] [raw]
Subject: Re: [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Wed, Oct 11, 2006 at 01:19:35PM -0700, Andrew Morton wrote:
> The patch deletes a pile of custom code from configfs and replaces it with
> calls to standard kernel infrastructure and fixes a shortcoming/bug in the
> process. Migration over to the new interface is trivial and almost
> scriptable.
>
> Nice patch. What's not to like about it??

We're discussing it both online and offline because there are
concerns they have that do need to be addressed.
What's not to like? If you need more than 4K to display an
attribute, odds are you have more than one attribute in the file, which
is Wrong. These guys, it turns out, don't, which is why I'm trying to
help them. But I don't want to encourage others to do the wrong thing.
I'm also unhappy with the lack of symmetry this creates on the
store/show() prototypes. We're discussing whether the store() side
should support larger-than-a-page writes, and how to do it.

Joel

--

"Conservative, n. A statesman who is enamoured of existing evils,
as distinguished from the Liberal, who wishes to replace them
with others."
- Ambrose Bierce, The Devil's Dictionary

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2006-10-11 22:18:48

by Joel Becker

[permalink] [raw]
Subject: Re: [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Wed, Oct 11, 2006 at 01:19:35PM -0700, Andrew Morton wrote:
> The patch deletes a pile of custom code from configfs and replaces it with
> calls to standard kernel infrastructure and fixes a shortcoming/bug in the
> process. Migration over to the new interface is trivial and almost
> scriptable.

The configfs stuff is based on the sysfs code too. Should we
migrate sysfs/file.c to the same seq_file code? Serious question, if
the cleanup is considered better.

Joel

--

"I almost ran over an angel
He had a nice big fat cigar.
'In a sense,' he said, 'You're alone here
So if you jump, you'd best jump far.'"

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2006-10-11 22:39:45

by Greg KH

[permalink] [raw]
Subject: Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Tue, Oct 10, 2006 at 06:28:51PM -0700, Joel Becker wrote:
> On Tue, Oct 10, 2006 at 05:49:59PM -0700, Matt Helsley wrote:
> > We want to be able to export a sequence of small (<< 1 page),
> > homogenous, unstructured (scalar), attributes through configfs using the
> > same file. While this is rather specific, I'd guess it would be a common
> > occurrence.
>
> Pray tell, why? "One attribute per file" is the mantra here.
> You really should think hard before you break it. Simple heuristic:
> would you have to parse the buffer? Then it's wrong.

I agree. You are trying to use configfs for something that it is not
entended to be used for. If you want to write/read large numbers of
attrbutes like this, use your own filesystem.

configfs has the same "one value per file" rule that sysfs has. And
because your userspace model doesn't fit that, don't try to change
configfs here.

What happened to your old ckrmfs? I thought you were handling all of
this in that.

thanks,

greg k-h

2006-10-11 22:49:01

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Wed, 11 Oct 2006 15:18:22 -0700
Joel Becker <[email protected]> wrote:

> On Wed, Oct 11, 2006 at 01:19:35PM -0700, Andrew Morton wrote:
> > The patch deletes a pile of custom code from configfs and replaces it with
> > calls to standard kernel infrastructure and fixes a shortcoming/bug in the
> > process. Migration over to the new interface is trivial and almost
> > scriptable.
>
> The configfs stuff is based on the sysfs code too. Should we
> migrate sysfs/file.c to the same seq_file code? Serious question, if
> the cleanup is considered better.
>

I don't see why not. I don't know if anyone has though of/proposed it
before.

2006-10-11 23:26:07

by Chandra Seetharaman

[permalink] [raw]
Subject: Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Wed, 2006-10-11 at 15:39 -0700, Greg KH wrote:
> On Tue, Oct 10, 2006 at 06:28:51PM -0700, Joel Becker wrote:
> > On Tue, Oct 10, 2006 at 05:49:59PM -0700, Matt Helsley wrote:
> > > We want to be able to export a sequence of small (<< 1 page),
> > > homogenous, unstructured (scalar), attributes through configfs using the
> > > same file. While this is rather specific, I'd guess it would be a common
> > > occurrence.
> >
> > Pray tell, why? "One attribute per file" is the mantra here.
> > You really should think hard before you break it. Simple heuristic:
> > would you have to parse the buffer? Then it's wrong.
>
> I agree. You are trying to use configfs for something that it is not
> entended to be used for. If you want to write/read large numbers of
> attrbutes like this, use your own filesystem.

I would say it is a "large attribute" not "large numbers of attributes".

>
> configfs has the same "one value per file" rule that sysfs has. And
> because your userspace model doesn't fit that, don't try to change
> configfs here.
>
> What happened to your old ckrmfs? I thought you were handling all of
> this in that.

We decided to use an existing infrastructure instead of having our own
file system.

configfs is a perfect fit for us, except the size limitation.

BTW, it it not just CKRM/RG, Paul Menage as recently extracted the
processes aggregation from cpuset to have an independent infrastructure
(http://marc.theaimsgroup.com/?l=ckrm-tech&m=116006307018720&w=2), which
has its own file system. I was advocating him to use configfs. But, he
also has this issue/limitation.

>
> thanks,
>
> greg k-h
--

----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- [email protected] | .......you may get it.
----------------------------------------------------------------------


2006-10-11 23:27:17

by Chandra Seetharaman

[permalink] [raw]
Subject: Re: [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Wed, 2006-10-11 at 15:48 -0700, Andrew Morton wrote:
> On Wed, 11 Oct 2006 15:18:22 -0700
> Joel Becker <[email protected]> wrote:
>
> > On Wed, Oct 11, 2006 at 01:19:35PM -0700, Andrew Morton wrote:
> > > The patch deletes a pile of custom code from configfs and replaces it with
> > > calls to standard kernel infrastructure and fixes a shortcoming/bug in the
> > > process. Migration over to the new interface is trivial and almost
> > > scriptable.
> >
> > The configfs stuff is based on the sysfs code too. Should we
> > migrate sysfs/file.c to the same seq_file code? Serious question, if
> > the cleanup is considered better.
> >
>
> I don't see why not. I don't know if anyone has though of/proposed it
> before.

I can generate a patch for that too.

--

----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- [email protected] | .......you may get it.
----------------------------------------------------------------------


2006-10-12 02:17:50

by Matt Helsley

[permalink] [raw]
Subject: Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Wed, 2006-10-11 at 15:39 -0700, Greg KH wrote:
> On Tue, Oct 10, 2006 at 06:28:51PM -0700, Joel Becker wrote:
> > On Tue, Oct 10, 2006 at 05:49:59PM -0700, Matt Helsley wrote:
> > > We want to be able to export a sequence of small (<< 1 page),
> > > homogenous, unstructured (scalar), attributes through configfs using the
> > > same file. While this is rather specific, I'd guess it would be a common
> > > occurrence.
> >
> > Pray tell, why? "One attribute per file" is the mantra here.
> > You really should think hard before you break it. Simple heuristic:
> > would you have to parse the buffer? Then it's wrong.
>
> I agree. You are trying to use configfs for something that it is not
> entended to be used for. If you want to write/read large numbers of

I disagree with your assertion that we're abusing configfs. "one value
per file" is not the purpose of configfs.

The purpose of configfs is to allow userspace to create and manipulate
kernel objects whose lifetime is under the control of userspace. That
perfectly matches the idea of being able to create, manipulate, and
destroy a resource group from userspace.

"one value per file" is a phrase describing what configfs and sysfs
files should normally look like. However it's not a rule since there is
precedent for sysfs files that require parsing:
/sys/devices/pciXXXX:XX/XXXX:XX:XX.X/resource
/sys/block/hda/stat
/sys/block/hda/dev

These are counterexamples to your assertion below that "one value per
file" is a rule.

> attrbutes like this, use your own filesystem.

This conflicts with the idea of reusing kernel code that was made to be
reused. Except for this 1 page limit configfs is nearly a perfect match.
I doubt we'd get a favorable reaction if we said:
"OK, let's copy configfs then remove the page size limit."

> configfs has the same "one value per file" rule that sysfs has. And
> because your userspace model doesn't fit that, don't try to change
> configfs here.

Please see my counterexample above.

> What happened to your old ckrmfs? I thought you were handling all of
> this in that.

We dropped RCFS more than 1 year ago after feedback suggested we should
try to share as much kernel code as possible. Other than the 1 page
limit to the list of pids, configfs is a perfect match for what we need.

Cheers,
-Matt Helsley

2006-10-12 04:17:34

by Paul Jackson

[permalink] [raw]
Subject: Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs

> I agree. You are trying to use configfs for something that it is not
> entended to be used for.

Yup - but perhaps the best answer is that the design should be
extended, to handle a simple vector, such as a list of task process
id's or a list of CPU numbers.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2006-10-13 00:04:54

by Greg KH

[permalink] [raw]
Subject: Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Wed, Oct 11, 2006 at 07:17:44PM -0700, Matt Helsley wrote:
> On Wed, 2006-10-11 at 15:39 -0700, Greg KH wrote:
> > On Tue, Oct 10, 2006 at 06:28:51PM -0700, Joel Becker wrote:
> > > On Tue, Oct 10, 2006 at 05:49:59PM -0700, Matt Helsley wrote:
> > > > We want to be able to export a sequence of small (<< 1 page),
> > > > homogenous, unstructured (scalar), attributes through configfs using the
> > > > same file. While this is rather specific, I'd guess it would be a common
> > > > occurrence.
> > >
> > > Pray tell, why? "One attribute per file" is the mantra here.
> > > You really should think hard before you break it. Simple heuristic:
> > > would you have to parse the buffer? Then it's wrong.
> >
> > I agree. You are trying to use configfs for something that it is not
> > entended to be used for. If you want to write/read large numbers of
>
> I disagree with your assertion that we're abusing configfs. "one value
> per file" is not the purpose of configfs.
>
> The purpose of configfs is to allow userspace to create and manipulate
> kernel objects whose lifetime is under the control of userspace. That
> perfectly matches the idea of being able to create, manipulate, and
> destroy a resource group from userspace.
>
> "one value per file" is a phrase describing what configfs and sysfs
> files should normally look like. However it's not a rule since there is
> precedent for sysfs files that require parsing:
> /sys/devices/pciXXXX:XX/XXXX:XX:XX.X/resource

Hold over from old procfs use of pci resources. And now a requirement
that X uses this, after it was pointed out to me.

> /sys/block/hda/stat

I never have liked that, it belongs somewhere else. Please move it.

> /sys/block/hda/dev

That is merely a dev_t, a single value.

Come on, there are way worse violators[1] of the "one value per file"
rule in sysfs that you could have found if you wanted to try to declare
how much it is not being followed in places. But if you look, the ratio
of good vs. bad usage is still very high, and keeping it high is my
goal.

> > What happened to your old ckrmfs? I thought you were handling all of
> > this in that.
>
> We dropped RCFS more than 1 year ago after feedback suggested we should
> try to share as much kernel code as possible. Other than the 1 page
> limit to the list of pids, configfs is a perfect match for what we need.

Feedback from whom? I know I sure liked the fact that you did your own
filesystem, despite the bugs that were found in it at times :)

thanks,

greg k-h

2006-10-13 00:04:47

by Greg KH

[permalink] [raw]
Subject: Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Wed, Oct 11, 2006 at 04:26:00PM -0700, Chandra Seetharaman wrote:
> On Wed, 2006-10-11 at 15:39 -0700, Greg KH wrote:
> > On Tue, Oct 10, 2006 at 06:28:51PM -0700, Joel Becker wrote:
> > > On Tue, Oct 10, 2006 at 05:49:59PM -0700, Matt Helsley wrote:
> > > > We want to be able to export a sequence of small (<< 1 page),
> > > > homogenous, unstructured (scalar), attributes through configfs using the
> > > > same file. While this is rather specific, I'd guess it would be a common
> > > > occurrence.
> > >
> > > Pray tell, why? "One attribute per file" is the mantra here.
> > > You really should think hard before you break it. Simple heuristic:
> > > would you have to parse the buffer? Then it's wrong.
> >
> > I agree. You are trying to use configfs for something that it is not
> > entended to be used for. If you want to write/read large numbers of
> > attrbutes like this, use your own filesystem.
>
> I would say it is a "large attribute" not "large numbers of attributes".

That attribute seems to violate the rule of "only one thing" and needs
to be parsed. That does not seem like a good fit for configfs or sysfs.

> > configfs has the same "one value per file" rule that sysfs has. And
> > because your userspace model doesn't fit that, don't try to change
> > configfs here.
> >
> > What happened to your old ckrmfs? I thought you were handling all of
> > this in that.
>
> We decided to use an existing infrastructure instead of having our own
> file system.
>
> configfs is a perfect fit for us, except the size limitation.

Then it's not a perfect fit, sorry, as you are trying to get it to do
things it is not supposed to, or designed to, do.

> BTW, it it not just CKRM/RG, Paul Menage as recently extracted the
> processes aggregation from cpuset to have an independent infrastructure
> (http://marc.theaimsgroup.com/?l=ckrm-tech&m=116006307018720&w=2), which
> has its own file system. I was advocating him to use configfs. But, he
> also has this issue/limitation.

That's one reason it is so easy to just write your own filesystem then.
What is it these days, less than 200 lines of code? I bet you can even
condence more things to make it 100 lines if you really try. That seems
much more sane than trying to bend configfs into something different.

Why are people so opposed to creating their own filesystems?

thanks,

greg k-h

2006-10-13 00:17:09

by Paul Jackson

[permalink] [raw]
Subject: Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs

> Why are people so opposed to creating their own filesystems?

Well ... I'm not ;). Though I will be eternally grateful to
Simon Derr for doing the heavy lifting needed to create the
cpuset filesystem.

For those of us whose brains don't hold so many details at once,
creating a new file system can seem a bit daunting. And for those
of us not skilled in the art, it is more likely to end up being
300 lines of code, presenting several good provocations for a Hellwig
or a Viro to curse in the general direction of their monitors.

Instead of trying to hijack configfs to purposes ill suited for it,
I wonder if there isn't someway to lower the hurdles that us mere
mortals must leap to creating additional filesystems.

It shouldn't take that much lowering.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2006-10-13 03:23:00

by Matt Helsley

[permalink] [raw]
Subject: Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Thu, 2006-10-12 at 16:54 -0700, Greg KH wrote:
> On Wed, Oct 11, 2006 at 07:17:44PM -0700, Matt Helsley wrote:
> > On Wed, 2006-10-11 at 15:39 -0700, Greg KH wrote:
> > > On Tue, Oct 10, 2006 at 06:28:51PM -0700, Joel Becker wrote:
> > > > On Tue, Oct 10, 2006 at 05:49:59PM -0700, Matt Helsley wrote:
> > > > > We want to be able to export a sequence of small (<< 1 page),
> > > > > homogenous, unstructured (scalar), attributes through configfs using the
> > > > > same file. While this is rather specific, I'd guess it would be a common
> > > > > occurrence.
> > > >
> > > > Pray tell, why? "One attribute per file" is the mantra here.
> > > > You really should think hard before you break it. Simple heuristic:
> > > > would you have to parse the buffer? Then it's wrong.
> > >
> > > I agree. You are trying to use configfs for something that it is not
> > > entended to be used for. If you want to write/read large numbers of
> >
> > I disagree with your assertion that we're abusing configfs. "one value
> > per file" is not the purpose of configfs.
> >
> > The purpose of configfs is to allow userspace to create and manipulate
> > kernel objects whose lifetime is under the control of userspace. That
> > perfectly matches the idea of being able to create, manipulate, and
> > destroy a resource group from userspace.
> >
> > "one value per file" is a phrase describing what configfs and sysfs
> > files should normally look like. However it's not a rule since there is
> > precedent for sysfs files that require parsing:
> > /sys/devices/pciXXXX:XX/XXXX:XX:XX.X/resource
>
> Hold over from old procfs use of pci resources. And now a requirement
> that X uses this, after it was pointed out to me.
>
> > /sys/block/hda/stat
>
> I never have liked that, it belongs somewhere else. Please move it.
>
> > /sys/block/hda/dev
>
> That is merely a dev_t, a single value.

It's a single value internally but it's presented as multiple values --
and it needs to be parsed. Following the guidelines it should have been
a directory with a single value per file:

/sys/block/hda/dev/major
/sys/block/hda/dev/minor

So the parsing test cited earlier fails. It's not a matter of whether
parsing is required but of how complex the parsing is.

> Come on, there are way worse violators[1] of the "one value per file"

You mean like /sys/class/input/input0/modalias ?

It was not my intention to catalog all of the violators -- only to point
out that there are counterexamples that prove it's not a rule as you've
asserted.

> rule in sysfs that you could have found if you wanted to try to declare
> how much it is not being followed in places. But if you look, the ratio
> of good vs. bad usage is still very high, and keeping it high is my
> goal.
>
> > > What happened to your old ckrmfs? I thought you were handling all of
> > > this in that.
> >
> > We dropped RCFS more than 1 year ago after feedback suggested we should
> > try to share as much kernel code as possible. Other than the 1 page
> > limit to the list of pids, configfs is a perfect match for what we need.
>
> Feedback from whom? I know I sure liked the fact that you did your own
> filesystem, despite the bugs that were found in it at times :)

This article on LWN suggested the idea of using configfs:
http://lwn.net/Articles/148180/
Though I think the idea may have come up earlier I haven't been able to
find it with Google. Shailabh may be able to pinpoint the exact source
of the idea.

We also got feedback along the lines of "it's too
big" ( http://lwn.net/Articles/145135/ ) which motivated us to find ways
to shrink it. Using appropriate kernel systems that already exist is an
excellent way to reduce the size of the patches and cut down on the bugs
-- and we can contribute by fixing some configfs bugs. Of course using
configfs saves quite a bit of code:
http://lwn.net/Articles/149275/

I've also gotten the impression that new filesystem code tends to
distract (or, worse, confuse) folks from the meat of the patches. It
takes up review time that's better spent on the resource groups core and
controllers.

Cheers,
-Matt Helsley

2006-10-13 23:38:10

by Matt Helsley

[permalink] [raw]
Subject: Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Thu, 2006-10-12 at 17:16 -0700, Paul Jackson wrote:

<snip>

> For those of us whose brains don't hold so many details at once,
> creating a new file system can seem a bit daunting. And for those
> of us not skilled in the art, it is more likely to end up being
> 300 lines of code, presenting several good provocations for a Hellwig
> or a Viro to curse in the general direction of their monitors.

Definitely.

> Instead of trying to hijack configfs to purposes ill suited for it,
> I wonder if there isn't someway to lower the hurdles that us mere
> mortals must leap to creating additional filesystems.

Again, I don't think using configfs to define groups of tasks is
ill-suited to the purpose of configfs. I think we're confusing the
limitations of the implementation with the purpose for having configfs
in addition to sysfs.

Cheers,
-Matt Helsley

2006-10-13 23:40:12

by Matt Helsley

[permalink] [raw]
Subject: Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Thu, 2006-10-12 at 16:51 -0700, Greg KH wrote:

<snip>

> > BTW, it it not just CKRM/RG, Paul Menage as recently extracted the
> > processes aggregation from cpuset to have an independent infrastructure
> > (http://marc.theaimsgroup.com/?l=ckrm-tech&m=116006307018720&w=2), which
> > has its own file system. I was advocating him to use configfs. But, he
> > also has this issue/limitation.
>
> That's one reason it is so easy to just write your own filesystem then.
> What is it these days, less than 200 lines of code? I bet you can even

For my_school_project_fs perhaps 200 lines is sufficient.

Paul Menage's patch which Chandra was referring to:

http://lkml.org/lkml/2006/9/28/104

is 1700 insertions. RCFS was around 1500 lines -- similar to Paul's
patch -- before we moved to configfs and reduced that to about 300-400
lines. This suggests we'd need around 1500 lines of filesystem code --
7.5 times your estimate.

> condence more things to make it 100 lines if you really try. That seems
> much more sane than trying to bend configfs into something different.

I don't agree. I think it's insane not to use configfs just because we
need a list of pids for each group of tasks.

> Why are people so opposed to creating their own filesystems?

There are lots of reasons not to create your own filesystem. When
you're not already a kernel maintainer it's no small task to create and
get a non-trivial filesystem accepted into the kernel. Getting people to
review whole new filesystems has its own problems:

http://www.ussg.iu.edu/hypermail/linux/kernel/0610.1/1928.html

Cheers,
-Matt Helsley


2006-10-13 23:47:24

by Paul Menage

[permalink] [raw]
Subject: Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On 10/13/06, Matt Helsley <[email protected]> wrote:
> On Thu, 2006-10-12 at 16:51 -0700, Greg KH wrote:
>
> <snip>
>
> > > BTW, it it not just CKRM/RG, Paul Menage as recently extracted the
> > > processes aggregation from cpuset to have an independent infrastructure
> > > (http://marc.theaimsgroup.com/?l=ckrm-tech&m=116006307018720&w=2), which
> > > has its own file system. I was advocating him to use configfs. But, he
> > > also has this issue/limitation.
> >
> > That's one reason it is so easy to just write your own filesystem then.
> > What is it these days, less than 200 lines of code? I bet you can even
>
> For my_school_project_fs perhaps 200 lines is sufficient.
>
> Paul Menage's patch which Chandra was referring to:
>
> http://lkml.org/lkml/2006/9/28/104
>
> is 1700 insertions.

To be fair, only about 350 lines of that is filesystem boilerplate.
There's also maybe 100-200 lines of interfacing with the filesystem,
but they'd probably be there as configfs-interfacing code if it was
over configfs.

Paul

2006-10-14 06:18:15

by Greg KH

[permalink] [raw]
Subject: Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Fri, Oct 13, 2006 at 04:40:08PM -0700, Matt Helsley wrote:
> On Thu, 2006-10-12 at 16:51 -0700, Greg KH wrote:
>
> <snip>
>
> > > BTW, it it not just CKRM/RG, Paul Menage as recently extracted the
> > > processes aggregation from cpuset to have an independent infrastructure
> > > (http://marc.theaimsgroup.com/?l=ckrm-tech&m=116006307018720&w=2), which
> > > has its own file system. I was advocating him to use configfs. But, he
> > > also has this issue/limitation.
> >
> > That's one reason it is so easy to just write your own filesystem then.
> > What is it these days, less than 200 lines of code? I bet you can even
>
> For my_school_project_fs perhaps 200 lines is sufficient.
>
> Paul Menage's patch which Chandra was referring to:
>
> http://lkml.org/lkml/2006/9/28/104
>
> is 1700 insertions. RCFS was around 1500 lines -- similar to Paul's
> patch -- before we moved to configfs and reduced that to about 300-400
> lines. This suggests we'd need around 1500 lines of filesystem code --
> 7.5 times your estimate.

Then I suggest that you are doing something extremely wrong here. The
base filesystem code for both debugfs and securityfs, is around 200
lines of code, including comments. And they are both not
"my_school_project_fs".

If you want to get a bit fancier, and parse some mount options and
provide a persistant mount state, like usbfs, it grows to about 350
lines of code.

Again, not a "fake" filesystem by any means.

And, for another example, ndevfs was posted to lkml over a year ago as a
bad joke and can be found at:
http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/bad/ndevfs.patch
and weighs in at a whopping 249 lines for all of inode.c (comments and
whitespace included.) It was a full-blown filesystem that handled
almost exactly what you will need to handle.

So please, before you critise me for not knowing exactly how much work
it takes to implement a ram-based filesystem for something like this,
take a look at a few of the already published and shipping
implementations, and note who wrote them...

Otherwise you look quite foolish.

> > condence more things to make it 100 lines if you really try. That seems
> > much more sane than trying to bend configfs into something different.
>
> I don't agree. I think it's insane not to use configfs just because we
> need a list of pids for each group of tasks.

Perhaps because your usage is not what it is intended for? Yeah, I
know, if all you have is a hammer...

> > Why are people so opposed to creating their own filesystems?
>
> There are lots of reasons not to create your own filesystem. When
> you're not already a kernel maintainer it's no small task to create and
> get a non-trivial filesystem accepted into the kernel. Getting people to
> review whole new filesystems has its own problems:
>
> http://www.ussg.iu.edu/hypermail/linux/kernel/0610.1/1928.html

There's a world of difference between something as complex as unionfs
and something that merely makes a few calls to the libfs code already in
the kernel.

And if you don't realize this, then yes, I would not recommend that you
should write your own fs. But I would then recommend that the task then
be passed on to someone else in your group who does have the capability
to do so...

greg k-h

2006-10-14 17:42:34

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Wed, Oct 11, 2006 at 04:27:13PM -0700, Chandra Seetharaman wrote:
> On Wed, 2006-10-11 at 15:48 -0700, Andrew Morton wrote:
> > On Wed, 11 Oct 2006 15:18:22 -0700
> > Joel Becker <[email protected]> wrote:
> >
> > > On Wed, Oct 11, 2006 at 01:19:35PM -0700, Andrew Morton wrote:
> > > > The patch deletes a pile of custom code from configfs and replaces it with
> > > > calls to standard kernel infrastructure and fixes a shortcoming/bug in the
> > > > process. Migration over to the new interface is trivial and almost
> > > > scriptable.
> > >
> > > The configfs stuff is based on the sysfs code too. Should we
> > > migrate sysfs/file.c to the same seq_file code? Serious question, if
> > > the cleanup is considered better.
> > >
> >
> > I don't see why not. I don't know if anyone has though of/proposed it
> > before.
>
> I can generate a patch for that too.

Argh!!!!

Are you going to honestly tell me you have a single attribute in sysfs
that is larger than PAGE_SIZE? If you are getting anywhere close to
this, then something is really wrong and we need to talk.

greg k-h

2006-10-14 19:44:17

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Sat, 14 Oct 2006 01:01:07 -0700
Greg KH <[email protected]> wrote:

> On Wed, Oct 11, 2006 at 04:27:13PM -0700, Chandra Seetharaman wrote:
> > On Wed, 2006-10-11 at 15:48 -0700, Andrew Morton wrote:
> > > On Wed, 11 Oct 2006 15:18:22 -0700
> > > Joel Becker <[email protected]> wrote:
> > >
> > > > On Wed, Oct 11, 2006 at 01:19:35PM -0700, Andrew Morton wrote:
> > > > > The patch deletes a pile of custom code from configfs and replaces it with
> > > > > calls to standard kernel infrastructure and fixes a shortcoming/bug in the
> > > > > process. Migration over to the new interface is trivial and almost
> > > > > scriptable.
> > > >
> > > > The configfs stuff is based on the sysfs code too. Should we
> > > > migrate sysfs/file.c to the same seq_file code? Serious question, if
> > > > the cleanup is considered better.
> > > >
> > >
> > > I don't see why not. I don't know if anyone has though of/proposed it
> > > before.
> >
> > I can generate a patch for that too.
>
> Argh!!!!
>
> Are you going to honestly tell me you have a single attribute in sysfs
> that is larger than PAGE_SIZE?

He does not. It's a matter of reusing existing facilities rather than
impementing similar things in multiple places. The equivalent patch in
configfs removed a decent amount of code:

fs/configfs/file.c | 130 ++++++++++-------------------------------------
include/linux/configfs.h | 3 -
2 files changed, 31 insertions(+), 102 deletions(-)

2006-10-14 20:11:16

by Joel Becker

[permalink] [raw]
Subject: Re: [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Sat, Oct 14, 2006 at 12:43:51PM -0700, Andrew Morton wrote:
> On Sat, 14 Oct 2006 01:01:07 -0700
> Greg KH <[email protected]> wrote:
> > Argh!!!!
> >
> > Are you going to honestly tell me you have a single attribute in sysfs
> > that is larger than PAGE_SIZE?
>
> He does not. It's a matter of reusing existing facilities rather than
> impementing similar things in multiple places. The equivalent patch in
> configfs removed a decent amount of code:

Issues of PAGE_SIZE and attribute size aside, the patch posted
was incorrect. While they used seq_file, they implemented it in a
completely inefficent fashion, filling the entire buffer in one ->show()
call rather than letting multiple read(2) calls iterate their list.

Joel

--

"Anything that is too stupid to be spoken is sung."
- Voltaire

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2006-10-14 23:14:55

by Matt Helsley

[permalink] [raw]
Subject: Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Fri, 2006-10-13 at 23:17 -0700, Greg KH wrote:
> On Fri, Oct 13, 2006 at 04:40:08PM -0700, Matt Helsley wrote:
> > On Thu, 2006-10-12 at 16:51 -0700, Greg KH wrote:
> >
> > <snip>
> >
> > > > BTW, it it not just CKRM/RG, Paul Menage as recently extracted the
> > > > processes aggregation from cpuset to have an independent infrastructure
> > > > (http://marc.theaimsgroup.com/?l=ckrm-tech&m=116006307018720&w=2), which
> > > > has its own file system. I was advocating him to use configfs. But, he
> > > > also has this issue/limitation.
> > >
> > > That's one reason it is so easy to just write your own filesystem then.
> > > What is it these days, less than 200 lines of code? I bet you can even
> >
> > For my_school_project_fs perhaps 200 lines is sufficient.
> >
> > Paul Menage's patch which Chandra was referring to:
> >
> > http://lkml.org/lkml/2006/9/28/104
> >
> > is 1700 insertions. RCFS was around 1500 lines -- similar to Paul's
> > patch -- before we moved to configfs and reduced that to about 300-400
> > lines. This suggests we'd need around 1500 lines of filesystem code --
> > 7.5 times your estimate.
>
> Then I suggest that you are doing something extremely wrong here. The
> base filesystem code for both debugfs and securityfs, is around 200
> lines of code, including comments. And they are both not
> "my_school_project_fs".
>
> If you want to get a bit fancier, and parse some mount options and
> provide a persistant mount state, like usbfs, it grows to about 350
> lines of code.
>
> Again, not a "fake" filesystem by any means.
>
> And, for another example, ndevfs was posted to lkml over a year ago as a
> bad joke and can be found at:
> http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/bad/ndevfs.patch
> and weighs in at a whopping 249 lines for all of inode.c (comments and
> whitespace included.) It was a full-blown filesystem that handled
> almost exactly what you will need to handle.
>
> So please, before you critise me for not knowing exactly how much work
> it takes to implement a ram-based filesystem for something like this,
> take a look at a few of the already published and shipping
> implementations, and note who wrote them...
>
> Otherwise you look quite foolish.
>
> > > condence more things to make it 100 lines if you really try. That seems
> > > much more sane than trying to bend configfs into something different.
> >
> > I don't agree. I think it's insane not to use configfs just because we
> > need a list of pids for each group of tasks.
>
> Perhaps because your usage is not what it is intended for? Yeah, I
> know, if all you have is a hammer...
>
> > > Why are people so opposed to creating their own filesystems?
> >
> > There are lots of reasons not to create your own filesystem. When
> > you're not already a kernel maintainer it's no small task to create and
> > get a non-trivial filesystem accepted into the kernel. Getting people to
> > review whole new filesystems has its own problems:
> >
> > http://www.ussg.iu.edu/hypermail/linux/kernel/0610.1/1928.html
>
> There's a world of difference between something as complex as unionfs
> and something that merely makes a few calls to the libfs code already in
> the kernel.
>
> And if you don't realize this, then yes, I would not recommend that you
> should write your own fs. But I would then recommend that the task then
> be passed on to someone else in your group who does have the capability
> to do so...
>
> greg k-h

You're right. I'm sorry I suggested that 200 lines might be too small
an estimate. Clearly that was a moronic suggestion on my part and I'm
sorry you took it so personally.

-Matt Helsley

2006-10-16 19:10:32

by Chandra Seetharaman

[permalink] [raw]
Subject: Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Thu, 2006-10-12 at 16:51 -0700, Greg KH wrote:
> On Wed, Oct 11, 2006 at 04:26:00PM -0700, Chandra Seetharaman wrote:
> > On Wed, 2006-10-11 at 15:39 -0700, Greg KH wrote:
> > > On Tue, Oct 10, 2006 at 06:28:51PM -0700, Joel Becker wrote:
> > > > On Tue, Oct 10, 2006 at 05:49:59PM -0700, Matt Helsley wrote:
> > > > > We want to be able to export a sequence of small (<< 1 page),
> > > > > homogenous, unstructured (scalar), attributes through configfs using the
> > > > > same file. While this is rather specific, I'd guess it would be a common
> > > > > occurrence.
> > > >
> > > > Pray tell, why? "One attribute per file" is the mantra here.
> > > > You really should think hard before you break it. Simple heuristic:
> > > > would you have to parse the buffer? Then it's wrong.
> > >
> > > I agree. You are trying to use configfs for something that it is not
> > > entended to be used for. If you want to write/read large numbers of
> > > attrbutes like this, use your own filesystem.
> >
> > I would say it is a "large attribute" not "large numbers of attributes".
>
> That attribute seems to violate the rule of "only one thing" and needs
> to be parsed. That does not seem like a good fit for configfs or sysfs.

As pointed by Paul Jackson in a reply, what we are trying to get is a
list of pids, which can be termed as vectors and doesn't need serious
parsing, but simply separating the items in the list by a '\n'.

What I was trying to do in the patches is to remove some code (which
maintains its own buffer) by an existing mechanism, seq_file.

As Andrew pointed in one of the email, my patch set reduces the number
of lines of code in configfs also.

You do not think that is a good idea ?

>
>
> > > configfs has the same "one value per file" rule that sysfs has. And
> > > because your userspace model doesn't fit that, don't try to change
> > > configfs here.
> > >
> > > What happened to your old ckrmfs? I thought you were handling all of
> > > this in that.
> >
> > We decided to use an existing infrastructure instead of having our own
> > file system.
> >
> > configfs is a perfect fit for us, except the size limitation.
>
> Then it's not a perfect fit, sorry, as you are trying to get it to do
> things it is not supposed to, or designed to, do.
>
> > BTW, it it not just CKRM/RG, Paul Menage as recently extracted the
> > processes aggregation from cpuset to have an independent infrastructure
> > (http://marc.theaimsgroup.com/?l=ckrm-tech&m=116006307018720&w=2), which
> > has its own file system. I was advocating him to use configfs. But, he
> > also has this issue/limitation.
>
> That's one reason it is so easy to just write your own filesystem then.
> What is it these days, less than 200 lines of code? I bet you can even
> condence more things to make it 100 lines if you really try. That seems
> much more sane than trying to bend configfs into something different.

I thought the mantra is "Use existing infrastructure, instead of writing
your own".

>
> Why are people so opposed to creating their own filesystems?

I do not think we were/are. As you pointed we did have our own file
system. And as I mentioned, we moved to configfs mainly to reduce code
size and to start using the existing infrastructure.

Same is true even for the (cpuset extracted) patch set submitted by Paul
Menage. It does have its own file system. I was advocating him to use
configfs for the same reason (to use existing interface).

> thanks,
>
> greg k-h
>
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> ckrm-tech mailing list
> https://lists.sourceforge.net/lists/listinfo/ckrm-tech
--

----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- [email protected] | .......you may get it.
----------------------------------------------------------------------


2006-10-16 19:16:55

by Chandra Seetharaman

[permalink] [raw]
Subject: Re: [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Sat, 2006-10-14 at 01:01 -0700, Greg KH wrote:
> On Wed, Oct 11, 2006 at 04:27:13PM -0700, Chandra Seetharaman wrote:
> > On Wed, 2006-10-11 at 15:48 -0700, Andrew Morton wrote:
> > > On Wed, 11 Oct 2006 15:18:22 -0700
> > > Joel Becker <[email protected]> wrote:
> > >
> > > > On Wed, Oct 11, 2006 at 01:19:35PM -0700, Andrew Morton wrote:
> > > > > The patch deletes a pile of custom code from configfs and replaces it with
> > > > > calls to standard kernel infrastructure and fixes a shortcoming/bug in the
> > > > > process. Migration over to the new interface is trivial and almost
> > > > > scriptable.
> > > >
> > > > The configfs stuff is based on the sysfs code too. Should we
> > > > migrate sysfs/file.c to the same seq_file code? Serious question, if
> > > > the cleanup is considered better.
> > > >
> > >
> > > I don't see why not. I don't know if anyone has though of/proposed it
> > > before.
> >
> > I can generate a patch for that too.
>
> Argh!!!!
>
> Are you going to honestly tell me you have a single attribute in sysfs
> that is larger than PAGE_SIZE? If you are getting anywhere close to
> this, then something is really wrong and we need to talk.

I was replying to Andrew's reply in the thread. Thats all.

I do _not_ have any personal interests here (sysfs).

>
> greg k-h
--

----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- [email protected] | .......you may get it.
----------------------------------------------------------------------


2006-10-16 19:24:41

by Chandra Seetharaman

[permalink] [raw]
Subject: Re: [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Sat, 2006-10-14 at 13:10 -0700, Joel Becker wrote:
> On Sat, Oct 14, 2006 at 12:43:51PM -0700, Andrew Morton wrote:
> > On Sat, 14 Oct 2006 01:01:07 -0700
> > Greg KH <[email protected]> wrote:
> > > Argh!!!!
> > >
> > > Are you going to honestly tell me you have a single attribute in sysfs
> > > that is larger than PAGE_SIZE?
> >
> > He does not. It's a matter of reusing existing facilities rather than
> > impementing similar things in multiple places. The equivalent patch in
> > configfs removed a decent amount of code:
>
> Issues of PAGE_SIZE and attribute size aside, the patch posted
> was incorrect. While they used seq_file, they implemented it in a

I do not think it is incorrect. It provides a simplest interface to
configfs's clients.

> completely inefficent fashion, filling the entire buffer in one ->show()
> call rather than letting multiple read(2) calls iterate their list.

So, are you suggesting that we should provide a generic seq_operations
interface to configfs's clients ?

>
> Joel
>
--

----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- [email protected] | .......you may get it.
----------------------------------------------------------------------


2006-10-16 20:33:16

by Paul Jackson

[permalink] [raw]
Subject: Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs

Chandra wrote:
> As Andrew pointed in one of the email, my patch set reduces the number
> of lines of code in configfs also.

I think Andrew has mentioned this a couple of times ;)

Actually, I've never been particularly happy with the implicit
PAGESIZE limit on these writes. It's dangerous coding practice to
pass someone a buffer pointer in which they are to write, without
passing a corresponding length.

If it is appropriate for configfs to have some such fixed limit on
file lengths, then that should be a formal part of the API or imposed
in a safer manner than hoping the callback routine doesn't overwrite
a buffer.

Whether or not it should use homebrew code as it does now, or Chandra's
patch to make use of existing infrastructure should be a separate
question. I think Andrew's observations apply here.

And whether or not we add support for a vector of scalars of the same
type and meaning should be yet another separate question. No doubt
reasonable minds will differ on this question, though so far that
discussion has been clouded by these other issues.

Greg seems to be suggesting that if we use Chandra's patch, we cannot
impose any length limit, and that this opens the cover to Pandora's
box of all manner of changing and complex output per file.

Well, I could code some pretty ugly output in a single page, if I
set my mind to it. But it would be rejected, because we've learned
that hurts.

>From that I conclude that it is not the PAGESIZE limit that is saving
us from more unparseable file formats, but the discipline we have
gained from learning the hard way.

Chandra - I haven't looked at seq file lately - could a user of it
such as configfs impose a length limit of its choosing, building on
your patch, without pushing the number of lines of code back above
where it started?

Perhaps, say, we would let the callback routines could push stuff into
a seq file without small limits, but then the configfs code could
truncate that output to a limit of its choosing. This would impose a
length limit, safely.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2006-10-16 22:29:07

by Chandra Seetharaman

[permalink] [raw]
Subject: Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Mon, 2006-10-16 at 13:32 -0700, Paul Jackson wrote:

<snip>

> Chandra - I haven't looked at seq file lately - could a user of it
> such as configfs impose a length limit of its choosing, building on

Quick look at the seq_file interfaces shows there is no such capability.
(disclaimer: I am no expert of seq_file :)

> your patch, without pushing the number of lines of code back above
> where it started?
>
> Perhaps, say, we would let the callback routines could push stuff into
> a seq file without small limits, but then the configfs code could
> truncate that output to a limit of its choosing. This would impose a
> length limit, safely.
>
--

----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- [email protected] | .......you may get it.
----------------------------------------------------------------------


2006-10-16 23:09:58

by Joel Becker

[permalink] [raw]
Subject: Re: [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Mon, Oct 16, 2006 at 12:24:35PM -0700, Chandra Seetharaman wrote:
> On Sat, 2006-10-14 at 13:10 -0700, Joel Becker wrote:
> > Issues of PAGE_SIZE and attribute size aside, the patch posted
> > was incorrect. While they used seq_file, they implemented it in a
>
> I do not think it is incorrect. It provides a simplest interface to
> configfs's clients.

I understand that it provides a simple sed(1)-based change for
existing clients. However, it abuses seq_file in exactly the wrong way.

> > completely inefficent fashion, filling the entire buffer in one ->show()
> > call rather than letting multiple read(2) calls iterate their list.
>
> So, are you suggesting that we should provide a generic seq_operations
> interface to configfs's clients ?

Yes, if we provide a vector type in any way, it should be via
seq_operations or something like it. If we're going to use seq_file, we
should use it correctly.

Joel

--

"What no boss of a programmer can ever understand is that a programmer
is working when he's staring out of the window"
- With apologies to Burton Rascoe


Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2006-10-17 03:00:22

by Paul Jackson

[permalink] [raw]
Subject: Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs

> Quick look at the seq_file interfaces shows there is no such capability.

Perhaps a seq file size limit could be added. Not sure if that's
a good idea or not ...

Ah - cap the count + ppos the user passed in to configfs_read_file,
before passing these values to flush_read_buffer().

If the user asks for more than is allowed, give them only what they are
allowed, by passing a smaller count to flush_read_buffer(). If they
start at a position past what's allowed, force a huge ppos and let them
see the resulting EOF. Disclaimer - I too am no seq_file expert ;).

This should be just a few more lines in configfs_read_file() on
top of your current patches adapting it to seq_file.

Granted - it is not going in directly in one step to the objective you
seek, that being a configfs suitable for displaying a long vector of
process id's, so that Resource Groups can make use of it (and maybe
someday cpusets, too.)

But it gains the code reduction and reuse benefits of your patch
set, and gets this conversation unwedged, so we can go on to discuss
whether or not it would be a good idea go add a suitable vector
interface to seq_file, without threatening the excellent improvements
that sysfs/configfs have made over the old /proc style mess.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2006-10-18 00:55:31

by Chandra Seetharaman

[permalink] [raw]
Subject: Re: [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Mon, 2006-10-16 at 16:09 -0700, Joel Becker wrote:
> On Mon, Oct 16, 2006 at 12:24:35PM -0700, Chandra Seetharaman wrote:
> > On Sat, 2006-10-14 at 13:10 -0700, Joel Becker wrote:
> > > Issues of PAGE_SIZE and attribute size aside, the patch posted
> > > was incorrect. While they used seq_file, they implemented it in a
> >
> > I do not think it is incorrect. It provides a simplest interface to
> > configfs's clients.
>
> I understand that it provides a simple sed(1)-based change for
> existing clients. However, it abuses seq_file in exactly the wrong way.
>
> > > completely inefficent fashion, filling the entire buffer in one ->show()
> > > call rather than letting multiple read(2) calls iterate their list.
> >
> > So, are you suggesting that we should provide a generic seq_operations
> > interface to configfs's clients ?
>
> Yes, if we provide a vector type in any way, it should be via
> seq_operations or something like it. If we're going to use seq_file, we
> should use it correctly.
>

So, you want me to make a patch that uses seq_op instead of seq_file ?
I am willing to do it.

> Joel
>
--

----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- [email protected] | .......you may get it.
----------------------------------------------------------------------


2006-10-19 18:42:56

by Joel Becker

[permalink] [raw]
Subject: Re: [PATCH 0/5] Allow more than PAGESIZE data read in configfs

On Tue, Oct 17, 2006 at 05:55:28PM -0700, Chandra Seetharaman wrote:
> On Mon, 2006-10-16 at 16:09 -0700, Joel Becker wrote:
> > Yes, if we provide a vector type in any way, it should be via
> > seq_operations or something like it. If we're going to use seq_file, we
> > should use it correctly.
> >
>
> So, you want me to make a patch that uses seq_op instead of seq_file ?
> I am willing to do it.

I'd like to see a patch that enforces vectorization and nothing
else, if we were to do anything.

Joel


--

"When arrows don't penetrate, see...
Cupid grabs the pistol."

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127