2009-10-20 05:47:39

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 0/3] Allow sysfs to be dynamically populated


The first patch in this series is just a bugfix.

The second and third patches offer progressively more flexibility to
dynamically construct the contents of sysfs directories. The second
patch delays the allocation of sysfs_dirents until the contents of
the directory is requested. This functionality is only used by the
attribute_groups in that patch, but could (in principle) be extended to
other directory providers.

The third patch extends this to permit attribute groups to be dynamically
constructed before the directory they're in is populated. I also add a
user for this; the much-requested ability to display the MSI-X vectors
in use by a device.

Both patches also deconstruct the directories / attribute groups when the
dentry of the directory goes out of cache.

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."


2009-10-20 05:49:44

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 1/3] Fix updating of named attribute groups


From: Matthew Wilcox <[email protected]>
Date: Sat, 17 Oct 2009 15:47:43 -0400
Subject: [PATCH 1/3] Fix updating of named attribute groups

The current code tries to create a subdirectory, even if the caller is
trying to update an existing group. If the caller wants to update a named
group, we need to call sysfs_get_dirent() which bumps the reference count
on the dirent, so move the sysfs_get() into the other two arms of the 'if'.

Signed-off-by: Matthew Wilcox <[email protected]>
---
fs/sysfs/group.c | 19 +++++++++++++------
1 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/fs/sysfs/group.c b/fs/sysfs/group.c
index fe61194..0c4d342 100644
--- a/fs/sysfs/group.c
+++ b/fs/sysfs/group.c
@@ -69,12 +69,19 @@ static int internal_create_group(struct kobject *kobj, int update,
return -EINVAL;

if (grp->name) {
- error = sysfs_create_subdir(kobj, grp->name, &sd);
- if (error)
- return error;
- } else
- sd = kobj->sd;
- sysfs_get(sd);
+ if (update) {
+ sd = sysfs_get_dirent(kobj->sd, grp->name);
+ if (!sd)
+ return -ENOENT;
+ } else {
+ error = sysfs_create_subdir(kobj, grp->name, &sd);
+ if (error)
+ return error;
+ sysfs_get(sd);
+ }
+ } else {
+ sd = sysfs_get(kobj->sd);
+ }
error = create_files(sd, kobj, grp, update);
if (error) {
if (grp->name)
--
1.6.3.3


--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2009-10-20 05:50:19

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 2/3] Sysfs: Allow directories to be populated dynamically


From: Matthew Wilcox <[email protected]>
Date: Mon, 19 Oct 2009 01:02:38 -0400
Subject: [PATCH 2/3] Sysfs: Allow directories to be populated dynamically

Use function pointers to populate and depopulate sysfs directories with
dynamically files. Include one user, the attribute groups.
---
fs/sysfs/dir.c | 49 ++++++++++++++++-------
fs/sysfs/group.c | 115 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
fs/sysfs/sysfs.h | 5 ++
3 files changed, 150 insertions(+), 19 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 5fad489..e32a11d 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -294,9 +294,13 @@ void release_sysfs_dirent(struct sysfs_dirent * sd)
goto repeat;
}

-static void sysfs_d_iput(struct dentry * dentry, struct inode * inode)
+static void sysfs_d_iput(struct dentry *dentry, struct inode *inode)
{
- struct sysfs_dirent * sd = dentry->d_fsdata;
+ struct sysfs_dirent *sd = dentry->d_fsdata;
+
+ if ((sysfs_type(sd) == SYSFS_DIR) && sd->s_dir.depopulate)
+ sd->s_dir.depopulate(dentry, sd);
+ sd->s_flags &= ~SYSFS_FLAG_POPULATED;

sysfs_put(sd);
iput(inode);
@@ -574,6 +578,22 @@ repeat:
iput(inode);
}

+void sysfs_kill_removed_dirents(struct sysfs_addrm_cxt *acxt)
+{
+ /* kill removed sysfs_dirents */
+ while (acxt->removed) {
+ struct sysfs_dirent *sd = acxt->removed;
+
+ acxt->removed = sd->s_sibling;
+ sd->s_sibling = NULL;
+
+ sysfs_drop_dentry(sd);
+ sysfs_deactivate(sd);
+ unmap_bin_file(sd);
+ sysfs_put(sd);
+ }
+}
+
/**
* sysfs_addrm_finish - finish up sysfs_dirent add/remove
* @acxt: addrm context to finish up
@@ -600,18 +620,7 @@ void sysfs_addrm_finish(struct sysfs_addrm_cxt *acxt)
iput(inode);
}

- /* kill removed sysfs_dirents */
- while (acxt->removed) {
- struct sysfs_dirent *sd = acxt->removed;
-
- acxt->removed = sd->s_sibling;
- sd->s_sibling = NULL;
-
- sysfs_drop_dentry(sd);
- sysfs_deactivate(sd);
- unmap_bin_file(sd);
- sysfs_put(sd);
- }
+ sysfs_kill_removed_dirents(acxt);
}

/**
@@ -731,6 +740,17 @@ static struct dentry * sysfs_lookup(struct inode *dir, struct dentry *dentry,

mutex_lock(&sysfs_mutex);

+ if (!(parent_sd->s_flags & SYSFS_FLAG_POPULATED)) {
+ if (parent_sd->s_dir.populate) {
+ int err = parent_sd->s_dir.populate(dentry->d_parent,
+ parent_sd);
+ if (err) {
+ ret = ERR_PTR(err);
+ goto out_unlock;
+ }
+ }
+ parent_sd->s_flags |= SYSFS_FLAG_POPULATED;
+ }
sd = sysfs_find_dirent(parent_sd, dentry->d_name.name);

/* no such entry */
@@ -1012,7 +1032,6 @@ static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir)
return 0;
}

-
const struct file_operations sysfs_dir_operations = {
.read = generic_read_dir,
.readdir = sysfs_readdir,
diff --git a/fs/sysfs/group.c b/fs/sysfs/group.c
index 0c4d342..37ac584 100644
--- a/fs/sysfs/group.c
+++ b/fs/sysfs/group.c
@@ -16,6 +16,102 @@
#include "sysfs.h"


+/*
+ * i_mutex is not held, but this inode is on its way out of the system, so
+ * nobody gets to mess with it. We can't take the sysfs_mutex here as it
+ * leads to a deadlock scenario where it's held in a path that can run
+ * reclaim, and this function can be called from reclaim context. I guess
+ * prayer is the only solution here (other than splitting sysfs_mutex)
+ */
+void sysfs_depopulate_group(struct dentry *dentry,
+ const struct attribute_group *grp)
+
+{
+ struct sysfs_dirent *dir_sd = dentry->d_fsdata;
+ struct sysfs_addrm_cxt acxt;
+ struct attribute **attrp;
+
+ memset(&acxt, 0, sizeof(acxt));
+ acxt.parent_sd = dir_sd;
+ acxt.parent_inode = dentry->d_inode;
+
+ for (attrp = grp->attrs; *attrp; attrp++) {
+ struct attribute *attr = *attrp;
+ struct sysfs_dirent *sd;
+
+ sd = sysfs_find_dirent(dir_sd, attr->name);
+ if (sd)
+ sysfs_remove_one(&acxt, sd);
+ }
+
+ sysfs_kill_removed_dirents(&acxt);
+}
+EXPORT_SYMBOL(sysfs_depopulate_group);
+
+/*
+ * inode->i_mutex is held by the VFS, and sysfs_mutex is held by
+ * sysfs_lookup, so there's no need to call sysfs_addrm_start/finish here.
+ * Nor is there a need to mess around with reference counts.
+ */
+int sysfs_populate_group(struct dentry *dentry,
+ const struct attribute_group *grp)
+
+{
+ struct sysfs_dirent *dir_sd = dentry->d_fsdata;
+ struct kobject *kobj = dir_sd->s_dir.kobj;
+ struct sysfs_addrm_cxt acxt;
+ struct attribute **attrp;
+ int i = 0, error = 0;
+
+ memset(&acxt, 0, sizeof(acxt));
+ acxt.parent_sd = dir_sd;
+ acxt.parent_inode = dentry->d_inode;
+
+ for (attrp = grp->attrs; *attrp; attrp++) {
+ struct attribute *attr = *attrp;
+ mode_t mode = attr->mode;
+ struct sysfs_dirent *sd;
+
+ if (grp->is_visible) {
+ mode_t vis = grp->is_visible(kobj, attr, i++);
+ if (!vis)
+ continue;
+ mode |= vis;
+ }
+
+ sd = sysfs_new_dirent(attr->name, mode, SYSFS_KOBJ_ATTR);
+ if (!sd) {
+ error = -ENOMEM;
+ break;
+ }
+ sd->s_attr.attr = (void *)attr;
+
+ error = sysfs_add_one(&acxt, sd);
+ if (error) {
+ sysfs_put(sd);
+ break;
+ }
+ }
+ if (error)
+ sysfs_depopulate_group(dentry, grp);
+ return error;
+}
+EXPORT_SYMBOL(sysfs_populate_group);
+
+static
+int group_populate(struct dentry *dentry, struct sysfs_dirent *sd)
+{
+ struct attribute_group *grp = sd->s_dir.data;
+ return sysfs_populate_group(dentry, grp);
+}
+
+static
+void group_depopulate(struct dentry *dentry, struct sysfs_dirent *sd)
+{
+ struct attribute_group *grp = sd->s_dir.data;
+ sysfs_depopulate_group(dentry, grp);
+}
+
static void remove_files(struct sysfs_dirent *dir_sd, struct kobject *kobj,
const struct attribute_group *grp)
{
@@ -32,6 +128,10 @@ static int create_files(struct sysfs_dirent *dir_sd, struct kobject *kobj,
struct attribute *const* attr;
int error = 0, i;

+ /* Directory isn't currently instantiated; nothing to do */
+ if (!(dir_sd->s_flags & SYSFS_FLAG_POPULATED))
+ return 0;
+
for (i = 0, attr = grp->attrs; *attr && !error; i++, attr++) {
mode_t mode = 0;

@@ -55,7 +155,6 @@ static int create_files(struct sysfs_dirent *dir_sd, struct kobject *kobj,
return error;
}

-
static int internal_create_group(struct kobject *kobj, int update,
const struct attribute_group *grp)
{
@@ -82,10 +181,18 @@ static int internal_create_group(struct kobject *kobj, int update,
} else {
sd = sysfs_get(kobj->sd);
}
- error = create_files(sd, kobj, grp, update);
- if (error) {
- if (grp->name)
+
+ error = 0;
+ if (update) {
+ error = create_files(sd, kobj, grp, update);
+ if (error && grp->name)
sysfs_remove_subdir(sd);
+ } else if (!sd->s_dir.populate) {
+ sd->s_dir.populate = group_populate;
+ sd->s_dir.depopulate = group_depopulate;
+ sd->s_dir.data = (void *)grp;
+ } else if (sd->s_dir.populate != group_populate) {
+ error = -EEXIST;
}
sysfs_put(sd);
return error;
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index af4c4e7..13843fa 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -17,6 +17,9 @@ struct sysfs_elem_dir {
struct kobject *kobj;
/* children list starts here and goes through sd->s_sibling */
struct sysfs_dirent *children;
+ int (*populate)(struct dentry *, struct sysfs_dirent *);
+ void (*depopulate)(struct dentry *, struct sysfs_dirent *);
+ void *data;
};

struct sysfs_elem_symlink {
@@ -77,6 +80,7 @@ struct sysfs_dirent {
#define SYSFS_COPY_NAME (SYSFS_DIR | SYSFS_KOBJ_LINK)

#define SYSFS_FLAG_MASK ~SYSFS_TYPE_MASK
+#define SYSFS_FLAG_POPULATED 0x0100
#define SYSFS_FLAG_REMOVED 0x0200

static inline unsigned int sysfs_type(struct sysfs_dirent *sd)
@@ -120,6 +124,7 @@ int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd);
int sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd);
void sysfs_remove_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd);
void sysfs_addrm_finish(struct sysfs_addrm_cxt *acxt);
+void sysfs_kill_removed_dirents(struct sysfs_addrm_cxt *acxt);

struct sysfs_dirent *sysfs_find_dirent(struct sysfs_dirent *parent_sd,
const unsigned char *name);
--
1.6.3.3

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2009-10-20 05:50:53

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 3/3] Expose MSI-X interrupts through a dynamically generated sysfs directory


From: Matthew Wilcox <[email protected]>
Date: Mon, 19 Oct 2009 01:35:41 -0400
Subject: [PATCH 3/3] Expose MSI-X interrupts through a dynamically generated sysfs directory

Introduce the ability to dynamically generate the attributes (which are
then added to sysfs). Add a user in the form of the PCI MSI code, which
was why I started on this in the first place.
---
drivers/pci/msi.c | 77 ++++++++++++++++++++++++++++++++++++++++++++++--
fs/sysfs/group.c | 7 ++++-
include/linux/pci.h | 2 +
include/linux/sysfs.h | 9 ++++++
4 files changed, 90 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index f9cf317..e0971e6 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -275,6 +275,9 @@ static void free_msi_irqs(struct pci_dev *dev)
{
struct msi_desc *entry, *tmp;

+ if (dev->msix_dir.name)
+ sysfs_remove_group(&dev->dev.kobj, &dev->msix_dir);
+
list_for_each_entry(entry, &dev->msi_list, list) {
int i, nvec;
if (!entry->irq)
@@ -447,13 +450,12 @@ static void __iomem *msix_map_region(struct pci_dev *dev, unsigned pos,
}

static int msix_setup_entries(struct pci_dev *dev, unsigned pos,
- void __iomem *base, struct msix_entry *entries,
- int nvec)
+ void __iomem *base, struct msix_entry *entries)
{
struct msi_desc *entry;
int i;

- for (i = 0; i < nvec; i++) {
+ for (i = 0; i < dev->nr_irqs; i++) {
entry = alloc_msi_entry(dev);
if (!entry) {
if (!i)
@@ -495,6 +497,64 @@ static void msix_program_entries(struct pci_dev *dev,
}
}

+struct pci_msix_attribute {
+ struct device_attribute da;
+ unsigned irq;
+ char name[8]; /* current max is 5: "2047\0" */
+};
+
+static ssize_t
+pci_msix_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+ struct pci_msix_attribute *msix_attr;
+ msix_attr = container_of(attr, struct pci_msix_attribute, da);
+ return snprintf(buf, PAGE_SIZE, "%u\n", msix_attr->irq);
+}
+
+#define kobj_to_pci_dev(obj) to_pci_dev(container_of(obj, struct device, kobj))
+
+static int msix_populate(struct dentry *dentry, struct attribute_group *grp)
+{
+ struct pci_dev *pdev = container_of(grp, struct pci_dev, msix_dir);
+ unsigned i, nr_irqs = pdev->nr_irqs;
+ struct pci_msix_attribute *attr;
+ struct attribute **array;
+ struct msi_desc *desc;
+
+ array = kmalloc((nr_irqs + 1) * sizeof(void *), GFP_KERNEL);
+ if (!array)
+ return -ENOMEM;
+ attr = kmalloc(nr_irqs * sizeof(*attr), GFP_KERNEL);
+ if (!attr)
+ return -ENOMEM;
+
+ for (i = 0; i < nr_irqs; i++)
+ array[i] = &attr[i].da.attr;
+ array[i] = NULL;
+ grp->attrs = array;
+
+ list_for_each_entry(desc, &pdev->msi_list, list) {
+ attr->irq = desc->irq;
+ snprintf(attr->name, sizeof(attr->name), "%u",
+ desc->msi_attrib.entry_nr);
+ attr->da.attr.name = (const char *)&attr->name;
+ attr->da.attr.mode = 0444;
+ attr->da.show = pci_msix_show;
+ attr++;
+ }
+
+ return sysfs_populate_group(dentry, grp);
+}
+
+static void msix_depopulate(struct dentry *dentry, struct attribute_group *grp)
+{
+ sysfs_depopulate_group(dentry, grp);
+
+ kfree(grp->attrs[0]);
+ kfree(grp->attrs);
+ grp->attrs = NULL;
+}
+
/**
* msix_capability_init - configure device's MSI-X capability
* @dev: pointer to the pci_dev data structure of MSI-X device function
@@ -524,10 +584,19 @@ static int msix_capability_init(struct pci_dev *dev,
if (!base)
return -ENOMEM;

- ret = msix_setup_entries(dev, pos, base, entries, nvec);
+ dev->nr_irqs = nvec;
+
+ dev->msix_dir.name = "irqs";
+ dev->msix_dir.populate = msix_populate;
+ dev->msix_dir.depopulate = msix_depopulate;
+ ret = sysfs_create_group(&dev->dev.kobj, &dev->msix_dir);
if (ret)
return ret;

+ ret = msix_setup_entries(dev, pos, base, entries);
+ if (ret)
+ goto error;
+
ret = arch_setup_msi_irqs(dev, nvec, PCI_CAP_ID_MSIX);
if (ret)
goto error;
diff --git a/fs/sysfs/group.c b/fs/sysfs/group.c
index 37ac584..c97139f 100644
--- a/fs/sysfs/group.c
+++ b/fs/sysfs/group.c
@@ -102,6 +102,8 @@ static
int group_populate(struct dentry *dentry, struct sysfs_dirent *sd)
{
struct attribute_group *grp = sd->s_dir.data;
+ if (grp->populate)
+ return grp->populate(dentry, grp);
return sysfs_populate_group(dentry, grp);
}

@@ -109,7 +111,10 @@ static
void group_depopulate(struct dentry *dentry, struct sysfs_dirent *sd)
{
struct attribute_group *grp = sd->s_dir.data;
- sysfs_depopulate_group(dentry, grp);
+ if (grp->depopulate)
+ grp->depopulate(dentry, grp);
+ else
+ sysfs_depopulate_group(dentry, grp);
}

static void remove_files(struct sysfs_dirent *dir_sd, struct kobject *kobj,
diff --git a/include/linux/pci.h b/include/linux/pci.h
index f5c7cd3..47c7fc6 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -291,6 +291,8 @@ struct pci_dev {
struct bin_attribute *res_attr_wc[DEVICE_COUNT_RESOURCE]; /* sysfs file for WC mapping of resources */
#ifdef CONFIG_PCI_MSI
struct list_head msi_list;
+ struct attribute_group msix_dir;
+ unsigned short nr_irqs;
#endif
struct pci_vpd *vpd;
#ifdef CONFIG_PCI_IOV
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index 9d68fed..cf9d200 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -19,6 +19,7 @@

struct kobject;
struct module;
+struct dentry;

/* FIXME
* The *owner field is no longer used.
@@ -36,6 +37,10 @@ struct attribute_group {
mode_t (*is_visible)(struct kobject *,
struct attribute *, int);
struct attribute **attrs;
+ int (*populate)(struct dentry *,
+ struct attribute_group *);
+ void (*depopulate)(struct dentry *,
+ struct attribute_group *);
};


@@ -115,6 +120,10 @@ int sysfs_update_group(struct kobject *kobj,
const struct attribute_group *grp);
void sysfs_remove_group(struct kobject *kobj,
const struct attribute_group *grp);
+int sysfs_populate_group(struct dentry *dentry,
+ const struct attribute_group *grp);
+void sysfs_depopulate_group(struct dentry *dentry,
+ const struct attribute_group *grp);
int sysfs_add_file_to_group(struct kobject *kobj,
const struct attribute *attr, const char *group);
void sysfs_remove_file_from_group(struct kobject *kobj,
--
1.6.3.3

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2009-10-20 05:51:59

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 0/3] Allow sysfs to be dynamically populated

On Mon, Oct 19, 2009 at 11:47:40PM -0600, Matthew Wilcox wrote:
> The second and third patches offer progressively more flexibility to
> dynamically construct the contents of sysfs directories. The second
> patch delays the allocation of sysfs_dirents until the contents of
> the directory is requested. This functionality is only used by the
> attribute_groups in that patch, but could (in principle) be extended to
> other directory providers.
>
> The third patch extends this to permit attribute groups to be dynamically
> constructed before the directory they're in is populated. I also add a
> user for this; the much-requested ability to display the MSI-X vectors
> in use by a device.
>
> Both patches also deconstruct the directories / attribute groups when the
> dentry of the directory goes out of cache.

Oh, I meant to say, this is enough to get my laptop to boot. I couldn't
swear to the non-existance of races, or the missing of features ... lots
of care needed with this patch, IMO.

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2009-10-20 08:14:45

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH 3/3] Expose MSI-X interrupts through a dynamically generated sysfs directory

On Tue, Oct 20, 2009 at 1:50 PM, Matthew Wilcox <[email protected]> wrote:
>
> From: Matthew Wilcox <[email protected]>
> Date: Mon, 19 Oct 2009 01:35:41 -0400
> Subject: [PATCH 3/3] Expose MSI-X interrupts through a dynamically generated sysfs directory
>
> Introduce the ability to dynamically generate the attributes (which are
> then added to sysfs).  Add a user in the form of the PCI MSI code, which
> was why I started on this in the first place.
> ---
>  drivers/pci/msi.c     |   77 ++++++++++++++++++++++++++++++++++++++++++++++--
>  fs/sysfs/group.c      |    7 ++++-
>  include/linux/pci.h   |    2 +
>  include/linux/sysfs.h |    9 ++++++
>  4 files changed, 90 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index f9cf317..e0971e6 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -275,6 +275,9 @@ static void free_msi_irqs(struct pci_dev *dev)
>  {
>        struct msi_desc *entry, *tmp;
>
> +       if (dev->msix_dir.name)
> +               sysfs_remove_group(&dev->dev.kobj, &dev->msix_dir);
> +
>        list_for_each_entry(entry, &dev->msi_list, list) {
>                int i, nvec;
>                if (!entry->irq)
> @@ -447,13 +450,12 @@ static void __iomem *msix_map_region(struct pci_dev *dev, unsigned pos,
>  }
>
>  static int msix_setup_entries(struct pci_dev *dev, unsigned pos,
> -                               void __iomem *base, struct msix_entry *entries,
> -                               int nvec)
> +                               void __iomem *base, struct msix_entry *entries)
>  {
>        struct msi_desc *entry;
>        int i;
>
> -       for (i = 0; i < nvec; i++) {
> +       for (i = 0; i < dev->nr_irqs; i++) {
>                entry = alloc_msi_entry(dev);
>                if (!entry) {
>                        if (!i)
> @@ -495,6 +497,64 @@ static void msix_program_entries(struct pci_dev *dev,
>        }
>  }
>
> +struct pci_msix_attribute {
> +       struct device_attribute da;
> +       unsigned irq;
> +       char name[8];   /* current max is 5: "2047\0" */
> +};
> +
> +static ssize_t
> +pci_msix_show(struct device *dev, struct device_attribute *attr, char *buf)
> +{
> +       struct pci_msix_attribute *msix_attr;
> +       msix_attr = container_of(attr, struct pci_msix_attribute, da);
> +       return snprintf(buf, PAGE_SIZE, "%u\n", msix_attr->irq);
> +}
> +
> +#define kobj_to_pci_dev(obj) to_pci_dev(container_of(obj, struct device, kobj))


You define this, but no one uses it?

> +
> +static int msix_populate(struct dentry *dentry, struct attribute_group *grp)
> +{
> +       struct pci_dev *pdev = container_of(grp, struct pci_dev, msix_dir);
> +       unsigned i, nr_irqs = pdev->nr_irqs;
> +       struct pci_msix_attribute *attr;
> +       struct attribute **array;
> +       struct msi_desc *desc;
> +
> +       array = kmalloc((nr_irqs + 1) * sizeof(void *), GFP_KERNEL);
> +       if (!array)
> +               return -ENOMEM;
> +       attr = kmalloc(nr_irqs * sizeof(*attr), GFP_KERNEL);
> +       if (!attr)
> +               return -ENOMEM;

Here leaks memory allocated above.

> +
> +       for (i = 0; i < nr_irqs; i++)
> +               array[i] = &attr[i].da.attr;
> +       array[i] = NULL;
> +       grp->attrs = array;
> +
> +       list_for_each_entry(desc, &pdev->msi_list, list) {
> +               attr->irq = desc->irq;
> +               snprintf(attr->name, sizeof(attr->name), "%u",
> +                                               desc->msi_attrib.entry_nr);
> +               attr->da.attr.name = (const char *)&attr->name;
> +               attr->da.attr.mode = 0444;
> +               attr->da.show = pci_msix_show;
> +               attr++;
> +       }
> +
> +       return sysfs_populate_group(dentry, grp);
> +}
> +
> +static void msix_depopulate(struct dentry *dentry, struct attribute_group *grp)
> +{
> +       sysfs_depopulate_group(dentry, grp);
> +
> +       kfree(grp->attrs[0]);
> +       kfree(grp->attrs);
> +       grp->attrs = NULL;
> +}
> +
>  /**
>  * msix_capability_init - configure device's MSI-X capability
>  * @dev: pointer to the pci_dev data structure of MSI-X device function
> @@ -524,10 +584,19 @@ static int msix_capability_init(struct pci_dev *dev,
>        if (!base)
>                return -ENOMEM;
>
> -       ret = msix_setup_entries(dev, pos, base, entries, nvec);
> +       dev->nr_irqs = nvec;
> +
> +       dev->msix_dir.name = "irqs";
> +       dev->msix_dir.populate = msix_populate;
> +       dev->msix_dir.depopulate = msix_depopulate;
> +       ret = sysfs_create_group(&dev->dev.kobj, &dev->msix_dir);
>        if (ret)
>                return ret;
>
> +       ret = msix_setup_entries(dev, pos, base, entries);
> +       if (ret)
> +               goto error;
> +
>        ret = arch_setup_msi_irqs(dev, nvec, PCI_CAP_ID_MSIX);
>        if (ret)
>                goto error;
> diff --git a/fs/sysfs/group.c b/fs/sysfs/group.c
> index 37ac584..c97139f 100644
> --- a/fs/sysfs/group.c
> +++ b/fs/sysfs/group.c
> @@ -102,6 +102,8 @@ static
>  int group_populate(struct dentry *dentry, struct sysfs_dirent *sd)
>  {
>        struct attribute_group *grp = sd->s_dir.data;
> +       if (grp->populate)
> +               return grp->populate(dentry, grp);
>        return sysfs_populate_group(dentry, grp);
>  }
>
> @@ -109,7 +111,10 @@ static
>  void group_depopulate(struct dentry *dentry, struct sysfs_dirent *sd)
>  {
>        struct attribute_group *grp = sd->s_dir.data;
> -       sysfs_depopulate_group(dentry, grp);
> +       if (grp->depopulate)
> +               grp->depopulate(dentry, grp);
> +       else
> +               sysfs_depopulate_group(dentry, grp);
>  }
>
>  static void remove_files(struct sysfs_dirent *dir_sd, struct kobject *kobj,
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index f5c7cd3..47c7fc6 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -291,6 +291,8 @@ struct pci_dev {
>        struct bin_attribute *res_attr_wc[DEVICE_COUNT_RESOURCE]; /* sysfs file for WC mapping of resources */
>  #ifdef CONFIG_PCI_MSI
>        struct list_head msi_list;
> +       struct attribute_group msix_dir;
> +       unsigned short nr_irqs;
>  #endif
>        struct pci_vpd *vpd;
>  #ifdef CONFIG_PCI_IOV
> diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
> index 9d68fed..cf9d200 100644
> --- a/include/linux/sysfs.h
> +++ b/include/linux/sysfs.h
> @@ -19,6 +19,7 @@
>
>  struct kobject;
>  struct module;
> +struct dentry;
>
>  /* FIXME
>  * The *owner field is no longer used.
> @@ -36,6 +37,10 @@ struct attribute_group {
>        mode_t                  (*is_visible)(struct kobject *,
>                                              struct attribute *, int);
>        struct attribute        **attrs;
> +       int                     (*populate)(struct dentry *,
> +                                               struct attribute_group *);
> +       void                    (*depopulate)(struct dentry *,
> +                                               struct attribute_group *);
>  };
>
>
> @@ -115,6 +120,10 @@ int sysfs_update_group(struct kobject *kobj,
>                       const struct attribute_group *grp);
>  void sysfs_remove_group(struct kobject *kobj,
>                        const struct attribute_group *grp);
> +int sysfs_populate_group(struct dentry *dentry,
> +                       const struct attribute_group *grp);
> +void sysfs_depopulate_group(struct dentry *dentry,
> +                       const struct attribute_group *grp);
>  int sysfs_add_file_to_group(struct kobject *kobj,
>                        const struct attribute *attr, const char *group);
>  void sysfs_remove_file_from_group(struct kobject *kobj,
> --
> 1.6.3.3
>
> --
> Matthew Wilcox                          Intel Open Source Technology Centre
> "Bill, look, we understand that you're interested in selling us this
> operating system, but compare it to ours.  We can't possibly take such
> a retrograde step."
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

2009-10-20 08:26:00

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 3/3] Expose MSI-X interrupts through a dynamically generated sysfs directory


I'll note that one of the complaints at the kernel summit has been
that people have lost the art of trimming mail they're replying to.
There's a proposal to filter mail like the one you sent as spam.

On Tue, Oct 20, 2009 at 04:14:44PM +0800, Am??rico Wang wrote:
> On Tue, Oct 20, 2009 at 1:50 PM, Matthew Wilcox <[email protected]> wrote:
> > +#define kobj_to_pci_dev(obj) to_pci_dev(container_of(obj, struct device, kobj))
>
> You define this, but no one uses it?

Yeah, left over from an earlier version of the patch. Thanks, I'll
remove it.

> > + ?? ?? ?? array = kmalloc((nr_irqs + 1) * sizeof(void *), GFP_KERNEL);
> > + ?? ?? ?? if (!array)
> > + ?? ?? ?? ?? ?? ?? ?? return -ENOMEM;
> > + ?? ?? ?? attr = kmalloc(nr_irqs * sizeof(*attr), GFP_KERNEL);
> > + ?? ?? ?? if (!attr)
> > + ?? ?? ?? ?? ?? ?? ?? return -ENOMEM;
>
> Here leaks memory allocated above.

Oh, yeah, duh, thanks.

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2009-10-27 17:57:50

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 3/3] Expose MSI-X interrupts through a dynamically generated sysfs directory

On Tue, Oct 20, 2009 at 02:26:02AM -0600, Matthew Wilcox wrote:
> On Tue, Oct 20, 2009 at 04:14:44PM +0800, Am??rico Wang wrote:
> > On Tue, Oct 20, 2009 at 1:50 PM, Matthew Wilcox <[email protected]> wrote:
> > > +#define kobj_to_pci_dev(obj) to_pci_dev(container_of(obj, struct device, kobj))
> >
> > You define this, but no one uses it?
>
> Yeah, left over from an earlier version of the patch. Thanks, I'll
> remove it.
>
> > > + ?? ?? ?? array = kmalloc((nr_irqs + 1) * sizeof(void *), GFP_KERNEL);
> > > + ?? ?? ?? if (!array)
> > > + ?? ?? ?? ?? ?? ?? ?? return -ENOMEM;
> > > + ?? ?? ?? attr = kmalloc(nr_irqs * sizeof(*attr), GFP_KERNEL);
> > > + ?? ?? ?? if (!attr)
> > > + ?? ?? ?? ?? ?? ?? ?? return -ENOMEM;
> >
> > Here leaks memory allocated above.
>
> Oh, yeah, duh, thanks.

Care to resend this series with these fixes?

thanks,

greg k-h

2009-10-29 15:57:56

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 1/3] Fix updating of named attribute groups

Matthew Wilcox wrote:
> + if (update) {
> + sd = sysfs_get_dirent(kobj->sd, grp->name);
> + if (!sd)
> + return -ENOENT;
> + } else {
> + error = sysfs_create_subdir(kobj, grp->name, &sd);
> + if (error)
> + return error;
> + sysfs_get(sd);
> + }
> + } else {
> + sd = sysfs_get(kobj->sd);
> + }

nitpick: the last braces aren't necessary.

--
tejun

2009-10-29 16:19:06

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 1/3] Fix updating of named attribute groups

On Thu, Oct 29, 2009 at 04:59:02PM +0100, Tejun Heo wrote:
> Matthew Wilcox wrote:
> > + if (update) {
> > + sd = sysfs_get_dirent(kobj->sd, grp->name);
> > + if (!sd)
> > + return -ENOENT;
> > + } else {
> > + error = sysfs_create_subdir(kobj, grp->name, &sd);
> > + if (error)
> > + return error;
> > + sysfs_get(sd);
> > + }
> > + } else {
> > + sd = sysfs_get(kobj->sd);
> > + }
>
> nitpick: the last braces aren't necessary.

But are more aesthetically pleasing.

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2009-10-29 16:18:49

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 2/3] Sysfs: Allow directories to be populated dynamically

Hello,

> diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
> index af4c4e7..13843fa 100644
> --- a/fs/sysfs/sysfs.h
> +++ b/fs/sysfs/sysfs.h
> @@ -17,6 +17,9 @@ struct sysfs_elem_dir {
> struct kobject *kobj;
> /* children list starts here and goes through sd->s_sibling */
> struct sysfs_dirent *children;
> + int (*populate)(struct dentry *, struct sysfs_dirent *);
> + void (*depopulate)(struct dentry *, struct sysfs_dirent *);
> + void *data;

This will increase the size of struct sysfs_dirent by three pointers
which is considerable. Bloating the size of sysfs_dirent can waste
large amount of memory on machines with a lot of disks.

The implementation looks quite scary to me. Is this the only way to
do this? It it because trying to create individual entries for msix
will end up creating too many sysfs entries? If so, how many are we
talking about?

Thanks.

--
tejun

2009-10-29 16:21:42

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 2/3] Sysfs: Allow directories to be populated dynamically

On Thu, Oct 29, 2009 at 05:20:00PM +0100, Tejun Heo wrote:
> This will increase the size of struct sysfs_dirent by three pointers
> which is considerable. Bloating the size of sysfs_dirent can waste
> large amount of memory on machines with a lot of disks.

No it won't. It's in a union with sysfs_inode_attrs which contains a
struct iattr, which is at least 52 bytes.

> The implementation looks quite scary to me. Is this the only way to
> do this? It it because trying to create individual entries for msix
> will end up creating too many sysfs entries? If so, how many are we
> talking about?

While that's the original motivation, shrinking the amount of memory
taken by sysfs overall is a worthwhile achievement, don't you think?

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2009-10-29 16:22:52

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 1/3] Fix updating of named attribute groups

Hello,

Matthew Wilcox wrote:
>>> + } else {
>>> + sd = sysfs_get(kobj->sd);
>>> + }
>> nitpick: the last braces aren't necessary.
>
> But are more aesthetically pleasing.

Hmmm... indeed, CodingStyle says

Do not unnecessarily use braces where a single statement will do.

if (condition)
action();

This does not apply if one branch of a conditional statement is a single
statement. Use braces in both branches.

if (condition) {
do_this();
do_that();
} else {
otherwise();
}

So, please ignore my previous nitpick.

Thanks.

--
tejun

2009-10-29 16:27:06

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 2/3] Sysfs: Allow directories to be populated dynamically

Matthew Wilcox wrote:
> On Thu, Oct 29, 2009 at 05:20:00PM +0100, Tejun Heo wrote:
>> This will increase the size of struct sysfs_dirent by three pointers
>> which is considerable. Bloating the size of sysfs_dirent can waste
>> large amount of memory on machines with a lot of disks.
>
> No it won't. It's in a union with sysfs_inode_attrs which contains a
> struct iattr, which is at least 52 bytes.

struct sysfs_dirent {
atomic_t s_count;
atomic_t s_active;
struct sysfs_dirent *s_parent;
struct sysfs_dirent *s_sibling;
const char *s_name;

union {
struct sysfs_elem_dir s_dir;
struct sysfs_elem_symlink s_symlink;
struct sysfs_elem_attr s_attr;
struct sysfs_elem_bin_attr s_bin_attr;
};

unsigned int s_flags;
ino_t s_ino;
umode_t s_mode;
struct sysfs_inode_attrs *s_iattr;
^^
};

>> The implementation looks quite scary to me. Is this the only way to
>> do this? It it because trying to create individual entries for msix
>> will end up creating too many sysfs entries? If so, how many are we
>> talking about?
>
> While that's the original motivation, shrinking the amount of memory
> taken by sysfs overall is a worthwhile achievement, don't you think?

It feels a bit too convoluted to me. sysfs is already pretty
convoluted and adding yet more convolution would require pretty good
justification, so I'm curious about the numbers.

Thanks.

--
tejun

2009-10-29 19:24:12

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 2/3] Sysfs: Allow directories to be populated dynamically

On Thu, Oct 29, 2009 at 05:28:14PM +0100, Tejun Heo wrote:
> struct sysfs_dirent {
> atomic_t s_count;
> atomic_t s_active;
> struct sysfs_dirent *s_parent;
> struct sysfs_dirent *s_sibling;
> const char *s_name;
>
> union {
> struct sysfs_elem_dir s_dir;
> struct sysfs_elem_symlink s_symlink;
> struct sysfs_elem_attr s_attr;
> struct sysfs_elem_bin_attr s_bin_attr;
> };
>
> unsigned int s_flags;
> ino_t s_ino;
> umode_t s_mode;
> struct sysfs_inode_attrs *s_iattr;
> ^^

Oh, ouch. That does change the calculus somewhat.

> It feels a bit too convoluted to me. sysfs is already pretty
> convoluted and adding yet more convolution would require pretty good
> justification, so I'm curious about the numbers.

It is convoluted. The advantage of this is that we get to create many
fewer dirents. I wonder if we can do away with the dirents entirely, and
have dentries constructed dynamically instead.

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2009-10-30 10:16:13

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 2/3] Sysfs: Allow directories to be populated dynamically

Hello, Matthew.

Matthew Wilcox wrote:
>> It feels a bit too convoluted to me. sysfs is already pretty
>> convoluted and adding yet more convolution would require pretty good
>> justification, so I'm curious about the numbers.
>
> It is convoluted. The advantage of this is that we get to create many
> fewer dirents. I wonder if we can do away with the dirents entirely, and
> have dentries constructed dynamically instead.

I think it's a bit misdirected. It's probably better to try to reduce
the size of struct sysfs_dirent by tightly packing them and moving out
dynamic part of the data structure into a separate one which is only
allocated while the node is being accessed. The thing is that large
number of the nodes would require struct sysfs_dirent anyway, so it
would be much more benficial and less convoluted to diet the whole
thing.

In this particular case, the trade off is actually much worse because
sysfs_dirent is being increased but the only one which is seeing any
kind of memory usage drop is the new msi-x code. Given that there can
be only a handful of msi-x controllers even in a fairly large system,
I don't think memory usage will be reduced in any meaningful way even
on affected systems and if you think about large systems with
thousands and tens of thousands block devices, the bloat in
sysfs_dirent will waste a lot of memory.

Thanks.

--
tejun

2009-10-30 11:14:50

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 2/3] Sysfs: Allow directories to be populated dynamically

On Fri, Oct 30, 2009 at 11:17:22AM +0100, Tejun Heo wrote:
> Matthew Wilcox wrote:
> > It is convoluted. The advantage of this is that we get to create many
> > fewer dirents. I wonder if we can do away with the dirents entirely, and
> > have dentries constructed dynamically instead.
>
> I think it's a bit misdirected. It's probably better to try to reduce
> the size of struct sysfs_dirent by tightly packing them and moving out
> dynamic part of the data structure into a separate one which is only
> allocated while the node is being accessed. The thing is that large
> number of the nodes would require struct sysfs_dirent anyway, so it
> would be much more benficial and less convoluted to diet the whole
> thing.
>
> In this particular case, the trade off is actually much worse because
> sysfs_dirent is being increased but the only one which is seeing any
> kind of memory usage drop is the new msi-x code. Given that there can
> be only a handful of msi-x controllers even in a fairly large system,
> I don't think memory usage will be reduced in any meaningful way even
> on affected systems and if you think about large systems with
> thousands and tens of thousands block devices, the bloat in
> sysfs_dirent will waste a lot of memory.

Every attribute sees a drop in memory usage. I think I saw about 30%
fewer dirents created on boot with this laptop. Why not try this patch
on one of your machines and see the difference?

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2009-10-30 16:05:35

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 2/3] Sysfs: Allow directories to be populated dynamically

Hello,

Matthew Wilcox wrote:
>> In this particular case, the trade off is actually much worse because
>> sysfs_dirent is being increased but the only one which is seeing any
>> kind of memory usage drop is the new msi-x code. Given that there can
>> be only a handful of msi-x controllers even in a fairly large system,
>> I don't think memory usage will be reduced in any meaningful way even
>> on affected systems and if you think about large systems with
>> thousands and tens of thousands block devices, the bloat in
>> sysfs_dirent will waste a lot of memory.
>
> Every attribute sees a drop in memory usage. I think I saw about 30%
> fewer dirents created on boot with this laptop. Why not try this patch
> on one of your machines and see the difference?

Heh... Sorry, I'm still travelling so testing is a bit difficult but
even if you reduce the number of dirents by 30% if you add three
pointers worth of memory to each sysfs_dirent, I don't think it will
amount to any meaningful amount of memory usage drop and given that
sysfs_dirent can be reduced further in size with some tricks, I think
that would be the better way to reduce memory footprint.

Thanks.

--
tejun