2012-08-16 18:22:53

by Aristeu Rozanski

[permalink] [raw]
Subject: [PATCH v6 3/4] cgroup: add xattr support

From: Li Zefan <[email protected]>

This is one of the items in the plumber's wish list.

For use cases:

>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart

v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support

Cc: Li Zefan <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Lennart Poettering <[email protected]>
Signed-off-by: Li Zefan <[email protected]>
Signed-off-by: Aristeu Rozanski <[email protected]>

---
include/linux/cgroup.h | 13 ++++--
kernel/cgroup.c | 100 +++++++++++++++++++++++++++++++++++++++++++++----
2 files changed, 103 insertions(+), 10 deletions(-)

Index: github/include/linux/cgroup.h
===================================================================
--- github.orig/include/linux/cgroup.h 2012-08-16 10:24:50.000000000 -0400
+++ github/include/linux/cgroup.h 2012-08-16 10:27:53.975223786 -0400
@@ -17,6 +17,7 @@
#include <linux/rwsem.h>
#include <linux/idr.h>
#include <linux/workqueue.h>
+#include <linux/xattr.h>

#ifdef CONFIG_CGROUPS

@@ -216,6 +217,9 @@
/* List of events which userspace want to receive */
struct list_head event_list;
spinlock_t event_list_lock;
+
+ /* directory xattrs */
+ struct simple_xattrs xattrs;
};

/*
@@ -309,6 +313,9 @@
/* CFTYPE_* flags */
unsigned int flags;

+ /* file xattrs */
+ struct simple_xattrs xattrs;
+
int (*open)(struct inode *inode, struct file *file);
ssize_t (*read)(struct cgroup *cgrp, struct cftype *cft,
struct file *file,
@@ -394,7 +401,7 @@
*/
struct cftype_set {
struct list_head node; /* chained at subsys->cftsets */
- const struct cftype *cfts;
+ struct cftype *cfts;
};

struct cgroup_scanner {
@@ -406,8 +413,8 @@
void *data;
};

-int cgroup_add_cftypes(struct cgroup_subsys *ss, const struct cftype *cfts);
-int cgroup_rm_cftypes(struct cgroup_subsys *ss, const struct cftype *cfts);
+int cgroup_add_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
+int cgroup_rm_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);

int cgroup_is_removed(const struct cgroup *cgrp);

Index: github/kernel/cgroup.c
===================================================================
--- github.orig/kernel/cgroup.c 2012-08-16 10:27:45.000000000 -0400
+++ github/kernel/cgroup.c 2012-08-16 11:10:37.470765933 -0400
@@ -276,7 +276,8 @@

/* bits in struct cgroupfs_root flags field */
enum {
- ROOT_NOPREFIX, /* mounted subsystems have no named prefix */
+ ROOT_NOPREFIX, /* mounted subsystems have no named prefix */
+ ROOT_XATTR, /* supports extended attributes */
};

static int cgroup_is_releasable(const struct cgroup *cgrp)
@@ -913,15 +914,19 @@
*/
BUG_ON(!list_empty(&cgrp->pidlists));

+ simple_xattrs_free(&cgrp->xattrs);
+
kfree_rcu(cgrp, rcu_head);
} else {
struct cfent *cfe = __d_cfe(dentry);
struct cgroup *cgrp = dentry->d_parent->d_fsdata;
+ struct cftype *cft = cfe->type;

WARN_ONCE(!list_empty(&cfe->node) &&
cgrp != &cgrp->root->top_cgroup,
"cfe still linked for %s\n", cfe->type->name);
kfree(cfe);
+ simple_xattrs_free(&cft->xattrs);
}
iput(inode);
}
@@ -1140,6 +1145,8 @@
seq_printf(seq, ",%s", ss->name);
if (test_bit(ROOT_NOPREFIX, &root->flags))
seq_puts(seq, ",noprefix");
+ if (test_bit(ROOT_XATTR, &root->flags))
+ seq_puts(seq, ",xattr");
if (strlen(root->release_agent_path))
seq_printf(seq, ",release_agent=%s", root->release_agent_path);
if (clone_children(&root->top_cgroup))
@@ -1208,6 +1215,10 @@
opts->clone_children = true;
continue;
}
+ if (!strcmp(token, "xattr")) {
+ set_bit(ROOT_XATTR, &opts->flags);
+ continue;
+ }
if (!strncmp(token, "release_agent=", 14)) {
/* Specifying two release agents is forbidden */
if (opts->release_agent)
@@ -1425,6 +1436,7 @@
mutex_init(&cgrp->pidlist_mutex);
INIT_LIST_HEAD(&cgrp->event_list);
spin_lock_init(&cgrp->event_list_lock);
+ simple_xattrs_init(&cgrp->xattrs);
}

static void init_cgroup_root(struct cgroupfs_root *root)
@@ -1769,6 +1781,8 @@
mutex_unlock(&cgroup_root_mutex);
mutex_unlock(&cgroup_mutex);

+ simple_xattrs_free(&cgrp->xattrs);
+
kill_litter_super(sb);
cgroup_drop_root(root);
}
@@ -2575,6 +2589,64 @@
return simple_rename(old_dir, old_dentry, new_dir, new_dentry);
}

+static struct simple_xattrs *__d_xattrs(struct dentry *dentry)
+{
+ if (S_ISDIR(dentry->d_inode->i_mode))
+ return &__d_cgrp(dentry)->xattrs;
+ else
+ return &__d_cft(dentry)->xattrs;
+}
+
+static inline int xattr_enabled(struct dentry *dentry)
+{
+ struct cgroupfs_root *root = dentry->d_sb->s_fs_info;
+ return test_bit(ROOT_XATTR, &root->flags);
+}
+
+static bool is_valid_xattr(const char *name)
+{
+ if (!strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN) ||
+ !strncmp(name, XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN))
+ return true;
+ return false;
+}
+
+static int cgroup_setxattr(struct dentry *dentry, const char *name,
+ const void *val, size_t size, int flags)
+{
+ if (!xattr_enabled(dentry))
+ return -EOPNOTSUPP;
+ if (!is_valid_xattr(name))
+ return -EINVAL;
+ return simple_xattr_set(__d_xattrs(dentry), name, val, size, flags);
+}
+
+static int cgroup_removexattr(struct dentry *dentry, const char *name)
+{
+ if (!xattr_enabled(dentry))
+ return -EOPNOTSUPP;
+ if (!is_valid_xattr(name))
+ return -EINVAL;
+ return simple_xattr_remove(__d_xattrs(dentry), name);
+}
+
+static ssize_t cgroup_getxattr(struct dentry *dentry, const char *name,
+ void *buf, size_t size)
+{
+ if (!xattr_enabled(dentry))
+ return -EOPNOTSUPP;
+ if (!is_valid_xattr(name))
+ return -EINVAL;
+ return simple_xattr_get(__d_xattrs(dentry), name, buf, size);
+}
+
+static ssize_t cgroup_listxattr(struct dentry *dentry, char *buf, size_t size)
+{
+ if (!xattr_enabled(dentry))
+ return -EOPNOTSUPP;
+ return simple_xattr_list(__d_xattrs(dentry), buf, size);
+}
+
static const struct file_operations cgroup_file_operations = {
.read = cgroup_file_read,
.write = cgroup_file_write,
@@ -2583,11 +2655,22 @@
.release = cgroup_file_release,
};

+static const struct inode_operations cgroup_file_inode_operations = {
+ .setxattr = cgroup_setxattr,
+ .getxattr = cgroup_getxattr,
+ .listxattr = cgroup_listxattr,
+ .removexattr = cgroup_removexattr,
+};
+
static const struct inode_operations cgroup_dir_inode_operations = {
.lookup = cgroup_lookup,
.mkdir = cgroup_mkdir,
.rmdir = cgroup_rmdir,
.rename = cgroup_rename,
+ .setxattr = cgroup_setxattr,
+ .getxattr = cgroup_getxattr,
+ .listxattr = cgroup_listxattr,
+ .removexattr = cgroup_removexattr,
};

static struct dentry *cgroup_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags)
@@ -2635,6 +2718,7 @@
} else if (S_ISREG(mode)) {
inode->i_size = 0;
inode->i_fop = &cgroup_file_operations;
+ inode->i_op = &cgroup_file_inode_operations;
}
d_instantiate(dentry, inode);
dget(dentry); /* Extra count - pin the dentry in core */
@@ -2695,7 +2779,7 @@
}

static int cgroup_add_file(struct cgroup *cgrp, struct cgroup_subsys *subsys,
- const struct cftype *cft)
+ struct cftype *cft)
{
struct dentry *dir = cgrp->dentry;
struct cgroup *parent = __d_cgrp(dir);
@@ -2705,6 +2789,8 @@
umode_t mode;
char name[MAX_CGROUP_TYPE_NAMELEN + MAX_CFTYPE_NAME + 2] = { 0 };

+ simple_xattrs_init(&cft->xattrs);
+
/* does @cft->flags tell us to skip creation on @cgrp? */
if ((cft->flags & CFTYPE_NOT_ON_ROOT) && !cgrp->parent)
return 0;
@@ -2745,9 +2831,9 @@
}

static int cgroup_addrm_files(struct cgroup *cgrp, struct cgroup_subsys *subsys,
- const struct cftype cfts[], bool is_add)
+ struct cftype cfts[], bool is_add)
{
- const struct cftype *cft;
+ struct cftype *cft;
int err, ret = 0;

for (cft = cfts; cft->name[0] != '\0'; cft++) {
@@ -2781,7 +2867,7 @@
}

static void cgroup_cfts_commit(struct cgroup_subsys *ss,
- const struct cftype *cfts, bool is_add)
+ struct cftype *cfts, bool is_add)
__releases(&cgroup_mutex) __releases(&cgroup_cft_mutex)
{
LIST_HEAD(pending);
@@ -2832,7 +2918,7 @@
* function currently returns 0 as long as @cfts registration is successful
* even if some file creation attempts on existing cgroups fail.
*/
-int cgroup_add_cftypes(struct cgroup_subsys *ss, const struct cftype *cfts)
+int cgroup_add_cftypes(struct cgroup_subsys *ss, struct cftype *cfts)
{
struct cftype_set *set;

@@ -2862,7 +2948,7 @@
* Returns 0 on successful unregistration, -ENOENT if @cfts is not
* registered with @ss.
*/
-int cgroup_rm_cftypes(struct cgroup_subsys *ss, const struct cftype *cfts)
+int cgroup_rm_cftypes(struct cgroup_subsys *ss, struct cftype *cfts)
{
struct cftype_set *set;


2012-08-16 20:00:16

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v6 3/4] cgroup: add xattr support

On Thu, Aug 16, 2012 at 01:44:56PM -0400, [email protected] wrote:
> From: Li Zefan <[email protected]>
>
> This is one of the items in the plumber's wish list.
>
> For use cases:
>
> >> What would the use case be for this?
> >
> > Attaching meta information to services, in an easily discoverable
> > way. For example, in systemd we create one cgroup for each service, and
> > could then store data like the main pid of the specific service as an
> > xattr on the cgroup itself. That way we'd have almost all service state
> > in the cgroupfs, which would make it possible to terminate systemd and
> > later restart it without losing any state information. But there's more:
> > for example, some very peculiar services cannot be terminated on
> > shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> > services in question could just mark that on their cgroup, by setting an
> > xattr. On the more desktopy side of things there are other
> > possibilities: for example there are plans defining what an application
> > is along the lines of a cgroup (i.e. an app being a collection of
> > processes). With xattrs one could then attach an icon or human readable
> > program name on the cgroup.
> >
> > The key idea is that this would allow attaching runtime meta information
> > to cgroups and everything they model (services, apps, vms), that doesn't
> > need any complex userspace infrastructure, has good access control
> > (i.e. because the file system enforces that anyway, and there's the
> > "trusted." xattr namespace), notifications (inotify), and can easily be
> > shared among applications.
> >
> > Lennart
>
> v6:
> - remove user xattr namespace, only allow trusted and security
> v5:
> - check for capabilities before setting/removing xattrs
> v4:
> - no changes
> v3:
> - instead of config option, use mount option to enable xattr support
>
> Cc: Li Zefan <[email protected]>
> Cc: Tejun Heo <[email protected]>
> Cc: Hugh Dickins <[email protected]>
> Cc: Hillf Danton <[email protected]>
> Cc: Lennart Poettering <[email protected]>
> Signed-off-by: Li Zefan <[email protected]>
> Signed-off-by: Aristeu Rozanski <[email protected]>

I'm not against this but unsure whether using kmem is enough for the
suggested use case. Lennart, would this suit systemd? How much
metadata are we talking about?

Thanks.

--
tejun

2012-08-21 21:44:32

by Lennart Poettering

[permalink] [raw]
Subject: Re: [PATCH v6 3/4] cgroup: add xattr support

Heya,

(sorry for the late reply)

On 16.08.2012 22:00, Tejun Heo wrote:
> On Thu, Aug 16, 2012 at 01:44:56PM -0400, [email protected] wrote:

>>> Attaching meta information to services, in an easily discoverable
>>> way. For example, in systemd we create one cgroup for each service, and
>>> could then store data like the main pid of the specific service as an
>>> xattr on the cgroup itself. That way we'd have almost all service state
>>> in the cgroupfs, which would make it possible to terminate systemd and
>>> later restart it without losing any state information. But there's more:
>>> for example, some very peculiar services cannot be terminated on
>>> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
>>> services in question could just mark that on their cgroup, by setting an
>>> xattr. On the more desktopy side of things there are other
>>> possibilities: for example there are plans defining what an application
>>> is along the lines of a cgroup (i.e. an app being a collection of
>>> processes). With xattrs one could then attach an icon or human readable
>>> program name on the cgroup.
>>>
>>> The key idea is that this would allow attaching runtime meta information
>>> to cgroups and everything they model (services, apps, vms), that doesn't
>>> need any complex userspace infrastructure, has good access control
>>> (i.e. because the file system enforces that anyway, and there's the
>>> "trusted." xattr namespace), notifications (inotify), and can easily be
>>> shared among applications.

>
> I'm not against this but unsure whether using kmem is enough for the
> suggested use case. Lennart, would this suit systemd? How much
> metadata are we talking about?

Just small things, like values, PIDs, i.e. a few 100 bytes or so per
cgroup should be more than sufficient for our needs.

Lennart

2012-08-21 21:49:06

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v6 3/4] cgroup: add xattr support

Hello,

On Tue, Aug 21, 2012 at 11:43:44PM +0200, Lennart Poettering wrote:
> >I'm not against this but unsure whether using kmem is enough for the
> >suggested use case. Lennart, would this suit systemd? How much
> >metadata are we talking about?
>
> Just small things, like values, PIDs, i.e. a few 100 bytes or so per
> cgroup should be more than sufficient for our needs.

Alright, then. I think there's gonna be one more round to address
Hugh's comments. Hugh, how should this be routed? Is there some git
branch that tmpfs changes can go in so that cgroup tree can pull?

Thanks.

--
tejun

2012-08-21 23:30:43

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH v6 3/4] cgroup: add xattr support

On Tue, 21 Aug 2012, Tejun Heo wrote:
> On Tue, Aug 21, 2012 at 11:43:44PM +0200, Lennart Poettering wrote:
> > >I'm not against this but unsure whether using kmem is enough for the
> > >suggested use case. Lennart, would this suit systemd? How much
> > >metadata are we talking about?
> >
> > Just small things, like values, PIDs, i.e. a few 100 bytes or so per
> > cgroup should be more than sufficient for our needs.

That is reasonable.

>
> Alright, then. I think there's gonna be one more round to address
> Hugh's comments. Hugh, how should this be routed? Is there some git
> branch that tmpfs changes can go in so that cgroup tree can pull?

No git tree, but we can easily handle it in one of two ways.

include/linux/shmem_fs.h and mm/shmem.c usually go to Linus from Andrew
from his mmotm tree (which includes and is included in linux-next,
by some magic escaping infinite recursion).

Are we expecting Aristeu+Zefan's simple_xattr patches to go into 3.7?
I don't have anything planned for shmem.c for 3.7 beyond a bugfix,
which shouldn't interact with the simple_xattr changes at all
(I could remove info->lock, but will not do so this time around,
precisely so as not to interfere with those patches).

So it should be perfectly workable for you to take Aristeu+Zefan's
shmem patches into your cgroup tree, then any further mods from
mmotm will get layered on top.

But if you prefer to leave shmem.c changes to Andrew, then it would
also be perfectly workable for Aristeu to split the 1/4 into two:
one for you which updates fs/xattr.c and include/linux/xattr.h with
simple_xattr code stolen from mm/shmem.c and include/linux/shmem_fs.h;
and one for Andrew which updates mm/shmem.c and include/linux/shmem_fs.h
to delete its shmem_xattr stuff and use simple_xattr interfaces instead.

Either approach is fine with me.

Hugh

2012-08-23 19:44:32

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v6 3/4] cgroup: add xattr support

Hello, Hugh.

On Tue, Aug 21, 2012 at 04:29:53PM -0700, Hugh Dickins wrote:
> Are we expecting Aristeu+Zefan's simple_xattr patches to go into 3.7?

Yeah, probably.

> I don't have anything planned for shmem.c for 3.7 beyond a bugfix,
> which shouldn't interact with the simple_xattr changes at all
> (I could remove info->lock, but will not do so this time around,
> precisely so as not to interfere with those patches).
>
> So it should be perfectly workable for you to take Aristeu+Zefan's
> shmem patches into your cgroup tree, then any further mods from
> mmotm will get layered on top.

I think this approach would be simpler. Once Aristeu posts the
updated version, I'll route the whole series through cgroup/for-3.7.

Thanks.

--
tejun

2012-08-23 19:58:40

by Aristeu Rozanski

[permalink] [raw]
Subject: Re: [PATCH v6 3/4] cgroup: add xattr support

On Thu, Aug 23, 2012 at 12:44:23PM -0700, Tejun Heo wrote:
> Hello, Hugh.
>
> On Tue, Aug 21, 2012 at 04:29:53PM -0700, Hugh Dickins wrote:
> > Are we expecting Aristeu+Zefan's simple_xattr patches to go into 3.7?
>
> Yeah, probably.
>
> > I don't have anything planned for shmem.c for 3.7 beyond a bugfix,
> > which shouldn't interact with the simple_xattr changes at all
> > (I could remove info->lock, but will not do so this time around,
> > precisely so as not to interfere with those patches).
> >
> > So it should be perfectly workable for you to take Aristeu+Zefan's
> > shmem patches into your cgroup tree, then any further mods from
> > mmotm will get layered on top.
>
> I think this approach would be simpler. Once Aristeu posts the
> updated version, I'll route the whole series through cgroup/for-3.7.

I'm about to submit it, just doing last build and testing round. Merged
back the changes Hugh asked in patch 1.

Also found why the list reinitialization was needed in the past; one of
the old iterations had a bug in the remount code and was cleaning the
xattrs out of the cgroup directory but not actually removing the
directory, thus new xattrs got added to a list of freed xattrs.

So I believe we're good to go.

--
Aristeu

2012-08-24 00:02:51

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v6 3/4] cgroup: add xattr support

Lennart Poettering <[email protected]> writes:

> Heya,
>
> (sorry for the late reply)
>
> On 16.08.2012 22:00, Tejun Heo wrote:
>> On Thu, Aug 16, 2012 at 01:44:56PM -0400, [email protected] wrote:
>
>>>> Attaching meta information to services, in an easily discoverable
>>>> way. For example, in systemd we create one cgroup for each service, and
>>>> could then store data like the main pid of the specific service as an
>>>> xattr on the cgroup itself. That way we'd have almost all service state
>>>> in the cgroupfs, which would make it possible to terminate systemd and
>>>> later restart it without losing any state information. But there's more:
>>>> for example, some very peculiar services cannot be terminated on
>>>> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
>>>> services in question could just mark that on their cgroup, by setting an
>>>> xattr. On the more desktopy side of things there are other
>>>> possibilities: for example there are plans defining what an application
>>>> is along the lines of a cgroup (i.e. an app being a collection of
>>>> processes). With xattrs one could then attach an icon or human readable
>>>> program name on the cgroup.
>>>>
>>>> The key idea is that this would allow attaching runtime meta information
>>>> to cgroups and everything they model (services, apps, vms), that doesn't
>>>> need any complex userspace infrastructure, has good access control
>>>> (i.e. because the file system enforces that anyway, and there's the
>>>> "trusted." xattr namespace), notifications (inotify), and can easily be
>>>> shared among applications.
>
>>
>> I'm not against this but unsure whether using kmem is enough for the
>> suggested use case. Lennart, would this suit systemd? How much
>> metadata are we talking about?
>
> Just small things, like values, PIDs, i.e. a few 100 bytes or so per cgroup
> should be more than sufficient for our needs.

I have a really silly question. Why is storing these things in xattrs
in a cgroup better than simply implementing a file in a cgroup?

It is most definitely going to be a real pain to discover as unix tools
do not support xattrs well.

Furthermore I am having nasty visiions that storing pids is going to
start breaking cgroups the way storing pids already breaks futexes.
Which pid namespace is that pid relative to?

Eric