2015-11-17 19:40:52

by Tejun Heo

[permalink] [raw]
Subject: [PATCHSET] netfilter, cgroup: implement xt_cgroup2 match

Hello,

In cgroup v1, dealing with cgroup membership was difficult because the
number of membership associations was unbound. As a result, cgroup v1
grew several controllers whose primary purpose is either tagging
membership or pull in configuration knobs from other subsystems so
that cgroup membership test can be avoided.

net_cls and net_prio controllers are examples of the latter. They
allow configuring network-specific attributes from cgroup side so that
network subsystem can avoid testing cgroup membership; unfortunately,
these are not only cumbersome but also problematic.

Both net_cls and net_prio aren't properly hierarchical. Both inherit
configuration from the parent on creation but there's no interaction
afterwards. An ancestor doesn't restrict the behavior in its subtree
in anyway and configuration changes aren't propagated downwards.
Especially when combined with cgroup delegation, this is problematic
because delegatees can mess up whatever network configuration
implemented at the system level. net_prio would allow the delegatees
to set whatever priority value regardless of CAP_NET_ADMIN and net_cls
the same for classid.

While it is possible to solve these issues from controller side by
implementing hierarchical allowable ranges in both controllers, it
would involve quite a bit of complexity in the controllers and further
obfuscate network configuration as it becomes even more difficult to
tell what's actually being configured looking from the network side.
While not much can be done for v1 at this point, as membership
handling is sane on cgroup v2, it'd be better to make cgroup matching
behave like other network matches and classifiers than introducing
further complications.

Unfortunately, this ends up adding another cgroup related field to
struct sock - sock->sk_cgroup. I tried to think of a way to overload
the fields for net_cls and net_prio but couldn't come up with a sane
way to do that. In the long term, it should be possible to disable
net_cls and net_prio.

This patchset includes the following five patches.

0001-cgroup-record-ancestor-IDs-and-reimplement-cgroup_is.patch
0002-kernfs-implement-kernfs_walk_and_get.patch
0003-cgroup-implement-cgroup_get_from_path-and-expose-cgr.patch
0004-sock-cgroup-add-sock-sk_cgroup.patch
0005-netfilter-implement-xt_cgroup2-match.patch

0001-0003 are prepatory patches in kernfs and cgroup. 0004 adds
sock->sk_cgroup to track socket to cgroup association. 0005
implements the new xt_cgroup2 match.

This patchset is on top of v4.4-rc1 and also available in the
following git branch.

git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-xt_cgroup2

I'll post iptables extension as a reply. diffstat follows. Thanks.

fs/kernfs/dir.c | 48 +++++++++++++++
include/linux/cgroup-defs.h | 14 ++++
include/linux/cgroup.h | 33 ++++++++++
include/linux/kernfs.h | 12 +++
include/net/sock.h | 4 +
include/uapi/linux/netfilter/xt_cgroup2.h | 14 ++++
kernel/cgroup.c | 95 +++++++++++++++++++++---------
net/core/sock.c | 2
net/netfilter/Kconfig | 9 ++
net/netfilter/Makefile | 1
net/netfilter/xt_cgroup2.c | 75 +++++++++++++++++++++++
11 files changed, 278 insertions(+), 29 deletions(-)

--
tejun


2015-11-17 19:40:56

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 1/5] cgroup: record ancestor IDs and reimplement cgroup_is_descendant() using it

cgroup_is_descendant() currently walks up the hierarchy and compares
each ancestor to the cgroup in question. While enough for cgroup core
usages, this can't be used in hot paths to test cgroup membership.
This patch adds cgroup->ancestor_ids[] which records the IDs of all
ancestors including self and cgroup->level for the nesting level.

This allows testing whether a given cgroup is a descendant of another
in three finite steps - testing whether the two belong to the same
hierarchy, whether the descendant candidate is at the same or a higher
level than the ancestor and comparing the recorded ancestor_id at the
matching level. cgroup_is_descendant() is accordingly reimplmented
and made inline.

Signed-off-by: Tejun Heo <[email protected]>
---
include/linux/cgroup-defs.h | 14 ++++++++++++++
include/linux/cgroup.h | 18 +++++++++++++++++-
kernel/cgroup.c | 32 ++++++++++----------------------
3 files changed, 41 insertions(+), 23 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 60d44b2..504d859 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -235,6 +235,14 @@ struct cgroup {
int id;

/*
+ * The depth this cgroup is at. The root is at depth zero and each
+ * step down the hierarchy increments the level. This along with
+ * ancestor_ids[] can determine whether a given cgroup is a
+ * descendant of another without traversing the hierarchy.
+ */
+ int level;
+
+ /*
* Each non-empty css_set associated with this cgroup contributes
* one to populated_cnt. All children with non-zero popuplated_cnt
* of their own contribute one. The count is zero iff there's no
@@ -289,6 +297,9 @@ struct cgroup {

/* used to schedule release agent */
struct work_struct release_agent_work;
+
+ /* ids of the ancestors at each level including self */
+ int ancestor_ids[];
};

/*
@@ -308,6 +319,9 @@ struct cgroup_root {
/* The root cgroup. Root is destroyed on its release. */
struct cgroup cgrp;

+ /* for cgrp->ancestor_ids[0] */
+ int cgrp_ancestor_id_storage;
+
/* Number of cgroups in the hierarchy, used only for /proc/cgroups */
atomic_t nr_cgrps;

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 22e3754..b5ee2c4 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -81,7 +81,6 @@ struct cgroup_subsys_state *cgroup_get_e_css(struct cgroup *cgroup,
struct cgroup_subsys_state *css_tryget_online_from_dir(struct dentry *dentry,
struct cgroup_subsys *ss);

-bool cgroup_is_descendant(struct cgroup *cgrp, struct cgroup *ancestor);
int cgroup_attach_task_all(struct task_struct *from, struct task_struct *);
int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from);

@@ -459,6 +458,23 @@ static inline struct cgroup *task_cgroup(struct task_struct *task,
return task_css(task, subsys_id)->cgroup;
}

+/**
+ * cgroup_is_descendant - test ancestry
+ * @cgrp: the cgroup to be tested
+ * @ancestor: possible ancestor of @cgrp
+ *
+ * Test whether @cgrp is a descendant of @ancestor. It also returns %true
+ * if @cgrp == @ancestor. This function is safe to call as long as @cgrp
+ * and @ancestor are accessible.
+ */
+static inline bool cgroup_is_descendant(struct cgroup *cgrp,
+ struct cgroup *ancestor)
+{
+ if (cgrp->root != ancestor->root || cgrp->level < ancestor->level)
+ return false;
+ return cgrp->ancestor_ids[ancestor->level] == ancestor->id;
+}
+
/* no synchronization, the result can only be used as a hint */
static inline bool cgroup_is_populated(struct cgroup *cgrp)
{
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index f1603c1..3190040 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -459,25 +459,6 @@ struct cgroup_subsys_state *of_css(struct kernfs_open_file *of)
}
EXPORT_SYMBOL_GPL(of_css);

-/**
- * cgroup_is_descendant - test ancestry
- * @cgrp: the cgroup to be tested
- * @ancestor: possible ancestor of @cgrp
- *
- * Test whether @cgrp is a descendant of @ancestor. It also returns %true
- * if @cgrp == @ancestor. This function is safe to call as long as @cgrp
- * and @ancestor are accessible.
- */
-bool cgroup_is_descendant(struct cgroup *cgrp, struct cgroup *ancestor)
-{
- while (cgrp) {
- if (cgrp == ancestor)
- return true;
- cgrp = cgroup_parent(cgrp);
- }
- return false;
-}
-
static int notify_on_release(const struct cgroup *cgrp)
{
return test_bit(CGRP_NOTIFY_ON_RELEASE, &cgrp->flags);
@@ -1903,6 +1884,7 @@ static int cgroup_setup_root(struct cgroup_root *root, unsigned long ss_mask)
if (ret < 0)
goto out;
root_cgrp->id = ret;
+ root_cgrp->ancestor_ids[0] = ret;

ret = percpu_ref_init(&root_cgrp->self.refcnt, css_release, 0,
GFP_KERNEL);
@@ -4846,11 +4828,11 @@ static int create_css(struct cgroup *cgrp, struct cgroup_subsys *ss,
static int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
umode_t mode)
{
- struct cgroup *parent, *cgrp;
+ struct cgroup *parent, *cgrp, *tcgrp;
struct cgroup_root *root;
struct cgroup_subsys *ss;
struct kernfs_node *kn;
- int ssid, ret;
+ int level, ssid, ret;

/* Do not accept '\n' to prevent making /proc/<pid>/cgroup unparsable.
*/
@@ -4861,9 +4843,11 @@ static int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
if (!parent)
return -ENODEV;
root = parent->root;
+ level = parent->level + 1;

/* allocate the cgroup and its ID, 0 is reserved for the root */
- cgrp = kzalloc(sizeof(*cgrp), GFP_KERNEL);
+ cgrp = kzalloc(sizeof(*cgrp) +
+ sizeof(cgrp->ancestor_ids[0]) * (level + 1), GFP_KERNEL);
if (!cgrp) {
ret = -ENOMEM;
goto out_unlock;
@@ -4887,6 +4871,10 @@ static int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name,

cgrp->self.parent = &parent->self;
cgrp->root = root;
+ cgrp->level = level;
+
+ for (tcgrp = cgrp; tcgrp; tcgrp = cgroup_parent(tcgrp))
+ cgrp->ancestor_ids[tcgrp->level] = tcgrp->id;

if (notify_on_release(parent))
set_bit(CGRP_NOTIFY_ON_RELEASE, &cgrp->flags);
--
2.5.0

2015-11-17 19:40:58

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 2/5] kernfs: implement kernfs_walk_and_get()

Implement kernfs_walk_and_get() which is similar to
kernfs_find_and_get() but can walk a path instead of just a name.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
---
fs/kernfs/dir.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/kernfs.h | 12 ++++++++++++
2 files changed, 60 insertions(+)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 91e0045..95dc415 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -694,6 +694,31 @@ static struct kernfs_node *kernfs_find_ns(struct kernfs_node *parent,
return NULL;
}

+static struct kernfs_node *kernfs_walk_ns(struct kernfs_node *parent,
+ const unsigned char *path,
+ const void *ns)
+{
+ static char path_buf[PATH_MAX]; /* protected by kernfs_mutex */
+ int len = strlen(path);
+ char *p, *name;
+
+ lockdep_assert_held(&kernfs_mutex);
+
+ if (len >= PATH_MAX)
+ return NULL;
+
+ memcpy(path_buf, path, len + 1);
+ p = path_buf;
+
+ while ((name = strsep(&p, "/")) && parent) {
+ if (*name == '\0')
+ continue;
+ parent = kernfs_find_ns(parent, name, ns);
+ }
+
+ return parent;
+}
+
/**
* kernfs_find_and_get_ns - find and get kernfs_node with the given name
* @parent: kernfs_node to search under
@@ -719,6 +744,29 @@ struct kernfs_node *kernfs_find_and_get_ns(struct kernfs_node *parent,
EXPORT_SYMBOL_GPL(kernfs_find_and_get_ns);

/**
+ * kernfs_walk_and_get_ns - find and get kernfs_node with the given path
+ * @parent: kernfs_node to search under
+ * @path: path to look for
+ * @ns: the namespace tag to use
+ *
+ * Look for kernfs_node with path @path under @parent and get a reference
+ * if found. This function may sleep and returns pointer to the found
+ * kernfs_node on success, %NULL on failure.
+ */
+struct kernfs_node *kernfs_walk_and_get_ns(struct kernfs_node *parent,
+ const char *path, const void *ns)
+{
+ struct kernfs_node *kn;
+
+ mutex_lock(&kernfs_mutex);
+ kn = kernfs_walk_ns(parent, path, ns);
+ kernfs_get(kn);
+ mutex_unlock(&kernfs_mutex);
+
+ return kn;
+}
+
+/**
* kernfs_create_root - create a new kernfs hierarchy
* @scops: optional syscall operations for the hierarchy
* @flags: KERNFS_ROOT_* flags
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 5d4e9c4..af51df3 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -274,6 +274,8 @@ void pr_cont_kernfs_path(struct kernfs_node *kn);
struct kernfs_node *kernfs_get_parent(struct kernfs_node *kn);
struct kernfs_node *kernfs_find_and_get_ns(struct kernfs_node *parent,
const char *name, const void *ns);
+struct kernfs_node *kernfs_walk_and_get_ns(struct kernfs_node *parent,
+ const char *path, const void *ns);
void kernfs_get(struct kernfs_node *kn);
void kernfs_put(struct kernfs_node *kn);

@@ -350,6 +352,10 @@ static inline struct kernfs_node *
kernfs_find_and_get_ns(struct kernfs_node *parent, const char *name,
const void *ns)
{ return NULL; }
+static inline struct kernfs_node *
+kernfs_walk_and_get_ns(struct kernfs_node *parent, const char *path,
+ const void *ns)
+{ return NULL; }

static inline void kernfs_get(struct kernfs_node *kn) { }
static inline void kernfs_put(struct kernfs_node *kn) { }
@@ -431,6 +437,12 @@ kernfs_find_and_get(struct kernfs_node *kn, const char *name)
}

static inline struct kernfs_node *
+kernfs_walk_and_get(struct kernfs_node *kn, const char *path)
+{
+ return kernfs_walk_and_get_ns(kn, path, NULL);
+}
+
+static inline struct kernfs_node *
kernfs_create_dir(struct kernfs_node *parent, const char *name, umode_t mode,
void *priv)
{
--
2.5.0

2015-11-17 19:43:33

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 3/5] cgroup: implement cgroup_get_from_path() and expose cgroup_put()

Implement cgroup_get_path() using kernfs_walk_and_get() which obtains
a default hierarchy cgroup from its path. This will be used to allow
cgroup path based matching from outside cgroup proper -
e.g. networking and perf.

Signed-off-by: Tejun Heo <[email protected]>
---
include/linux/cgroup.h | 7 +++++++
kernel/cgroup.c | 38 +++++++++++++++++++++++++++++++++-----
2 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index b5ee2c4..4c3ffab 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -81,6 +81,8 @@ struct cgroup_subsys_state *cgroup_get_e_css(struct cgroup *cgroup,
struct cgroup_subsys_state *css_tryget_online_from_dir(struct dentry *dentry,
struct cgroup_subsys *ss);

+struct cgroup *cgroup_get_from_path(const char *path);
+
int cgroup_attach_task_all(struct task_struct *from, struct task_struct *);
int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from);

@@ -351,6 +353,11 @@ static inline void css_put_many(struct cgroup_subsys_state *css, unsigned int n)
percpu_ref_put_many(&css->refcnt, n);
}

+static inline void cgroup_put(struct cgroup *cgrp)
+{
+ css_put(&cgrp->self);
+}
+
/**
* task_css_set_check - obtain a task's css_set with extra access conditions
* @task: the task to obtain css_set for
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 3190040..49947c1 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -434,11 +434,6 @@ static bool cgroup_tryget(struct cgroup *cgrp)
return css_tryget(&cgrp->self);
}

-static void cgroup_put(struct cgroup *cgrp)
-{
- css_put(&cgrp->self);
-}
-
struct cgroup_subsys_state *of_css(struct kernfs_open_file *of)
{
struct cgroup *cgrp = of->kn->parent->priv;
@@ -5753,6 +5748,39 @@ struct cgroup_subsys_state *css_from_id(int id, struct cgroup_subsys *ss)
return id > 0 ? idr_find(&ss->css_idr, id) : NULL;
}

+/**
+ * cgroup_get_from_path - lookup and get a cgroup from its default hierarchy path
+ * @path: path on the default hierarchy
+ *
+ * Find the cgroup at @path on the default hierarchy, increment its
+ * reference count and return it. Returns pointer to the found cgroup on
+ * success, ERR_PTR(-ENOENT) if @path doens't exist and ERR_PTR(-ENOTDIR)
+ * if @path points to a non-directory.
+ */
+struct cgroup *cgroup_get_from_path(const char *path)
+{
+ struct kernfs_node *kn;
+ struct cgroup *cgrp;
+
+ mutex_lock(&cgroup_mutex);
+
+ kn = kernfs_walk_and_get(cgrp_dfl_root.cgrp.kn, path);
+ if (kn) {
+ if (kernfs_type(kn) == KERNFS_DIR) {
+ cgrp = kn->priv;
+ cgroup_get(cgrp);
+ } else {
+ cgrp = ERR_PTR(-ENOTDIR);
+ }
+ kernfs_put(kn);
+ } else {
+ cgrp = ERR_PTR(-ENOENT);
+ }
+
+ mutex_unlock(&cgroup_mutex);
+ return cgrp;
+}
+
#ifdef CONFIG_CGROUP_DEBUG
static struct cgroup_subsys_state *
debug_css_alloc(struct cgroup_subsys_state *parent_css)
--
2.5.0

2015-11-17 19:41:54

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 4/5] sock, cgroup: add sock->sk_cgroup

In cgroup v1, dealing with cgroup membership was difficult because the
number of membership associations was unbound. As a result, cgroup v1
grew several controllers whose primary purpose is either tagging
membership or pull in configuration knobs from other subsystems so
that cgroup membership test can be avoided.

net_cls and net_prio controllers are examples of the latter. They
allow configuring network-specific attributes from cgroup side so that
network subsystem can avoid testing cgroup membership; unfortunately,
these are not only cumbersome but also problematic.

Both net_cls and net_prio aren't properly hierarchical. Both inherit
configuration from the parent on creation but there's no interaction
afterwards. An ancestor doesn't restrict the behavior in its subtree
in anyway and configuration changes aren't propagated downwards.
Especially when combined with cgroup delegation, this is problematic
because delegatees can mess up whatever network configuration
implemented at the system level. net_prio would allow the delegatees
to set whatever priority value regardless of CAP_NET_ADMIN and net_cls
the same for classid.

While it is possible to solve these issues from controller side by
implementing hierarchical allowable ranges in both controllers, it
would involve quite a bit of complexity in the controllers and further
obfuscate network configuration as it becomes even more difficult to
tell what's actually being configured looking from the network side.
While not much can be done for v1 at this point, as membership
handling is sane on cgroup v2, it'd be better to make cgroup matching
behave like other network matches and classifiers than introducing
further complications.

In preparation, this patch adds sock->sk_cgroup which points to the
associated cgroup. A sock is associated on creation and stays
associated to the same cgroup until freed; unfortunately, this ends up
adding another cgroup field to struct sock on top of sk_cgrp_prioidx
and sk_classid. I tried to think of a way to somehow overload the
existing fields but couldn't come up with a reasonable one. For the
longer term, the fields can be rearranged so that disabling prio and
cls controllers reduce the size of the struct.

This patch doesn't make use of the added field yet. The following
patch will implement netfilter match for cgroup2 membership.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Daniel Wagner <[email protected]>
CC: Neil Horman <[email protected]>
---
include/linux/cgroup.h | 8 ++++++++
include/net/sock.h | 4 ++++
kernel/cgroup.c | 25 ++++++++++++++++++++++++-
net/core/sock.c | 2 ++
4 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 4c3ffab..2a6d7c4 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -20,6 +20,8 @@

#include <linux/cgroup-defs.h>

+struct sock;
+
#ifdef CONFIG_CGROUPS

/*
@@ -108,6 +110,9 @@ void cgroup_free(struct task_struct *p);
int cgroup_init_early(void);
int cgroup_init(void);

+void cgroup_sk_alloc(struct sock *sk);
+void cgroup_sk_free(struct sock *sk);
+
/*
* Iteration helpers and macros.
*/
@@ -576,6 +581,9 @@ static inline void cgroup_free(struct task_struct *p) {}
static inline int cgroup_init_early(void) { return 0; }
static inline int cgroup_init(void) { return 0; }

+static inline void cgroup_sk_alloc(struct sock *sk) {}
+static inline void cgroup_sk_free(struct sock *sk) {}
+
#endif /* !CONFIG_CGROUPS */

#endif /* _LINUX_CGROUP_H */
diff --git a/include/net/sock.h b/include/net/sock.h
index bbf7c2c..6c5d195 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -310,6 +310,7 @@ struct cg_proto;
* @sk_security: used by security modules
* @sk_mark: generic packet mark
* @sk_classid: this socket's cgroup classid
+ * @sk_cgroup: the v2 cgroup this socket is associated with
* @sk_cgrp: this socket's cgroup-specific proto data
* @sk_write_pending: a write to stream socket waits to start
* @sk_state_change: callback to indicate change in the state of the sock
@@ -447,6 +448,9 @@ struct sock {
#ifdef CONFIG_CGROUP_NET_CLASSID
u32 sk_classid;
#endif
+#ifdef CONFIG_CGROUPS
+ struct cgroup *sk_cgroup;
+#endif
struct cg_proto *sk_cgrp;
void (*sk_state_change)(struct sock *sk);
void (*sk_data_ready)(struct sock *sk);
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 49947c1..f26533b 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -57,8 +57,8 @@
#include <linux/vmalloc.h> /* TODO: replace with more sophisticated array */
#include <linux/kthread.h>
#include <linux/delay.h>
-
#include <linux/atomic.h>
+#include <net/sock.h>

/*
* pidlists linger the following amount before being destroyed. The goal
@@ -5781,6 +5781,29 @@ struct cgroup *cgroup_get_from_path(const char *path)
return cgrp;
}

+void cgroup_sk_alloc(struct sock *sk)
+{
+ rcu_read_lock();
+
+ while (true) {
+ struct css_set *cset;
+
+ cset = task_css_set(current);
+ if (likely(cgroup_tryget(cset->dfl_cgrp))) {
+ sk->sk_cgroup = cset->dfl_cgrp;
+ break;
+ }
+ cpu_relax();
+ }
+
+ rcu_read_unlock();
+}
+
+void cgroup_sk_free(struct sock *sk)
+{
+ cgroup_put(sk->sk_cgroup);
+}
+
#ifdef CONFIG_CGROUP_DEBUG
static struct cgroup_subsys_state *
debug_css_alloc(struct cgroup_subsys_state *parent_css)
diff --git a/net/core/sock.c b/net/core/sock.c
index 1e4dd54..7c34bba 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1363,6 +1363,7 @@ static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority,
if (!try_module_get(prot->owner))
goto out_free_sec;
sk_tx_queue_clear(sk);
+ cgroup_sk_alloc(sk);
}

return sk;
@@ -1385,6 +1386,7 @@ static void sk_prot_free(struct proto *prot, struct sock *sk)
owner = prot->owner;
slab = prot->slab;

+ cgroup_sk_free(sk);
security_sk_free(sk);
if (slab != NULL)
kmem_cache_free(slab, sk);
--
2.5.0

2015-11-17 19:41:04

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 5/5] netfilter: implement xt_cgroup2 match

This patch implements xt_cgroup2 which matches cgroup2 membership of
the associated socket. The match is recursive and invertible.

For rationales on introducing another cgroup based match, please refer
to a preceding commit "sock, cgroup: add sock->sk_cgroup".

Signed-off-by: Tejun Heo <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Daniel Wagner <[email protected]>
CC: Neil Horman <[email protected]>
---
include/uapi/linux/netfilter/xt_cgroup2.h | 14 ++++++
net/netfilter/Kconfig | 9 ++++
net/netfilter/Makefile | 1 +
net/netfilter/xt_cgroup2.c | 75 +++++++++++++++++++++++++++++++
4 files changed, 99 insertions(+)
create mode 100644 include/uapi/linux/netfilter/xt_cgroup2.h
create mode 100644 net/netfilter/xt_cgroup2.c

diff --git a/include/uapi/linux/netfilter/xt_cgroup2.h b/include/uapi/linux/netfilter/xt_cgroup2.h
new file mode 100644
index 0000000..8726d31
--- /dev/null
+++ b/include/uapi/linux/netfilter/xt_cgroup2.h
@@ -0,0 +1,14 @@
+#ifndef _XT_CGROUP2_H
+#define _XT_CGROUP2_H
+
+#include <linux/types.h>
+
+struct xt_cgroup2_info {
+ char path[PATH_MAX];
+ __u8 invert;
+
+ /* kernel internal data */
+ void *priv;
+};
+
+#endif /* _XT_CGROUP2_H */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index e22349e..55d3afe 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -974,6 +974,15 @@ config NETFILTER_XT_MATCH_BPF

To compile it as a module, choose M here. If unsure, say N.

+config NETFILTER_XT_MATCH_CGROUP2
+ tristate '"cgroup2" match support'
+ depends on NETFILTER_ADVANCED
+ depends on CGROUPS
+ help
+ cgroup2 matching allows you to match locally generated and
+ early demuxed packets based on the v2 cgroup the socket is
+ associated with on creation.
+
config NETFILTER_XT_MATCH_CGROUP
tristate '"control group" match support'
depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 7638c36..86cee05 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -152,6 +152,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_MULTIPORT) += xt_multiport.o
obj-$(CONFIG_NETFILTER_XT_MATCH_NFACCT) += xt_nfacct.o
obj-$(CONFIG_NETFILTER_XT_MATCH_OSF) += xt_osf.o
obj-$(CONFIG_NETFILTER_XT_MATCH_OWNER) += xt_owner.o
+obj-$(CONFIG_NETFILTER_XT_MATCH_CGROUP2) += xt_cgroup2.o
obj-$(CONFIG_NETFILTER_XT_MATCH_CGROUP) += xt_cgroup.o
obj-$(CONFIG_NETFILTER_XT_MATCH_PHYSDEV) += xt_physdev.o
obj-$(CONFIG_NETFILTER_XT_MATCH_PKTTYPE) += xt_pkttype.o
diff --git a/net/netfilter/xt_cgroup2.c b/net/netfilter/xt_cgroup2.c
new file mode 100644
index 0000000..cf99524
--- /dev/null
+++ b/net/netfilter/xt_cgroup2.c
@@ -0,0 +1,75 @@
+#include <linux/skbuff.h>
+#include <linux/module.h>
+#include <linux/cgroup.h>
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_cgroup2.h>
+#include <net/sock.h>
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Tejun Heo <[email protected]>");
+MODULE_DESCRIPTION("Xtables: cgroup2 socket ownership matching");
+MODULE_ALIAS("ipt_cgroup2");
+MODULE_ALIAS("ip6t_cgroup2");
+
+static int cgroup2_mt_check(const struct xt_mtchk_param *par)
+{
+ struct xt_cgroup2_info *info = par->matchinfo;
+ struct cgroup *cgrp;
+
+ if (info->invert & ~1)
+ return -EINVAL;
+
+ cgrp = cgroup_get_from_path(info->path);
+ if (IS_ERR(cgrp)) {
+ pr_info("xt_cgroup2: invalid path, errno=%ld\n", PTR_ERR(cgrp));
+ return -EINVAL;
+ }
+ info->priv = cgrp;
+
+ return 0;
+}
+
+static bool cgroup2_mt(const struct sk_buff *skb, struct xt_action_param *par)
+{
+ const struct xt_cgroup2_info *info = par->matchinfo;
+ struct cgroup *ancestor = info->priv;
+
+ if (!skb->sk || !sk_fullsock(skb->sk))
+ return false;
+
+ return cgroup_is_descendant(skb->sk->sk_cgroup, ancestor) ^ info->invert;
+}
+
+static void cgroup2_mt_destroy(const struct xt_mtdtor_param *par)
+{
+ struct xt_cgroup2_info *info = par->matchinfo;
+
+ cgroup_put(info->priv);
+}
+
+static struct xt_match cgroup2_mt_reg __read_mostly = {
+ .name = "cgroup2",
+ .revision = 0,
+ .family = NFPROTO_UNSPEC,
+ .checkentry = cgroup2_mt_check,
+ .match = cgroup2_mt,
+ .matchsize = sizeof(struct xt_cgroup2_info),
+ .destroy = cgroup2_mt_destroy,
+ .me = THIS_MODULE,
+ .hooks = (1 << NF_INET_LOCAL_OUT) |
+ (1 << NF_INET_POST_ROUTING) |
+ (1 << NF_INET_LOCAL_IN),
+};
+
+static int __init cgroup2_mt_init(void)
+{
+ return xt_register_match(&cgroup2_mt_reg);
+}
+
+static void __exit cgroup2_mt_exit(void)
+{
+ xt_unregister_match(&cgroup2_mt_reg);
+}
+
+module_init(cgroup2_mt_init);
+module_exit(cgroup2_mt_exit);
--
2.5.0

2015-11-17 19:42:29

by Tejun Heo

[permalink] [raw]
Subject: [PATCH iptables] libxt_cgroup2: add support for cgroup2 path matching

This patch adds the extension for the xt_cgroup2 which matches packets
based on the v2 cgroup path of the associated socket.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Daniel Borkmann <[email protected]>
---
extensions/libxt_cgroup2.c | 70 +++++++++++++++++++++++++++++++++++
extensions/libxt_cgroup2.man | 24 ++++++++++++
include/linux/netfilter/xt_cgroup2.h | 14 +++++++
3 files changed, 108 insertions(+)

--- /dev/null
+++ b/extensions/libxt_cgroup2.c
@@ -0,0 +1,70 @@
+#include <stdio.h>
+#include <mntent.h>
+#include <xtables.h>
+#include <linux/netfilter/xt_cgroup2.h>
+#include <linux/limits.h>
+
+enum {
+ O_CGROUP2_PATH,
+};
+
+static void cgroup2_help(void)
+{
+ printf(
+"cgroup2 match options:\n"
+"[!] --path path path relative to cgroup2 root, recursive match\n");
+}
+
+static const struct xt_option_entry cgroup2_opts[] = {
+ {
+ .name = "path",
+ .id = O_CGROUP2_PATH,
+ .type = XTTYPE_STRING,
+ .flags = XTOPT_INVERT | XTOPT_MAND | XTOPT_PUT,
+ XTOPT_POINTER(struct xt_cgroup2_info, path)
+ },
+ XTOPT_TABLEEND,
+};
+
+static void cgroup2_parse(struct xt_option_call *cb)
+{
+ struct xt_cgroup2_info *info = cb->data;
+
+ xtables_option_parse(cb);
+
+ if (cb->invert)
+ info->invert = true;
+}
+
+static void cgroup2_print(const void *ip, const struct xt_entry_match *match,
+ int numeric)
+{
+ const struct xt_cgroup2_info *info = (void *)match->data;
+
+ printf(" cgroup2 %s%s", info->invert ? "! ":"", info->path);
+}
+
+static void cgroup2_save(const void *ip, const struct xt_entry_match *match)
+{
+ const struct xt_cgroup2_info *info = (void *)match->data;
+
+ printf("%s --path %s", info->invert ? " !" : "", info->path);
+}
+
+static struct xtables_match cgroup2_match = {
+ .family = NFPROTO_UNSPEC,
+ .name = "cgroup2",
+ .version = XTABLES_VERSION,
+ .size = XT_ALIGN(sizeof(struct xt_cgroup2_info)),
+ .userspacesize = XT_ALIGN(sizeof(struct xt_cgroup2_info)),
+ .help = cgroup2_help,
+ .print = cgroup2_print,
+ .save = cgroup2_save,
+ .x6_parse = cgroup2_parse,
+ .x6_options = cgroup2_opts,
+};
+
+void _init(void)
+{
+ xtables_register_match(&cgroup2_match);
+}
--- /dev/null
+++ b/extensions/libxt_cgroup2.man
@@ -0,0 +1,24 @@
+.TP
+[\fB!\fP] \fB\-\-path\fP \fIpath\fP
+Match cgroup2 membership.
+
+Each socket is associated with the v2 cgroup of the creating process.
+This matches packets coming from or going to all sockets in the
+sub-hierarchy of the specified path. The path should be relative to
+the root of the cgroup2 hierarchy. Can be used in the OUTPUT and
+INPUT chains to assign particular firewall policies for aggregated
+processes on the system. This allows for more fine-grained firewall
+policies that only match for a subset of the system's processes.
+
+\fBIMPORTANT\fP: when being used in the INPUT chain, the cgroup2
+matcher is currently only of limited functionality, meaning it
+will only match on packets that are processed for local sockets
+through early socket demuxing. Therefore, general usage on the
+INPUT chain is disadviced unless the implications are well
+understood.
+.PP
+Example:
+.IP
+iptables \-A OUTPUT \-p tcp \-\-sport 80 \-m cgroup2 ! \-\-path service/http-server \-j DROP
+.PP
+Available since Linux 4.5.
--- /dev/null
+++ b/include/linux/netfilter/xt_cgroup2.h
@@ -0,0 +1,14 @@
+#ifndef _XT_CGROUP2_H
+#define _XT_CGROUP2_H
+
+#include <linux/types.h>
+
+struct xt_cgroup2_info {
+ char path[PATH_MAX];
+ __u8 invert;
+
+ /* kernel internal data */
+ void *priv;
+};
+
+#endif /* _XT_CGROUP2_H */

2015-11-17 21:20:52

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 2/5] kernfs: implement kernfs_walk_and_get()

From: Tejun Heo <[email protected]>
Date: Tue, 17 Nov 2015 14:40:37 -0500

> + static char path_buf[PATH_MAX]; /* protected by kernfs_mutex */
> + int len = strlen(path);
...
> + if (len >= PATH_MAX)
> + return NULL;
> +
> + memcpy(path_buf, path, len + 1);

static char path_buf[PATH_MAX]; /* protected by kernfs_mutex */
int len = strlcpy(path_buf, path, PATH_MAX);
...
if (len >= PATH_MAX)
return NULL;

2015-11-17 21:22:49

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 2/5] kernfs: implement kernfs_walk_and_get()

On Tue, Nov 17, 2015 at 04:20:40PM -0500, David Miller wrote:
> From: Tejun Heo <[email protected]>
> Date: Tue, 17 Nov 2015 14:40:37 -0500
>
> > + static char path_buf[PATH_MAX]; /* protected by kernfs_mutex */
> > + int len = strlen(path);
> ...
> > + if (len >= PATH_MAX)
> > + return NULL;
> > +
> > + memcpy(path_buf, path, len + 1);
>
> static char path_buf[PATH_MAX]; /* protected by kernfs_mutex */
> int len = strlcpy(path_buf, path, PATH_MAX);
> ...
> if (len >= PATH_MAX)
> return NULL;

That's a lot better. Will update.

Thanks.

--
tejun

2015-11-17 21:25:59

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 4/5] sock, cgroup: add sock->sk_cgroup

From: Tejun Heo <[email protected]>
Date: Tue, 17 Nov 2015 14:40:39 -0500

> In preparation, this patch adds sock->sk_cgroup which points to the
> associated cgroup. A sock is associated on creation and stays
> associated to the same cgroup until freed; unfortunately, this ends up
> adding another cgroup field to struct sock on top of sk_cgrp_prioidx
> and sk_classid. I tried to think of a way to somehow overload the
> existing fields but couldn't come up with a reasonable one.

sk->sk_cgrp_prioidx is simply sk->sk_cgroup->id, is it not?

We really need to consolidate this before we stuff even more members
into the socket for control group support, sorry.

2015-11-17 21:31:33

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 4/5] sock, cgroup: add sock->sk_cgroup

Hello, David.

On Tue, Nov 17, 2015 at 04:25:54PM -0500, David Miller wrote:
> > In preparation, this patch adds sock->sk_cgroup which points to the
> > associated cgroup. A sock is associated on creation and stays
> > associated to the same cgroup until freed; unfortunately, this ends up
> > adding another cgroup field to struct sock on top of sk_cgrp_prioidx
> > and sk_classid. I tried to think of a way to somehow overload the
> > existing fields but couldn't come up with a reasonable one.
>
> sk->sk_cgrp_prioidx is simply sk->sk_cgroup->id, is it not?

Unfortunately, sk->sk_cgrp_prioidx is an arbitrary value which can be
configured through "net_cls.classid". :(

> We really need to consolidate this before we stuff even more members
> into the socket for control group support, sorry.

Yeah, it is messy. I'll see if I can come up with a non-crazy way to
combine the other two fields with ->sk_cgroup.

Thanks.

--
tejun

2015-11-17 21:46:54

by Daniel Borkmann

[permalink] [raw]
Subject: Re: [PATCH 4/5] sock, cgroup: add sock->sk_cgroup

Hi Tejun,

On 11/17/2015 08:40 PM, Tejun Heo wrote:
...
> While it is possible to solve these issues from controller side by
> implementing hierarchical allowable ranges in both controllers, it
> would involve quite a bit of complexity in the controllers and further
> obfuscate network configuration as it becomes even more difficult to
> tell what's actually being configured looking from the network side.
> While not much can be done for v1 at this point, as membership
> handling is sane on cgroup v2, it'd be better to make cgroup matching
> behave like other network matches and classifiers than introducing
> further complications.
>
> In preparation, this patch adds sock->sk_cgroup which points to the
> associated cgroup. A sock is associated on creation and stays
> associated to the same cgroup until freed; unfortunately, this ends up
> adding another cgroup field to struct sock on top of sk_cgrp_prioidx
> and sk_classid. I tried to think of a way to somehow overload the
> existing fields but couldn't come up with a reasonable one. For the
> longer term, the fields can be rearranged so that disabling prio and
> cls controllers reduce the size of the struct.

Do you see a way forward where the new behavior could be enabled f.e.
as an extra mount option (that long-term would be made default, while
deprecating the current behavior) on net_cls et al? There are various
more users at least on the net_cls side (nft and tc as well). Would be
really great, if sk_cgroup could abstract that somehow away for all of
them w/o adding a second version to all users.

Best,
Daniel

2015-11-17 21:46:43

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 4/5] sock, cgroup: add sock->sk_cgroup

From: Tejun Heo <[email protected]>
Date: Tue, 17 Nov 2015 16:31:26 -0500

> I'll see if I can come up with a non-crazy way to combine the other
> two fields with ->sk_cgroup.

Thank you.

2015-11-17 21:48:59

by Daniel Borkmann

[permalink] [raw]
Subject: Re: [PATCH 4/5] sock, cgroup: add sock->sk_cgroup

On 11/17/2015 10:31 PM, Tejun Heo wrote:
> Hello, David.
>
> On Tue, Nov 17, 2015 at 04:25:54PM -0500, David Miller wrote:
>>> In preparation, this patch adds sock->sk_cgroup which points to the
>>> associated cgroup. A sock is associated on creation and stays
>>> associated to the same cgroup until freed; unfortunately, this ends up
>>> adding another cgroup field to struct sock on top of sk_cgrp_prioidx
>>> and sk_classid. I tried to think of a way to somehow overload the
>>> existing fields but couldn't come up with a reasonable one.
>>
>> sk->sk_cgrp_prioidx is simply sk->sk_cgroup->id, is it not?
>
> Unfortunately, sk->sk_cgrp_prioidx is an arbitrary value which can be
> configured through "net_cls.classid". :(

Hmm, isn't net_prio independent of net_cls?

>> We really need to consolidate this before we stuff even more members
>> into the socket for control group support, sorry.
>
> Yeah, it is messy. I'll see if I can come up with a non-crazy way to
> combine the other two fields with ->sk_cgroup.
>
> Thanks.
>

2015-11-17 22:18:05

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 4/5] sock, cgroup: add sock->sk_cgroup

Hello, Daniel.

On Tue, Nov 17, 2015 at 10:48:46PM +0100, Daniel Borkmann wrote:
> >Unfortunately, sk->sk_cgrp_prioidx is an arbitrary value which can be
> >configured through "net_cls.classid". :(
>
> Hmm, isn't net_prio independent of net_cls?

Ah, yeah, I was mixing up the two but the story is the same for both.
net_prio configures the prioidx and net_cls the classid and the two
may or may not (but more likely not) be on the same hierarchy and
definitely not on the cgroup2 hierarchy, so you can't really access
them through a single ->sk_cgroup pointer. After all, that's why we
ended up with these controllers.

Thanks.

--
tejun

2015-11-17 22:21:34

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 4/5] sock, cgroup: add sock->sk_cgroup

Hello, Daniel.

On Tue, Nov 17, 2015 at 10:46:30PM +0100, Daniel Borkmann wrote:
> Do you see a way forward where the new behavior could be enabled f.e.
> as an extra mount option (that long-term would be made default, while
> deprecating the current behavior) on net_cls et al? There are various
> more users at least on the net_cls side (nft and tc as well). Would be
> really great, if sk_cgroup could abstract that somehow away for all of
> them w/o adding a second version to all users.

I'm not really sure I'm following what you mean but no matter how we
go about it net_cls and prio won't be broken. I likely will end up
making the two sets mutually exclusive once configured run-time but
there's no reason to use both at the same time anyway.

Thanks.

--
tejun

2015-11-17 22:49:00

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [PATCH 2/5] kernfs: implement kernfs_walk_and_get()


On Tuesday 2015-11-17 22:20, David Miller wrote:
>> + static char path_buf[PATH_MAX]; /* protected by kernfs_mutex */
>> + int len = strlen(path);
> ...
>> + if (len >= PATH_MAX)
>> + return NULL;
>> +
>> + memcpy(path_buf, path, len + 1);
>
> static char path_buf[PATH_MAX]; /* protected by kernfs_mutex */
> int len = strlcpy(path_buf, path, PATH_MAX);
> ...
> if (len >= PATH_MAX)
> return NULL;

if (len < 0 || len >= PATH_MAX)

strlcpy returns a size_t, which, when coerced into an int, could lead to
negative numbers. In that sense, "size_t len" probably seems like an even
better bet yet.

2015-11-17 22:54:58

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [PATCH 1/5] cgroup: record ancestor IDs and reimplement cgroup_is_descendant() using it


On Tuesday 2015-11-17 20:40, Tejun Heo wrote:
>+static inline bool cgroup_is_descendant(struct cgroup *cgrp,
>+ struct cgroup *ancestor)

(const struct group *cgrp, const struct group *ancestor)

>+{
>+ if (cgrp->root != ancestor->root || cgrp->level < ancestor->level)
>+ return false;
>+ return cgrp->ancestor_ids[ancestor->level] == ancestor->id;
>+}

2015-11-17 22:56:05

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [PATCH 5/5] netfilter: implement xt_cgroup2 match


On Tuesday 2015-11-17 20:40, Tejun Heo wrote:
>@@ -0,0 +1,14 @@
>+#ifndef _XT_CGROUP2_H
>+#define _XT_CGROUP2_H
>+
>+#include <linux/types.h>
>+
>+struct xt_cgroup2_info {
>+ char path[PATH_MAX];
>+ __u8 invert;

Should <linux/limits.h> be included? (For PATH_MAX)

>+ /* kernel internal data */
>+ void *priv;
>+};

void *priv __attribute__((aligned(8)));

>+static bool cgroup2_mt(const struct sk_buff *skb, struct xt_action_param *par)
>+{
>+ const struct xt_cgroup2_info *info = par->matchinfo;
>+ struct cgroup *ancestor = info->priv;

There is no modification planned on the cgroup, so this too can be const struct
cgroup * if-and-when cgroup_is_descendant is made to take const ptrs as well.

>+ if (!skb->sk || !sk_fullsock(skb->sk))
>+ return false;
>+
>+ return cgroup_is_descendant(skb->sk->sk_cgroup, ancestor) ^ info->invert;
>+}

2015-11-17 23:02:08

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [PATCH iptables] libxt_cgroup2: add support for cgroup2 path matching


On Tuesday 2015-11-17 20:42, Tejun Heo wrote:
>+static void cgroup2_save(const void *ip, const struct xt_entry_match *match)
>+{
>+ const struct xt_cgroup2_info *info = (void *)match->data;
>+
>+ printf("%s --path %s", info->invert ? " !" : "", info->path);
>+}

Can cgroup path names contain anything fancy, like spaces, backslashes, etc.?
If so, xtables_save_string() will be needed here.

>+static struct xtables_match cgroup2_match = {
>+ .family = NFPROTO_UNSPEC,
>+ .name = "cgroup2",
>+ .version = XTABLES_VERSION,
>+ .size = XT_ALIGN(sizeof(struct xt_cgroup2_info)),
>+ .userspacesize = XT_ALIGN(sizeof(struct xt_cgroup2_info)),

userspacesize must not include xt_cgroup2_info.priv.
Change to offsetof(...), cf. other .c modules which do this.


>+++ b/extensions/libxt_cgroup2.man
>@@ -0,0 +1,24 @@
>+.TP
>+[\fB!\fP] \fB\-\-path\fP \fIpath\fP
>+Match cgroup2 membership.
>+
>+Each socket is associated with the v2 cgroup of the creating process.
>+This matches packets coming from or going to all sockets in the
>+sub-hierarchy of the specified path. The path should be relative to
>+the root of the cgroup2 hierarchy. Can be used in the OUTPUT and
>+INPUT chains to assign particular firewall policies for aggregated
>+processes on the system. This allows for more fine-grained firewall
>+policies that only match for a subset of the system's processes.
>+
>+\fBIMPORTANT\fP: when being used in the INPUT chain, the cgroup2
>+matcher is currently only of limited functionality, meaning it
>+will only match on packets that are processed for local sockets
>+through early socket demuxing. Therefore, general usage on the
>+INPUT chain is disadviced unless the implications are well

is disadviced (sic) -> is not advised

2015-11-17 23:03:43

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 1/5] cgroup: record ancestor IDs and reimplement cgroup_is_descendant() using it

On Tue, Nov 17, 2015 at 11:54:54PM +0100, Jan Engelhardt wrote:
>
> On Tuesday 2015-11-17 20:40, Tejun Heo wrote:
> >+static inline bool cgroup_is_descendant(struct cgroup *cgrp,
> >+ struct cgroup *ancestor)
>
> (const struct group *cgrp, const struct group *ancestor)

I'm not sure about using const on complex data structures. It often
ends up causing more mess than being helpful. It's never consistent
and often can't be - e.g. you throw in an iteration macro and then
realize that there need to be two separate variants of iteration
macros for consts and !consts after propagating const through the call
chains and then end up doing forced casts.

Thanks.

--
tejun

2015-11-17 23:09:30

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH iptables] libxt_cgroup2: add support for cgroup2 path matching

Hello,

On Wed, Nov 18, 2015 at 12:02:01AM +0100, Jan Engelhardt wrote:
> On Tuesday 2015-11-17 20:42, Tejun Heo wrote:
> >+static void cgroup2_save(const void *ip, const struct xt_entry_match *match)
> >+{
> >+ const struct xt_cgroup2_info *info = (void *)match->data;
> >+
> >+ printf("%s --path %s", info->invert ? " !" : "", info->path);
> >+}
>
> Can cgroup path names contain anything fancy, like spaces, backslashes, etc.?
> If so, xtables_save_string() will be needed here.

Will update.

> >+static struct xtables_match cgroup2_match = {
> >+ .family = NFPROTO_UNSPEC,
> >+ .name = "cgroup2",
> >+ .version = XTABLES_VERSION,
> >+ .size = XT_ALIGN(sizeof(struct xt_cgroup2_info)),
> >+ .userspacesize = XT_ALIGN(sizeof(struct xt_cgroup2_info)),
>
> userspacesize must not include xt_cgroup2_info.priv.
> Change to offsetof(...), cf. other .c modules which do this.

Ah, okay.

...
> >+\fBIMPORTANT\fP: when being used in the INPUT chain, the cgroup2
> >+matcher is currently only of limited functionality, meaning it
> >+will only match on packets that are processed for local sockets
> >+through early socket demuxing. Therefore, general usage on the
> >+INPUT chain is disadviced unless the implications are well
>
> is disadviced (sic) -> is not advised

Will update. That's copied verbatim from xt_cgroup tho. Will create
a patch to update that too.

Thanks.

--
tejun