2020-07-02 00:28:19

by Matt Bennett

[permalink] [raw]
Subject: [PATCH 0/5] RFC: connector: Add network namespace awareness

Previously the connector functionality could only be used by processes running in the
default network namespace. This meant that any process that uses the connector functionality
could not operate correctly when run inside a container. This is a draft patch series that
attempts to now allow this functionality outside of the default network namespace.

I see this has been discussed previously [1], but am not sure how my changes relate to all
of the topics discussed there and/or if there are any unintended side effects from my draft
changes.

Thanks.

[1] https://marc.info/?l=linux-kernel&m=150806196728365&w=2

Matt Bennett (5):
connector: Use task pid helpers
connector: Use 'current_user_ns' function
connector: Ensure callback entry is released
connector: Prepare for supporting multiple namespaces
connector: Create connector per namespace

Documentation/driver-api/connector.rst | 6 +-
drivers/connector/cn_proc.c | 110 +++++++-------
drivers/connector/cn_queue.c | 9 +-
drivers/connector/connector.c | 192 ++++++++++++++++++++-----
drivers/hv/hv_fcopy.c | 1 +
drivers/hv/hv_utils_transport.c | 6 +-
drivers/md/dm-log-userspace-transfer.c | 6 +-
drivers/video/fbdev/uvesafb.c | 8 +-
drivers/w1/w1_netlink.c | 19 +--
include/linux/connector.h | 38 +++--
include/net/net_namespace.h | 4 +
kernel/exit.c | 2 +-
samples/connector/cn_test.c | 6 +-
13 files changed, 286 insertions(+), 121 deletions(-)

--
2.27.0


2020-07-02 00:28:27

by Matt Bennett

[permalink] [raw]
Subject: [PATCH 2/5] connector: Use 'current_user_ns' function

In preparation for supporting the connector outside of the default
network namespace we switch to using this function now. As the connector
is still only supported in the default namespace this change is a no-op.

Signed-off-by: Matt Bennett <[email protected]>
---
drivers/connector/cn_proc.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 36a7823c56ec..d90aea555a21 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -139,11 +139,11 @@ void proc_id_connector(struct task_struct *task, int which_id)
rcu_read_lock();
cred = __task_cred(task);
if (which_id == PROC_EVENT_UID) {
- ev->event_data.id.r.ruid = from_kuid_munged(&init_user_ns, cred->uid);
- ev->event_data.id.e.euid = from_kuid_munged(&init_user_ns, cred->euid);
+ ev->event_data.id.r.ruid = from_kuid_munged(current_user_ns(), cred->uid);
+ ev->event_data.id.e.euid = from_kuid_munged(current_user_ns(), cred->euid);
} else if (which_id == PROC_EVENT_GID) {
- ev->event_data.id.r.rgid = from_kgid_munged(&init_user_ns, cred->gid);
- ev->event_data.id.e.egid = from_kgid_munged(&init_user_ns, cred->egid);
+ ev->event_data.id.r.rgid = from_kgid_munged(current_user_ns(), cred->gid);
+ ev->event_data.id.e.egid = from_kgid_munged(current_user_ns(), cred->egid);
} else {
rcu_read_unlock();
return;
@@ -362,7 +362,7 @@ static void cn_proc_mcast_ctl(struct cn_msg *msg,
return;

/* Can only change if privileged. */
- if (!__netlink_ns_capable(nsp, &init_user_ns, CAP_NET_ADMIN)) {
+ if (!__netlink_ns_capable(nsp, current_user_ns(), CAP_NET_ADMIN)) {
err = EPERM;
goto out;
}
--
2.27.0

2020-07-02 00:28:31

by Matt Bennett

[permalink] [raw]
Subject: [PATCH 5/5] connector: Create connector per namespace

Move to storing the connector instance per network namespace. In doing
so the ability to use the connector functionality outside the default
namespace is now available.

Signed-off-by: Matt Bennett <[email protected]>
---
drivers/connector/cn_proc.c | 49 ++++++----
drivers/connector/connector.c | 171 ++++++++++++++++++++++++++++------
drivers/hv/hv_fcopy.c | 1 +
include/linux/connector.h | 14 ++-
include/net/net_namespace.h | 4 +
kernel/exit.c | 2 +-
6 files changed, 190 insertions(+), 51 deletions(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 9202be177a30..661d921fd146 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -17,6 +17,7 @@
#include <linux/atomic.h>
#include <linux/pid_namespace.h>
#include <net/net_namespace.h>
+#include <linux/netlink.h>

#include <linux/cn_proc.h>
#include <linux/local_lock.h>
@@ -37,7 +38,6 @@ static inline struct cn_msg *buffer_to_cn_msg(__u8 *buffer)
return (struct cn_msg *)(buffer + 4);
}

-static atomic_t proc_event_num_listeners = ATOMIC_INIT(0);
static struct cb_id cn_proc_event_id = { CN_IDX_PROC, CN_VAL_PROC };

/* local_event.count is used as the sequence number of the netlink message */
@@ -51,6 +51,9 @@ static DEFINE_PER_CPU(struct local_event, local_event) = {

static inline void send_msg(struct cn_msg *msg)
{
+ int ret = 0;
+ struct net *net = current->nsproxy->net_ns;
+
local_lock(&local_event.lock);

msg->seq = __this_cpu_inc_return(local_event.count) - 1;
@@ -62,7 +65,9 @@ static inline void send_msg(struct cn_msg *msg)
*
* If cn_netlink_send() fails, the data is not sent.
*/
- cn_netlink_send(&init_net, msg, 0, CN_IDX_PROC, GFP_NOWAIT);
+ ret = cn_netlink_send(net, msg, 0, CN_IDX_PROC, GFP_NOWAIT);
+ if (ret == -ESRCH && netlink_has_listeners(net->cdev.nls, CN_IDX_PROC) == 0)
+ atomic_set(&(net->cdev.proc_event_num_listeners), 0);

local_unlock(&local_event.lock);
}
@@ -73,8 +78,9 @@ void proc_fork_connector(struct task_struct *task)
struct proc_event *ev;
__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
struct task_struct *parent;
+ struct net *net = current->nsproxy->net_ns;

- if (atomic_read(&proc_event_num_listeners) < 1)
+ if (atomic_read(&(net->cdev.proc_event_num_listeners)) < 1)
return;

msg = buffer_to_cn_msg(buffer);
@@ -102,8 +108,9 @@ void proc_exec_connector(struct task_struct *task)
struct cn_msg *msg;
struct proc_event *ev;
__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
+ struct net *net = current->nsproxy->net_ns;

- if (atomic_read(&proc_event_num_listeners) < 1)
+ if (atomic_read(&(net->cdev.proc_event_num_listeners)) < 1)
return;

msg = buffer_to_cn_msg(buffer);
@@ -127,8 +134,9 @@ void proc_id_connector(struct task_struct *task, int which_id)
struct proc_event *ev;
__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
const struct cred *cred;
+ struct net *net = current->nsproxy->net_ns;

- if (atomic_read(&proc_event_num_listeners) < 1)
+ if (atomic_read(&(net->cdev.proc_event_num_listeners)) < 1)
return;

msg = buffer_to_cn_msg(buffer);
@@ -164,8 +172,9 @@ void proc_sid_connector(struct task_struct *task)
struct cn_msg *msg;
struct proc_event *ev;
__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
+ struct net *net = current->nsproxy->net_ns;

- if (atomic_read(&proc_event_num_listeners) < 1)
+ if (atomic_read(&(net->cdev.proc_event_num_listeners)) < 1)
return;

msg = buffer_to_cn_msg(buffer);
@@ -188,8 +197,9 @@ void proc_ptrace_connector(struct task_struct *task, int ptrace_id)
struct cn_msg *msg;
struct proc_event *ev;
__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
+ struct net *net = current->nsproxy->net_ns;

- if (atomic_read(&proc_event_num_listeners) < 1)
+ if (atomic_read(&(net->cdev.proc_event_num_listeners)) < 1)
return;

msg = buffer_to_cn_msg(buffer);
@@ -220,8 +230,9 @@ void proc_comm_connector(struct task_struct *task)
struct cn_msg *msg;
struct proc_event *ev;
__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
+ struct net *net = current->nsproxy->net_ns;

- if (atomic_read(&proc_event_num_listeners) < 1)
+ if (atomic_read(&(net->cdev.proc_event_num_listeners)) < 1)
return;

msg = buffer_to_cn_msg(buffer);
@@ -246,8 +257,9 @@ void proc_coredump_connector(struct task_struct *task)
struct proc_event *ev;
struct task_struct *parent;
__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
+ struct net *net = current->nsproxy->net_ns;

- if (atomic_read(&proc_event_num_listeners) < 1)
+ if (atomic_read(&(net->cdev.proc_event_num_listeners)) < 1)
return;

msg = buffer_to_cn_msg(buffer);
@@ -279,8 +291,9 @@ void proc_exit_connector(struct task_struct *task)
struct proc_event *ev;
struct task_struct *parent;
__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
+ struct net *net = current->nsproxy->net_ns;

- if (atomic_read(&proc_event_num_listeners) < 1)
+ if (atomic_read(&(net->cdev.proc_event_num_listeners)) < 1)
return;

msg = buffer_to_cn_msg(buffer);
@@ -321,8 +334,9 @@ static void cn_proc_ack(int err, int rcvd_seq, int rcvd_ack)
struct cn_msg *msg;
struct proc_event *ev;
__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
+ struct net *net = current->nsproxy->net_ns;

- if (atomic_read(&proc_event_num_listeners) < 1)
+ if (atomic_read(&(net->cdev.proc_event_num_listeners)) < 1)
return;

msg = buffer_to_cn_msg(buffer);
@@ -353,13 +367,10 @@ static void cn_proc_mcast_ctl(struct net *net, struct cn_msg *msg,
if (msg->len != sizeof(*mc_op))
return;

- /*
- * Events are reported with respect to the initial pid
- * and user namespaces so ignore requestors from
- * other namespaces.
+ /*
+ * Events are reported with respect to network namespaces.
*/
- if ((current_user_ns() != &init_user_ns) ||
- (task_active_pid_ns(current) != &init_pid_ns))
+ if (current->nsproxy->net_ns != net)
return;

/* Can only change if privileged. */
@@ -371,10 +382,10 @@ static void cn_proc_mcast_ctl(struct net *net, struct cn_msg *msg,
mc_op = (enum proc_cn_mcast_op *)msg->data;
switch (*mc_op) {
case PROC_CN_MCAST_LISTEN:
- atomic_inc(&proc_event_num_listeners);
+ atomic_inc(&(net->cdev.proc_event_num_listeners));
break;
case PROC_CN_MCAST_IGNORE:
- atomic_dec(&proc_event_num_listeners);
+ atomic_dec(&(net->cdev.proc_event_num_listeners));
break;
default:
err = EINVAL;
diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index 82fcaa4d8be3..30efcf39751f 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -26,9 +26,7 @@ MODULE_AUTHOR("Evgeniy Polyakov <[email protected]>");
MODULE_DESCRIPTION("Generic userspace <-> kernelspace connector.");
MODULE_ALIAS_NET_PF_PROTO(PF_NETLINK, NETLINK_CONNECTOR);

-static struct cn_dev cdev;
-
-static int cn_already_initialized;
+static DEFINE_MUTEX(cn_mutex);

/*
* Sends mult (multiple) cn_msg at a time.
@@ -66,10 +64,13 @@ int cn_netlink_send_mult(struct net *net, struct cn_msg *msg, u16 len,
struct sk_buff *skb;
struct nlmsghdr *nlh;
struct cn_msg *data;
- struct cn_dev *dev = &cdev;
+ struct cn_dev *dev = &(net->cdev);
u32 group = 0;
int found = 0;

+ if (!msg || len < 0)
+ return -EINVAL;
+
if (portid || __group) {
group = __group;
} else {
@@ -133,7 +134,7 @@ static int cn_call_callback(struct net *net, struct sk_buff *skb)
{
struct nlmsghdr *nlh;
struct cn_callback_entry *i, *cbq = NULL;
- struct cn_dev *dev = &cdev;
+ struct cn_dev *dev = &net->cdev;
struct cn_msg *msg = nlmsg_data(nlmsg_hdr(skb));
struct netlink_skb_parms *nsp = &NETLINK_CB(skb);
int err = -ENODEV;
@@ -168,7 +169,7 @@ static int cn_call_callback(struct net *net, struct sk_buff *skb)
*
* It checks skb, netlink header and msg sizes, and calls callback helper.
*/
-static void cn_rx_skb(struct sk_buff *skb)
+static void __cn_rx_skb(struct sk_buff *skb)
{
struct nlmsghdr *nlh;
int len, err;
@@ -190,6 +191,13 @@ static void cn_rx_skb(struct sk_buff *skb)
}
}

+static void cn_rx_skb(struct sk_buff *skb)
+{
+ mutex_lock(&cn_mutex);
+ __cn_rx_skb(skb);
+ mutex_unlock(&cn_mutex);
+}
+
/*
* Callback add routing - adds callback with given ID and name.
* If there is registered callback with the same ID it will not be added.
@@ -200,20 +208,47 @@ int cn_add_callback(struct cb_id *id, const char *name,
void (*callback)(struct net *, struct cn_msg *,
struct netlink_skb_parms *))
{
- int err;
- struct cn_dev *dev = &cdev;
-
- if (!cn_already_initialized)
- return -EAGAIN;
+ int err = -EINVAL;
+ struct net *net = NULL;
+ struct cn_dev *dev = NULL;

- err = cn_queue_add_callback(dev->cbdev, name, id, callback);
- if (err)
+ if (!id || !name || !callback)
return err;

- return 0;
+ down_read(&net_rwsem);
+ for_each_net(net) {
+ dev = &net->cdev;
+ err = cn_queue_add_callback(dev->cbdev, name, id, callback);
+ if (err)
+ break;
+ }
+
+ if (err) {
+ for_each_net(net) {
+ dev = &net->cdev;
+ cn_queue_del_callback(dev->cbdev, id);
+ }
+ }
+ up_read(&net_rwsem);
+
+ return err;
}
EXPORT_SYMBOL_GPL(cn_add_callback);

+int cn_add_callback_one(struct net *net, struct cb_id *id, const char *name,
+ void (*callback)(struct net *, struct cn_msg *,
+ struct netlink_skb_parms *))
+{
+ struct cn_dev *dev = NULL;
+
+ if (!net || !id || !name || !callback)
+ return -EINVAL;
+
+ dev = &(net->cdev);
+ return cn_queue_add_callback(dev->cbdev, name, id, callback);
+}
+EXPORT_SYMBOL_GPL(cn_add_callback_one);
+
/*
* Callback remove routing - removes callback
* with given ID.
@@ -224,15 +259,25 @@ EXPORT_SYMBOL_GPL(cn_add_callback);
*/
void cn_del_callback(struct cb_id *id)
{
- struct cn_dev *dev = &cdev;
+ struct net *net = NULL;
+ struct cn_dev *dev = NULL;
+
+ if (!id)
+ return;

- cn_queue_del_callback(dev->cbdev, id);
+ down_read(&net_rwsem);
+ for_each_net(net) {
+ dev = &net->cdev;
+ cn_queue_del_callback(dev->cbdev, id);
+ }
+ up_read(&net_rwsem);
}
EXPORT_SYMBOL_GPL(cn_del_callback);

static int __maybe_unused cn_proc_show(struct seq_file *m, void *v)
{
- struct cn_queue_dev *dev = cdev.cbdev;
+ struct net *net = seq_file_single_net(m);
+ struct cn_queue_dev *dev = net->cdev.cbdev;
struct cn_callback_entry *cbq;

seq_printf(m, "Name ID\n");
@@ -251,15 +296,62 @@ static int __maybe_unused cn_proc_show(struct seq_file *m, void *v)
return 0;
}

-static int cn_init(void)
+static int init_cn_net(struct net *net)
+{
+ int ret = 0;
+ struct cn_dev *init_dev = &(init_net.cdev);
+ struct cn_queue_dev *cbdev = init_dev->cbdev;
+
+ struct cn_callback_entry *cbq = NULL;
+ struct cn_callback_entry_ex *cbq_ex = NULL;
+ struct cn_callback_entry_ex *tmp = NULL;
+ LIST_HEAD(head);
+
+ if (!net)
+ return -EINVAL;
+
+ spin_lock_bh(&cbdev->queue_lock);
+ list_for_each_entry(cbq, &cbdev->queue_list, callback_entry) {
+ cbq_ex = kmalloc(sizeof(*cbq_ex), GFP_ATOMIC);
+ if (!cbq_ex) {
+ ret = -ENOMEM;
+ break;
+ }
+ INIT_LIST_HEAD(&(cbq_ex->list));
+
+ memcpy(&cbq_ex->id, &(cbq->id.id), sizeof(struct cb_id));
+ memcpy(cbq_ex->name, &(cbq->id.name), CN_CBQ_NAMELEN);
+ cbq_ex->callback = cbq->callback;
+
+ list_add_tail(&(cbq_ex->list), &head);
+ }
+ spin_unlock_bh(&cbdev->queue_lock);
+
+ if (ret < 0) {
+ list_for_each_entry_safe(cbq_ex, tmp, &head, list) {
+ kfree(cbq_ex);
+ }
+ } else {
+ list_for_each_entry_safe(cbq_ex, tmp, &head, list) {
+ cn_add_callback_one(net, &(cbq_ex->id), cbq_ex->name,
+ cbq_ex->callback);
+ kfree(cbq_ex);
+ }
+ }
+
+ return ret;
+}
+
+static int __net_init cn_init(struct net *net)
{
- struct cn_dev *dev = &cdev;
+ int ret = 0;
+ struct cn_dev *dev = &net->cdev;
struct netlink_kernel_cfg cfg = {
.groups = CN_NETLINK_USERS + 0xf,
.input = cn_rx_skb,
};

- dev->nls = netlink_kernel_create(&init_net, NETLINK_CONNECTOR, &cfg);
+ dev->nls = netlink_kernel_create(net, NETLINK_CONNECTOR, &cfg);
if (!dev->nls)
return -EIO;

@@ -268,25 +360,44 @@ static int cn_init(void)
netlink_kernel_release(dev->nls);
return -EINVAL;
}
+ atomic_set(&(dev->proc_event_num_listeners), 0);

- cn_already_initialized = 1;
+ ret = init_cn_net(net);
+ if (ret < 0) {
+ cn_queue_free_dev(dev->cbdev);
+ netlink_kernel_release(dev->nls);
+ return ret;
+ }

- proc_create_single("connector", S_IRUGO, init_net.proc_net, cn_proc_show);
+ proc_create_net_single("connector", 0444, net->proc_net, cn_proc_show,
+ NULL);

return 0;
}

-static void cn_fini(void)
+static void __net_exit cn_fini(struct net *net)
{
- struct cn_dev *dev = &cdev;
-
- cn_already_initialized = 0;
-
- remove_proc_entry("connector", init_net.proc_net);
+ struct cn_dev *dev = &net->cdev;

+ remove_proc_entry("connector", net->proc_net);
cn_queue_free_dev(dev->cbdev);
netlink_kernel_release(dev->nls);
}

-subsys_initcall(cn_init);
-module_exit(cn_fini);
+static struct pernet_operations cn_netlink_net_ops = {
+ .init = cn_init,
+ .exit = cn_fini,
+};
+
+static int __init connector_init(void)
+{
+ return register_pernet_subsys(&cn_netlink_net_ops);
+}
+
+static void __exit connector_exit(void)
+{
+ unregister_pernet_subsys(&cn_netlink_net_ops);
+}
+
+subsys_initcall(connector_init);
+module_exit(connector_exit);
diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
index 5040d7e0cd9e..a7151296af5c 100644
--- a/drivers/hv/hv_fcopy.c
+++ b/drivers/hv/hv_fcopy.c
@@ -13,6 +13,7 @@
#include <linux/workqueue.h>
#include <linux/hyperv.h>
#include <linux/sched.h>
+#include <linux/slab.h>
#include <asm/hyperv-tlfs.h>

#include "hyperv_vmbus.h"
diff --git a/include/linux/connector.h b/include/linux/connector.h
index 8e9385eb18f8..17febd6946ce 100644
--- a/include/linux/connector.h
+++ b/include/linux/connector.h
@@ -14,11 +14,14 @@
#include <linux/list.h>
#include <linux/workqueue.h>

-#include <net/sock.h>
#include <uapi/linux/connector.h>

#define CN_CBQ_NAMELEN 32

+struct net;
+struct sock;
+struct netlink_skb_parms;
+
struct cn_queue_dev {
atomic_t refcnt;
unsigned char name[CN_CBQ_NAMELEN];
@@ -46,11 +49,20 @@ struct cn_callback_entry {
u32 seq, group;
};

+struct cn_callback_entry_ex {
+ struct list_head list;
+ struct cb_id id;
+ unsigned char name[CN_CBQ_NAMELEN];
+ void (*callback)(struct net *net, struct cn_msg *cn_msg,
+ struct netlink_skb_parms *nsp);
+};
+
struct cn_dev {
struct cb_id id;

u32 seq, groups;
struct sock *nls;
+ atomic_t proc_event_num_listeners;

struct cn_queue_dev *cbdev;
};
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 2ee5901bec7a..312972fb2dcc 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -38,6 +38,7 @@
#include <linux/idr.h>
#include <linux/skbuff.h>
#include <linux/notifier.h>
+#include <linux/connector.h>

struct user_namespace;
struct proc_dir_entry;
@@ -187,6 +188,9 @@ struct net {
#endif
#if IS_ENABLED(CONFIG_CRYPTO_USER)
struct sock *crypto_nlsk;
+#endif
+#if IS_ENABLED(CONFIG_CONNECTOR)
+ struct cn_dev cdev;
#endif
struct sock *diag_nlsk;
} __randomize_layout;
diff --git a/kernel/exit.c b/kernel/exit.c
index 727150f28103..976fd6032024 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -788,6 +788,7 @@ void __noreturn do_exit(long code)

tsk->exit_code = code;
taskstats_exit(tsk, group_dead);
+ proc_exit_connector(tsk);

exit_mm();

@@ -824,7 +825,6 @@ void __noreturn do_exit(long code)

exit_tasks_rcu_start();
exit_notify(tsk, group_dead);
- proc_exit_connector(tsk);
mpol_put_task_policy(tsk);
#ifdef CONFIG_FUTEX
if (unlikely(current->pi_state_cache))
--
2.27.0

2020-07-02 00:28:37

by Matt Bennett

[permalink] [raw]
Subject: [PATCH 4/5] connector: Prepare for supporting multiple namespaces

Extend the existing function definitions / call sites to start
passing the network namespace. For now we still only pass the
default namespace.

Signed-off-by: Matt Bennett <[email protected]>
---
Documentation/driver-api/connector.rst | 6 +++---
drivers/connector/cn_proc.c | 5 +++--
drivers/connector/cn_queue.c | 5 +++--
drivers/connector/connector.c | 21 ++++++++++++---------
drivers/hv/hv_utils_transport.c | 6 ++++--
drivers/md/dm-log-userspace-transfer.c | 6 ++++--
drivers/video/fbdev/uvesafb.c | 8 +++++---
drivers/w1/w1_netlink.c | 19 +++++++++++--------
include/linux/connector.h | 24 ++++++++++++++++--------
samples/connector/cn_test.c | 6 ++++--
10 files changed, 65 insertions(+), 41 deletions(-)

diff --git a/Documentation/driver-api/connector.rst b/Documentation/driver-api/connector.rst
index c100c7482289..4fb1f73d76ad 100644
--- a/Documentation/driver-api/connector.rst
+++ b/Documentation/driver-api/connector.rst
@@ -25,9 +25,9 @@ handling, etc... The Connector driver allows any kernelspace agents to use
netlink based networking for inter-process communication in a significantly
easier way::

- int cn_add_callback(struct cb_id *id, char *name, void (*callback) (struct cn_msg *, struct netlink_skb_parms *));
- void cn_netlink_send_multi(struct cn_msg *msg, u16 len, u32 portid, u32 __group, int gfp_mask);
- void cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, int gfp_mask);
+ int cn_add_callback(struct cb_id *id, char *name, void (*callback) (struct net *, struct cn_msg *, struct netlink_skb_parms *));
+ void cn_netlink_send_multi(struct net *net, struct cn_msg *msg, u16 len, u32 portid, u32 __group, int gfp_mask);
+ void cn_netlink_send(struct net *net, struct cn_msg *msg, u32 portid, u32 __group, int gfp_mask);

struct cb_id
{
diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index d90aea555a21..9202be177a30 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -16,6 +16,7 @@
#include <linux/ptrace.h>
#include <linux/atomic.h>
#include <linux/pid_namespace.h>
+#include <net/net_namespace.h>

#include <linux/cn_proc.h>
#include <linux/local_lock.h>
@@ -61,7 +62,7 @@ static inline void send_msg(struct cn_msg *msg)
*
* If cn_netlink_send() fails, the data is not sent.
*/
- cn_netlink_send(msg, 0, CN_IDX_PROC, GFP_NOWAIT);
+ cn_netlink_send(&init_net, msg, 0, CN_IDX_PROC, GFP_NOWAIT);

local_unlock(&local_event.lock);
}
@@ -343,7 +344,7 @@ static void cn_proc_ack(int err, int rcvd_seq, int rcvd_ack)
* cn_proc_mcast_ctl
* @data: message sent from userspace via the connector
*/
-static void cn_proc_mcast_ctl(struct cn_msg *msg,
+static void cn_proc_mcast_ctl(struct net *net, struct cn_msg *msg,
struct netlink_skb_parms *nsp)
{
enum proc_cn_mcast_op *mc_op = NULL;
diff --git a/drivers/connector/cn_queue.c b/drivers/connector/cn_queue.c
index a82ceeb37f26..22fdd2b149af 100644
--- a/drivers/connector/cn_queue.c
+++ b/drivers/connector/cn_queue.c
@@ -16,11 +16,12 @@
#include <linux/suspend.h>
#include <linux/connector.h>
#include <linux/delay.h>
+#include <net/net_namespace.h>

static struct cn_callback_entry *
cn_queue_alloc_callback_entry(struct cn_queue_dev *dev, const char *name,
struct cb_id *id,
- void (*callback)(struct cn_msg *,
+ void (*callback)(struct net *, struct cn_msg *,
struct netlink_skb_parms *))
{
struct cn_callback_entry *cbq;
@@ -58,7 +59,7 @@ int cn_cb_equal(struct cb_id *i1, struct cb_id *i2)

int cn_queue_add_callback(struct cn_queue_dev *dev, const char *name,
struct cb_id *id,
- void (*callback)(struct cn_msg *,
+ void (*callback)(struct net *, struct cn_msg *,
struct netlink_skb_parms *))
{
struct cn_callback_entry *cbq, *__cbq;
diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index 2d22d6bf52f2..82fcaa4d8be3 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -58,8 +58,8 @@ static int cn_already_initialized;
* The message is sent to, the portid if given, the group if given, both if
* both, or if both are zero then the group is looked up and sent there.
*/
-int cn_netlink_send_mult(struct cn_msg *msg, u16 len, u32 portid, u32 __group,
- gfp_t gfp_mask)
+int cn_netlink_send_mult(struct net *net, struct cn_msg *msg, u16 len,
+ u32 portid, u32 __group, gfp_t gfp_mask)
{
struct cn_callback_entry *__cbq;
unsigned int size;
@@ -118,17 +118,18 @@ int cn_netlink_send_mult(struct cn_msg *msg, u16 len, u32 portid, u32 __group,
EXPORT_SYMBOL_GPL(cn_netlink_send_mult);

/* same as cn_netlink_send_mult except msg->len is used for len */
-int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group,
- gfp_t gfp_mask)
+int cn_netlink_send(struct net *net, struct cn_msg *msg, u32 portid,
+ u32 __group, gfp_t gfp_mask)
{
- return cn_netlink_send_mult(msg, msg->len, portid, __group, gfp_mask);
+ return cn_netlink_send_mult(net, msg, msg->len, portid, __group,
+ gfp_mask);
}
EXPORT_SYMBOL_GPL(cn_netlink_send);

/*
* Callback helper - queues work and setup destructor for given data.
*/
-static int cn_call_callback(struct sk_buff *skb)
+static int cn_call_callback(struct net *net, struct sk_buff *skb)
{
struct nlmsghdr *nlh;
struct cn_callback_entry *i, *cbq = NULL;
@@ -153,7 +154,7 @@ static int cn_call_callback(struct sk_buff *skb)
spin_unlock_bh(&dev->cbdev->queue_lock);

if (cbq != NULL) {
- cbq->callback(msg, nsp);
+ cbq->callback(net, msg, nsp);
kfree_skb(skb);
cn_queue_release_callback(cbq);
err = 0;
@@ -172,6 +173,8 @@ static void cn_rx_skb(struct sk_buff *skb)
struct nlmsghdr *nlh;
int len, err;

+ struct net *net = sock_net(skb->sk);
+
if (skb->len >= NLMSG_HDRLEN) {
nlh = nlmsg_hdr(skb);
len = nlmsg_len(nlh);
@@ -181,7 +184,7 @@ static void cn_rx_skb(struct sk_buff *skb)
len > CONNECTOR_MAX_MSG_SIZE)
return;

- err = cn_call_callback(skb_get(skb));
+ err = cn_call_callback(net, skb_get(skb));
if (err < 0)
kfree_skb(skb);
}
@@ -194,7 +197,7 @@ static void cn_rx_skb(struct sk_buff *skb)
* May sleep.
*/
int cn_add_callback(struct cb_id *id, const char *name,
- void (*callback)(struct cn_msg *,
+ void (*callback)(struct net *, struct cn_msg *,
struct netlink_skb_parms *))
{
int err;
diff --git a/drivers/hv/hv_utils_transport.c b/drivers/hv/hv_utils_transport.c
index eb2833d2b5d0..1a67efe59e91 100644
--- a/drivers/hv/hv_utils_transport.c
+++ b/drivers/hv/hv_utils_transport.c
@@ -8,6 +8,7 @@
#include <linux/slab.h>
#include <linux/fs.h>
#include <linux/poll.h>
+#include <net/net_namespace.h>

#include "hyperv_vmbus.h"
#include "hv_utils_transport.h"
@@ -181,7 +182,8 @@ static int hvt_op_release(struct inode *inode, struct file *file)
return 0;
}

-static void hvt_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp)
+static void hvt_cn_callback(struct net *net, struct cn_msg *msg,
+ struct netlink_skb_parms *nsp)
{
struct hvutil_transport *hvt, *hvt_found = NULL;

@@ -231,7 +233,7 @@ int hvutil_transport_send(struct hvutil_transport *hvt, void *msg, int len,
cn_msg->id.val = hvt->cn_id.val;
cn_msg->len = len;
memcpy(cn_msg->data, msg, len);
- ret = cn_netlink_send(cn_msg, 0, 0, GFP_ATOMIC);
+ ret = cn_netlink_send(&init_net, cn_msg, 0, 0, GFP_ATOMIC);
kfree(cn_msg);
/*
* We don't know when netlink messages are delivered but unlike
diff --git a/drivers/md/dm-log-userspace-transfer.c b/drivers/md/dm-log-userspace-transfer.c
index fdf8ec304f8d..0e835acf14da 100644
--- a/drivers/md/dm-log-userspace-transfer.c
+++ b/drivers/md/dm-log-userspace-transfer.c
@@ -12,6 +12,7 @@
#include <linux/connector.h>
#include <linux/device-mapper.h>
#include <linux/dm-log-userspace.h>
+#include <net/net_namespace.h>

#include "dm-log-userspace-transfer.h"

@@ -66,7 +67,7 @@ static int dm_ulog_sendto_server(struct dm_ulog_request *tfr)
msg->seq = tfr->seq;
msg->len = sizeof(struct dm_ulog_request) + tfr->data_size;

- r = cn_netlink_send(msg, 0, 0, gfp_any());
+ r = cn_netlink_send(&init_net, msg, 0, 0, gfp_any());

return r;
}
@@ -130,7 +131,8 @@ static int fill_pkg(struct cn_msg *msg, struct dm_ulog_request *tfr)
* This is the connector callback that delivers data
* that was sent from userspace.
*/
-static void cn_ulog_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp)
+static void cn_ulog_callback(struct net *net, struct cn_msg *msg,
+ struct netlink_skb_parms *nsp)
{
struct dm_ulog_request *tfr = (struct dm_ulog_request *)(msg + 1);

diff --git a/drivers/video/fbdev/uvesafb.c b/drivers/video/fbdev/uvesafb.c
index def14ac0ebe1..f9b6ed7b97f2 100644
--- a/drivers/video/fbdev/uvesafb.c
+++ b/drivers/video/fbdev/uvesafb.c
@@ -25,6 +25,7 @@
#include <linux/slab.h>
#include <video/edid.h>
#include <video/uvesafb.h>
+#include <net/net_namespace.h>
#ifdef CONFIG_X86
#include <video/vga.h>
#endif
@@ -69,7 +70,8 @@ static DEFINE_MUTEX(uvfb_lock);
* find the kernel part of the task struct, copy the registers and
* the buffer contents and then complete the task.
*/
-static void uvesafb_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp)
+static void uvesafb_cn_callback(struct net *net, struct cn_msg *msg,
+ struct netlink_skb_parms *nsp)
{
struct uvesafb_task *utask;
struct uvesafb_ktask *task;
@@ -194,7 +196,7 @@ static int uvesafb_exec(struct uvesafb_ktask *task)
uvfb_tasks[seq] = task;
mutex_unlock(&uvfb_lock);

- err = cn_netlink_send(m, 0, 0, GFP_KERNEL);
+ err = cn_netlink_send(&init_net, m, 0, 0, GFP_KERNEL);
if (err == -ESRCH) {
/*
* Try to start the userspace helper if sending
@@ -206,7 +208,7 @@ static int uvesafb_exec(struct uvesafb_ktask *task)
pr_err("make sure that the v86d helper is installed and executable\n");
} else {
v86d_started = 1;
- err = cn_netlink_send(m, 0, 0, gfp_any());
+ err = cn_netlink_send(&init_net, m, 0, 0, gfp_any());
if (err == -ENOBUFS)
err = 0;
}
diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c
index fa490aa4407c..246844b61613 100644
--- a/drivers/w1/w1_netlink.c
+++ b/drivers/w1/w1_netlink.c
@@ -7,6 +7,7 @@
#include <linux/skbuff.h>
#include <linux/netlink.h>
#include <linux/connector.h>
+#include <net/net_namespace.h>

#include "w1_internal.h"
#include "w1_netlink.h"
@@ -64,8 +65,8 @@ static void w1_unref_block(struct w1_cb_block *block)
if (atomic_sub_return(1, &block->refcnt) == 0) {
u16 len = w1_reply_len(block);
if (len) {
- cn_netlink_send_mult(block->first_cn, len,
- block->portid, 0, GFP_KERNEL);
+ cn_netlink_send_mult(&init_net, block->first_cn, len,
+ block->portid, 0, GFP_KERNEL);
}
kfree(block);
}
@@ -83,7 +84,8 @@ static void w1_reply_make_space(struct w1_cb_block *block, u16 space)
{
u16 len = w1_reply_len(block);
if (len + space >= block->maxlen) {
- cn_netlink_send_mult(block->first_cn, len, block->portid, 0, GFP_KERNEL);
+ cn_netlink_send_mult(&init_net, block->first_cn, len,
+ block->portid, 0, GFP_KERNEL);
block->first_cn->len = 0;
block->cn = NULL;
block->msg = NULL;
@@ -201,7 +203,7 @@ static void w1_netlink_send_error(struct cn_msg *cn, struct w1_netlink_msg *msg,
packet.cn.len = sizeof(packet.msg);
packet.msg.len = 0;
packet.msg.status = (u8)-error;
- cn_netlink_send(&packet.cn, portid, 0, GFP_KERNEL);
+ cn_netlink_send(&init_net, &packet.cn, portid, 0, GFP_KERNEL);
}

/**
@@ -228,7 +230,7 @@ void w1_netlink_send(struct w1_master *dev, struct w1_netlink_msg *msg)
memcpy(&packet.msg, msg, sizeof(*msg));
packet.msg.len = 0;

- cn_netlink_send(&packet.cn, 0, 0, GFP_KERNEL);
+ cn_netlink_send(&init_net, &packet.cn, 0, 0, GFP_KERNEL);
}

static void w1_send_slave(struct w1_master *dev, u64 rn)
@@ -421,7 +423,7 @@ static int w1_process_command_root(struct cn_msg *req_cn, u32 portid)
mutex_lock(&w1_mlock);
list_for_each_entry(dev, &w1_masters, w1_master_entry) {
if (cn->len + sizeof(*id) > PAGE_SIZE - sizeof(struct cn_msg)) {
- cn_netlink_send(cn, portid, 0, GFP_KERNEL);
+ cn_netlink_send(&init_net, cn, portid, 0, GFP_KERNEL);
cn->len = sizeof(struct w1_netlink_msg);
msg->len = 0;
id = (u32 *)msg->data;
@@ -432,7 +434,7 @@ static int w1_process_command_root(struct cn_msg *req_cn, u32 portid)
cn->len += sizeof(*id);
id++;
}
- cn_netlink_send(cn, portid, 0, GFP_KERNEL);
+ cn_netlink_send(&init_net, cn, portid, 0, GFP_KERNEL);
mutex_unlock(&w1_mlock);

kfree(cn);
@@ -532,7 +534,8 @@ static void w1_list_count_cmds(struct w1_netlink_msg *msg, int *cmd_count,
}
}

-static void w1_cn_callback(struct cn_msg *cn, struct netlink_skb_parms *nsp)
+static void w1_cn_callback(struct net *net, struct cn_msg *cn,
+ struct netlink_skb_parms *nsp)
{
struct w1_netlink_msg *msg = (struct w1_netlink_msg *)(cn + 1);
struct w1_slave *sl;
diff --git a/include/linux/connector.h b/include/linux/connector.h
index cb732643471b..8e9385eb18f8 100644
--- a/include/linux/connector.h
+++ b/include/linux/connector.h
@@ -40,7 +40,8 @@ struct cn_callback_entry {
struct cn_queue_dev *pdev;

struct cn_callback_id id;
- void (*callback) (struct cn_msg *, struct netlink_skb_parms *);
+ void (*callback)(struct net *, struct cn_msg *,
+ struct netlink_skb_parms *);

u32 seq, group;
};
@@ -62,10 +63,12 @@ struct cn_dev {
* in-kernel users.
* @name: connector's callback symbolic name.
* @callback: connector's callback.
- * parameters are %cn_msg and the sender's credentials
+ * parameters are network namespace, %cn_msg and
+ * the sender's credentials
*/
int cn_add_callback(struct cb_id *id, const char *name,
- void (*callback)(struct cn_msg *, struct netlink_skb_parms *));
+ void (*callback)(struct net *, struct cn_msg *,
+ struct netlink_skb_parms *));
/**
* cn_del_callback() - Unregisters new callback with connector core.
*
@@ -75,8 +78,9 @@ void cn_del_callback(struct cb_id *id);


/**
- * cn_netlink_send_mult - Sends message to the specified groups.
+ * cn_netlink_send_mult - Sends messages to the specified groups.
*
+ * @net: network namespace
* @msg: message header(with attached data).
* @len: Number of @msg to be sent.
* @portid: destination port.
@@ -96,11 +100,13 @@ void cn_del_callback(struct cb_id *id);
*
* If there are no listeners for given group %-ESRCH can be returned.
*/
-int cn_netlink_send_mult(struct cn_msg *msg, u16 len, u32 portid, u32 group, gfp_t gfp_mask);
+int cn_netlink_send_mult(struct net *net, struct cn_msg *msg, u16 len,
+ u32 portid, u32 group, gfp_t gfp_mask);

/**
- * cn_netlink_send_mult - Sends message to the specified groups.
+ * cn_netlink_send - Sends message to the specified groups.
*
+ * @net: network namespace
* @msg: message header(with attached data).
* @portid: destination port.
* If non-zero the message will be sent to the given port,
@@ -119,11 +125,13 @@ int cn_netlink_send_mult(struct cn_msg *msg, u16 len, u32 portid, u32 group, gfp
*
* If there are no listeners for given group %-ESRCH can be returned.
*/
-int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 group, gfp_t gfp_mask);
+int cn_netlink_send(struct net *net, struct cn_msg *msg, u32 portid, u32 group,
+ gfp_t gfp_mask);

int cn_queue_add_callback(struct cn_queue_dev *dev, const char *name,
struct cb_id *id,
- void (*callback)(struct cn_msg *, struct netlink_skb_parms *));
+ void (*callback)(struct net *, struct cn_msg *,
+ struct netlink_skb_parms *));
void cn_queue_del_callback(struct cn_queue_dev *dev, struct cb_id *id);
void cn_queue_release_callback(struct cn_callback_entry *);

diff --git a/samples/connector/cn_test.c b/samples/connector/cn_test.c
index 0958a171d048..9eaf40bbd714 100644
--- a/samples/connector/cn_test.c
+++ b/samples/connector/cn_test.c
@@ -16,13 +16,15 @@
#include <linux/timer.h>

#include <linux/connector.h>
+#include <net/net_namespace.h>

static struct cb_id cn_test_id = { CN_NETLINK_USERS + 3, 0x456 };
static char cn_test_name[] = "cn_test";
static struct sock *nls;
static struct timer_list cn_test_timer;

-static void cn_test_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp)
+static void cn_test_callback(struct net *net, struct cn_msg *msg,
+ struct netlink_skb_parms *nsp)
{
pr_info("%s: %lu: idx=%x, val=%x, seq=%u, ack=%u, len=%d: %s.\n",
__func__, jiffies, msg->id.idx, msg->id.val,
@@ -132,7 +134,7 @@ static void cn_test_timer_func(struct timer_list *unused)

memcpy(m + 1, data, m->len);

- cn_netlink_send(m, 0, 0, GFP_ATOMIC);
+ cn_netlink_send(&init_net, m, 0, 0, GFP_ATOMIC);
kfree(m);
}

--
2.27.0

2020-07-02 00:29:32

by Matt Bennett

[permalink] [raw]
Subject: [PATCH 1/5] connector: Use task pid helpers

In preparation for supporting the connector outside of the default
network namespace we switch to using these helpers now. As the connector
is still only supported in the default namespace this change is a no-op.

Signed-off-by: Matt Bennett <[email protected]>
---
drivers/connector/cn_proc.c | 48 ++++++++++++++++++-------------------
1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 646ad385e490..36a7823c56ec 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -83,11 +83,11 @@ void proc_fork_connector(struct task_struct *task)
ev->what = PROC_EVENT_FORK;
rcu_read_lock();
parent = rcu_dereference(task->real_parent);
- ev->event_data.fork.parent_pid = parent->pid;
- ev->event_data.fork.parent_tgid = parent->tgid;
+ ev->event_data.fork.parent_pid = task_pid_vnr(parent);
+ ev->event_data.fork.parent_tgid = task_tgid_vnr(parent);
rcu_read_unlock();
- ev->event_data.fork.child_pid = task->pid;
- ev->event_data.fork.child_tgid = task->tgid;
+ ev->event_data.fork.child_pid = task_pid_vnr(task);
+ ev->event_data.fork.child_tgid = task_tgid_vnr(task);

memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
msg->ack = 0; /* not used */
@@ -110,8 +110,8 @@ void proc_exec_connector(struct task_struct *task)
memset(&ev->event_data, 0, sizeof(ev->event_data));
ev->timestamp_ns = ktime_get_ns();
ev->what = PROC_EVENT_EXEC;
- ev->event_data.exec.process_pid = task->pid;
- ev->event_data.exec.process_tgid = task->tgid;
+ ev->event_data.exec.process_pid = task_pid_vnr(task);
+ ev->event_data.exec.process_tgid = task_tgid_vnr(task);

memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
msg->ack = 0; /* not used */
@@ -134,8 +134,8 @@ void proc_id_connector(struct task_struct *task, int which_id)
ev = (struct proc_event *)msg->data;
memset(&ev->event_data, 0, sizeof(ev->event_data));
ev->what = which_id;
- ev->event_data.id.process_pid = task->pid;
- ev->event_data.id.process_tgid = task->tgid;
+ ev->event_data.id.process_pid = task_pid_vnr(task);
+ ev->event_data.id.process_tgid = task_tgid_vnr(task);
rcu_read_lock();
cred = __task_cred(task);
if (which_id == PROC_EVENT_UID) {
@@ -172,8 +172,8 @@ void proc_sid_connector(struct task_struct *task)
memset(&ev->event_data, 0, sizeof(ev->event_data));
ev->timestamp_ns = ktime_get_ns();
ev->what = PROC_EVENT_SID;
- ev->event_data.sid.process_pid = task->pid;
- ev->event_data.sid.process_tgid = task->tgid;
+ ev->event_data.sid.process_pid = task_pid_vnr(task);
+ ev->event_data.sid.process_tgid = task_tgid_vnr(task);

memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
msg->ack = 0; /* not used */
@@ -196,11 +196,11 @@ void proc_ptrace_connector(struct task_struct *task, int ptrace_id)
memset(&ev->event_data, 0, sizeof(ev->event_data));
ev->timestamp_ns = ktime_get_ns();
ev->what = PROC_EVENT_PTRACE;
- ev->event_data.ptrace.process_pid = task->pid;
- ev->event_data.ptrace.process_tgid = task->tgid;
+ ev->event_data.ptrace.process_pid = task_pid_vnr(task);
+ ev->event_data.ptrace.process_tgid = task_tgid_vnr(task);
if (ptrace_id == PTRACE_ATTACH) {
- ev->event_data.ptrace.tracer_pid = current->pid;
- ev->event_data.ptrace.tracer_tgid = current->tgid;
+ ev->event_data.ptrace.tracer_pid = task_pid_vnr(current);
+ ev->event_data.ptrace.tracer_tgid = task_tgid_vnr(current);
} else if (ptrace_id == PTRACE_DETACH) {
ev->event_data.ptrace.tracer_pid = 0;
ev->event_data.ptrace.tracer_tgid = 0;
@@ -228,8 +228,8 @@ void proc_comm_connector(struct task_struct *task)
memset(&ev->event_data, 0, sizeof(ev->event_data));
ev->timestamp_ns = ktime_get_ns();
ev->what = PROC_EVENT_COMM;
- ev->event_data.comm.process_pid = task->pid;
- ev->event_data.comm.process_tgid = task->tgid;
+ ev->event_data.comm.process_pid = task_pid_vnr(task);
+ ev->event_data.comm.process_tgid = task_tgid_vnr(task);
get_task_comm(ev->event_data.comm.comm, task);

memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
@@ -254,14 +254,14 @@ void proc_coredump_connector(struct task_struct *task)
memset(&ev->event_data, 0, sizeof(ev->event_data));
ev->timestamp_ns = ktime_get_ns();
ev->what = PROC_EVENT_COREDUMP;
- ev->event_data.coredump.process_pid = task->pid;
- ev->event_data.coredump.process_tgid = task->tgid;
+ ev->event_data.coredump.process_pid = task_pid_vnr(task);
+ ev->event_data.coredump.process_tgid = task_tgid_vnr(task);

rcu_read_lock();
if (pid_alive(task)) {
parent = rcu_dereference(task->real_parent);
- ev->event_data.coredump.parent_pid = parent->pid;
- ev->event_data.coredump.parent_tgid = parent->tgid;
+ ev->event_data.coredump.parent_pid = task_pid_vnr(parent);
+ ev->event_data.coredump.parent_tgid = task_tgid_vnr(parent);
}
rcu_read_unlock();

@@ -287,16 +287,16 @@ void proc_exit_connector(struct task_struct *task)
memset(&ev->event_data, 0, sizeof(ev->event_data));
ev->timestamp_ns = ktime_get_ns();
ev->what = PROC_EVENT_EXIT;
- ev->event_data.exit.process_pid = task->pid;
- ev->event_data.exit.process_tgid = task->tgid;
+ ev->event_data.exit.process_pid = task_pid_vnr(task);
+ ev->event_data.exit.process_tgid = task_tgid_vnr(task);
ev->event_data.exit.exit_code = task->exit_code;
ev->event_data.exit.exit_signal = task->exit_signal;

rcu_read_lock();
if (pid_alive(task)) {
parent = rcu_dereference(task->real_parent);
- ev->event_data.exit.parent_pid = parent->pid;
- ev->event_data.exit.parent_tgid = parent->tgid;
+ ev->event_data.exit.parent_pid = task_pid_vnr(parent);
+ ev->event_data.exit.parent_tgid = task_tgid_vnr(parent);
}
rcu_read_unlock();

--
2.27.0

2020-07-02 00:30:53

by Matt Bennett

[permalink] [raw]
Subject: [PATCH 3/5] connector: Ensure callback entry is released

Currently the entry itself appears to be being leaked.

Signed-off-by: Matt Bennett <[email protected]>
---
drivers/connector/cn_queue.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/connector/cn_queue.c b/drivers/connector/cn_queue.c
index 49295052ba8b..a82ceeb37f26 100644
--- a/drivers/connector/cn_queue.c
+++ b/drivers/connector/cn_queue.c
@@ -132,8 +132,10 @@ void cn_queue_free_dev(struct cn_queue_dev *dev)
struct cn_callback_entry *cbq, *n;

spin_lock_bh(&dev->queue_lock);
- list_for_each_entry_safe(cbq, n, &dev->queue_list, callback_entry)
+ list_for_each_entry_safe(cbq, n, &dev->queue_list, callback_entry) {
list_del(&cbq->callback_entry);
+ cn_queue_release_callback(cbq);
+ }
spin_unlock_bh(&dev->queue_lock);

while (atomic_read(&dev->refcnt)) {
--
2.27.0

2020-07-02 05:56:25

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 5/5] connector: Create connector per namespace

Hi Matt,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on ipvs/master]
[also build test WARNING on dm/for-next linux/master linus/master v5.8-rc3 next-20200701]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Matt-Bennett/RFC-connector-Add-network-namespace-awareness/20200702-083030
base: https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master
config: powerpc-pmac32_defconfig (attached as .config)
compiler: powerpc-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=powerpc

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

drivers/connector/connector.c: In function 'cn_netlink_send_mult':
>> drivers/connector/connector.c:71:18: warning: comparison is always false due to limited range of data type [-Wtype-limits]
71 | if (!msg || len < 0)
| ^
drivers/connector/connector.c: At top level:
drivers/connector/connector.c:238:5: warning: no previous prototype for 'cn_add_callback_one' [-Wmissing-prototypes]
238 | int cn_add_callback_one(struct net *net, struct cb_id *id, const char *name,
| ^~~~~~~~~~~~~~~~~~~

vim +71 drivers/connector/connector.c

30
31 /*
32 * Sends mult (multiple) cn_msg at a time.
33 *
34 * msg->seq and msg->ack are used to determine message genealogy.
35 * When someone sends message it puts there locally unique sequence
36 * and random acknowledge numbers. Sequence number may be copied into
37 * nlmsghdr->nlmsg_seq too.
38 *
39 * Sequence number is incremented with each message to be sent.
40 *
41 * If we expect a reply to our message then the sequence number in
42 * received message MUST be the same as in original message, and
43 * acknowledge number MUST be the same + 1.
44 *
45 * If we receive a message and its sequence number is not equal to the
46 * one we are expecting then it is a new message.
47 *
48 * If we receive a message and its sequence number is the same as one
49 * we are expecting but it's acknowledgement number is not equal to
50 * the acknowledgement number in the original message + 1, then it is
51 * a new message.
52 *
53 * If msg->len != len, then additional cn_msg messages are expected following
54 * the first msg.
55 *
56 * The message is sent to, the portid if given, the group if given, both if
57 * both, or if both are zero then the group is looked up and sent there.
58 */
59 int cn_netlink_send_mult(struct net *net, struct cn_msg *msg, u16 len,
60 u32 portid, u32 __group, gfp_t gfp_mask)
61 {
62 struct cn_callback_entry *__cbq;
63 unsigned int size;
64 struct sk_buff *skb;
65 struct nlmsghdr *nlh;
66 struct cn_msg *data;
67 struct cn_dev *dev = &(net->cdev);
68 u32 group = 0;
69 int found = 0;
70
> 71 if (!msg || len < 0)
72 return -EINVAL;
73
74 if (portid || __group) {
75 group = __group;
76 } else {
77 spin_lock_bh(&dev->cbdev->queue_lock);
78 list_for_each_entry(__cbq, &dev->cbdev->queue_list,
79 callback_entry) {
80 if (cn_cb_equal(&__cbq->id.id, &msg->id)) {
81 found = 1;
82 group = __cbq->group;
83 break;
84 }
85 }
86 spin_unlock_bh(&dev->cbdev->queue_lock);
87
88 if (!found)
89 return -ENODEV;
90 }
91
92 if (!portid && !netlink_has_listeners(dev->nls, group))
93 return -ESRCH;
94
95 size = sizeof(*msg) + len;
96
97 skb = nlmsg_new(size, gfp_mask);
98 if (!skb)
99 return -ENOMEM;
100
101 nlh = nlmsg_put(skb, 0, msg->seq, NLMSG_DONE, size, 0);
102 if (!nlh) {
103 kfree_skb(skb);
104 return -EMSGSIZE;
105 }
106
107 data = nlmsg_data(nlh);
108
109 memcpy(data, msg, size);
110
111 NETLINK_CB(skb).dst_group = group;
112
113 if (group)
114 return netlink_broadcast(dev->nls, skb, portid, group,
115 gfp_mask);
116 return netlink_unicast(dev->nls, skb, portid,
117 !gfpflags_allow_blocking(gfp_mask));
118 }
119 EXPORT_SYMBOL_GPL(cn_netlink_send_mult);
120

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (4.93 kB)
.config.gz (24.39 kB)
Download all attachments

2020-07-02 06:54:09

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 5/5] connector: Create connector per namespace

Hi Matt,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on ipvs/master]
[also build test WARNING on dm/for-next linux/master linus/master v5.8-rc3 next-20200701]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Matt-Bennett/RFC-connector-Add-network-namespace-awareness/20200702-083030
base: https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master
config: s390-randconfig-r016-20200701 (attached as .config)
compiler: s390-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=s390

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

In file included from drivers/connector/connector.c:9:
drivers/connector/connector.c: In function 'cn_netlink_send_mult':
drivers/connector/connector.c:71:18: warning: comparison is always false due to limited range of data type [-Wtype-limits]
71 | if (!msg || len < 0)
| ^
include/linux/compiler.h:58:52: note: in definition of macro '__trace_if_var'
58 | #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
| ^~~~
>> drivers/connector/connector.c:71:2: note: in expansion of macro 'if'
71 | if (!msg || len < 0)
| ^~
drivers/connector/connector.c:71:18: warning: comparison is always false due to limited range of data type [-Wtype-limits]
71 | if (!msg || len < 0)
| ^
include/linux/compiler.h:58:61: note: in definition of macro '__trace_if_var'
58 | #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
| ^~~~
>> drivers/connector/connector.c:71:2: note: in expansion of macro 'if'
71 | if (!msg || len < 0)
| ^~
drivers/connector/connector.c:71:18: warning: comparison is always false due to limited range of data type [-Wtype-limits]
71 | if (!msg || len < 0)
| ^
include/linux/compiler.h:69:3: note: in definition of macro '__trace_if_value'
69 | (cond) ? \
| ^~~~
include/linux/compiler.h:56:28: note: in expansion of macro '__trace_if_var'
56 | #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
| ^~~~~~~~~~~~~~
>> drivers/connector/connector.c:71:2: note: in expansion of macro 'if'
71 | if (!msg || len < 0)
| ^~
drivers/connector/connector.c: At top level:
drivers/connector/connector.c:238:5: warning: no previous prototype for 'cn_add_callback_one' [-Wmissing-prototypes]
238 | int cn_add_callback_one(struct net *net, struct cb_id *id, const char *name,
| ^~~~~~~~~~~~~~~~~~~

vim +/if +71 drivers/connector/connector.c

30
31 /*
32 * Sends mult (multiple) cn_msg at a time.
33 *
34 * msg->seq and msg->ack are used to determine message genealogy.
35 * When someone sends message it puts there locally unique sequence
36 * and random acknowledge numbers. Sequence number may be copied into
37 * nlmsghdr->nlmsg_seq too.
38 *
39 * Sequence number is incremented with each message to be sent.
40 *
41 * If we expect a reply to our message then the sequence number in
42 * received message MUST be the same as in original message, and
43 * acknowledge number MUST be the same + 1.
44 *
45 * If we receive a message and its sequence number is not equal to the
46 * one we are expecting then it is a new message.
47 *
48 * If we receive a message and its sequence number is the same as one
49 * we are expecting but it's acknowledgement number is not equal to
50 * the acknowledgement number in the original message + 1, then it is
51 * a new message.
52 *
53 * If msg->len != len, then additional cn_msg messages are expected following
54 * the first msg.
55 *
56 * The message is sent to, the portid if given, the group if given, both if
57 * both, or if both are zero then the group is looked up and sent there.
58 */
59 int cn_netlink_send_mult(struct net *net, struct cn_msg *msg, u16 len,
60 u32 portid, u32 __group, gfp_t gfp_mask)
61 {
62 struct cn_callback_entry *__cbq;
63 unsigned int size;
64 struct sk_buff *skb;
65 struct nlmsghdr *nlh;
66 struct cn_msg *data;
67 struct cn_dev *dev = &(net->cdev);
68 u32 group = 0;
69 int found = 0;
70
> 71 if (!msg || len < 0)
72 return -EINVAL;
73
74 if (portid || __group) {
75 group = __group;
76 } else {
77 spin_lock_bh(&dev->cbdev->queue_lock);
78 list_for_each_entry(__cbq, &dev->cbdev->queue_list,
79 callback_entry) {
80 if (cn_cb_equal(&__cbq->id.id, &msg->id)) {
81 found = 1;
82 group = __cbq->group;
83 break;
84 }
85 }
86 spin_unlock_bh(&dev->cbdev->queue_lock);
87
88 if (!found)
89 return -ENODEV;
90 }
91
92 if (!portid && !netlink_has_listeners(dev->nls, group))
93 return -ESRCH;
94
95 size = sizeof(*msg) + len;
96
97 skb = nlmsg_new(size, gfp_mask);
98 if (!skb)
99 return -ENOMEM;
100
101 nlh = nlmsg_put(skb, 0, msg->seq, NLMSG_DONE, size, 0);
102 if (!nlh) {
103 kfree_skb(skb);
104 return -EMSGSIZE;
105 }
106
107 data = nlmsg_data(nlh);
108
109 memcpy(data, msg, size);
110
111 NETLINK_CB(skb).dst_group = group;
112
113 if (group)
114 return netlink_broadcast(dev->nls, skb, portid, group,
115 gfp_mask);
116 return netlink_unicast(dev->nls, skb, portid,
117 !gfpflags_allow_blocking(gfp_mask));
118 }
119 EXPORT_SYMBOL_GPL(cn_netlink_send_mult);
120

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (6.57 kB)
.config.gz (27.47 kB)
Download all attachments

2020-07-02 13:23:08

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/5] RFC: connector: Add network namespace awareness

Matt Bennett <[email protected]> writes:

> Previously the connector functionality could only be used by processes running in the
> default network namespace. This meant that any process that uses the connector functionality
> could not operate correctly when run inside a container. This is a draft patch series that
> attempts to now allow this functionality outside of the default network namespace.
>
> I see this has been discussed previously [1], but am not sure how my changes relate to all
> of the topics discussed there and/or if there are any unintended side effects from my draft
> changes.

Is there a piece of software that uses connector that you want to get
working in containers?

I am curious what the motivation is because up until now there has been
nothing very interesting using this functionality. So it hasn't been
worth anyone's time to make the necessary changes to the code.

Eric


> Thanks.
>
> [1] https://marc.info/?l=linux-kernel&m=150806196728365&w=2
>
> Matt Bennett (5):
> connector: Use task pid helpers
> connector: Use 'current_user_ns' function
> connector: Ensure callback entry is released
> connector: Prepare for supporting multiple namespaces
> connector: Create connector per namespace
>
> Documentation/driver-api/connector.rst | 6 +-
> drivers/connector/cn_proc.c | 110 +++++++-------
> drivers/connector/cn_queue.c | 9 +-
> drivers/connector/connector.c | 192 ++++++++++++++++++++-----
> drivers/hv/hv_fcopy.c | 1 +
> drivers/hv/hv_utils_transport.c | 6 +-
> drivers/md/dm-log-userspace-transfer.c | 6 +-
> drivers/video/fbdev/uvesafb.c | 8 +-
> drivers/w1/w1_netlink.c | 19 +--
> include/linux/connector.h | 38 +++--
> include/net/net_namespace.h | 4 +
> kernel/exit.c | 2 +-
> samples/connector/cn_test.c | 6 +-
> 13 files changed, 286 insertions(+), 121 deletions(-)

2020-07-02 14:38:12

by Dan Carpenter

[permalink] [raw]
Subject: [kbuild] Re: [PATCH 5/5] connector: Create connector per namespace

_______________________________________________
kbuild mailing list -- [email protected]
To unsubscribe send an email to [email protected]


Attachments:
(No filename) (5.87 kB)
.config.gz (26.79 kB)
(No filename) (152.00 B)
Download all attachments

2020-07-02 19:07:13

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/5] RFC: connector: Add network namespace awareness

Matt Bennett <[email protected]> writes:

> Previously the connector functionality could only be used by processes running in the
> default network namespace. This meant that any process that uses the connector functionality
> could not operate correctly when run inside a container. This is a draft patch series that
> attempts to now allow this functionality outside of the default network namespace.
>
> I see this has been discussed previously [1], but am not sure how my changes relate to all
> of the topics discussed there and/or if there are any unintended side
> effects from my draft

In a quick skim this patchset does not look like it approaches a correct
conversion to having code that works in multiple namespaces.

I will take the changes to proc_id_connector for example.
You report the values in the callers current namespaces.

Which means an unprivileged user can create a user namespace and get
connector to report whichever ids they want to users in another
namespace. AKA lie.

So this appears to make connector completely unreliable.

Eric



> changes.
>
> Thanks.
>
> [1] https://marc.info/?l=linux-kernel&m=150806196728365&w=2
>
> Matt Bennett (5):
> connector: Use task pid helpers
> connector: Use 'current_user_ns' function
> connector: Ensure callback entry is released
> connector: Prepare for supporting multiple namespaces
> connector: Create connector per namespace
>
> Documentation/driver-api/connector.rst | 6 +-
> drivers/connector/cn_proc.c | 110 +++++++-------
> drivers/connector/cn_queue.c | 9 +-
> drivers/connector/connector.c | 192 ++++++++++++++++++++-----
> drivers/hv/hv_fcopy.c | 1 +
> drivers/hv/hv_utils_transport.c | 6 +-
> drivers/md/dm-log-userspace-transfer.c | 6 +-
> drivers/video/fbdev/uvesafb.c | 8 +-
> drivers/w1/w1_netlink.c | 19 +--
> include/linux/connector.h | 38 +++--
> include/net/net_namespace.h | 4 +
> kernel/exit.c | 2 +-
> samples/connector/cn_test.c | 6 +-
> 13 files changed, 286 insertions(+), 121 deletions(-)

2020-07-02 19:11:48

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH 0/5] RFC: connector: Add network namespace awareness

On Thu, Jul 02, 2020 at 08:17:38AM -0500, Eric W. Biederman wrote:
> Matt Bennett <[email protected]> writes:
>
> > Previously the connector functionality could only be used by processes running in the
> > default network namespace. This meant that any process that uses the connector functionality
> > could not operate correctly when run inside a container. This is a draft patch series that
> > attempts to now allow this functionality outside of the default network namespace.
> >
> > I see this has been discussed previously [1], but am not sure how my changes relate to all
> > of the topics discussed there and/or if there are any unintended side effects from my draft
> > changes.
>
> Is there a piece of software that uses connector that you want to get
> working in containers?
>
> I am curious what the motivation is because up until now there has been
> nothing very interesting using this functionality. So it hasn't been
> worth anyone's time to make the necessary changes to the code.

Imho, we should just state once and for all that the proc connector will
not be namespaced. This is such a corner-case thing and has been
non-namespaced for such a long time without consistent push for it to be
namespaced combined with the fact that this needs quite some code to
make it work correctly that I fear we end up buying more bugs than we're
selling features. And realistically, you and I will end up maintaining
this and I feel this is not worth the time(?). Maybe I'm being too
pessimistic though.

Christian

2020-07-02 22:45:38

by Aleksa Sarai

[permalink] [raw]
Subject: Re: [PATCH 0/5] RFC: connector: Add network namespace awareness

On 2020-07-02, Christian Brauner <[email protected]> wrote:
> On Thu, Jul 02, 2020 at 08:17:38AM -0500, Eric W. Biederman wrote:
> > Matt Bennett <[email protected]> writes:
> >
> > > Previously the connector functionality could only be used by processes running in the
> > > default network namespace. This meant that any process that uses the connector functionality
> > > could not operate correctly when run inside a container. This is a draft patch series that
> > > attempts to now allow this functionality outside of the default network namespace.
> > >
> > > I see this has been discussed previously [1], but am not sure how my changes relate to all
> > > of the topics discussed there and/or if there are any unintended side effects from my draft
> > > changes.
> >
> > Is there a piece of software that uses connector that you want to get
> > working in containers?
> >
> > I am curious what the motivation is because up until now there has been
> > nothing very interesting using this functionality. So it hasn't been
> > worth anyone's time to make the necessary changes to the code.
>
> Imho, we should just state once and for all that the proc connector will
> not be namespaced. This is such a corner-case thing and has been
> non-namespaced for such a long time without consistent push for it to be
> namespaced combined with the fact that this needs quite some code to
> make it work correctly that I fear we end up buying more bugs than we're
> selling features. And realistically, you and I will end up maintaining
> this and I feel this is not worth the time(?). Maybe I'm being too
> pessimistic though.

It would be nice to have the proc connector be namespaced, because it
would allow you to have init systems that don't depend on cgroups to
operate -- and it would allow us to have a subset of FreeBSD's kqueue
functionality that doesn't exist today under Linux. However, arguably
pidfds might be a better path forward toward implementing such events
these days -- and is maybe something we should look into.

All of that being said, I agree that doing this is going to be
particularly hairy and likely not worth the effort. In particular, the
proc connector is:

* Almost entirely unused (and largely unknown) by userspace.

* Fairly fundamentally broken right now (the "security feature" of
PROC_CN_MCAST_LISTEN doesn't work because once there is one listener,
anyone who opens an cn_proc socket can get all events on the system
-- and if the process which opened the socket dies with calling
PROC_CN_MCAST_IGNORE then that information is now always streaming).
So if we end up supporting this, we'll need to fix those bugs too.

* Is so deeply intertwined with netlink and thus is so deeply embedded
with network namespaces (rather than pid namespaces) meaning that
getting it to correctly handle shared-network-namespace cases is
going to be a nightmare. I agree with Eric that this patchset looks
like it doesn't approach the problem from the right angle (and
thinking about how you could fix it makes me a little nervous).

Not to mention that when I brought this up with the maintainer listed in
MAINTAINERS a few years ago (soon after I posted [1]), I was told that
they no longer maintain this code -- so whoever touches it next is the
new maintainer.

In 2017, I wrote that GNU Shepherd uses cn_proc, however I'm pretty sure
(looking at the code now) that it wasn't true then and isn't true now
(Shepherd seems to just do basic pidfile liveliness checks). So even the
niche example I used then doesn't actually use cn_proc.

[1]: https://lore.kernel.org/lkml/[email protected]/

--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>


Attachments:
(No filename) (3.78 kB)
signature.asc (849.00 B)
Download all attachments

2020-07-05 22:36:59

by Matt Bennett

[permalink] [raw]
Subject: Re: [PATCH 0/5] RFC: connector: Add network namespace awareness

On Thu, 2020-07-02 at 13:59 -0500, Eric W. Biederman wrote:
> Matt Bennett <[email protected]> writes:
>
> > Previously the connector functionality could only be used by processes running in the
> > default network namespace. This meant that any process that uses the connector functionality
> > could not operate correctly when run inside a container. This is a draft patch series that
> > attempts to now allow this functionality outside of the default network namespace.
> >
> > I see this has been discussed previously [1], but am not sure how my changes relate to all
> > of the topics discussed there and/or if there are any unintended side
> > effects from my draft
>
> In a quick skim this patchset does not look like it approaches a correct
> conversion to having code that works in multiple namespaces.
>
> I will take the changes to proc_id_connector for example.
> You report the values in the callers current namespaces.
>
> Which means an unprivileged user can create a user namespace and get
> connector to report whichever ids they want to users in another
> namespace. AKA lie.
>
> So this appears to make connector completely unreliable.
>
> Eric
>

Hi Eric,

Thank you for taking the time to review. I wrote these patches in an attempt to show that I was willing to do the work myself rather than simply
asking for someone else to do it for me. The changes worked for my use cases when I tested them, but I expected that some of the changes would be
incorrect and that I would need some guidance. I can spend some time to really dig in and fully understand the changes I am trying to make (I have
limited kernel development experience) but based on the rest of the discussion threads it seems that there is likely no appetite to ever support
namespaces with the connector.

Best regards,
Matt

2020-07-05 22:37:23

by Matt Bennett

[permalink] [raw]
Subject: Re: [PATCH 0/5] RFC: connector: Add network namespace awareness

On Thu, 2020-07-02 at 21:10 +0200, Christian Brauner wrote:
> On Thu, Jul 02, 2020 at 08:17:38AM -0500, Eric W. Biederman wrote:
> > Matt Bennett <[email protected]> writes:
> >
> > > Previously the connector functionality could only be used by processes running in the
> > > default network namespace. This meant that any process that uses the connector functionality
> > > could not operate correctly when run inside a container. This is a draft patch series that
> > > attempts to now allow this functionality outside of the default network namespace.
> > >
> > > I see this has been discussed previously [1], but am not sure how my changes relate to all
> > > of the topics discussed there and/or if there are any unintended side effects from my draft
> > > changes.
> >
> > Is there a piece of software that uses connector that you want to get
> > working in containers?

We have an IPC system [1] where processes can register their socket details (unix, tcp, tipc, ...) to a 'monitor' process. Processes can then get
notified when other processes they are interested in start/stop their servers and use the registered details to connect to them. Everything works
unless a process crashes, in which case the monitoring process never removes their details. Therefore the monitoring process uses the connector
functionality with PROC_EVENT_EXIT to detect when a process crashes and removes the details if it is a previously registered PID.

This was working for us until we tried to run our system in a container.

> >
> > I am curious what the motivation is because up until now there has been
> > nothing very interesting using this functionality. So it hasn't been
> > worth anyone's time to make the necessary changes to the code.
>
> Imho, we should just state once and for all that the proc connector will
> not be namespaced. This is such a corner-case thing and has been
> non-namespaced for such a long time without consistent push for it to be
> namespaced combined with the fact that this needs quite some code to
> make it work correctly that I fear we end up buying more bugs than we're
> selling features. And realistically, you and I will end up maintaining
> this and I feel this is not worth the time(?). Maybe I'm being too
> pessimistic though.
>

Fair enough. I can certainly look for another way to detect process crashes. Interestingly I found a patch set [2] on the mailing list that attempts
to solve the problem I wish to solve, but it doesn't look like the patches were ever developed further. From reading the discussion thread on that
patch set it appears that I should be doing some form of polling on the /proc files.

Best regards,
Matt

[1] https://github.com/alliedtelesis/cmsg/blob/master/cmsg/src/service_listener/netlink.c#L61
[2] https://lkml.org/lkml/2018/10/29/638

2020-07-13 18:38:22

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/5] RFC: connector: Add network namespace awareness

Matt Bennett <[email protected]> writes:

> On Thu, 2020-07-02 at 21:10 +0200, Christian Brauner wrote:
>> On Thu, Jul 02, 2020 at 08:17:38AM -0500, Eric W. Biederman wrote:
>> > Matt Bennett <[email protected]> writes:
>> >
>> > > Previously the connector functionality could only be used by processes running in the
>> > > default network namespace. This meant that any process that uses the connector functionality
>> > > could not operate correctly when run inside a container. This is a draft patch series that
>> > > attempts to now allow this functionality outside of the default network namespace.
>> > >
>> > > I see this has been discussed previously [1], but am not sure how my changes relate to all
>> > > of the topics discussed there and/or if there are any unintended side effects from my draft
>> > > changes.
>> >
>> > Is there a piece of software that uses connector that you want to get
>> > working in containers?
>
> We have an IPC system [1] where processes can register their socket
> details (unix, tcp, tipc, ...) to a 'monitor' process. Processes can
> then get notified when other processes they are interested in
> start/stop their servers and use the registered details to connect to
> them. Everything works unless a process crashes, in which case the
> monitoring process never removes their details. Therefore the
> monitoring process uses the connector functionality with
> PROC_EVENT_EXIT to detect when a process crashes and removes the
> details if it is a previously registered PID.
>
> This was working for us until we tried to run our system in a container.
>
>> >
>> > I am curious what the motivation is because up until now there has been
>> > nothing very interesting using this functionality. So it hasn't been
>> > worth anyone's time to make the necessary changes to the code.
>>
>> Imho, we should just state once and for all that the proc connector will
>> not be namespaced. This is such a corner-case thing and has been
>> non-namespaced for such a long time without consistent push for it to be
>> namespaced combined with the fact that this needs quite some code to
>> make it work correctly that I fear we end up buying more bugs than we're
>> selling features. And realistically, you and I will end up maintaining
>> this and I feel this is not worth the time(?). Maybe I'm being too
>> pessimistic though.
>>
>
> Fair enough. I can certainly look for another way to detect process
> crashes. Interestingly I found a patch set [2] on the mailing list
> that attempts to solve the problem I wish to solve, but it doesn't
> look like the patches were ever developed further. From reading the
> discussion thread on that patch set it appears that I should be doing
> some form of polling on the /proc files.

Recently Christian Brauner implemented pidfd complete with a poll
operation that reports when a process terminates.

If you are willing to change your userspace code switching to pidfd
should be all that you need.

Eric

2020-07-13 18:45:15

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/5] RFC: connector: Add network namespace awareness

Matt Bennett <[email protected]> writes:

> On Thu, 2020-07-02 at 13:59 -0500, Eric W. Biederman wrote:
>> Matt Bennett <[email protected]> writes:
>>
>> > Previously the connector functionality could only be used by processes running in the
>> > default network namespace. This meant that any process that uses the connector functionality
>> > could not operate correctly when run inside a container. This is a draft patch series that
>> > attempts to now allow this functionality outside of the default network namespace.
>> >
>> > I see this has been discussed previously [1], but am not sure how my changes relate to all
>> > of the topics discussed there and/or if there are any unintended side
>> > effects from my draft
>>
>> In a quick skim this patchset does not look like it approaches a correct
>> conversion to having code that works in multiple namespaces.
>>
>> I will take the changes to proc_id_connector for example.
>> You report the values in the callers current namespaces.
>>
>> Which means an unprivileged user can create a user namespace and get
>> connector to report whichever ids they want to users in another
>> namespace. AKA lie.
>>
>> So this appears to make connector completely unreliable.
>>
>> Eric
>>
>
> Hi Eric,
>
> Thank you for taking the time to review. I wrote these patches in an
> attempt to show that I was willing to do the work myself rather than
> simply asking for someone else to do it for me. The changes worked for
> my use cases when I tested them, but I expected that some of the
> changes would be incorrect and that I would need some guidance. I can
> spend some time to really dig in and fully understand the changes I am
> trying to make (I have limited kernel development experience) but
> based on the rest of the discussion threads it seems that there is
> likely no appetite to ever support namespaces with the connector.

Good approach to this.

My sense is that there are few enough uses of connector that if don't
mind changing your code so that it works in a container (and the pidfd
support appears to already provide what you need) that is probably the
past of least resistance.

I don't think it maintaining connector support would be much more work
than it is now, if someone went through and did the work to carefully
convert the code. So if someone really wants to use connector we can
namespace the code.

Otherwise it is probably makes sense to let the few users gradually stop
using connector so the code can eventually be removed.

Please checkout out the pidfd support and tell us how it meets your
needs. If there is something that connector really does better it would
be good to know.

Thank you,
Eric

2020-07-14 05:04:40

by Aleksa Sarai

[permalink] [raw]
Subject: Re: [PATCH 0/5] RFC: connector: Add network namespace awareness

On 2020-07-13, Eric W. Biederman <[email protected]> wrote:
> Matt Bennett <[email protected]> writes:
>
> > On Thu, 2020-07-02 at 21:10 +0200, Christian Brauner wrote:
> >> On Thu, Jul 02, 2020 at 08:17:38AM -0500, Eric W. Biederman wrote:
> >> > Matt Bennett <[email protected]> writes:
> >> >
> >> > > Previously the connector functionality could only be used by processes running in the
> >> > > default network namespace. This meant that any process that uses the connector functionality
> >> > > could not operate correctly when run inside a container. This is a draft patch series that
> >> > > attempts to now allow this functionality outside of the default network namespace.
> >> > >
> >> > > I see this has been discussed previously [1], but am not sure how my changes relate to all
> >> > > of the topics discussed there and/or if there are any unintended side effects from my draft
> >> > > changes.
> >> >
> >> > Is there a piece of software that uses connector that you want to get
> >> > working in containers?
> >
> > We have an IPC system [1] where processes can register their socket
> > details (unix, tcp, tipc, ...) to a 'monitor' process. Processes can
> > then get notified when other processes they are interested in
> > start/stop their servers and use the registered details to connect to
> > them. Everything works unless a process crashes, in which case the
> > monitoring process never removes their details. Therefore the
> > monitoring process uses the connector functionality with
> > PROC_EVENT_EXIT to detect when a process crashes and removes the
> > details if it is a previously registered PID.
> >
> > This was working for us until we tried to run our system in a container.
> >
> >> >
> >> > I am curious what the motivation is because up until now there has been
> >> > nothing very interesting using this functionality. So it hasn't been
> >> > worth anyone's time to make the necessary changes to the code.
> >>
> >> Imho, we should just state once and for all that the proc connector will
> >> not be namespaced. This is such a corner-case thing and has been
> >> non-namespaced for such a long time without consistent push for it to be
> >> namespaced combined with the fact that this needs quite some code to
> >> make it work correctly that I fear we end up buying more bugs than we're
> >> selling features. And realistically, you and I will end up maintaining
> >> this and I feel this is not worth the time(?). Maybe I'm being too
> >> pessimistic though.
> >>
> >
> > Fair enough. I can certainly look for another way to detect process
> > crashes. Interestingly I found a patch set [2] on the mailing list
> > that attempts to solve the problem I wish to solve, but it doesn't
> > look like the patches were ever developed further. From reading the
> > discussion thread on that patch set it appears that I should be doing
> > some form of polling on the /proc files.
>
> Recently Christian Brauner implemented pidfd complete with a poll
> operation that reports when a process terminates.
>
> If you are willing to change your userspace code switching to pidfd
> should be all that you need.

While this does solve the problem of getting exit notifications in
general, you cannot get the exit code. But if they don't care about that
then we can solve that problem another time. :D

--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>


Attachments:
(No filename) (3.46 kB)
signature.asc (235.00 B)
Download all attachments

2020-07-14 05:19:40

by Matt Bennett

[permalink] [raw]
Subject: Re: [PATCH 0/5] RFC: connector: Add network namespace awareness

On Tue, 2020-07-14 at 15:03 +1000, Aleksa Sarai wrote:
> On 2020-07-13, Eric W. Biederman <[email protected]> wrote:
> > Matt Bennett <[email protected]> writes:
> >
> > > On Thu, 2020-07-02 at 21:10 +0200, Christian Brauner wrote:
> > > > On Thu, Jul 02, 2020 at 08:17:38AM -0500, Eric W. Biederman wrote:
> > > > > Matt Bennett <[email protected]> writes:
> > > > >
> > > > > > Previously the connector functionality could only be used by processes running in the
> > > > > > default network namespace. This meant that any process that uses the connector functionality
> > > > > > could not operate correctly when run inside a container. This is a draft patch series that
> > > > > > attempts to now allow this functionality outside of the default network namespace.
> > > > > >
> > > > > > I see this has been discussed previously [1], but am not sure how my changes relate to all
> > > > > > of the topics discussed there and/or if there are any unintended side effects from my draft
> > > > > > changes.
> > > > >
> > > > > Is there a piece of software that uses connector that you want to get
> > > > > working in containers?
> > >
> > > We have an IPC system [1] where processes can register their socket
> > > details (unix, tcp, tipc, ...) to a 'monitor' process. Processes can
> > > then get notified when other processes they are interested in
> > > start/stop their servers and use the registered details to connect to
> > > them. Everything works unless a process crashes, in which case the
> > > monitoring process never removes their details. Therefore the
> > > monitoring process uses the connector functionality with
> > > PROC_EVENT_EXIT to detect when a process crashes and removes the
> > > details if it is a previously registered PID.
> > >
> > > This was working for us until we tried to run our system in a container.
> > >
> > > > >
> > > > > I am curious what the motivation is because up until now there has been
> > > > > nothing very interesting using this functionality. So it hasn't been
> > > > > worth anyone's time to make the necessary changes to the code.
> > > >
> > > > Imho, we should just state once and for all that the proc connector will
> > > > not be namespaced. This is such a corner-case thing and has been
> > > > non-namespaced for such a long time without consistent push for it to be
> > > > namespaced combined with the fact that this needs quite some code to
> > > > make it work correctly that I fear we end up buying more bugs than we're
> > > > selling features. And realistically, you and I will end up maintaining
> > > > this and I feel this is not worth the time(?). Maybe I'm being too
> > > > pessimistic though.
> > > >
> > >
> > > Fair enough. I can certainly look for another way to detect process
> > > crashes. Interestingly I found a patch set [2] on the mailing list
> > > that attempts to solve the problem I wish to solve, but it doesn't
> > > look like the patches were ever developed further. From reading the
> > > discussion thread on that patch set it appears that I should be doing
> > > some form of polling on the /proc files.
> >
> > Recently Christian Brauner implemented pidfd complete with a poll
> > operation that reports when a process terminates.
> >
> > If you are willing to change your userspace code switching to pidfd
> > should be all that you need.
>
> While this does solve the problem of getting exit notifications in
> general, you cannot get the exit code. But if they don't care about that
> then we can solve that problem another time. :D
>

From first glance using pidfd will do exactly what we need. Not being able
to get the exit code will not be an issue. In fact I think it will be an
improvement over the connector as the listener will now only be waiting for
the PIDs we actually care about - rather than getting woken up on every single
process exit and having to check if it cares about the PID.

Many thanks Eric and others,
Matt