LinuxLists.cc - Re: [PATCH 3/4] proc: Point /proc/net at /proc/thread-self/net instead of /proc/self/net

[permalink] [raw]

Subject: Re: [PATCH 3/4] proc: Point /proc/net at /proc/thread-self/net instead of /proc/self/net

Linus Torvalds <[email protected]> writes:

> On Thu, Sep 29, 2022 at 2:15 PM Al Viro <[email protected]> wrote:
>>
>> FWIW, what e.g. debian profile for dhclient has is
>> @{PROC}/@{pid}/net/dev r,
>>
>> Note that it's not
>> @{PROC}/net/dev r,
>
> Argh. Yeah, then a bind mount or a hardlink won't work either, you're
> right. I was assuming that any Apparmor rules allowed for just
> /proc/net.
>
> Oh well. I guess we're screwed any which way we turn.

I actually think there is a solution.

Instead of going to /proc/self/net -> /proc/tgid/net
or /proc/thread-self/net -> /proc/tgid/task/tid/net

We should be able to go to: /proc/tid/net

That directory does not show up in readdir, but the tid directories were
put in /proc because of how our pthread support evolved and gdb which
made gdb expect them to be their.

That should continue to work with the incomplete apparmor rules that
don't allow accessing /proc/tgid/tid/net for some reason.

Eric

2022-09-29 23:06:13

[permalink] [raw]

Subject: [CFT][PATCH] proc: Update /proc/net to point at the accessing threads network namespace

Since common apparmor policies don't allow access /proc/tgid/task/tid/net
point the code at /proc/tid/net instead.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>
---

I have only compile tested this. All of the boiler plate is a copy of
/proc/self and /proc/thread-self, so it should work.

Can David or someone who cares and has access to the limited apparmor
configurations could test this to make certain this works?

fs/proc/base.c | 12 ++++++--
fs/proc/internal.h | 2 ++
fs/proc/proc_net.c | 68 ++++++++++++++++++++++++++++++++++++++++-
fs/proc/root.c | 7 ++++-
include/linux/proc_fs.h | 1 +
5 files changed, 85 insertions(+), 5 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 93f7e3d971e4..c205234f3822 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3479,7 +3479,7 @@ static struct tgid_iter next_tgid(struct pid_namespace *ns, struct tgid_iter ite
return iter;
}

-#define TGID_OFFSET (FIRST_PROCESS_ENTRY + 2)
+#define TGID_OFFSET (FIRST_PROCESS_ENTRY + 3)

/* for the /proc/ directory itself, after non-process stuff has been done */
int proc_pid_readdir(struct file *file, struct dir_context *ctx)
@@ -3492,18 +3492,24 @@ int proc_pid_readdir(struct file *file, struct dir_context *ctx)
if (pos >= PID_MAX_LIMIT + TGID_OFFSET)
return 0;

- if (pos == TGID_OFFSET - 2) {
+ if (pos == TGID_OFFSET - 3) {
struct inode *inode = d_inode(fs_info->proc_self);
if (!dir_emit(ctx, "self", 4, inode->i_ino, DT_LNK))
return 0;
ctx->pos = pos = pos + 1;
}
- if (pos == TGID_OFFSET - 1) {
+ if (pos == TGID_OFFSET - 2) {
struct inode *inode = d_inode(fs_info->proc_thread_self);
if (!dir_emit(ctx, "thread-self", 11, inode->i_ino, DT_LNK))
return 0;
ctx->pos = pos = pos + 1;
}
+ if (pos == TGID_OFFSET - 1) {
+ struct inode *inode = d_inode(fs_info->proc_net);
+ if (!dir_emit(ctx, "net", 11, inode->i_ino, DT_LNK))
+ return 0;
+ ctx->pos = pos = pos + 1;
+ }
iter.tgid = pos - TGID_OFFSET;
iter.task = NULL;
for (iter = next_tgid(ns, iter);
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 06a80f78433d..9d13c24b80c8 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -232,8 +232,10 @@ extern const struct inode_operations proc_net_inode_operations;

#ifdef CONFIG_NET
extern int proc_net_init(void);
+extern int proc_setup_net_symlink(struct super_block *s);
#else
static inline int proc_net_init(void) { return 0; }
+static inline int proc_setup_net_symlink(struct super_block *s) { return 0; }
#endif

/*
diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
index 856839b8ae8b..99335e800c1c 100644
--- a/fs/proc/proc_net.c
+++ b/fs/proc/proc_net.c
@@ -408,9 +408,75 @@ static struct pernet_operations __net_initdata proc_net_ns_ops = {
.exit = proc_net_ns_exit,
};

+/*
+ * /proc/net:
+ */
+static const char *proc_net_symlink_get_link(struct dentry *dentry,
+ struct inode *inode,
+ struct delayed_call *done)
+{
+ struct pid_namespace *ns = proc_pid_ns(inode->i_sb);
+ pid_t tid = task_pid_nr_ns(current, ns);
+ char *name;
+
+ if (!tid)
+ return ERR_PTR(-ENOENT);
+ name = kmalloc(10 + 4 + 1, dentry ? GFP_KERNEL : GFP_ATOMIC);
+ if (unlikely(!name))
+ return dentry ? ERR_PTR(-ENOMEM) : ERR_PTR(-ECHILD);
+ sprintf(name, "%u/net", tid);
+ set_delayed_call(done, kfree_link, name);
+ return name;
+}
+
+static const struct inode_operations proc_net_symlink_inode_operations = {
+ .get_link = proc_net_symlink_get_link,
+};
+
+static unsigned net_symlink_inum __ro_after_init;
+
+int proc_setup_net_symlink(struct super_block *s)
+{
+ struct inode *root_inode = d_inode(s->s_root);
+ struct proc_fs_info *fs_info = proc_sb_info(s);
+ struct dentry *net_symlink;
+ int ret = -ENOMEM;
+
+ inode_lock(root_inode);
+ net_symlink = d_alloc_name(s->s_root, "net");
+ if (net_symlink) {
+ struct inode *inode = new_inode(s);
+ if (inode) {
+ inode->i_ino = net_symlink_inum;
+ inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode);
+ inode->i_mode = S_IFLNK | S_IRWXUGO;
+ inode->i_uid = GLOBAL_ROOT_UID;
+ inode->i_gid = GLOBAL_ROOT_GID;
+ inode->i_op = &proc_net_symlink_inode_operations;
+ d_add(net_symlink, inode);
+ ret = 0;
+ } else {
+ dput(net_symlink);
+ }
+ }
+ inode_unlock(root_inode);
+
+ if (ret)
+ pr_err("proc_fill_super: can't allocate /proc/net\n");
+ else
+ fs_info->proc_net = net_symlink;
+
+ return ret;
+}
+
+void __init proc_net_symlink_init(void)
+{
+ proc_alloc_inum(&net_symlink_inum);
+}
+
int __init proc_net_init(void)
{
- proc_symlink("net", NULL, "self/net");
+ proc_net_symlink_init();

return register_pernet_subsys(&proc_net_ns_ops);
}
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 3c2ee3eb1138..6e57e9a4acf9 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -207,7 +207,11 @@ static int proc_fill_super(struct super_block *s, struct fs_context *fc)
if (ret) {
return ret;
}
- return proc_setup_thread_self(s);
+ ret = proc_setup_thread_self(s);
+ if (ret) {
+ return ret;
+ }
+ return proc_setup_net_symlink(s);
}

static int proc_reconfigure(struct fs_context *fc)
@@ -268,6 +272,7 @@ static void proc_kill_sb(struct super_block *sb)

dput(fs_info->proc_self);
dput(fs_info->proc_thread_self);
+ dput(fs_info->proc_net);

kill_anon_super(sb);
put_pid_ns(fs_info->pid_ns);
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 81d6e4ec2294..65f4ef15c8bf 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -62,6 +62,7 @@ struct proc_fs_info {
struct pid_namespace *pid_ns;
struct dentry *proc_self; /* For /proc/self */
struct dentry *proc_thread_self; /* For /proc/thread-self */
+ struct dentry *proc_net; /* For /proc/net */
kgid_t pid_gid;
enum proc_hidepid hide_pid;
enum proc_pidonly pidonly;
--
2.35.3

2022-09-30 00:28:16

[permalink] [raw]

Subject: Re: [CFT][PATCH] proc: Update /proc/net to point at the accessing threads network namespace

On Thu, Sep 29, 2022 at 05:48:29PM -0500, Eric W. Biederman wrote:

> +static const char *proc_net_symlink_get_link(struct dentry *dentry,
> + struct inode *inode,
> + struct delayed_call *done)
> +{
> + struct pid_namespace *ns = proc_pid_ns(inode->i_sb);
> + pid_t tid = task_pid_nr_ns(current, ns);
> + char *name;
> +
> + if (!tid)
> + return ERR_PTR(-ENOENT);
> + name = kmalloc(10 + 4 + 1, dentry ? GFP_KERNEL : GFP_ATOMIC);
> + if (unlikely(!name))
> + return dentry ? ERR_PTR(-ENOMEM) : ERR_PTR(-ECHILD);
> + sprintf(name, "%u/net", tid);
> + set_delayed_call(done, kfree_link, name);
> + return name;
> +}

Just to troll adobriyan a bit:

static const char *dynamic_get_link(struct delayed_call *done,
bool is_rcu,
const char *fmt, ...)
{
va_list args;
char *body;

va_start(args, fmt);
body = kvasprintf(is_rcu ? GFP_ATOMIC : GFP_KERNEL, fmt, args);
va_end(args);

if (unlikely(!body))
return is_rcu ? ERR_PTR(-ECHILD) : ERR_PTR(-ENOMEM);
set_delayed_call(done, kfree_link, body);
return body;
}

static const char *proc_net_symlink_get_link(struct dentry *dentry,
struct inode *inode,
struct delayed_call *done)
{
struct pid_namespace *ns = proc_pid_ns(inode->i_sb);
pid_t tid = task_pid_nr_ns(current, ns);

if (!tid)
return ERR_PTR(-ENOENT);
return dyname_get_link(done, !dentry, "%u/net", tid);
}

static const char *proc_self_get_link(struct dentry *dentry,
struct inode *inode,
struct delayed_call *done)
{
struct pid_namespace *ns = proc_pid_ns(inode->i_sb);
pid_t tgid = task_tgid_nr_ns(current, ns);

if (!tgid)
return ERR_PTR(-ENOENT);
return dynamic_get_link(done, !dentry, "%u", tgid);
}

static const char *proc_thread_self_get_link(struct dentry *dentry,
struct inode *inode,
struct delayed_call *done)
{
struct pid_namespace *ns = proc_pid_ns(inode->i_sb);
pid_t tgid = task_tgid_nr_ns(current, ns);
pid_t pid = task_pid_nr_ns(current, ns);

if (!pid)
return ERR_PTR(-ENOENT);
return dynamic_get_link(done, !dentry, "%u/task/%u", tgid, pid);
}

2022-09-30 04:29:15

by kernel test robot

[permalink] [raw]

Subject: Re: [CFT][PATCH] proc: Update /proc/net to point at the accessing threads network namespace

Hi Eric,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linux/master]
[also build test WARNING on linus/master v6.0-rc7 next-20220929]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Eric-W-Biederman/proc-Update-proc-net-to-point-at-the-accessing-threads-network-namespace/20220930-065017
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 987a926c1d8a40e4256953b04771fbdb63bc7938
config: m68k-allyesconfig
compiler: m68k-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/5336f1902b4ba8a646f082f32fbb183850a13080
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Eric-W-Biederman/proc-Update-proc-net-to-point-at-the-accessing-threads-network-namespace/20220930-065017
git checkout 5336f1902b4ba8a646f082f32fbb183850a13080
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=m68k SHELL=/bin/bash fs/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

>> fs/proc/proc_net.c:472:13: warning: no previous prototype for 'proc_net_symlink_init' [-Wmissing-prototypes]
472 | void __init proc_net_symlink_init(void)
| ^~~~~~~~~~~~~~~~~~~~~

vim +/proc_net_symlink_init +472 fs/proc/proc_net.c

471
> 472 void __init proc_net_symlink_init(void)
473 {
474 proc_alloc_inum(&net_symlink_inum);
475 }
476

--
0-DAY CI Kernel Test Service
https://01.org/lkp

Attachments:

(No filename) (2.08 kB)
config (285.05 kB)
Download all attachments

2022-09-30 06:15:31

by kernel test robot

[permalink] [raw]

Subject: Re: [CFT][PATCH] proc: Update /proc/net to point at the accessing threads network namespace

Hi Eric,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linux/master]
[also build test WARNING on linus/master v6.0-rc7 next-20220929]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Eric-W-Biederman/proc-Update-proc-net-to-point-at-the-accessing-threads-network-namespace/20220930-065017
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 987a926c1d8a40e4256953b04771fbdb63bc7938
config: i386-randconfig-s001
compiler: gcc-11 (Debian 11.3.0-5) 11.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.4-39-gce1a6720-dirty
# https://github.com/intel-lab-lkp/linux/commit/5336f1902b4ba8a646f082f32fbb183850a13080
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Eric-W-Biederman/proc-Update-proc-net-to-point-at-the-accessing-threads-network-namespace/20220930-065017
git checkout 5336f1902b4ba8a646f082f32fbb183850a13080
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash fs/proc/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>

sparse warnings: (new ones prefixed by >>)
>> fs/proc/proc_net.c:472:13: sparse: sparse: symbol 'proc_net_symlink_init' was not declared. Should it be static?

--
0-DAY CI Kernel Test Service
https://01.org/lkp

Attachments:

(No filename) (1.73 kB)
config (151.47 kB)
Download all attachments

2022-09-30 09:39:54

[permalink] [raw]

Subject: RE: [CFT][PATCH] proc: Update /proc/net to point at the accessing threads network namespace

From: Eric W. Biederman
> Sent: 29 September 2022 23:48
>
> Since common apparmor policies don't allow access /proc/tgid/task/tid/net
> point the code at /proc/tid/net instead.
>
> Link: https://lkml.kernel.org/r/[email protected]
> Signed-off-by: "Eric W. Biederman" <[email protected]>
> ---
>
> I have only compile tested this. All of the boiler plate is a copy of
> /proc/self and /proc/thread-self, so it should work.
>
> Can David or someone who cares and has access to the limited apparmor
> configurations could test this to make certain this works?

It works with a minor 'cut & paste' fixup.
(Not nested inside a program that changes namespaces.)

Although if it is reasonable for /proc/net -> /proc/tid/net
why not just make /proc/thread-self -> /proc/tid
Then /proc/net can just be thread-self/net

I have wondered if the namespace lookup could be done as a 'special'
directory lookup for "net" rather that changing everything when the
namespace is changed.
I can imagine scenarios where a thread needs to keep changing
between two namespaces, at the moment I suspect that is rather
more expensive than a lookup and changing the reference counts.

Notwithstanding the apparmor issues, /proc/net could actuall be
a symlink to (say) /proc/net_namespaces/namespace_name with
readlink returning the name based on the threads actual namespace.

I've also had problems with accessing /sys/class/net for multiple
namespaces within the same thread (think of a system monitor process).
The simplest solution is to start the program with:
ip netne exec namespace program 3</sys/class/net
and the use openat(3, ...) to read items in the 'init' namespace.

FWIW I'm pretty sure there a sequence involving unshare() that
can get you out of a chroot - but I've not found it yet.

David

>
> fs/proc/base.c | 12 ++++++--
> fs/proc/internal.h | 2 ++
> fs/proc/proc_net.c | 68 ++++++++++++++++++++++++++++++++++++++++-
> fs/proc/root.c | 7 ++++-
> include/linux/proc_fs.h | 1 +
> 5 files changed, 85 insertions(+), 5 deletions(-)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 93f7e3d971e4..c205234f3822 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -3479,7 +3479,7 @@ static struct tgid_iter next_tgid(struct pid_namespace *ns, struct tgid_iter ite
> return iter;
> }
>
> -#define TGID_OFFSET (FIRST_PROCESS_ENTRY + 2)
> +#define TGID_OFFSET (FIRST_PROCESS_ENTRY + 3)
>
> /* for the /proc/ directory itself, after non-process stuff has been done */
> int proc_pid_readdir(struct file *file, struct dir_context *ctx)
> @@ -3492,18 +3492,24 @@ int proc_pid_readdir(struct file *file, struct dir_context *ctx)
> if (pos >= PID_MAX_LIMIT + TGID_OFFSET)
> return 0;
>
> - if (pos == TGID_OFFSET - 2) {
> + if (pos == TGID_OFFSET - 3) {
> struct inode *inode = d_inode(fs_info->proc_self);
> if (!dir_emit(ctx, "self", 4, inode->i_ino, DT_LNK))
> return 0;
> ctx->pos = pos = pos + 1;
> }
> - if (pos == TGID_OFFSET - 1) {
> + if (pos == TGID_OFFSET - 2) {
> struct inode *inode = d_inode(fs_info->proc_thread_self);
> if (!dir_emit(ctx, "thread-self", 11, inode->i_ino, DT_LNK))
> return 0;
> ctx->pos = pos = pos + 1;
> }
> + if (pos == TGID_OFFSET - 1) {
> + struct inode *inode = d_inode(fs_info->proc_net);
> + if (!dir_emit(ctx, "net", 11, inode->i_ino, DT_LNK))

The 11 is the length so needs to be 4.
This block can also be put first - to reduce churn.

David

> + return 0;
> + ctx->pos = pos = pos + 1;
> + }
> iter.tgid = pos - TGID_OFFSET;
> iter.task = NULL;
> for (iter = next_tgid(ns, iter);
> diff --git a/fs/proc/internal.h b/fs/proc/internal.h
> index 06a80f78433d..9d13c24b80c8 100644
> --- a/fs/proc/internal.h
> +++ b/fs/proc/internal.h
> @@ -232,8 +232,10 @@ extern const struct inode_operations proc_net_inode_operations;
>
> #ifdef CONFIG_NET
> extern int proc_net_init(void);
> +extern int proc_setup_net_symlink(struct super_block *s);
> #else
> static inline int proc_net_init(void) { return 0; }
> +static inline int proc_setup_net_symlink(struct super_block *s) { return 0; }
> #endif
>
> /*
> diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
> index 856839b8ae8b..99335e800c1c 100644
> --- a/fs/proc/proc_net.c
> +++ b/fs/proc/proc_net.c
> @@ -408,9 +408,75 @@ static struct pernet_operations __net_initdata proc_net_ns_ops = {
> .exit = proc_net_ns_exit,
> };
>
> +/*
> + * /proc/net:
> + */
> +static const char *proc_net_symlink_get_link(struct dentry *dentry,
> + struct inode *inode,
> + struct delayed_call *done)
> +{
> + struct pid_namespace *ns = proc_pid_ns(inode->i_sb);
> + pid_t tid = task_pid_nr_ns(current, ns);
> + char *name;
> +
> + if (!tid)
> + return ERR_PTR(-ENOENT);
> + name = kmalloc(10 + 4 + 1, dentry ? GFP_KERNEL : GFP_ATOMIC);
> + if (unlikely(!name))
> + return dentry ? ERR_PTR(-ENOMEM) : ERR_PTR(-ECHILD);
> + sprintf(name, "%u/net", tid);
> + set_delayed_call(done, kfree_link, name);
> + return name;
> +}
> +
> +static const struct inode_operations proc_net_symlink_inode_operations = {
> + .get_link = proc_net_symlink_get_link,
> +};
> +
> +static unsigned net_symlink_inum __ro_after_init;
> +
> +int proc_setup_net_symlink(struct super_block *s)
> +{
> + struct inode *root_inode = d_inode(s->s_root);
> + struct proc_fs_info *fs_info = proc_sb_info(s);
> + struct dentry *net_symlink;
> + int ret = -ENOMEM;
> +
> + inode_lock(root_inode);
> + net_symlink = d_alloc_name(s->s_root, "net");
> + if (net_symlink) {
> + struct inode *inode = new_inode(s);
> + if (inode) {
> + inode->i_ino = net_symlink_inum;
> + inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode);
> + inode->i_mode = S_IFLNK | S_IRWXUGO;
> + inode->i_uid = GLOBAL_ROOT_UID;
> + inode->i_gid = GLOBAL_ROOT_GID;
> + inode->i_op = &proc_net_symlink_inode_operations;
> + d_add(net_symlink, inode);
> + ret = 0;
> + } else {
> + dput(net_symlink);
> + }
> + }
> + inode_unlock(root_inode);
> +
> + if (ret)
> + pr_err("proc_fill_super: can't allocate /proc/net\n");
> + else
> + fs_info->proc_net = net_symlink;
> +
> + return ret;
> +}
> +
> +void __init proc_net_symlink_init(void)
> +{
> + proc_alloc_inum(&net_symlink_inum);
> +}
> +
> int __init proc_net_init(void)
> {
> - proc_symlink("net", NULL, "self/net");
> + proc_net_symlink_init();
>
> return register_pernet_subsys(&proc_net_ns_ops);
> }
> diff --git a/fs/proc/root.c b/fs/proc/root.c
> index 3c2ee3eb1138..6e57e9a4acf9 100644
> --- a/fs/proc/root.c
> +++ b/fs/proc/root.c
> @@ -207,7 +207,11 @@ static int proc_fill_super(struct super_block *s, struct fs_context *fc)
> if (ret) {
> return ret;
> }
> - return proc_setup_thread_self(s);
> + ret = proc_setup_thread_self(s);
> + if (ret) {
> + return ret;
> + }
> + return proc_setup_net_symlink(s);
> }
>
> static int proc_reconfigure(struct fs_context *fc)
> @@ -268,6 +272,7 @@ static void proc_kill_sb(struct super_block *sb)
>
> dput(fs_info->proc_self);
> dput(fs_info->proc_thread_self);
> + dput(fs_info->proc_net);
>
> kill_anon_super(sb);
> put_pid_ns(fs_info->pid_ns);
> diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
> index 81d6e4ec2294..65f4ef15c8bf 100644
> --- a/include/linux/proc_fs.h
> +++ b/include/linux/proc_fs.h
> @@ -62,6 +62,7 @@ struct proc_fs_info {
> struct pid_namespace *pid_ns;
> struct dentry *proc_self; /* For /proc/self */
> struct dentry *proc_thread_self; /* For /proc/thread-self */
> + struct dentry *proc_net; /* For /proc/net */
> kgid_t pid_gid;
> enum proc_hidepid hide_pid;
> enum proc_pidonly pidonly;
> --
> 2.35.3

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2022-09-30 14:34:38

by Alexey Dobriyan

[permalink] [raw]

Subject: Re: [CFT][PATCH] proc: Update /proc/net to point at the accessing threads network namespace

Al wrote:

> Just to troll adobriyan a bit:
>
> static const char *dynamic_get_link(struct delayed_call *done,
> bool is_rcu,
> const char *fmt, ...)
> {
> va_list args;
> char *body;
>
> va_start(args, fmt);
> body = kvasprintf(is_rcu ? GFP_ATOMIC : GFP_KERNEL, fmt, args);
> va_end(args);

Ouch... Double pass over data. Who wrote this?

>
> if (unlikely(!body))
> return is_rcu ? ERR_PTR(-ECHILD) : ERR_PTR(-ENOMEM);
> set_delayed_call(done, kfree_link, body);
> return body;
> }

2022-09-30 16:29:06

[permalink] [raw]

Subject: Re: [CFT][PATCH] proc: Update /proc/net to point at the accessing threads network namespace

David Laight <[email protected]> writes:

> From: Eric W. Biederman
>> Sent: 29 September 2022 23:48
>>
>> Since common apparmor policies don't allow access /proc/tgid/task/tid/net
>> point the code at /proc/tid/net instead.
>>
>> Link: https://lkml.kernel.org/r/[email protected]
>> Signed-off-by: "Eric W. Biederman" <[email protected]>
>> ---
>>
>> I have only compile tested this. All of the boiler plate is a copy of
>> /proc/self and /proc/thread-self, so it should work.
>>
>> Can David or someone who cares and has access to the limited apparmor
>> configurations could test this to make certain this works?
>
> It works with a minor 'cut & paste' fixup.
> (Not nested inside a program that changes namespaces.)

Were there any apparmor problems? I just want to confirm that is what
you tested.

Assuming not this patch looks like it reveals a solution to this
issue.

> Although if it is reasonable for /proc/net -> /proc/tid/net
> why not just make /proc/thread-self -> /proc/tid
> Then /proc/net can just be thread-self/net

There are minor differences between the process directories that
tend to report process wide information and task directories that
only report some of the same information per-task. So in general
thread-self makes much more sense pointing to a per-task directory.

The hidden /proc/tid/ directories use the per process code to generate
themselves. The difference is that they assume the tid is the leading
thread instead of the other process. Those directories are all a bit of
a scrambled mess. I was suspecting the other day we might be able to
fix gdb and make them go away entirely in a decade or so.

So I don't think it makes sense in general to point /proc/thread-self at
the hidden per /proc/tid/ directories.

> I have wondered if the namespace lookup could be done as a 'special'
> directory lookup for "net" rather that changing everything when the
> namespace is changed.
> I can imagine scenarios where a thread needs to keep changing
> between two namespaces, at the moment I suspect that is rather
> more expensive than a lookup and changing the reference counts.

You can always open the net directories once, and then change as
an open directory will not change between namespaces.

> Notwithstanding the apparmor issues, /proc/net could actuall be
> a symlink to (say) /proc/net_namespaces/namespace_name with
> readlink returning the name based on the threads actual namespace.

There really aren't good names for namespaces at the kernel level. As
one of their use cases is to make process migration possible between
machines. So any kernel level name would need to be migrated as well.
So those kernel level names would need a name in another namespace,
or an extra namespace would have to be created for those names.

> I've also had problems with accessing /sys/class/net for multiple
> namespaces within the same thread (think of a system monitor process).
> The simplest solution is to start the program with:
> ip netne exec namespace program 3</sys/class/net
> and the use openat(3, ...) to read items in the 'init' namespace.
>
> FWIW I'm pretty sure there a sequence involving unshare() that
> can get you out of a chroot - but I've not found it yet.

Out of a chroot is essentially just:
chdir("/");
chroot("/somedir");
chdir("../../../../../../../../../../../../../../../..");
Out of most namespaces except the pid and user namespace is
just chns.

You can't get out of the pid namespace as you can't change your pid.

Not being able to escape a user namespace is what makes it impossible to
confuse a process and gain privileges through a privilege gaining exec.

Eric

2022-09-30 21:31:00

[permalink] [raw]

Subject: RE: [CFT][PATCH] proc: Update /proc/net to point at the accessing threads network namespace

From: Eric W. Biederman
> Sent: 30 September 2022 17:17
>
> David Laight <[email protected]> writes:
>
> > From: Eric W. Biederman
> >> Sent: 29 September 2022 23:48
> >>
> >> Since common apparmor policies don't allow access /proc/tgid/task/tid/net
> >> point the code at /proc/tid/net instead.
> >>
> >> Link: https://lkml.kernel.org/r/[email protected]
> >> Signed-off-by: "Eric W. Biederman" <[email protected]>
> >> ---
> >>
> >> I have only compile tested this. All of the boiler plate is a copy of
> >> /proc/self and /proc/thread-self, so it should work.
> >>
> >> Can David or someone who cares and has access to the limited apparmor
> >> configurations could test this to make certain this works?
> >
> > It works with a minor 'cut & paste' fixup.
> > (Not nested inside a program that changes namespaces.)
>
> Were there any apparmor problems? I just want to confirm that is what
> you tested.

I know nothing about apparmor - I just tested that /proc/net
pointed to somewhere that looked right.

> Assuming not this patch looks like it reveals a solution to this
> issue.
>
> > Although if it is reasonable for /proc/net -> /proc/tid/net
> > why not just make /proc/thread-self -> /proc/tid
> > Then /proc/net can just be thread-self/net
>
> There are minor differences between the process directories that
> tend to report process wide information and task directories that
> only report some of the same information per-task. So in general
> thread-self makes much more sense pointing to a per-task directory.
>
> The hidden /proc/tid/ directories use the per process code to generate
> themselves. The difference is that they assume the tid is the leading
> thread instead of the other process. Those directories are all a bit of
> a scrambled mess. I was suspecting the other day we might be able to
> fix gdb and make them go away entirely in a decade or so.
>
> So I don't think it makes sense in general to point /proc/thread-self at
> the hidden per /proc/tid/ directories.

Ok - I hadn't actually looked in them.
But if you have a long-term plan to remove them directing /proc/net
thought them might not be such a good idea.

> > I have wondered if the namespace lookup could be done as a 'special'
> > directory lookup for "net" rather that changing everything when the
> > namespace is changed.
> > I can imagine scenarios where a thread needs to keep changing
> > between two namespaces, at the moment I suspect that is rather
> > more expensive than a lookup and changing the reference counts.
>
> You can always open the net directories once, and then change as
> an open directory will not change between namespaces.

Part of the problem is that changing the net namespace isn't
enough, you also have to remount /sys - which isn't entirely
trivial.
It might be possibly to mount a network namespace version
of /sys on a different mountpoint - I've not tried very
hard to do that.

> > Notwithstanding the apparmor issues, /proc/net could actuall be
> > a symlink to (say) /proc/net_namespaces/namespace_name with
> > readlink returning the name based on the threads actual namespace.
>
> There really aren't good names for namespaces at the kernel level. As
> one of their use cases is to make process migration possible between
> machines. So any kernel level name would need to be migrated as well.
> So those kernel level names would need a name in another namespace,
> or an extra namespace would have to be created for those names.

Network namespaces do seem to have names.
Although I gave up working out how to change to a named network
namespace from within the kernel (especially in a non-GPL module).

...
> > FWIW I'm pretty sure there a sequence involving unshare() that
> > can get you out of a chroot - but I've not found it yet.
>
> Out of a chroot is essentially just:
> chdir("/");
> chroot("/somedir");
> chdir("../../../../../../../../../../../../../../../..");

A chdir() inside a chroot anchors at the base of the chroot.
fchdir() will get you out if you have an open fd to a directory
outside the chroot.
The 'usual' way out requires a process outside the chroot to
just use mvdir().
But there isn't supposed to be a way to get out.

I can certainly get the /proc symlinks (for a copy of /proc
mounted inside a chroot) to report the full paths for files
that exist inside the chroot.
These should (and do normally) truncate at the chroot base.
(This all happened because a pivot_root() was failing.)

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2022-10-02 00:38:57

[permalink] [raw]

Subject: Re: [CFT][PATCH] proc: Update /proc/net to point at the accessing threads network namespace

On Fri, Sep 30, 2022 at 09:28:31PM +0000, David Laight wrote:
> > > FWIW I'm pretty sure there a sequence involving unshare() that
> > > can get you out of a chroot - but I've not found it yet.
> >
> > Out of a chroot is essentially just:
> > chdir("/");
> > chroot("/somedir");
> > chdir("../../../../../../../../../../../../../../../..");
>
> A chdir() inside a chroot anchors at the base of the chroot.
> fchdir() will get you out if you have an open fd to a directory
> outside the chroot.
> The 'usual' way out requires a process outside the chroot to
> just use mvdir().
> But there isn't supposed to be a way to get out.

In order of original claims:

* chdir inside a chroot does *NOT* "anchor at the base of the chroot".
What it does is (a) start at the base if the pathname is absolute and
(b) treats .. in the base as ., same as any other syscall.

* correct.

* WTF is "mvdir()"? Some Unices used to have mvdir(1), but it had never
been a function... And mv(1) (or rename(2)) is far from being the only
way for assistant outside of jail to let the chrooted process out.

* ability to chroot(2) had always been equivalent to ability to undo
chroot(2). If you want to prevent getting out of there, you need
(among other things) to prevent the processes to be confined from
further chroot(2).

2022-10-03 10:41:13

[permalink] [raw]

Subject: RE: [CFT][PATCH] proc: Update /proc/net to point at the accessing threads network namespace

...
> * ability to chroot(2) had always been equivalent to ability to undo
> chroot(2). If you want to prevent getting out of there, you need
> (among other things) to prevent the processes to be confined from
> further chroot(2).

Not always, certainly not historically.
chroot() inside a chroot() just constrained you further.
If fchdir() and openat() have broken that it is a serious
problem.

NetBSD certainly has checks to detect (log and fix)
programs that have (or might) escape from chroots.

unshare() seems to create a 'shadow' inode structure
for the chroot's "/" so at least some of the tests
when following ".." fail to detect it.

I also thought containers relied on the same scheme?
(But I'm too old fashioned to have looked into them!)

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2022-10-03 14:29:10

[permalink] [raw]

Subject: Re: [CFT][PATCH] proc: Update /proc/net to point at the accessing threads network namespace

On Mon, Oct 03, 2022 at 09:36:46AM +0000, David Laight wrote:
> ...
> > * ability to chroot(2) had always been equivalent to ability to undo
> > chroot(2). If you want to prevent getting out of there, you need
> > (among other things) to prevent the processes to be confined from
> > further chroot(2).
>
> Not always, certainly not historically.

Factually incorrect.

> chroot() inside a chroot() just constrained you further.

What it did was change your root directory. Yes, deeper.
And leave your current directory where it had been.

Now, recall that chroot does *NOT* affect the
interpretation of .. other than in the current root.

Which means that attacker doing
chdir("/");
chroot(some_existing_directory);
chdir("..");
will end up outside of the original chroot environment.

This is POSIX-mandated behaviour. Moreover, that is behaviour of
historical Unices. Any Unix programmer who tries to use chroot(2)
should be aware of that. Ability of making chroot(2) calls
means the ability to break out of any chroot you are currently in.

> If fchdir() and openat() have broken that it is a serious
> problem.

Have you even read the mail you'd been replying to? Where had anything
in the example given (OK sketched out) to you upthread involve fchdir()
or openat()?

2022-10-03 17:27:13

[permalink] [raw]

Subject: Re: [CFT][PATCH] proc: Update /proc/net to point at the accessing threads network namespace

David Laight <[email protected]> writes:

> From: Eric W. Biederman
>> Sent: 30 September 2022 17:17
>>
>> David Laight <[email protected]> writes:
>>
>> > From: Eric W. Biederman
>> >> Sent: 29 September 2022 23:48
>> >>
>> >> Since common apparmor policies don't allow access /proc/tgid/task/tid/net
>> >> point the code at /proc/tid/net instead.
>> >>
>> >> Link: https://lkml.kernel.org/r/[email protected]
>> >> Signed-off-by: "Eric W. Biederman" <[email protected]>
>> >> ---
>> >>
>> >> I have only compile tested this. All of the boiler plate is a copy of
>> >> /proc/self and /proc/thread-self, so it should work.
>> >>
>> >> Can David or someone who cares and has access to the limited apparmor
>> >> configurations could test this to make certain this works?
>> >
>> > It works with a minor 'cut & paste' fixup.
>> > (Not nested inside a program that changes namespaces.)
>>
>> Were there any apparmor problems? I just want to confirm that is what
>> you tested.
>
> I know nothing about apparmor - I just tested that /proc/net
> pointed to somewhere that looked right.

Fair enough. We should attempt to verify with an apparmor configuration
before merging this just in case there is a detail someone overlooked.
It doesn't help much if there is a fix that has to be reverted right
away.

>> Assuming not this patch looks like it reveals a solution to this
>> issue.
>>
>> > Although if it is reasonable for /proc/net -> /proc/tid/net
>> > why not just make /proc/thread-self -> /proc/tid
>> > Then /proc/net can just be thread-self/net
>>
>> There are minor differences between the process directories that
>> tend to report process wide information and task directories that
>> only report some of the same information per-task. So in general
>> thread-self makes much more sense pointing to a per-task directory.
>>
>> The hidden /proc/tid/ directories use the per process code to generate
>> themselves. The difference is that they assume the tid is the leading
>> thread instead of the other process. Those directories are all a bit of
>> a scrambled mess. I was suspecting the other day we might be able to
>> fix gdb and make them go away entirely in a decade or so.
>>
>> So I don't think it makes sense in general to point /proc/thread-self at
>> the hidden per /proc/tid/ directories.
>
> Ok - I hadn't actually looked in them.
> But if you have a long-term plan to remove them directing /proc/net
> thought them might not be such a good idea.

Nah. I just want to grouse about them and encourage people not to
use them in general. They are a weird special case. They aren't
painful enough to maintain to make me want to do something else.

It would actually be less work to fix the apparmor security polices,
and the to verify over the course of a several years that the broken
security policies are no longer shipped.

>> > I have wondered if the namespace lookup could be done as a 'special'
>> > directory lookup for "net" rather that changing everything when the
>> > namespace is changed.
>> > I can imagine scenarios where a thread needs to keep changing
>> > between two namespaces, at the moment I suspect that is rather
>> > more expensive than a lookup and changing the reference counts.
>>
>> You can always open the net directories once, and then change as
>> an open directory will not change between namespaces.
>
> Part of the problem is that changing the net namespace isn't
> enough, you also have to remount /sys - which isn't entirely
> trivial.

Yes. That is actually a much more maintainable model. But it is still
imperfect. I was thinking about the proc/net directories when
I made my comment. Unlike proc where we have task ids there is nothing
in /proc that can do anything.

> It might be possibly to mount a network namespace version
> of /sys on a different mountpoint - I've not tried very
> hard to do that.

It is a bug if that doesn't work.

>> > Notwithstanding the apparmor issues, /proc/net could actuall be
>> > a symlink to (say) /proc/net_namespaces/namespace_name with
>> > readlink returning the name based on the threads actual namespace.
>>
>> There really aren't good names for namespaces at the kernel level. As
>> one of their use cases is to make process migration possible between
>> machines. So any kernel level name would need to be migrated as well.
>> So those kernel level names would need a name in another namespace,
>> or an extra namespace would have to be created for those names.
>
> Network namespaces do seem to have names.
> Although I gave up working out how to change to a named network
> namespace from within the kernel (especially in a non-GPL module).

Network namespaces have mount points. The mount points have names.

It is just a matter of finding the right filesystem and calling
sys_rename().

There are a some network namespace local names for other network
namespaces. For those I don't see how it would make any sense
to change the name. If you need to you can always create a
new network namespace and ensure you get the name you want there.
Which is good enough for process migration. I don't know why else
anyone would want to change names.

> ...
>> > FWIW I'm pretty sure there a sequence involving unshare() that
>> > can get you out of a chroot - but I've not found it yet.
>>
>> Out of a chroot is essentially just:
>> chdir("/");
>> chroot("/somedir");
>> chdir("../../../../../../../../../../../../../../../..");
>
> A chdir() inside a chroot anchors at the base of the chroot.
But the check is very simple.
If (working_directory == root_directory) make chdir("...") a noop.

Once the working directory is below the root directory (as
chroot("/somedir") achieves the chroot checks are no longer usable.

> fchdir() will get you out if you have an open fd to a directory
> outside the chroot.
> The 'usual' way out requires a process outside the chroot to
> just use mvdir().
> But there isn't supposed to be a way to get out.

As I recall the history chroot was a quick hack to allow building a
building against a different version of the binaries than were currently
installed. It was not built as a security feature.

Eric

2022-10-03 19:03:59

[permalink] [raw]

Subject: Re: [CFT][PATCH] proc: Update /proc/net to point at the accessing threads network namespace

On Mon, Oct 03, 2022 at 12:07:27PM -0500, Eric W. Biederman wrote:

> > fchdir() will get you out if you have an open fd to a directory
> > outside the chroot.
> > The 'usual' way out requires a process outside the chroot to
> > just use mvdir().
> > But there isn't supposed to be a way to get out.
>
> As I recall the history chroot was a quick hack to allow building a
> building against a different version of the binaries than were currently
> installed. It was not built as a security feature.

A last-moment prerelease hack in v7, by the look of it; at that point it
hadn't even tried to modify ".." behaviour in the directory you'd been
chrooted into - just modified the starting point for resolving absolute pathnames.

Not even token attempts of confinement until 1982 commit by Bill Joy,
during one of the namei rewrites. No idea how when non-BSD branches
had picked that.

At no point did chroot(2) switch the current directory. fchdir(2) doesn't
add anything to the situation when
chdir("/");
chroot("some_directory");
chdir("../../../../../../../..");
chroot(".");
will break you out of it nicely.

Again, chroot(2) had never been intended to be root-resistant; there's
a reason why "drop elevated priveleges right after chrooting" is
in all kinds of UNIX FAQs (very likely in Stevens et.al. as well -
I don't have the relevant volume in front of me, but it's certainly
something covered in textbooks).

chroot(2) can be useful in confining processes, but you need to be
really careful about the ways you use it.

2022-10-04 09:21:14