While user namespaces do not make the kernel more vulnerable, they are however
used to initiate exploits. Some users do not want to block namespace creation
for the entirety of the system, which some distributions provide. Instead, we
needed a way to have some applications be blocked, and others allowed. This is
not possible with those tools. Managing hierarchies also did not fit our case
because we're determining which tasks are allowed based on their attributes.
While exploring a solution, we first leveraged the LSM cred_prepare hook
because that is the closest hook to prevent a call to create_user_ns().
The calls look something like this:
cred = prepare_creds()
security_prepare_creds()
call_int_hook(cred_prepare, ...
if (cred)
create_user_ns(cred)
We noticed that error codes were not propagated from this hook and
introduced a patch [1] to propagate those errors.
The discussion notes that security_prepare_creds() is not appropriate for
MAC policies, and instead the hook is meant for LSM authors to prepare
credentials for mutation. [2]
Additionally, cred_prepare hook is not without problems. Handling the clone3
case is a bit more tricky due to the user space pointer passed to it. This
makes checking the syscall subject to a possible TOCTTOU attack.
Ultimately, we concluded that a better course of action is to introduce
a new security hook for LSM authors. [3]
This patch set first introduces a new security_create_user_ns() function
and userns_create LSM hook, then marks the hook as sleepable in BPF. The
following patches after include a BPF test and a patch for an SELinux
implementation.
We want to encourage use of user namespaces, and also cater the needs
of users/administrators to observe and/or control access. There is no
expectation of an impact on user space applications because access control
is opt-in, and users wishing to observe within a LSM context
Links:
1. https://lore.kernel.org/all/[email protected]/
2. https://lore.kernel.org/all/[email protected]/
3. https://lore.kernel.org/all/[email protected]/
Past discussions:
V4: https://lore.kernel.org/all/[email protected]/
V3: https://lore.kernel.org/all/[email protected]/
V2: https://lore.kernel.org/all/[email protected]/
V1: https://lore.kernel.org/all/[email protected]/
Changes since v4:
- Update commit description
- Update cover letter
Changes since v3:
- Explicitly set CAP_SYS_ADMIN to test namespace is created given
permission
- Simplify BPF test to use sleepable hook only
- Prefer unshare() over clone() for tests
Changes since v2:
- Rename create_user_ns hook to userns_create
- Use user_namespace as an object opposed to a generic namespace object
- s/domB_t/domA_t in commit message
Changes since v1:
- Add selftests/bpf: Add tests verifying bpf lsm create_user_ns hook patch
- Add selinux: Implement create_user_ns hook patch
- Change function signature of security_create_user_ns() to only take
struct cred
- Move security_create_user_ns() call after id mapping check in
create_user_ns()
- Update documentation to reflect changes
Frederick Lawler (4):
security, lsm: Introduce security_create_user_ns()
bpf-lsm: Make bpf_lsm_userns_create() sleepable
selftests/bpf: Add tests verifying bpf lsm userns_create hook
selinux: Implement userns_create hook
include/linux/lsm_hook_defs.h | 1 +
include/linux/lsm_hooks.h | 4 +
include/linux/security.h | 6 ++
kernel/bpf/bpf_lsm.c | 1 +
kernel/user_namespace.c | 5 +
security/security.c | 5 +
security/selinux/hooks.c | 9 ++
security/selinux/include/classmap.h | 2 +
.../selftests/bpf/prog_tests/deny_namespace.c | 102 ++++++++++++++++++
.../selftests/bpf/progs/test_deny_namespace.c | 33 ++++++
10 files changed, 168 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/deny_namespace.c
create mode 100644 tools/testing/selftests/bpf/progs/test_deny_namespace.c
--
2.30.2
Unprivileged user namespace creation is an intended feature to enable
sandboxing, however this feature is often used to as an initial step to
perform a privilege escalation attack.
This patch implements a new user_namespace { create } access control
permission to restrict which domains allow or deny user namespace
creation. This is necessary for system administrators to quickly protect
their systems while waiting for vulnerability patches to be applied.
This permission can be used in the following way:
allow domA_t domA_t : user_namespace { create };
Signed-off-by: Frederick Lawler <[email protected]>
---
Changes since v4:
- None
Changes since v3:
- None
Changes since v2:
- Rename create_user_ns hook to userns_create
- Use user_namespace as an object opposed to a generic namespace object
- s/domB_t/domA_t in commit message
Changes since v1:
- Introduce this patch
---
security/selinux/hooks.c | 9 +++++++++
security/selinux/include/classmap.h | 2 ++
2 files changed, 11 insertions(+)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 79573504783b..b9f1078450b3 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4221,6 +4221,14 @@ static void selinux_task_to_inode(struct task_struct *p,
spin_unlock(&isec->lock);
}
+static int selinux_userns_create(const struct cred *cred)
+{
+ u32 sid = current_sid();
+
+ return avc_has_perm(&selinux_state, sid, sid, SECCLASS_USER_NAMESPACE,
+ USER_NAMESPACE__CREATE, NULL);
+}
+
/* Returns error only if unable to parse addresses */
static int selinux_parse_skb_ipv4(struct sk_buff *skb,
struct common_audit_data *ad, u8 *proto)
@@ -7111,6 +7119,7 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = {
LSM_HOOK_INIT(task_movememory, selinux_task_movememory),
LSM_HOOK_INIT(task_kill, selinux_task_kill),
LSM_HOOK_INIT(task_to_inode, selinux_task_to_inode),
+ LSM_HOOK_INIT(userns_create, selinux_userns_create),
LSM_HOOK_INIT(ipc_permission, selinux_ipc_permission),
LSM_HOOK_INIT(ipc_getsecid, selinux_ipc_getsecid),
diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
index ff757ae5f253..0bff55bb9cde 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -254,6 +254,8 @@ const struct security_class_mapping secclass_map[] = {
{ COMMON_FILE_PERMS, NULL } },
{ "io_uring",
{ "override_creds", "sqpoll", NULL } },
+ { "user_namespace",
+ { "create", NULL } },
{ NULL }
};
--
2.30.2
The LSM hook userns_create was introduced to provide LSM's an
opportunity to block or allow unprivileged user namespace creation. This
test serves two purposes: it provides a test eBPF implementation, and
tests the hook successfully blocks or allows user namespace creation.
This tests 3 cases:
1. Unattached bpf program does not block unpriv user namespace
creation.
2. Attached bpf program allows user namespace creation given
CAP_SYS_ADMIN privileges.
3. Attached bpf program denies user namespace creation for a
user without CAP_SYS_ADMIN.
Acked-by: KP Singh <[email protected]>
Signed-off-by: Frederick Lawler <[email protected]>
---
The generic deny_namespace file name is used for future namespace
expansion. I didn't want to limit these files to just the create_user_ns
hook.
Changes since v4:
- None
Changes since v3:
- Explicitly set CAP_SYS_ADMIN to test namespace is created given
permission
- Simplify BPF test to use sleepable hook only
- Prefer unshare() over clone() for tests
Changes since v2:
- Rename create_user_ns hook to userns_create
Changes since v1:
- Introduce this patch
---
.../selftests/bpf/prog_tests/deny_namespace.c | 102 ++++++++++++++++++
.../selftests/bpf/progs/test_deny_namespace.c | 33 ++++++
2 files changed, 135 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/deny_namespace.c
create mode 100644 tools/testing/selftests/bpf/progs/test_deny_namespace.c
diff --git a/tools/testing/selftests/bpf/prog_tests/deny_namespace.c b/tools/testing/selftests/bpf/prog_tests/deny_namespace.c
new file mode 100644
index 000000000000..1bc6241b755b
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/deny_namespace.c
@@ -0,0 +1,102 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <test_progs.h>
+#include "test_deny_namespace.skel.h"
+#include <sched.h>
+#include "cap_helpers.h"
+#include <stdio.h>
+
+static int wait_for_pid(pid_t pid)
+{
+ int status, ret;
+
+again:
+ ret = waitpid(pid, &status, 0);
+ if (ret == -1) {
+ if (errno == EINTR)
+ goto again;
+
+ return -1;
+ }
+
+ if (!WIFEXITED(status))
+ return -1;
+
+ return WEXITSTATUS(status);
+}
+
+/* negative return value -> some internal error
+ * positive return value -> userns creation failed
+ * 0 -> userns creation succeeded
+ */
+static int create_user_ns(void)
+{
+ pid_t pid;
+
+ pid = fork();
+ if (pid < 0)
+ return -1;
+
+ if (pid == 0) {
+ if (unshare(CLONE_NEWUSER))
+ _exit(EXIT_FAILURE);
+ _exit(EXIT_SUCCESS);
+ }
+
+ return wait_for_pid(pid);
+}
+
+static void test_userns_create_bpf(void)
+{
+ __u32 cap_mask = 1ULL << CAP_SYS_ADMIN;
+ __u64 old_caps = 0;
+
+ cap_enable_effective(cap_mask, &old_caps);
+
+ ASSERT_OK(create_user_ns(), "priv new user ns");
+
+ cap_disable_effective(cap_mask, &old_caps);
+
+ ASSERT_EQ(create_user_ns(), EPERM, "unpriv new user ns");
+
+ if (cap_mask & old_caps)
+ cap_enable_effective(cap_mask, NULL);
+}
+
+static void test_unpriv_userns_create_no_bpf(void)
+{
+ __u32 cap_mask = 1ULL << CAP_SYS_ADMIN;
+ __u64 old_caps = 0;
+
+ cap_disable_effective(cap_mask, &old_caps);
+
+ ASSERT_OK(create_user_ns(), "no-bpf unpriv new user ns");
+
+ if (cap_mask & old_caps)
+ cap_enable_effective(cap_mask, NULL);
+}
+
+void test_deny_namespace(void)
+{
+ struct test_deny_namespace *skel = NULL;
+ int err;
+
+ if (test__start_subtest("unpriv_userns_create_no_bpf"))
+ test_unpriv_userns_create_no_bpf();
+
+ skel = test_deny_namespace__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel load"))
+ goto close_prog;
+
+ err = test_deny_namespace__attach(skel);
+ if (!ASSERT_OK(err, "attach"))
+ goto close_prog;
+
+ if (test__start_subtest("userns_create_bpf"))
+ test_userns_create_bpf();
+
+ test_deny_namespace__detach(skel);
+
+close_prog:
+ test_deny_namespace__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_deny_namespace.c b/tools/testing/selftests/bpf/progs/test_deny_namespace.c
new file mode 100644
index 000000000000..09ad5a4ebd1f
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_deny_namespace.c
@@ -0,0 +1,33 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <errno.h>
+#include <linux/capability.h>
+
+struct kernel_cap_struct {
+ __u32 cap[_LINUX_CAPABILITY_U32S_3];
+} __attribute__((preserve_access_index));
+
+struct cred {
+ struct kernel_cap_struct cap_effective;
+} __attribute__((preserve_access_index));
+
+char _license[] SEC("license") = "GPL";
+
+SEC("lsm.s/userns_create")
+int BPF_PROG(test_userns_create, const struct cred *cred, int ret)
+{
+ struct kernel_cap_struct caps = cred->cap_effective;
+ int cap_index = CAP_TO_INDEX(CAP_SYS_ADMIN);
+ __u32 cap_mask = CAP_TO_MASK(CAP_SYS_ADMIN);
+
+ if (ret)
+ return 0;
+
+ ret = -EPERM;
+ if (caps.cap[cap_index] & cap_mask)
+ return 0;
+
+ return -EPERM;
+}
--
2.30.2
On Mon, Aug 15, 2022 at 12:20 PM Frederick Lawler <[email protected]> wrote:
>
> While user namespaces do not make the kernel more vulnerable, they are however
> used to initiate exploits. Some users do not want to block namespace creation
> for the entirety of the system, which some distributions provide. Instead, we
> needed a way to have some applications be blocked, and others allowed. This is
> not possible with those tools. Managing hierarchies also did not fit our case
> because we're determining which tasks are allowed based on their attributes.
>
> While exploring a solution, we first leveraged the LSM cred_prepare hook
> because that is the closest hook to prevent a call to create_user_ns().
>
> The calls look something like this:
>
> cred = prepare_creds()
> security_prepare_creds()
> call_int_hook(cred_prepare, ...
> if (cred)
> create_user_ns(cred)
>
> We noticed that error codes were not propagated from this hook and
> introduced a patch [1] to propagate those errors.
>
> The discussion notes that security_prepare_creds() is not appropriate for
> MAC policies, and instead the hook is meant for LSM authors to prepare
> credentials for mutation. [2]
>
> Additionally, cred_prepare hook is not without problems. Handling the clone3
> case is a bit more tricky due to the user space pointer passed to it. This
> makes checking the syscall subject to a possible TOCTTOU attack.
>
> Ultimately, we concluded that a better course of action is to introduce
> a new security hook for LSM authors. [3]
>
> This patch set first introduces a new security_create_user_ns() function
> and userns_create LSM hook, then marks the hook as sleepable in BPF. The
> following patches after include a BPF test and a patch for an SELinux
> implementation.
>
> We want to encourage use of user namespaces, and also cater the needs
> of users/administrators to observe and/or control access. There is no
> expectation of an impact on user space applications because access control
> is opt-in, and users wishing to observe within a LSM context
>
>
> Links:
> 1. https://lore.kernel.org/all/[email protected]/
> 2. https://lore.kernel.org/all/[email protected]/
> 3. https://lore.kernel.org/all/[email protected]/
>
> Past discussions:
> V4: https://lore.kernel.org/all/[email protected]/
> V3: https://lore.kernel.org/all/[email protected]/
> V2: https://lore.kernel.org/all/[email protected]/
> V1: https://lore.kernel.org/all/[email protected]/
>
> Changes since v4:
> - Update commit description
> - Update cover letter
> Changes since v3:
> - Explicitly set CAP_SYS_ADMIN to test namespace is created given
> permission
> - Simplify BPF test to use sleepable hook only
> - Prefer unshare() over clone() for tests
> Changes since v2:
> - Rename create_user_ns hook to userns_create
> - Use user_namespace as an object opposed to a generic namespace object
> - s/domB_t/domA_t in commit message
> Changes since v1:
> - Add selftests/bpf: Add tests verifying bpf lsm create_user_ns hook patch
> - Add selinux: Implement create_user_ns hook patch
> - Change function signature of security_create_user_ns() to only take
> struct cred
> - Move security_create_user_ns() call after id mapping check in
> create_user_ns()
> - Update documentation to reflect changes
>
> Frederick Lawler (4):
> security, lsm: Introduce security_create_user_ns()
> bpf-lsm: Make bpf_lsm_userns_create() sleepable
> selftests/bpf: Add tests verifying bpf lsm userns_create hook
> selinux: Implement userns_create hook
>
> include/linux/lsm_hook_defs.h | 1 +
> include/linux/lsm_hooks.h | 4 +
> include/linux/security.h | 6 ++
> kernel/bpf/bpf_lsm.c | 1 +
> kernel/user_namespace.c | 5 +
> security/security.c | 5 +
> security/selinux/hooks.c | 9 ++
> security/selinux/include/classmap.h | 2 +
> .../selftests/bpf/prog_tests/deny_namespace.c | 102 ++++++++++++++++++
> .../selftests/bpf/progs/test_deny_namespace.c | 33 ++++++
> 10 files changed, 168 insertions(+)
> create mode 100644 tools/testing/selftests/bpf/prog_tests/deny_namespace.c
> create mode 100644 tools/testing/selftests/bpf/progs/test_deny_namespace.c
I just merged this into the lsm/next tree, thanks for seeing this
through Frederick, and thank you to everyone who took the time to
review the patches and add their tags.
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm.git next
--
paul-moore.com
>
> I just merged this into the lsm/next tree, thanks for seeing this
> through Frederick, and thank you to everyone who took the time to
> review the patches and add their tags.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm.git next
Paul, Frederick
I repeat my NACK, in part because I am being ignored and in part
because the hook does not make technical sense.
Linus I want you to know that this has been put in the lsm tree against
my explicit and clear objections.
My request to talk about the actual problems that are being address has
been completely ignored.
I have been a bit slow in dealing with this conversation because I am
very much sick and not on top of my game, but that is no excuse to steam
roll over me, instead of addressing my concerns.
This is an irresponsible way of adding an access control to user
namespace creation. This is a linux-api and manpages level kind of
change, as this is a semantic change visible to userspace. Instead that
concern has been brushed off as different return code to userspace.
For observably this is a terrible LSM interface because there is no
pair with user namespace destruction, nor is their any ability for the
LSM to allocate any state to track the user namespace. As there is no
patch actually calling audit or anything else observably does not appear
to be a driving factor of this new interface.
The common scenarios I am aware of for using the user namespace are:
- Creating a container.
- Using the user namespace to sandbox your application like chrome does.
- Running an exploit.
Returning an error code in the first 2 scenarios will create a userspace
regression as either userspace will run less securely or it won't work
at all.
Returning an error code in the third scenario when someone is trying to
exploit your machine is equally foolish as you are giving the exploit
the chance to continue running. The application should be killed
instead.
Further adding a random failure mode to user namespace creation if it is
used at all will just encourage userspace to use a setuid application to
perform the namespace creation instead. Creating a less secure system
overall.
If the concern is to reduce the attack surface everything this
proposed hook can do is already possible with the security_capable
security hook.
So Paul, Frederick please drop this. I can't see what this new hook is
good for except creating regressions in existing userspace code. I am
not willing to support such a hook in code that I maintain.
Eric
On Wed, Aug 17, 2022 at 11:08 AM Eric W. Biederman
<[email protected]> wrote:
> > I just merged this into the lsm/next tree, thanks for seeing this
> > through Frederick, and thank you to everyone who took the time to
> > review the patches and add their tags.
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm.git next
>
> Paul, Frederick
>
> I repeat my NACK, in part because I am being ignored and in part
> because the hook does not make technical sense.
>
> Linus I want you to know that this has been put in the lsm tree against
> my explicit and clear objections.
Eric, we are disagreeing with you, not ignoring you; that's an
important distinction. This is the fifth iteration of the patchset,
or the sixth (?) if you could Frederick's earlier attempts using the
credential hooks, and with each revision multiple people have tried to
work with you to find a mutually agreeable solution to the use cases
presented by Frederick and others. In the end of the v4 discussion it
was my opinion that you kept moving the goalposts in an effort to
prevent any additional hooks/controls/etc. to the user namespace code
which is why I made the decision to merge the code into the lsm/next
branch against your wishes. Multiple people have come out in support
of this functionality, and you remain the only one opposed to the
change; normally a maintainer's objection would be enough to block the
change, but it is my opinion that Eric is acting in bad faith.
At the end of the v4 patchset I suggested merging this into lsm/next
so it could get a full -rc cycle in linux-next, assuming no issues
were uncovered during testing I was planning to send it to Linus
during the next merge window with commentary on the contentiousness of
the patchset, including Eric's NACK. I'm personally very disappointed
that it has come to this, but I'm at a loss of how to work with you
(Eric) to find a solution; this is the only path forward that I can
see at this point. Others have expressed their agreement with this
approach, both on-list and privately.
If anyone other than Eric or myself has a different view of the
situation, *please* add your comments now. I believe I've done a fair
job of summarizing things, but everyone has a bias and I'm definitely
no exception.
Finally, I'm going to refrain from rehashing the same arguments over
again in this revision of the patchset, instead I'll just provide
links to the previous drafts in case anyone wants to spend an hour or
two:
Revision v1
https://lore.kernel.org/linux-security-module/[email protected]/
Revision v2
https://lore.kernel.org/linux-security-module/[email protected]/
Revision v3
https://lore.kernel.org/linux-security-module/[email protected]/
Revision v4
https://lore.kernel.org/linux-security-module/[email protected]/
--
paul-moore.com
Paul Moore <[email protected]> writes:
> At the end of the v4 patchset I suggested merging this into lsm/next
> so it could get a full -rc cycle in linux-next, assuming no issues
> were uncovered during testing
What in the world can be uncovered in linux-next for code that has no in
tree users.
That is one of my largest problems. I want to talk about the users and
the use cases and I don't get dialog. Nor do I get hey look back there
you missed it.
Since you don't want to rehash this. I will just repeat my conclusion
that the patchset appears to introduce an ineffective defense that will
achieve nothing in the defense of the kernel, and so all it will achieve
a code maintenance burden and to occasionally break legitimate users of
the user namespace.
Further the process is broken. You are changing the semantics of an
operation with the introduction of a security hook. That needs a
man-page and discussion on linux-abi. In general of the scrutiny we
give to new systems and changed system calls. As this change
fundamentally changes the semantics of creating a user namespace.
Skipping that part of the process is not simply disagree that is being
irresponsible.
Eric
On Wed, Aug 17, 2022 at 3:58 PM Eric W. Biederman <[email protected]> wrote:
> Paul Moore <[email protected]> writes:
>
> > At the end of the v4 patchset I suggested merging this into lsm/next
> > so it could get a full -rc cycle in linux-next, assuming no issues
> > were uncovered during testing
>
> What in the world can be uncovered in linux-next for code that has no in
> tree users.
The patchset provides both BPF LSM and SELinux implementations of the
hooks along with a BPF LSM test under tools/testing/selftests/bpf/.
If no one beats me to it, I plan to work on adding a test to the
selinux-testsuite as soon as I'm done dealing with other urgent
LSM/SELinux issues (io_uring CMD passthrough, SCTP problems, etc.); I
run these tests multiple times a week (multiple times a day sometimes)
against the -rcX kernels with the lsm/next, selinux/next, and
audit/next branches applied on top. I know others do similar things.
--
paul-moore.com
On Wed, Aug 17, 2022 at 4:56 PM Eric W. Biederman <[email protected]> wrote:
> Paul Moore <[email protected]> writes:
> > On Wed, Aug 17, 2022 at 3:58 PM Eric W. Biederman <[email protected]> wrote:
> >> Paul Moore <[email protected]> writes:
> >>
> >> > At the end of the v4 patchset I suggested merging this into lsm/next
> >> > so it could get a full -rc cycle in linux-next, assuming no issues
> >> > were uncovered during testing
> >>
> >> What in the world can be uncovered in linux-next for code that has no in
> >> tree users.
> >
> > The patchset provides both BPF LSM and SELinux implementations of the
> > hooks along with a BPF LSM test under tools/testing/selftests/bpf/.
> > If no one beats me to it, I plan to work on adding a test to the
> > selinux-testsuite as soon as I'm done dealing with other urgent
> > LSM/SELinux issues (io_uring CMD passthrough, SCTP problems, etc.); I
> > run these tests multiple times a week (multiple times a day sometimes)
> > against the -rcX kernels with the lsm/next, selinux/next, and
> > audit/next branches applied on top. I know others do similar things.
>
> A layer of hooks that leaves all of the logic to userspace is not an
> in-tree user for purposes of understanding the logic of the code.
The BPF LSM selftests which are part of this patchset live in-tree.
The SELinux hook implementation is completely in-tree with the
subject/verb/object relationship clearly described by the code itself.
After all, the selinux_userns_create() function consists of only two
lines, one of which is an assignment. Yes, it is true that the
SELinux policy lives outside the kernel, but that is because there is
no singular SELinux policy for everyone. From a practical
perspective, the SELinux policy is really just a configuration file
used to setup the kernel at runtime; it is not significantly different
than an iptables script, /etc/sysctl.conf, or any of the other myriad
of configuration files used to configure the kernel during boot.
--
paul-moore.com
Paul Moore <[email protected]> writes:
> On Wed, Aug 17, 2022 at 3:58 PM Eric W. Biederman <[email protected]> wrote:
>> Paul Moore <[email protected]> writes:
>>
>> > At the end of the v4 patchset I suggested merging this into lsm/next
>> > so it could get a full -rc cycle in linux-next, assuming no issues
>> > were uncovered during testing
>>
>> What in the world can be uncovered in linux-next for code that has no in
>> tree users.
>
> The patchset provides both BPF LSM and SELinux implementations of the
> hooks along with a BPF LSM test under tools/testing/selftests/bpf/.
> If no one beats me to it, I plan to work on adding a test to the
> selinux-testsuite as soon as I'm done dealing with other urgent
> LSM/SELinux issues (io_uring CMD passthrough, SCTP problems, etc.); I
> run these tests multiple times a week (multiple times a day sometimes)
> against the -rcX kernels with the lsm/next, selinux/next, and
> audit/next branches applied on top. I know others do similar things.
A layer of hooks that leaves all of the logic to userspace is not an
in-tree user for purposes of understanding the logic of the code.
The reason why I implemented user namespaces is so that all of linux's
neat features could be exposed to non-root userspace processes, in
a way that doesn't break suid root processes.
The access control you are adding to user namespaces looks to take that
away. It looks to remove the whole point of user namespaces.
So without any mention of how people intend to use this feature, without
any code that uses this hook to implement semantics. Without any talk
about how this semantic change is reasonable. I strenuously object.
Eric
Paul Moore <[email protected]> writes:
> On Wed, Aug 17, 2022 at 4:56 PM Eric W. Biederman <[email protected]> wrote:
>> Paul Moore <[email protected]> writes:
>> > On Wed, Aug 17, 2022 at 3:58 PM Eric W. Biederman <[email protected]> wrote:
>> >> Paul Moore <[email protected]> writes:
>> >>
>> >> > At the end of the v4 patchset I suggested merging this into lsm/next
>> >> > so it could get a full -rc cycle in linux-next, assuming no issues
>> >> > were uncovered during testing
>> >>
>> >> What in the world can be uncovered in linux-next for code that has no in
>> >> tree users.
>> >
>> > The patchset provides both BPF LSM and SELinux implementations of the
>> > hooks along with a BPF LSM test under tools/testing/selftests/bpf/.
>> > If no one beats me to it, I plan to work on adding a test to the
>> > selinux-testsuite as soon as I'm done dealing with other urgent
>> > LSM/SELinux issues (io_uring CMD passthrough, SCTP problems, etc.); I
>> > run these tests multiple times a week (multiple times a day sometimes)
>> > against the -rcX kernels with the lsm/next, selinux/next, and
>> > audit/next branches applied on top. I know others do similar things.
>>
>> A layer of hooks that leaves all of the logic to userspace is not an
>> in-tree user for purposes of understanding the logic of the code.
>
> The BPF LSM selftests which are part of this patchset live in-tree.
> The SELinux hook implementation is completely in-tree with the
> subject/verb/object relationship clearly described by the code itself.
> After all, the selinux_userns_create() function consists of only two
> lines, one of which is an assignment. Yes, it is true that the
> SELinux policy lives outside the kernel, but that is because there is
> no singular SELinux policy for everyone. From a practical
> perspective, the SELinux policy is really just a configuration file
> used to setup the kernel at runtime; it is not significantly different
> than an iptables script, /etc/sysctl.conf, or any of the other myriad
> of configuration files used to configure the kernel during boot.
I object to adding the new system configuration knob.
Especially when I don't see people explaining why such a knob is a good
idea. What is userspace going to do with this new feature that makes it
worth maintaining in the kernel?
That is always the conversation we have when adding new features, and
that is exactly the conversation that has not happened here.
Adding a layer of indirection should not exempt a new feature from
needing to justify itself.
Eric
On Wed, Aug 17, 2022 at 5:24 PM Eric W. Biederman <[email protected]> wrote:
> I object to adding the new system configuration knob.
>
> Especially when I don't see people explaining why such a knob is a good
> idea. What is userspace going to do with this new feature that makes it
> worth maintaining in the kernel?
From https://lore.kernel.org/all/CAEiveUdPhEPAk7Y0ZXjPsD=Vb5hn453CHzS9aG-tkyRa8bf_eg@mail.gmail.com/
"We have valid use cases not specifically related to the
attack surface, but go into the middle from bpf observability
to enforcement. As we want to track namespace creation, changes,
nesting and per task creds context depending on the nature of
the workload."
-Djalal Harouni
From https://lore.kernel.org/linux-security-module/CALrw=nGT0kcHh4wyBwUF-Q8+v8DgnyEJM55vfmABwfU67EQn=g@mail.gmail.com/
"[W]e do want to embrace user namespaces in our code and some of
our workloads already depend on it. Hence we didn't agree to
Debian's approach of just having a global sysctl. But there is
"our code" and there is "third party" code, which might not even
be open source due to various reasons. And while the path exists
for that code to do something bad - we want to block it."
-Ignat Korchagin
From https://lore.kernel.org/linux-security-module/CAHC9VhSKmqn5wxF3BZ67Z+-CV7sZzdnO+JODq48rZJ4WAe8ULA@mail.gmail.com/
"I've heard you talk about bugs being the only reason why people
would want to ever block user namespaces, but I think we've all
seen use cases now where it goes beyond that. However, even if
it didn't, the need to build high confidence/assurance systems
where big chunks of functionality can be disabled based on a
security policy is a very real use case, and this patchset would
help enable that."
-Paul Moore (with apologies for self-quoting)
From https://lore.kernel.org/linux-security-module/CAHC9VhRSCXCM51xpOT95G_WVi=UQ44gNV=uvvG23p8wn16uYSA@mail.gmail.com/
"One of the selling points of the BPF LSM is that it allows for
various different ways of reporting and logging beyond audit.
However, even if it was limited to just audit I believe that
provides some useful justification as auditing fork()/clone()
isn't quite the same and could be difficult to do at scale in
some configurations."
-Paul Moore (my apologies again)
From https://lore.kernel.org/linux-security-module/20220722082159.jgvw7jgds3qwfyqk@wittgenstein/
"Nice and straightforward."
-Christian Brauner
--
paul-moore.com
Hi,
Please remove me from this list and stop harassing me.
Jonathan Moore
-----Original Message-----
From: Paul Moore <[email protected]>
Sent: Wednesday, August 17, 2022 5:51 PM
To: Eric W. Biederman <[email protected]>
Cc: Linus Torvalds <[email protected]>; Frederick Lawler <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Subject: Re: [PATCH v5 0/4] Introduce security_create_user_ns()
On Wed, Aug 17, 2022 at 5:24 PM Eric W. Biederman <[email protected]> wrote:
> I object to adding the new system configuration knob.
>
> Especially when I don't see people explaining why such a knob is a good
> idea. What is userspace going to do with this new feature that makes it
> worth maintaining in the kernel?
From https://lore.kernel.org/all/CAEiveUdPhEPAk7Y0ZXjPsD=Vb5hn453CHzS9aG-tkyRa8bf_eg@mail.gmail.com/
"We have valid use cases not specifically related to the
attack surface, but go into the middle from bpf observability
to enforcement. As we want to track namespace creation, changes,
nesting and per task creds context depending on the nature of
the workload."
-Djalal Harouni
From https://lore.kernel.org/linux-security-module/CALrw=nGT0kcHh4wyBwUF-Q8+v8DgnyEJM55vfmABwfU67EQn=g@mail.gmail.com/
"[W]e do want to embrace user namespaces in our code and some of
our workloads already depend on it. Hence we didn't agree to
Debian's approach of just having a global sysctl. But there is
"our code" and there is "third party" code, which might not even
be open source due to various reasons. And while the path exists
for that code to do something bad - we want to block it."
-Ignat Korchagin
From https://lore.kernel.org/linux-security-module/CAHC9VhSKmqn5wxF3BZ67Z+-CV7sZzdnO+JODq48rZJ4WAe8ULA@mail.gmail.com/
"I've heard you talk about bugs being the only reason why people
would want to ever block user namespaces, but I think we've all
seen use cases now where it goes beyond that. However, even if
it didn't, the need to build high confidence/assurance systems
where big chunks of functionality can be disabled based on a
security policy is a very real use case, and this patchset would
help enable that."
-Paul Moore (with apologies for self-quoting)
From https://lore.kernel.org/linux-security-module/CAHC9VhRSCXCM51xpOT95G_WVi=UQ44gNV=uvvG23p8wn16uYSA@mail.gmail.com/
"One of the selling points of the BPF LSM is that it allows for
various different ways of reporting and logging beyond audit.
However, even if it was limited to just audit I believe that
provides some useful justification as auditing fork()/clone()
isn't quite the same and could be difficult to do at scale in
some configurations."
-Paul Moore (my apologies again)
From https://lore.kernel.org/linux-security-module/20220722082159.jgvw7jgds3qwfyqk@wittgenstein/
"Nice and straightforward."
-Christian Brauner
--
paul-moore.com
On Wed, Aug 17, 2022 at 04:24:28PM -0500, Eric W. Biederman wrote:
> Paul Moore <[email protected]> writes:
>
> > On Wed, Aug 17, 2022 at 4:56 PM Eric W. Biederman <[email protected]> wrote:
> >> Paul Moore <[email protected]> writes:
> >> > On Wed, Aug 17, 2022 at 3:58 PM Eric W. Biederman <[email protected]> wrote:
> >> >> Paul Moore <[email protected]> writes:
> >> >>
> >> >> > At the end of the v4 patchset I suggested merging this into lsm/next
> >> >> > so it could get a full -rc cycle in linux-next, assuming no issues
> >> >> > were uncovered during testing
> >> >>
> >> >> What in the world can be uncovered in linux-next for code that has no in
> >> >> tree users.
> >> >
> >> > The patchset provides both BPF LSM and SELinux implementations of the
> >> > hooks along with a BPF LSM test under tools/testing/selftests/bpf/.
> >> > If no one beats me to it, I plan to work on adding a test to the
> >> > selinux-testsuite as soon as I'm done dealing with other urgent
> >> > LSM/SELinux issues (io_uring CMD passthrough, SCTP problems, etc.); I
> >> > run these tests multiple times a week (multiple times a day sometimes)
> >> > against the -rcX kernels with the lsm/next, selinux/next, and
> >> > audit/next branches applied on top. I know others do similar things.
> >>
> >> A layer of hooks that leaves all of the logic to userspace is not an
> >> in-tree user for purposes of understanding the logic of the code.
> >
> > The BPF LSM selftests which are part of this patchset live in-tree.
> > The SELinux hook implementation is completely in-tree with the
> > subject/verb/object relationship clearly described by the code itself.
> > After all, the selinux_userns_create() function consists of only two
> > lines, one of which is an assignment. Yes, it is true that the
> > SELinux policy lives outside the kernel, but that is because there is
> > no singular SELinux policy for everyone. From a practical
> > perspective, the SELinux policy is really just a configuration file
> > used to setup the kernel at runtime; it is not significantly different
> > than an iptables script, /etc/sysctl.conf, or any of the other myriad
> > of configuration files used to configure the kernel during boot.
>
> I object to adding the new system configuration knob.
I do strongly sympathize with Eric's points. It will be very easy, once
user namespace creation has been further restricted in some distros, to
say "well see this stuff is silly" and go back to simply requiring root
to create all containers and namespaces, which is generally quite a bit
easier anywway. And then, of course, give everyone root so they can
start containers.
As Eric said,
| Further adding a random failure mode to user namespace creation if it is
| used at all will just encourage userspace to use a setuid application to
| perform the namespace creation instead. Creating a less secure system
| overall.
However, I'm also looking at e.g. CVE-2022-2588 and CVE-2022-2586, and
yes there are two issues which do require discussion (three if you
count reportability, which is mainly a tool in guarding against the others).
The first is, indeed, configuration knobs. There are tools, including
chrome, which use user namespaces to make things better. The hope is
that more and more tools will do so.
The second is damage control. When an 0day has been announced, things
change. You can say "well the bug was there all along", but it is
different when every lazy ne'erdowell can pick an exploit off a mailing
list and use it against a product for which spinning a new version with
a new kernel and getting customers to update is probably a months-long
endeavor. Some of these products do in fact require namespaces (user
and otherwise) as part of their function. And - to my chagrin - I suspect
most of them create usernamespace as the root user, before possibly processing
untrusted user input, so unprivileged_userns_clone isn't a good fit.
SELinux (and LSMs in generaly) do in fact seem like a useful place to
add some configuration, because they tend to assign different domains
to tasks with different purposes and trust levels. But another such
place is the init system / service manager. And in most cases these
days, this will use cgroups to collect tasks of certain types. So I
wonder (this is ALMOST ENTIRELY thinking out loud, not thought through
sufficiently) whether we should be setting a cgroup.nslock or
somesuch.
Of course, kernel livepatch is another potentially useful mitigation.
Currently that's not possible for everyone.
Maybe there is a more fundamental way we can approach this. Part of me
still likes the idea of splitting the id mapping and capability-in-userns
parts, but that's not sufficient. Maybe looking over all the relevant
CVEs would give a better hint.
Eric, you said
| If the concern is to reduce the attack surface everything this
| proposed hook can do is already possible with the security_capable
| security hook.
I suppose I could envision an LSM which gets activated when we find
out there was a net-ns-exacerbated 0-day, which refuses CAP_NET_ADMIN
for a task not in init_user_ns? Ideally it would be more flexible
than that.
> idea. What is userspace going to do with this new feature that makes it
> worth maintaining in the kernel?
>
> That is always the conversation we have when adding new features, and
> that is exactly the conversation that has not happened here.
Eric and Paul, I wonder, will you - or some people you'd like to represent
you - be at plumbers in September? Should there be a BOF session there? (I
won't be there, but could join over video) I think a brainstorming session
for solutions to the above problems would be good.
> Adding a layer of indirection should not exempt a new feature from
> needing to justify itself.
>
> Eric
On Thu, Aug 18, 2022 at 10:05 AM Serge E. Hallyn <[email protected]> wrote:
> On Wed, Aug 17, 2022 at 04:24:28PM -0500, Eric W. Biederman wrote:
> > Paul Moore <[email protected]> writes:
> > > On Wed, Aug 17, 2022 at 4:56 PM Eric W. Biederman <[email protected]> wrote:
> > >> Paul Moore <[email protected]> writes:
> > >> > On Wed, Aug 17, 2022 at 3:58 PM Eric W. Biederman <[email protected]> wrote:
> > >> >> Paul Moore <[email protected]> writes:
> > >> >>
> > >> >> > At the end of the v4 patchset I suggested merging this into lsm/next
> > >> >> > so it could get a full -rc cycle in linux-next, assuming no issues
> > >> >> > were uncovered during testing
> > >> >>
> > >> >> What in the world can be uncovered in linux-next for code that has no in
> > >> >> tree users.
> > >> >
> > >> > The patchset provides both BPF LSM and SELinux implementations of the
> > >> > hooks along with a BPF LSM test under tools/testing/selftests/bpf/.
> > >> > If no one beats me to it, I plan to work on adding a test to the
> > >> > selinux-testsuite as soon as I'm done dealing with other urgent
> > >> > LSM/SELinux issues (io_uring CMD passthrough, SCTP problems, etc.); I
> > >> > run these tests multiple times a week (multiple times a day sometimes)
> > >> > against the -rcX kernels with the lsm/next, selinux/next, and
> > >> > audit/next branches applied on top. I know others do similar things.
> > >>
> > >> A layer of hooks that leaves all of the logic to userspace is not an
> > >> in-tree user for purposes of understanding the logic of the code.
> > >
> > > The BPF LSM selftests which are part of this patchset live in-tree.
> > > The SELinux hook implementation is completely in-tree with the
> > > subject/verb/object relationship clearly described by the code itself.
> > > After all, the selinux_userns_create() function consists of only two
> > > lines, one of which is an assignment. Yes, it is true that the
> > > SELinux policy lives outside the kernel, but that is because there is
> > > no singular SELinux policy for everyone. From a practical
> > > perspective, the SELinux policy is really just a configuration file
> > > used to setup the kernel at runtime; it is not significantly different
> > > than an iptables script, /etc/sysctl.conf, or any of the other myriad
> > > of configuration files used to configure the kernel during boot.
> >
> > I object to adding the new system configuration knob.
>
> I do strongly sympathize with Eric's points. It will be very easy, once
> user namespace creation has been further restricted in some distros, to
> say "well see this stuff is silly" and go back to simply requiring root
> to create all containers and namespaces, which is generally quite a bit
> easier anywway. And then, of course, give everyone root so they can
> start containers.
That's assuming a lot. Many years have passed since namespaces were
first introduced, and awareness of good security practices has
improved, perhaps not as much as any of us would like, but to say that
distros, system builders, and even users are the same as they were so
many years ago is a bit of a stretch in my opinion.
However, even ignoring that for a moment, do we really want to go to a
place where we dictate how users compose and secure their systems?
Linux "took over the world" because it offered a level of flexibility
that wasn't really possible before, and it has flourished because it
has kept that mentality. The Linux Kernel can be shoehorned onto most
hardware that you can get your hands on these days, with driver
support for most anything you can think to plug into the system. Do
you want a single-user environment with no per-user separation? We
can do that. Do you want a traditional DAC based system that leans
heavy on ACLs and capabilities? We can do that. Do you want a
container host that allows you to carve up the system with a high
degree of granularity thanks to the different namespaces? We can do
that. How about a system that leverages the LSM to enforce a least
privilege ideal, even on the most privileged root user? We can do
that too. This patchset is about giving distro, system builders, and
users another choice in how they build their system. We've seen both
in this patchset and in previously failed attempts that there is a
definite want from a user perspective for functionality such as this,
and I think it's time we deliver it in the upstream kernel so they
don't have to keep patching their own systems with out-of-tree
patches.
> Eric and Paul, I wonder, will you - or some people you'd like to represent
> you - be at plumbers in September? Should there be a BOF session there? (I
> won't be there, but could join over video) I think a brainstorming session
> for solutions to the above problems would be good.
Regardless of if Eric or I will be at LPC, it is doubtful that all of
the people who have participated in this discussion will be able to
attend, and I think it's important that the users who are asking for
this patchset have a chance to be heard in each forum where this is
discussed. While conferences are definitely nice - I definitely
missed them over the past couple of years - we can't use them as a
crutch to help us reach a conclusion on this issue; we've debated much
more difficult things over the mailing lists, I see no reason why this
would be any different.
--
paul-moore.com
On Thu, Aug 18, 2022 at 11:11:06AM -0400, Paul Moore wrote:
> On Thu, Aug 18, 2022 at 10:05 AM Serge E. Hallyn <[email protected]> wrote:
> > On Wed, Aug 17, 2022 at 04:24:28PM -0500, Eric W. Biederman wrote:
> > > Paul Moore <[email protected]> writes:
> > > > On Wed, Aug 17, 2022 at 4:56 PM Eric W. Biederman <[email protected]> wrote:
> > > >> Paul Moore <[email protected]> writes:
> > > >> > On Wed, Aug 17, 2022 at 3:58 PM Eric W. Biederman <[email protected]> wrote:
> > > >> >> Paul Moore <[email protected]> writes:
> > > >> >>
> > > >> >> > At the end of the v4 patchset I suggested merging this into lsm/next
> > > >> >> > so it could get a full -rc cycle in linux-next, assuming no issues
> > > >> >> > were uncovered during testing
> > > >> >>
> > > >> >> What in the world can be uncovered in linux-next for code that has no in
> > > >> >> tree users.
> > > >> >
> > > >> > The patchset provides both BPF LSM and SELinux implementations of the
> > > >> > hooks along with a BPF LSM test under tools/testing/selftests/bpf/.
> > > >> > If no one beats me to it, I plan to work on adding a test to the
> > > >> > selinux-testsuite as soon as I'm done dealing with other urgent
> > > >> > LSM/SELinux issues (io_uring CMD passthrough, SCTP problems, etc.); I
> > > >> > run these tests multiple times a week (multiple times a day sometimes)
> > > >> > against the -rcX kernels with the lsm/next, selinux/next, and
> > > >> > audit/next branches applied on top. I know others do similar things.
> > > >>
> > > >> A layer of hooks that leaves all of the logic to userspace is not an
> > > >> in-tree user for purposes of understanding the logic of the code.
> > > >
> > > > The BPF LSM selftests which are part of this patchset live in-tree.
> > > > The SELinux hook implementation is completely in-tree with the
> > > > subject/verb/object relationship clearly described by the code itself.
> > > > After all, the selinux_userns_create() function consists of only two
> > > > lines, one of which is an assignment. Yes, it is true that the
> > > > SELinux policy lives outside the kernel, but that is because there is
> > > > no singular SELinux policy for everyone. From a practical
> > > > perspective, the SELinux policy is really just a configuration file
> > > > used to setup the kernel at runtime; it is not significantly different
> > > > than an iptables script, /etc/sysctl.conf, or any of the other myriad
> > > > of configuration files used to configure the kernel during boot.
> > >
> > > I object to adding the new system configuration knob.
> >
> > I do strongly sympathize with Eric's points. It will be very easy, once
> > user namespace creation has been further restricted in some distros, to
> > say "well see this stuff is silly" and go back to simply requiring root
> > to create all containers and namespaces, which is generally quite a bit
> > easier anywway. And then, of course, give everyone root so they can
> > start containers.
>
> That's assuming a lot. Many years have passed since namespaces were
> first introduced, and awareness of good security practices has
> improved, perhaps not as much as any of us would like, but to say that
> distros, system builders, and even users are the same as they were so
> many years ago is a bit of a stretch in my opinion.
Maybe. But I do get a bit worried based on some of what I've been
reading in mailing lists lately. Kernel dev definitely moves like
fashion - remember when every api should have its own filesystem?
That was not a different group of people.
> However, even ignoring that for a moment, do we really want to go to a
> place where we dictate how users compose and secure their systems?
> Linux "took over the world" because it offered a level of flexibility
> that wasn't really possible before, and it has flourished because it
> has kept that mentality. The Linux Kernel can be shoehorned onto most
> hardware that you can get your hands on these days, with driver
> support for most anything you can think to plug into the system. Do
> you want a single-user environment with no per-user separation? We
> can do that. Do you want a traditional DAC based system that leans
> heavy on ACLs and capabilities? We can do that. Do you want a
> container host that allows you to carve up the system with a high
> degree of granularity thanks to the different namespaces? We can do
> that. How about a system that leverages the LSM to enforce a least
> privilege ideal, even on the most privileged root user? We can do
> that too. This patchset is about giving distro, system builders, and
> users another choice in how they build their system. We've seen both
Oh, you misunderstand. Whereas I do feel there are important concerns in
Eric's objections, and whereas I don't feel this set sufficiently
addresses the problems that I see and outlined above, I do see value in
this set, and was not aiming to deter it. We need better ways to
mitigate a certain clas sof 0-days without completely disallowing use of
user namespaces, and this may help.
> in this patchset and in previously failed attempts that there is a
> definite want from a user perspective for functionality such as this,
> and I think it's time we deliver it in the upstream kernel so they
> don't have to keep patching their own systems with out-of-tree
> patches.
>
> > Eric and Paul, I wonder, will you - or some people you'd like to represent
> > you - be at plumbers in September? Should there be a BOF session there? (I
> > won't be there, but could join over video) I think a brainstorming session
> > for solutions to the above problems would be good.
>
> Regardless of if Eric or I will be at LPC, it is doubtful that all of
> the people who have participated in this discussion will be able to
> attend, and I think it's important that the users who are asking for
> this patchset have a chance to be heard in each forum where this is
> discussed. While conferences are definitely nice - I definitely
> missed them over the past couple of years - we can't use them as a
> crutch to help us reach a conclusion on this issue; we've debated much
No I wasn't thinking we would use LPC to decide on this patchset. As far
as I can see, the patchset is merged. I am hoping we can come up with
"something better" to address people's needs, make everyone happy, and
bring forth world peace. Which would stack just fine with what's here
for defense in depth.
You may well not be interested in further work, and that's fine. I need
to set aside a few days to think on this.
> more difficult things over the mailing lists, I see no reason why this
> would be any different.
>
> --
> paul-moore.com
On Fri, Aug 19, 2022 at 10:45 AM Serge E. Hallyn <[email protected]> wrote:
> On Thu, Aug 18, 2022 at 11:11:06AM -0400, Paul Moore wrote:
> > On Thu, Aug 18, 2022 at 10:05 AM Serge E. Hallyn <[email protected]> wrote:
...
> > > I do strongly sympathize with Eric's points. It will be very easy, once
> > > user namespace creation has been further restricted in some distros, to
> > > say "well see this stuff is silly" and go back to simply requiring root
> > > to create all containers and namespaces, which is generally quite a bit
> > > easier anywway. And then, of course, give everyone root so they can
> > > start containers.
> >
> > That's assuming a lot. Many years have passed since namespaces were
> > first introduced, and awareness of good security practices has
> > improved, perhaps not as much as any of us would like, but to say that
> > distros, system builders, and even users are the same as they were so
> > many years ago is a bit of a stretch in my opinion.
>
> Maybe. But I do get a bit worried based on some of what I've been
> reading in mailing lists lately. Kernel dev definitely moves like
> fashion - remember when every api should have its own filesystem?
> That was not a different group of people.
I'm not going to argue against the idea that kernel development is
subject to fads, I just don't agree that adding a LSM control point
for user namespace creation is going to be the end of user namespaces.
> > However, even ignoring that for a moment, do we really want to go to a
> > place where we dictate how users compose and secure their systems?
> > Linux "took over the world" because it offered a level of flexibility
> > that wasn't really possible before, and it has flourished because it
> > has kept that mentality. The Linux Kernel can be shoehorned onto most
> > hardware that you can get your hands on these days, with driver
> > support for most anything you can think to plug into the system. Do
> > you want a single-user environment with no per-user separation? We
> > can do that. Do you want a traditional DAC based system that leans
> > heavy on ACLs and capabilities? We can do that. Do you want a
> > container host that allows you to carve up the system with a high
> > degree of granularity thanks to the different namespaces? We can do
> > that. How about a system that leverages the LSM to enforce a least
> > privilege ideal, even on the most privileged root user? We can do
> > that too. This patchset is about giving distro, system builders, and
> > users another choice in how they build their system. We've seen both
>
> Oh, you misunderstand. Whereas I do feel there are important concerns in
> Eric's objections, and whereas I don't feel this set sufficiently
> addresses the problems that I see and outlined above, I do see value in
> this set, and was not aiming to deter it. We need better ways to
> mitigate a certain clas sof 0-days without completely disallowing use of
> user namespaces, and this may help.
Ah, thanks for the explanation, I missed that (obviously) in your
previous email. If I'm perfectly honest, I suppose the protracted
debate with Eric has also left me a little overly sensitive to any
perceived arguments against this patchset.
> > in this patchset and in previously failed attempts that there is a
> > definite want from a user perspective for functionality such as this,
> > and I think it's time we deliver it in the upstream kernel so they
> > don't have to keep patching their own systems with out-of-tree
> > patches.
> >
> > > Eric and Paul, I wonder, will you - or some people you'd like to represent
> > > you - be at plumbers in September? Should there be a BOF session there? (I
> > > won't be there, but could join over video) I think a brainstorming session
> > > for solutions to the above problems would be good.
> >
> > Regardless of if Eric or I will be at LPC, it is doubtful that all of
> > the people who have participated in this discussion will be able to
> > attend, and I think it's important that the users who are asking for
> > this patchset have a chance to be heard in each forum where this is
> > discussed. While conferences are definitely nice - I definitely
> > missed them over the past couple of years - we can't use them as a
> > crutch to help us reach a conclusion on this issue; we've debated much
>
> No I wasn't thinking we would use LPC to decide on this patchset. As far
> as I can see, the patchset is merged.
While I maintain that Frederick's patches are a good thing, I'm not
going to consider them "merged" until I see them in Linus' tree or
Linus decided to voice his support on the lists. These patches do
have Eric's NACK, and a maintainer's NACK isn't something to take
lightly. I certainly don't.
> I am hoping we can come up with
> "something better" to address people's needs, make everyone happy, and
> bring forth world peace. Which would stack just fine with what's here
> for defense in depth.
>
> You may well not be interested in further work, and that's fine. I need
> to set aside a few days to think on this.
I'm happy to continue the discussion as long as it's constructive; I
think we all are. My gut feeling is that Frederick's approach falls
closest to the sweet spot of "workable without being overly offensive"
(*cough*), but if you've got an additional approach in mind, or an
alternative approach that solves the same use case problems, I think
we'd all love to hear about it.
--
paul-moore.com
Paul Moore <[email protected]> writes:
> On Fri, Aug 19, 2022 at 10:45 AM Serge E. Hallyn <[email protected]> wrote:
>> I am hoping we can come up with
>> "something better" to address people's needs, make everyone happy, and
>> bring forth world peace. Which would stack just fine with what's here
>> for defense in depth.
>>
>> You may well not be interested in further work, and that's fine. I need
>> to set aside a few days to think on this.
>
> I'm happy to continue the discussion as long as it's constructive; I
> think we all are. My gut feeling is that Frederick's approach falls
> closest to the sweet spot of "workable without being overly offensive"
> (*cough*), but if you've got an additional approach in mind, or an
> alternative approach that solves the same use case problems, I think
> we'd all love to hear about it.
I would love to actually hear the problems people are trying to solve so
that we can have a sensible conversation about the trade offs.
As best I can tell without more information people want to use
the creation of a user namespace as a signal that the code is
attempting an exploit.
As such let me propose instead of returning an error code which will let
the exploit continue, have the security hook return a bool. With true
meaning the code can continue and on false it will trigger using SIGSYS
to terminate the program like seccomp does.
I am not super fond of that idea, but it means that userspace code is
not expected to deal with the situation, and the only conversation a
userspace application developer needs to enter into with a system
administrator or security policy developer is one to prove they are not
exploit code. Plus it makes much more sense to kill an exploit
immediately instead of letting it run.
In general when addressing code coverage concerns I think it makes more
sense to use the security hooks to implement some variety of the principle
of least privilege and only give applications access to the kernel
facilities they are known to use.
As far as I can tell creating a user namespace does not increase the
attack surface. It is the creation of the other namespaces from a user
namespace that begins to do that. So in general I would think
restrictions should be in places they matter.
Just like the bugs that have exploits that involve the user namespace
are not user namespace bugs, but instead they are bugs in other
subsystems that just happen to go through the user namespace as the
easiest path to the buggy code, not the only path to the buggy code.
Eric
On Thu, Aug 25, 2022 at 2:15 PM Eric W. Biederman <[email protected]> wrote:
> Paul Moore <[email protected]> writes:
> > On Fri, Aug 19, 2022 at 10:45 AM Serge E. Hallyn <[email protected]> wrote:
> >> I am hoping we can come up with
> >> "something better" to address people's needs, make everyone happy, and
> >> bring forth world peace. Which would stack just fine with what's here
> >> for defense in depth.
> >>
> >> You may well not be interested in further work, and that's fine. I need
> >> to set aside a few days to think on this.
> >
> > I'm happy to continue the discussion as long as it's constructive; I
> > think we all are. My gut feeling is that Frederick's approach falls
> > closest to the sweet spot of "workable without being overly offensive"
> > (*cough*), but if you've got an additional approach in mind, or an
> > alternative approach that solves the same use case problems, I think
> > we'd all love to hear about it.
>
> I would love to actually hear the problems people are trying to solve so
> that we can have a sensible conversation about the trade offs.
Here are several taken from the previous threads, it's surely not a
complete list, but it should give you a good idea:
https://lore.kernel.org/linux-security-module/CAHC9VhQnPAsmjmKo-e84XDJ1wmaOFkTKPjjztsOa9Yrq+AeAQA@mail.gmail.com/
> As best I can tell without more information people want to use
> the creation of a user namespace as a signal that the code is
> attempting an exploit.
Some use cases are like that, there are several other use cases that
go beyond this; see all of our previous discussions on this
topic/patchset. As has been mentioned before, there are use cases
that require improved observability, access control, or both.
> As such let me propose instead of returning an error code which will let
> the exploit continue, have the security hook return a bool. With true
> meaning the code can continue and on false it will trigger using SIGSYS
> to terminate the program like seccomp does.
Having the kernel forcibly exit the process isn't something that most
LSMs would likely want. I suppose we could modify the hook/caller so
that *if* an LSM wanted to return SIGSYS the system would kill the
process, but I would want that to be something in addition to
returning an error code like LSMs normally do (e.g. EACCES).
--
paul-moore.com
> On Aug 25, 2022, at 12:19 PM, Paul Moore <[email protected]> wrote:
>
> On Thu, Aug 25, 2022 at 2:15 PM Eric W. Biederman <[email protected]> wrote:
>> Paul Moore <[email protected]> writes:
>>> On Fri, Aug 19, 2022 at 10:45 AM Serge E. Hallyn <[email protected]> wrote:
>>>> I am hoping we can come up with
>>>> "something better" to address people's needs, make everyone happy, and
>>>> bring forth world peace. Which would stack just fine with what's here
>>>> for defense in depth.
>>>>
>>>> You may well not be interested in further work, and that's fine. I need
>>>> to set aside a few days to think on this.
>>>
>>> I'm happy to continue the discussion as long as it's constructive; I
>>> think we all are. My gut feeling is that Frederick's approach falls
>>> closest to the sweet spot of "workable without being overly offensive"
>>> (*cough*), but if you've got an additional approach in mind, or an
>>> alternative approach that solves the same use case problems, I think
>>> we'd all love to hear about it.
>>
>> I would love to actually hear the problems people are trying to solve so
>> that we can have a sensible conversation about the trade offs.
>
> Here are several taken from the previous threads, it's surely not a
> complete list, but it should give you a good idea:
>
> https://lore.kernel.org/linux-security-module/CAHC9VhQnPAsmjmKo-e84XDJ1wmaOFkTKPjjztsOa9Yrq+AeAQA@mail.gmail.com/
>
>> As best I can tell without more information people want to use
>> the creation of a user namespace as a signal that the code is
>> attempting an exploit.
>
> Some use cases are like that, there are several other use cases that
> go beyond this; see all of our previous discussions on this
> topic/patchset. As has been mentioned before, there are use cases
> that require improved observability, access control, or both.
>
>> As such let me propose instead of returning an error code which will let
>> the exploit continue, have the security hook return a bool. With true
>> meaning the code can continue and on false it will trigger using SIGSYS
>> to terminate the program like seccomp does.
>
> Having the kernel forcibly exit the process isn't something that most
> LSMs would likely want. I suppose we could modify the hook/caller so
> that *if* an LSM wanted to return SIGSYS the system would kill the
> process, but I would want that to be something in addition to
> returning an error code like LSMs normally do (e.g. EACCES).
I am new to user_namespace and security work, so please pardon me if
anything below is very wrong.
IIUC, user_namespace is a tool that enables trusted userspace code to
control the behavior of untrusted (or less trusted) userspace code.
Failing create_user_ns() doesn't make the system more reliable.
Specifically, we call create_user_ns() via two paths: fork/clone and
unshare. For both paths, we need the userspace to use user_namespace,
and to honor failed create_user_ns().
On the other hand, I would echo that killing the process is not
practical in some use cases. Specifically, allowing the application to
run in a less secure environment for a short period of time might be
much better than killing it and taking down the whole service. Of
course, there are other cases that security is more important, and
taking down the whole service is the better choice.
I guess the ultimate solution is a way to enforce using user_namespace
in the kernel (if it ever makes sense...). But I don't know how that
gonna work. Before we have such solution, maybe we only need an
void hook for observability (or just a tracepoint, coming from BPF
background).
Thanks,
Song
On Thu, Aug 25, 2022 at 5:58 PM Song Liu <[email protected]> wrote:
> > On Aug 25, 2022, at 12:19 PM, Paul Moore <[email protected]> wrote:
> >
> > On Thu, Aug 25, 2022 at 2:15 PM Eric W. Biederman <[email protected]> wrote:
> >> Paul Moore <[email protected]> writes:
> >>> On Fri, Aug 19, 2022 at 10:45 AM Serge E. Hallyn <[email protected]> wrote:
> >>>> I am hoping we can come up with
> >>>> "something better" to address people's needs, make everyone happy, and
> >>>> bring forth world peace. Which would stack just fine with what's here
> >>>> for defense in depth.
> >>>>
> >>>> You may well not be interested in further work, and that's fine. I need
> >>>> to set aside a few days to think on this.
> >>>
> >>> I'm happy to continue the discussion as long as it's constructive; I
> >>> think we all are. My gut feeling is that Frederick's approach falls
> >>> closest to the sweet spot of "workable without being overly offensive"
> >>> (*cough*), but if you've got an additional approach in mind, or an
> >>> alternative approach that solves the same use case problems, I think
> >>> we'd all love to hear about it.
> >>
> >> I would love to actually hear the problems people are trying to solve so
> >> that we can have a sensible conversation about the trade offs.
> >
> > Here are several taken from the previous threads, it's surely not a
> > complete list, but it should give you a good idea:
> >
> > https://lore.kernel.org/linux-security-module/CAHC9VhQnPAsmjmKo-e84XDJ1wmaOFkTKPjjztsOa9Yrq+AeAQA@mail.gmail.com/
> >
> >> As best I can tell without more information people want to use
> >> the creation of a user namespace as a signal that the code is
> >> attempting an exploit.
> >
> > Some use cases are like that, there are several other use cases that
> > go beyond this; see all of our previous discussions on this
> > topic/patchset. As has been mentioned before, there are use cases
> > that require improved observability, access control, or both.
> >
> >> As such let me propose instead of returning an error code which will let
> >> the exploit continue, have the security hook return a bool. With true
> >> meaning the code can continue and on false it will trigger using SIGSYS
> >> to terminate the program like seccomp does.
> >
> > Having the kernel forcibly exit the process isn't something that most
> > LSMs would likely want. I suppose we could modify the hook/caller so
> > that *if* an LSM wanted to return SIGSYS the system would kill the
> > process, but I would want that to be something in addition to
> > returning an error code like LSMs normally do (e.g. EACCES).
>
> I am new to user_namespace and security work, so please pardon me if
> anything below is very wrong.
>
> IIUC, user_namespace is a tool that enables trusted userspace code to
> control the behavior of untrusted (or less trusted) userspace code.
> Failing create_user_ns() doesn't make the system more reliable.
> Specifically, we call create_user_ns() via two paths: fork/clone and
> unshare. For both paths, we need the userspace to use user_namespace,
> and to honor failed create_user_ns().
>
> On the other hand, I would echo that killing the process is not
> practical in some use cases. Specifically, allowing the application to
> run in a less secure environment for a short period of time might be
> much better than killing it and taking down the whole service. Of
> course, there are other cases that security is more important, and
> taking down the whole service is the better choice.
>
> I guess the ultimate solution is a way to enforce using user_namespace
> in the kernel (if it ever makes sense...).
The LSM framework, and the BPF and SELinux LSM implementations in this
patchset, provide a mechanism to do just that: kernel enforced access
controls using flexible security policies which can be tailored by the
distro, solution provider, or end user to meet the specific needs of
their use case.
--
paul-moore.com
> On Aug 25, 2022, at 3:10 PM, Paul Moore <[email protected]> wrote:
>
> On Thu, Aug 25, 2022 at 5:58 PM Song Liu <[email protected]> wrote:
>>> On Aug 25, 2022, at 12:19 PM, Paul Moore <[email protected]> wrote:
>>>
>>> On Thu, Aug 25, 2022 at 2:15 PM Eric W. Biederman <[email protected]> wrote:
>>>> Paul Moore <[email protected]> writes:
>>>>> On Fri, Aug 19, 2022 at 10:45 AM Serge E. Hallyn <[email protected]> wrote:
>>>>>> I am hoping we can come up with
>>>>>> "something better" to address people's needs, make everyone happy, and
>>>>>> bring forth world peace. Which would stack just fine with what's here
>>>>>> for defense in depth.
>>>>>>
>>>>>> You may well not be interested in further work, and that's fine. I need
>>>>>> to set aside a few days to think on this.
>>>>>
>>>>> I'm happy to continue the discussion as long as it's constructive; I
>>>>> think we all are. My gut feeling is that Frederick's approach falls
>>>>> closest to the sweet spot of "workable without being overly offensive"
>>>>> (*cough*), but if you've got an additional approach in mind, or an
>>>>> alternative approach that solves the same use case problems, I think
>>>>> we'd all love to hear about it.
>>>>
>>>> I would love to actually hear the problems people are trying to solve so
>>>> that we can have a sensible conversation about the trade offs.
>>>
>>> Here are several taken from the previous threads, it's surely not a
>>> complete list, but it should give you a good idea:
>>>
>>> https://lore.kernel.org/linux-security-module/CAHC9VhQnPAsmjmKo-e84XDJ1wmaOFkTKPjjztsOa9Yrq+AeAQA@mail.gmail.com/
>>>
>>>> As best I can tell without more information people want to use
>>>> the creation of a user namespace as a signal that the code is
>>>> attempting an exploit.
>>>
>>> Some use cases are like that, there are several other use cases that
>>> go beyond this; see all of our previous discussions on this
>>> topic/patchset. As has been mentioned before, there are use cases
>>> that require improved observability, access control, or both.
>>>
>>>> As such let me propose instead of returning an error code which will let
>>>> the exploit continue, have the security hook return a bool. With true
>>>> meaning the code can continue and on false it will trigger using SIGSYS
>>>> to terminate the program like seccomp does.
>>>
>>> Having the kernel forcibly exit the process isn't something that most
>>> LSMs would likely want. I suppose we could modify the hook/caller so
>>> that *if* an LSM wanted to return SIGSYS the system would kill the
>>> process, but I would want that to be something in addition to
>>> returning an error code like LSMs normally do (e.g. EACCES).
>>
>> I am new to user_namespace and security work, so please pardon me if
>> anything below is very wrong.
>>
>> IIUC, user_namespace is a tool that enables trusted userspace code to
>> control the behavior of untrusted (or less trusted) userspace code.
>> Failing create_user_ns() doesn't make the system more reliable.
>> Specifically, we call create_user_ns() via two paths: fork/clone and
>> unshare. For both paths, we need the userspace to use user_namespace,
>> and to honor failed create_user_ns().
>>
>> On the other hand, I would echo that killing the process is not
>> practical in some use cases. Specifically, allowing the application to
>> run in a less secure environment for a short period of time might be
>> much better than killing it and taking down the whole service. Of
>> course, there are other cases that security is more important, and
>> taking down the whole service is the better choice.
>>
>> I guess the ultimate solution is a way to enforce using user_namespace
>> in the kernel (if it ever makes sense...).
>
> The LSM framework, and the BPF and SELinux LSM implementations in this
> patchset, provide a mechanism to do just that: kernel enforced access
> controls using flexible security policies which can be tailored by the
> distro, solution provider, or end user to meet the specific needs of
> their use case.
In this case, I wouldn't call the kernel is enforcing access control.
(I might be wrong). There are 3 components here: kernel, LSM, and
trusted userspace (whoever calls unshare). AFAICT, kernel simply passes
the decision made by LSM (BPF or SELinux) to the trusted userspace. It
is up to the trusted userspace to honor the return value of unshare().
If the userspace simply ignores unshare failures, or does not call
unshare(CLONE_NEWUSER), kernel and LSM cannot do much about it, right?
This might still be useful in some cases. (I am far from an expert on
these). I just feel this is not the typical solution to enforce
something.
Thanks,
Song
PS: If I said something very stupid, I would not feel offended if someone
pointed it out loud. :)
On Thu, Aug 25, 2022 at 8:19 PM Paul Moore <[email protected]> wrote:
>
> On Thu, Aug 25, 2022 at 2:15 PM Eric W. Biederman <[email protected]> wrote:
> > Paul Moore <[email protected]> writes:
> > > On Fri, Aug 19, 2022 at 10:45 AM Serge E. Hallyn <[email protected]> wrote:
> > >> I am hoping we can come up with
> > >> "something better" to address people's needs, make everyone happy, and
> > >> bring forth world peace. Which would stack just fine with what's here
> > >> for defense in depth.
> > >>
> > >> You may well not be interested in further work, and that's fine. I need
> > >> to set aside a few days to think on this.
> > >
> > > I'm happy to continue the discussion as long as it's constructive; I
> > > think we all are. My gut feeling is that Frederick's approach falls
> > > closest to the sweet spot of "workable without being overly offensive"
> > > (*cough*), but if you've got an additional approach in mind, or an
> > > alternative approach that solves the same use case problems, I think
> > > we'd all love to hear about it.
> >
> > I would love to actually hear the problems people are trying to solve so
> > that we can have a sensible conversation about the trade offs.
>
> Here are several taken from the previous threads, it's surely not a
> complete list, but it should give you a good idea:
>
> https://lore.kernel.org/linux-security-module/CAHC9VhQnPAsmjmKo-e84XDJ1wmaOFkTKPjjztsOa9Yrq+AeAQA@mail.gmail.com/
>
> > As best I can tell without more information people want to use
> > the creation of a user namespace as a signal that the code is
> > attempting an exploit.
>
> Some use cases are like that, there are several other use cases that
> go beyond this; see all of our previous discussions on this
> topic/patchset. As has been mentioned before, there are use cases
> that require improved observability, access control, or both.
>
> > As such let me propose instead of returning an error code which will let
> > the exploit continue, have the security hook return a bool. With true
> > meaning the code can continue and on false it will trigger using SIGSYS
> > to terminate the program like seccomp does.
>
> Having the kernel forcibly exit the process isn't something that most
> LSMs would likely want. I suppose we could modify the hook/caller so
> that *if* an LSM wanted to return SIGSYS the system would kill the
> process, but I would want that to be something in addition to
> returning an error code like LSMs normally do (e.g. EACCES).
I would also add here that seccomp allows more flexibility than just
delivering SIGSYS to a violating application. We can program seccomp
bpf to:
* deliver a signal
* return a CUSTOM error code (and BTW somehow this does not trigger
any requirements to change userapi or document in manpages: in my toy
example in [1] I'm delivering ENETDOWN from a uname(2) system call,
which is not documented in the man pages, but totally valid from a
seccomp usage perspective)
* do-nothing, but log the action
So I would say the seccomp reference supports the current approach
more than the alternative approach of delivering SIGSYS as technically
an LSM implementation of the hook (at least in-kernel one) can chose
to deliver a signal to a task via kernel-api, but BPF-LSM (and others)
can deliver custom error codes and log the actions as well.
Ignat
> --
> paul-moore.com
[1]: https://blog.cloudflare.com/sandboxing-in-linux-with-zero-lines-of-code/
On Fri, Aug 26, 2022 at 5:11 AM Ignat Korchagin <[email protected]> wrote:
> I would also add here that seccomp allows more flexibility than just
> delivering SIGSYS to a violating application. We can program seccomp
> bpf to:
> * deliver a signal
> * return a CUSTOM error code (and BTW somehow this does not trigger
> any requirements to change userapi or document in manpages: in my toy
> example in [1] I'm delivering ENETDOWN from a uname(2) system call,
> which is not documented in the man pages, but totally valid from a
> seccomp usage perspective)
> * do-nothing, but log the action
>
> So I would say the seccomp reference supports the current approach
> more than the alternative approach of delivering SIGSYS as technically
> an LSM implementation of the hook (at least in-kernel one) can chose
> to deliver a signal to a task via kernel-api, but BPF-LSM (and others)
> can deliver custom error codes and log the actions as well.
I agree that seccomp mode 2 allows for more flexibility than was
mentioned earlier, however seccomp filtering has some limitations in
this particular case which can be an issue for some. The first, and
perhaps most important, is that some of the information that a seccomp
filter might want to inspect is effectively hidden with the clone3(2)
syscall due to the clone_args struct; this would make it difficult for
a seccomp filter to identify namespace related operations. The second
issue is that a seccomp mode 2 based approach requires the
applications themselves to "Do The Right Thing" and ensure that the
proper seccomp filter is loaded into the kernel before the target
fork()/clone()/unshare() call is executed; a LSM which implements a
proper mandatory access control mechanism does not rely on the
application, it enforces the system's security policy regardless of
what actions userspace performs.
--
paul-moore.com
On Thu, Aug 25, 2022 at 6:42 PM Song Liu <[email protected]> wrote:
> > On Aug 25, 2022, at 3:10 PM, Paul Moore <[email protected]> wrote:
> > On Thu, Aug 25, 2022 at 5:58 PM Song Liu <[email protected]> wrote:
...
> >> I am new to user_namespace and security work, so please pardon me if
> >> anything below is very wrong.
> >>
> >> IIUC, user_namespace is a tool that enables trusted userspace code to
> >> control the behavior of untrusted (or less trusted) userspace code.
> >> Failing create_user_ns() doesn't make the system more reliable.
> >> Specifically, we call create_user_ns() via two paths: fork/clone and
> >> unshare. For both paths, we need the userspace to use user_namespace,
> >> and to honor failed create_user_ns().
> >>
> >> On the other hand, I would echo that killing the process is not
> >> practical in some use cases. Specifically, allowing the application to
> >> run in a less secure environment for a short period of time might be
> >> much better than killing it and taking down the whole service. Of
> >> course, there are other cases that security is more important, and
> >> taking down the whole service is the better choice.
> >>
> >> I guess the ultimate solution is a way to enforce using user_namespace
> >> in the kernel (if it ever makes sense...).
> >
> > The LSM framework, and the BPF and SELinux LSM implementations in this
> > patchset, provide a mechanism to do just that: kernel enforced access
> > controls using flexible security policies which can be tailored by the
> > distro, solution provider, or end user to meet the specific needs of
> > their use case.
>
> In this case, I wouldn't call the kernel is enforcing access control.
> (I might be wrong). There are 3 components here: kernel, LSM, and
> trusted userspace (whoever calls unshare).
The LSM layer, and the LSMs themselves are part of the kernel; look at
the changes in this patchset to see the LSM, BPF LSM, and SELinux
kernel changes. Explaining how the different LSMs work is quite a bit
beyond the scope of this discussion, but there is plenty of
information available online that should be able to serve as an
introduction, not to mention the kernel source itself. However, in
very broad terms you can think of the individual LSMs as somewhat
analogous to filesystem drivers, e.g. ext4, and the LSM itself as the
VFS layer.
> AFAICT, kernel simply passes
> the decision made by LSM (BPF or SELinux) to the trusted userspace. It
> is up to the trusted userspace to honor the return value of unshare().
With a LSM enabled and enforcing a security policy on user namespace
creation, which appears to be the case of most concern, the kernel
would make a decision on the namespace creation based on various
factors (e.g. for SELinux this would be the calling process' security
domain and the domain's permission set as determined by the configured
security policy) and if the operation was rejected an error code would
be returned to userspace and the operation rejected. It is the exact
same thing as what would happen if the calling process is chrooted or
doesn't have a proper UID/GID mapping. Don't forget that the
create_user_ns() function already enforces a security policy and
returns errors to userspace; this patchset doesn't add anything new in
that regard, it just allows for a richer and more flexible security
policy to be built on top of the existing constraints.
> If the userspace simply ignores unshare failures, or does not call
> unshare(CLONE_NEWUSER), kernel and LSM cannot do much about it, right?
The process is still subject to any security policies that are active
and being enforced by the kernel. A malicious or misconfigured
application can still be constrained by the kernel using both the
kernel's legacy Discretionary Access Controls (DAC) as well as the
more comprehensive Mandatory Access Controls (MAC) provided by many of
the LSMs.
--
paul-moore.com
On Thu, Aug 25, 2022 at 01:15:46PM -0500, Eric W. Biederman wrote:
> Paul Moore <[email protected]> writes:
>
> > On Fri, Aug 19, 2022 at 10:45 AM Serge E. Hallyn <[email protected]> wrote:
> >> I am hoping we can come up with
> >> "something better" to address people's needs, make everyone happy, and
> >> bring forth world peace. Which would stack just fine with what's here
> >> for defense in depth.
> >>
> >> You may well not be interested in further work, and that's fine. I need
> >> to set aside a few days to think on this.
> >
> > I'm happy to continue the discussion as long as it's constructive; I
> > think we all are. My gut feeling is that Frederick's approach falls
> > closest to the sweet spot of "workable without being overly offensive"
> > (*cough*), but if you've got an additional approach in mind, or an
> > alternative approach that solves the same use case problems, I think
> > we'd all love to hear about it.
>
> I would love to actually hear the problems people are trying to solve so
> that we can have a sensible conversation about the trade offs.
>
> As best I can tell without more information people want to use
> the creation of a user namespace as a signal that the code is
> attempting an exploit.
I don't think that's it at all. I think the problem is that it seems
you can pretty reliably get a root shell at some point in the future
by creating a user namespace, leaving it open for a bit, and waiting
for a new announcement of the latest netfilter or whatever exploit
that requires root in a user namespace. Then go back to your userns
shell and run the exploit.
So i was hoping we could do something more targeted. Be it splitting
off the ability to run code under capable_ns code from uid mapping (to
an extent), or maybe some limited-livepatch type of thing where
certain parts of code become inaccessible to code in a non-init userns
after some sysctl has been toggled, or something cooloer that I've
failed to think of.
-serge
On Thu, Aug 25, 2022 at 09:58:46PM +0000, Song Liu wrote:
>
>
> > On Aug 25, 2022, at 12:19 PM, Paul Moore <[email protected]> wrote:
> >
> > On Thu, Aug 25, 2022 at 2:15 PM Eric W. Biederman <[email protected]> wrote:
> >> Paul Moore <[email protected]> writes:
> >>> On Fri, Aug 19, 2022 at 10:45 AM Serge E. Hallyn <[email protected]> wrote:
> >>>> I am hoping we can come up with
> >>>> "something better" to address people's needs, make everyone happy, and
> >>>> bring forth world peace. Which would stack just fine with what's here
> >>>> for defense in depth.
> >>>>
> >>>> You may well not be interested in further work, and that's fine. I need
> >>>> to set aside a few days to think on this.
> >>>
> >>> I'm happy to continue the discussion as long as it's constructive; I
> >>> think we all are. My gut feeling is that Frederick's approach falls
> >>> closest to the sweet spot of "workable without being overly offensive"
> >>> (*cough*), but if you've got an additional approach in mind, or an
> >>> alternative approach that solves the same use case problems, I think
> >>> we'd all love to hear about it.
> >>
> >> I would love to actually hear the problems people are trying to solve so
> >> that we can have a sensible conversation about the trade offs.
> >
> > Here are several taken from the previous threads, it's surely not a
> > complete list, but it should give you a good idea:
> >
> > https://lore.kernel.org/linux-security-module/CAHC9VhQnPAsmjmKo-e84XDJ1wmaOFkTKPjjztsOa9Yrq+AeAQA@mail.gmail.com/
> >
> >> As best I can tell without more information people want to use
> >> the creation of a user namespace as a signal that the code is
> >> attempting an exploit.
> >
> > Some use cases are like that, there are several other use cases that
> > go beyond this; see all of our previous discussions on this
> > topic/patchset. As has been mentioned before, there are use cases
> > that require improved observability, access control, or both.
> >
> >> As such let me propose instead of returning an error code which will let
> >> the exploit continue, have the security hook return a bool. With true
> >> meaning the code can continue and on false it will trigger using SIGSYS
> >> to terminate the program like seccomp does.
> >
> > Having the kernel forcibly exit the process isn't something that most
> > LSMs would likely want. I suppose we could modify the hook/caller so
> > that *if* an LSM wanted to return SIGSYS the system would kill the
> > process, but I would want that to be something in addition to
> > returning an error code like LSMs normally do (e.g. EACCES).
>
> I am new to user_namespace and security work, so please pardon me if
> anything below is very wrong.
>
> IIUC, user_namespace is a tool that enables trusted userspace code to
> control the behavior of untrusted (or less trusted) userspace code.
No. user namespaces are not a way for more trusted code to control the
behavior of less trusted code.
> Failing create_user_ns() doesn't make the system more reliable.
> Specifically, we call create_user_ns() via two paths: fork/clone and
> unshare. For both paths, we need the userspace to use user_namespace,
> and to honor failed create_user_ns().
>
> On the other hand, I would echo that killing the process is not
> practical in some use cases. Specifically, allowing the application to
> run in a less secure environment for a short period of time might be
> much better than killing it and taking down the whole service. Of
> course, there are other cases that security is more important, and
> taking down the whole service is the better choice.
>
> I guess the ultimate solution is a way to enforce using user_namespace
> in the kernel (if it ever makes sense...). But I don't know how that
> gonna work. Before we have such solution, maybe we only need an
> void hook for observability (or just a tracepoint, coming from BPF
> background).
>
> Thanks,
> Song
> On Aug 26, 2022, at 8:24 AM, Serge E. Hallyn <[email protected]> wrote:
>
> On Thu, Aug 25, 2022 at 09:58:46PM +0000, Song Liu wrote:
>>
>>
>>> On Aug 25, 2022, at 12:19 PM, Paul Moore <[email protected]> wrote:
>>>
>>> On Thu, Aug 25, 2022 at 2:15 PM Eric W. Biederman <[email protected]> wrote:
>>>> Paul Moore <[email protected]> writes:
>>>>> On Fri, Aug 19, 2022 at 10:45 AM Serge E. Hallyn <[email protected]> wrote:
>>>>>> I am hoping we can come up with
>>>>>> "something better" to address people's needs, make everyone happy, and
>>>>>> bring forth world peace. Which would stack just fine with what's here
>>>>>> for defense in depth.
>>>>>>
>>>>>> You may well not be interested in further work, and that's fine. I need
>>>>>> to set aside a few days to think on this.
>>>>>
>>>>> I'm happy to continue the discussion as long as it's constructive; I
>>>>> think we all are. My gut feeling is that Frederick's approach falls
>>>>> closest to the sweet spot of "workable without being overly offensive"
>>>>> (*cough*), but if you've got an additional approach in mind, or an
>>>>> alternative approach that solves the same use case problems, I think
>>>>> we'd all love to hear about it.
>>>>
>>>> I would love to actually hear the problems people are trying to solve so
>>>> that we can have a sensible conversation about the trade offs.
>>>
>>> Here are several taken from the previous threads, it's surely not a
>>> complete list, but it should give you a good idea:
>>>
>>> https://lore.kernel.org/linux-security-module/CAHC9VhQnPAsmjmKo-e84XDJ1wmaOFkTKPjjztsOa9Yrq+AeAQA@mail.gmail.com/
>>>
>>>> As best I can tell without more information people want to use
>>>> the creation of a user namespace as a signal that the code is
>>>> attempting an exploit.
>>>
>>> Some use cases are like that, there are several other use cases that
>>> go beyond this; see all of our previous discussions on this
>>> topic/patchset. As has been mentioned before, there are use cases
>>> that require improved observability, access control, or both.
>>>
>>>> As such let me propose instead of returning an error code which will let
>>>> the exploit continue, have the security hook return a bool. With true
>>>> meaning the code can continue and on false it will trigger using SIGSYS
>>>> to terminate the program like seccomp does.
>>>
>>> Having the kernel forcibly exit the process isn't something that most
>>> LSMs would likely want. I suppose we could modify the hook/caller so
>>> that *if* an LSM wanted to return SIGSYS the system would kill the
>>> process, but I would want that to be something in addition to
>>> returning an error code like LSMs normally do (e.g. EACCES).
>>
>> I am new to user_namespace and security work, so please pardon me if
>> anything below is very wrong.
>>
>> IIUC, user_namespace is a tool that enables trusted userspace code to
>> control the behavior of untrusted (or less trusted) userspace code.
>
> No. user namespaces are not a way for more trusted code to control the
> behavior of less trusted code.
Hmm.. In this case, I think I really need to learn more.
Thanks for pointing out my misunderstanding.
Song
>
>> Failing create_user_ns() doesn't make the system more reliable.
>> Specifically, we call create_user_ns() via two paths: fork/clone and
>> unshare. For both paths, we need the userspace to use user_namespace,
>> and to honor failed create_user_ns().
>>
>> On the other hand, I would echo that killing the process is not
>> practical in some use cases. Specifically, allowing the application to
>> run in a less secure environment for a short period of time might be
>> much better than killing it and taking down the whole service. Of
>> course, there are other cases that security is more important, and
>> taking down the whole service is the better choice.
>>
>> I guess the ultimate solution is a way to enforce using user_namespace
>> in the kernel (if it ever makes sense...). But I don't know how that
>> gonna work. Before we have such solution, maybe we only need an
>> void hook for observability (or just a tracepoint, coming from BPF
>> background).
>>
>> Thanks,
>> Song
> On Aug 26, 2022, at 8:02 AM, Paul Moore <[email protected]> wrote:
>
> On Thu, Aug 25, 2022 at 6:42 PM Song Liu <[email protected]> wrote:
>>> On Aug 25, 2022, at 3:10 PM, Paul Moore <[email protected]> wrote:
>>> On Thu, Aug 25, 2022 at 5:58 PM Song Liu <[email protected]> wrote:
>
> ...
>
>>>> I am new to user_namespace and security work, so please pardon me if
>>>> anything below is very wrong.
>>>>
>>>> IIUC, user_namespace is a tool that enables trusted userspace code to
>>>> control the behavior of untrusted (or less trusted) userspace code.
>>>> Failing create_user_ns() doesn't make the system more reliable.
>>>> Specifically, we call create_user_ns() via two paths: fork/clone and
>>>> unshare. For both paths, we need the userspace to use user_namespace,
>>>> and to honor failed create_user_ns().
>>>>
>>>> On the other hand, I would echo that killing the process is not
>>>> practical in some use cases. Specifically, allowing the application to
>>>> run in a less secure environment for a short period of time might be
>>>> much better than killing it and taking down the whole service. Of
>>>> course, there are other cases that security is more important, and
>>>> taking down the whole service is the better choice.
>>>>
>>>> I guess the ultimate solution is a way to enforce using user_namespace
>>>> in the kernel (if it ever makes sense...).
>>>
>>> The LSM framework, and the BPF and SELinux LSM implementations in this
>>> patchset, provide a mechanism to do just that: kernel enforced access
>>> controls using flexible security policies which can be tailored by the
>>> distro, solution provider, or end user to meet the specific needs of
>>> their use case.
>>
>> In this case, I wouldn't call the kernel is enforcing access control.
>> (I might be wrong). There are 3 components here: kernel, LSM, and
>> trusted userspace (whoever calls unshare).
>
> The LSM layer, and the LSMs themselves are part of the kernel; look at
> the changes in this patchset to see the LSM, BPF LSM, and SELinux
> kernel changes. Explaining how the different LSMs work is quite a bit
> beyond the scope of this discussion, but there is plenty of
> information available online that should be able to serve as an
> introduction, not to mention the kernel source itself. However, in
> very broad terms you can think of the individual LSMs as somewhat
> analogous to filesystem drivers, e.g. ext4, and the LSM itself as the
> VFS layer.
Thanks for the explanation. This matches my understanding with LSM.
>
>> AFAICT, kernel simply passes
>> the decision made by LSM (BPF or SELinux) to the trusted userspace. It
>> is up to the trusted userspace to honor the return value of unshare().
>
> With a LSM enabled and enforcing a security policy on user namespace
> creation, which appears to be the case of most concern, the kernel
> would make a decision on the namespace creation based on various
> factors (e.g. for SELinux this would be the calling process' security
> domain and the domain's permission set as determined by the configured
> security policy) and if the operation was rejected an error code would
> be returned to userspace and the operation rejected. It is the exact
> same thing as what would happen if the calling process is chrooted or
> doesn't have a proper UID/GID mapping. Don't forget that the
> create_user_ns() function already enforces a security policy and
> returns errors to userspace; this patchset doesn't add anything new in
> that regard, it just allows for a richer and more flexible security
> policy to be built on top of the existing constraints.
I believe I don't understand user namespace enough to agree or disagree
here. I guess I should read more.
Thanks,
Song
>
>> If the userspace simply ignores unshare failures, or does not call
>> unshare(CLONE_NEWUSER), kernel and LSM cannot do much about it, right?
>
> The process is still subject to any security policies that are active
> and being enforced by the kernel. A malicious or misconfigured
> application can still be constrained by the kernel using both the
> kernel's legacy Discretionary Access Controls (DAC) as well as the
> more comprehensive Mandatory Access Controls (MAC) provided by many of
> the LSMs.
>
> --
> paul-moore.com
On Fri, Aug 26, 2022 at 05:00:51PM +0000, Song Liu wrote:
>
>
> > On Aug 26, 2022, at 8:24 AM, Serge E. Hallyn <[email protected]> wrote:
> >
> > On Thu, Aug 25, 2022 at 09:58:46PM +0000, Song Liu wrote:
> >>
> >>
> >>> On Aug 25, 2022, at 12:19 PM, Paul Moore <[email protected]> wrote:
> >>>
> >>> On Thu, Aug 25, 2022 at 2:15 PM Eric W. Biederman <[email protected]> wrote:
> >>>> Paul Moore <[email protected]> writes:
> >>>>> On Fri, Aug 19, 2022 at 10:45 AM Serge E. Hallyn <[email protected]> wrote:
> >>>>>> I am hoping we can come up with
> >>>>>> "something better" to address people's needs, make everyone happy, and
> >>>>>> bring forth world peace. Which would stack just fine with what's here
> >>>>>> for defense in depth.
> >>>>>>
> >>>>>> You may well not be interested in further work, and that's fine. I need
> >>>>>> to set aside a few days to think on this.
> >>>>>
> >>>>> I'm happy to continue the discussion as long as it's constructive; I
> >>>>> think we all are. My gut feeling is that Frederick's approach falls
> >>>>> closest to the sweet spot of "workable without being overly offensive"
> >>>>> (*cough*), but if you've got an additional approach in mind, or an
> >>>>> alternative approach that solves the same use case problems, I think
> >>>>> we'd all love to hear about it.
> >>>>
> >>>> I would love to actually hear the problems people are trying to solve so
> >>>> that we can have a sensible conversation about the trade offs.
> >>>
> >>> Here are several taken from the previous threads, it's surely not a
> >>> complete list, but it should give you a good idea:
> >>>
> >>> https://lore.kernel.org/linux-security-module/CAHC9VhQnPAsmjmKo-e84XDJ1wmaOFkTKPjjztsOa9Yrq+AeAQA@mail.gmail.com/
> >>>
> >>>> As best I can tell without more information people want to use
> >>>> the creation of a user namespace as a signal that the code is
> >>>> attempting an exploit.
> >>>
> >>> Some use cases are like that, there are several other use cases that
> >>> go beyond this; see all of our previous discussions on this
> >>> topic/patchset. As has been mentioned before, there are use cases
> >>> that require improved observability, access control, or both.
> >>>
> >>>> As such let me propose instead of returning an error code which will let
> >>>> the exploit continue, have the security hook return a bool. With true
> >>>> meaning the code can continue and on false it will trigger using SIGSYS
> >>>> to terminate the program like seccomp does.
> >>>
> >>> Having the kernel forcibly exit the process isn't something that most
> >>> LSMs would likely want. I suppose we could modify the hook/caller so
> >>> that *if* an LSM wanted to return SIGSYS the system would kill the
> >>> process, but I would want that to be something in addition to
> >>> returning an error code like LSMs normally do (e.g. EACCES).
> >>
> >> I am new to user_namespace and security work, so please pardon me if
> >> anything below is very wrong.
> >>
> >> IIUC, user_namespace is a tool that enables trusted userspace code to
> >> control the behavior of untrusted (or less trusted) userspace code.
> >
> > No. user namespaces are not a way for more trusted code to control the
> > behavior of less trusted code.
>
> Hmm.. In this case, I think I really need to learn more.
>
> Thanks for pointing out my misunderstanding.
(I thought maybe Eric would chime in with a better explanation, but I'll
fill it in for now :)
One of the main goals of user namespaces is to allow unprivileged users
to do things like chroot and mount, which are very useful development
tools, without needing admin privileges. So it's almost the opposite
of what you said: rather than to enable trusted userspace code to control
the behavior of less trusted code, it's to allow less privileged code to
do things which do not affect other users, without having to assume *more*
privilege.
To be precise, the goals were:
1. uid mapping - allow two users to both "use uid 500" without conflicting
2. provide (unprivileged) users privilege over their own resources
3. absolutely no extra privilege over other resources
4. be able to nest
While (3) was technically achieved, the problem we have is that
(2) provides unprivileged users the ability to exercise kernel code
which they previously could not.
-serge
> On Aug 26, 2022, at 2:00 PM, Serge E. Hallyn <[email protected]> wrote:
>
> On Fri, Aug 26, 2022 at 05:00:51PM +0000, Song Liu wrote:
>>
>>
>>> On Aug 26, 2022, at 8:24 AM, Serge E. Hallyn <[email protected]> wrote:
>>>
>>> On Thu, Aug 25, 2022 at 09:58:46PM +0000, Song Liu wrote:
>>>>
>>>>
>>>>> On Aug 25, 2022, at 12:19 PM, Paul Moore <[email protected]> wrote:
>>>>>
>>>>> On Thu, Aug 25, 2022 at 2:15 PM Eric W. Biederman <[email protected]> wrote:
>>>>>> Paul Moore <[email protected]> writes:
>>>>>>> On Fri, Aug 19, 2022 at 10:45 AM Serge E. Hallyn <[email protected]> wrote:
>>>>>>>> I am hoping we can come up with
>>>>>>>> "something better" to address people's needs, make everyone happy, and
>>>>>>>> bring forth world peace. Which would stack just fine with what's here
>>>>>>>> for defense in depth.
>>>>>>>>
>>>>>>>> You may well not be interested in further work, and that's fine. I need
>>>>>>>> to set aside a few days to think on this.
>>>>>>>
>>>>>>> I'm happy to continue the discussion as long as it's constructive; I
>>>>>>> think we all are. My gut feeling is that Frederick's approach falls
>>>>>>> closest to the sweet spot of "workable without being overly offensive"
>>>>>>> (*cough*), but if you've got an additional approach in mind, or an
>>>>>>> alternative approach that solves the same use case problems, I think
>>>>>>> we'd all love to hear about it.
>>>>>>
>>>>>> I would love to actually hear the problems people are trying to solve so
>>>>>> that we can have a sensible conversation about the trade offs.
>>>>>
>>>>> Here are several taken from the previous threads, it's surely not a
>>>>> complete list, but it should give you a good idea:
>>>>>
>>>>> https://lore.kernel.org/linux-security-module/CAHC9VhQnPAsmjmKo-e84XDJ1wmaOFkTKPjjztsOa9Yrq+AeAQA@mail.gmail.com/
>>>>>
>>>>>> As best I can tell without more information people want to use
>>>>>> the creation of a user namespace as a signal that the code is
>>>>>> attempting an exploit.
>>>>>
>>>>> Some use cases are like that, there are several other use cases that
>>>>> go beyond this; see all of our previous discussions on this
>>>>> topic/patchset. As has been mentioned before, there are use cases
>>>>> that require improved observability, access control, or both.
>>>>>
>>>>>> As such let me propose instead of returning an error code which will let
>>>>>> the exploit continue, have the security hook return a bool. With true
>>>>>> meaning the code can continue and on false it will trigger using SIGSYS
>>>>>> to terminate the program like seccomp does.
>>>>>
>>>>> Having the kernel forcibly exit the process isn't something that most
>>>>> LSMs would likely want. I suppose we could modify the hook/caller so
>>>>> that *if* an LSM wanted to return SIGSYS the system would kill the
>>>>> process, but I would want that to be something in addition to
>>>>> returning an error code like LSMs normally do (e.g. EACCES).
>>>>
>>>> I am new to user_namespace and security work, so please pardon me if
>>>> anything below is very wrong.
>>>>
>>>> IIUC, user_namespace is a tool that enables trusted userspace code to
>>>> control the behavior of untrusted (or less trusted) userspace code.
>>>
>>> No. user namespaces are not a way for more trusted code to control the
>>> behavior of less trusted code.
>>
>> Hmm.. In this case, I think I really need to learn more.
>>
>> Thanks for pointing out my misunderstanding.
>
> (I thought maybe Eric would chime in with a better explanation, but I'll
> fill it in for now :)
>
> One of the main goals of user namespaces is to allow unprivileged users
> to do things like chroot and mount, which are very useful development
> tools, without needing admin privileges. So it's almost the opposite
> of what you said: rather than to enable trusted userspace code to control
> the behavior of less trusted code, it's to allow less privileged code to
> do things which do not affect other users, without having to assume *more*
> privilege.
Thanks for the explanation!
>
> To be precise, the goals were:
>
> 1. uid mapping - allow two users to both "use uid 500" without conflicting
> 2. provide (unprivileged) users privilege over their own resources
> 3. absolutely no extra privilege over other resources
> 4. be able to nest
Now I have better idea about "what". But I am not quite sure about how to do
it. I will do more homework, and probably come back with more questions. :)
>
> While (3) was technically achieved, the problem we have is that
> (2) provides unprivileged users the ability to exercise kernel code
> which they previously could not.
Do you mean this one?
"""
I think the problem is that it seems
you can pretty reliably get a root shell at some point in the future
by creating a user namespace, leaving it open for a bit, and waiting
for a new announcement of the latest netfilter or whatever exploit
that requires root in a user namespace. Then go back to your userns
shell and run the exploit.
"""
Please don't share how to do it yet. I want to use it as a test for my study. :)
Thanks again!
Song
On Fri, Aug 26, 2022 at 04:00:39PM -0500, Serge Hallyn wrote:
> On Fri, Aug 26, 2022 at 05:00:51PM +0000, Song Liu wrote:
> >
> >
> > > On Aug 26, 2022, at 8:24 AM, Serge E. Hallyn <[email protected]> wrote:
> > >
> > > On Thu, Aug 25, 2022 at 09:58:46PM +0000, Song Liu wrote:
> > >>
> > >>
> > >>> On Aug 25, 2022, at 12:19 PM, Paul Moore <[email protected]> wrote:
> > >>>
> > >>> On Thu, Aug 25, 2022 at 2:15 PM Eric W. Biederman <[email protected]> wrote:
> > >>>> Paul Moore <[email protected]> writes:
> > >>>>> On Fri, Aug 19, 2022 at 10:45 AM Serge E. Hallyn <[email protected]> wrote:
> > >>>>>> I am hoping we can come up with
> > >>>>>> "something better" to address people's needs, make everyone happy, and
> > >>>>>> bring forth world peace. Which would stack just fine with what's here
> > >>>>>> for defense in depth.
> > >>>>>>
> > >>>>>> You may well not be interested in further work, and that's fine. I need
> > >>>>>> to set aside a few days to think on this.
> > >>>>>
> > >>>>> I'm happy to continue the discussion as long as it's constructive; I
> > >>>>> think we all are. My gut feeling is that Frederick's approach falls
> > >>>>> closest to the sweet spot of "workable without being overly offensive"
> > >>>>> (*cough*), but if you've got an additional approach in mind, or an
> > >>>>> alternative approach that solves the same use case problems, I think
> > >>>>> we'd all love to hear about it.
> > >>>>
> > >>>> I would love to actually hear the problems people are trying to solve so
> > >>>> that we can have a sensible conversation about the trade offs.
> > >>>
> > >>> Here are several taken from the previous threads, it's surely not a
> > >>> complete list, but it should give you a good idea:
> > >>>
> > >>> https://lore.kernel.org/linux-security-module/CAHC9VhQnPAsmjmKo-e84XDJ1wmaOFkTKPjjztsOa9Yrq+AeAQA@mail.gmail.com/
> > >>>
> > >>>> As best I can tell without more information people want to use
> > >>>> the creation of a user namespace as a signal that the code is
> > >>>> attempting an exploit.
> > >>>
> > >>> Some use cases are like that, there are several other use cases that
> > >>> go beyond this; see all of our previous discussions on this
> > >>> topic/patchset. As has been mentioned before, there are use cases
> > >>> that require improved observability, access control, or both.
> > >>>
> > >>>> As such let me propose instead of returning an error code which will let
> > >>>> the exploit continue, have the security hook return a bool. With true
> > >>>> meaning the code can continue and on false it will trigger using SIGSYS
> > >>>> to terminate the program like seccomp does.
> > >>>
> > >>> Having the kernel forcibly exit the process isn't something that most
> > >>> LSMs would likely want. I suppose we could modify the hook/caller so
> > >>> that *if* an LSM wanted to return SIGSYS the system would kill the
> > >>> process, but I would want that to be something in addition to
> > >>> returning an error code like LSMs normally do (e.g. EACCES).
> > >>
> > >> I am new to user_namespace and security work, so please pardon me if
> > >> anything below is very wrong.
> > >>
> > >> IIUC, user_namespace is a tool that enables trusted userspace code to
> > >> control the behavior of untrusted (or less trusted) userspace code.
> > >
> > > No. user namespaces are not a way for more trusted code to control the
> > > behavior of less trusted code.
> >
> > Hmm.. In this case, I think I really need to learn more.
> >
> > Thanks for pointing out my misunderstanding.
>
> (I thought maybe Eric would chime in with a better explanation, but I'll
> fill it in for now :)
>
> One of the main goals of user namespaces is to allow unprivileged users
> to do things like chroot and mount, which are very useful development
> tools, without needing admin privileges. So it's almost the opposite
> of what you said: rather than to enable trusted userspace code to control
> the behavior of less trusted code, it's to allow less privileged code to
> do things which do not affect other users, without having to assume *more*
> privilege.
>
> To be precise, the goals were:
>
> 1. uid mapping - allow two users to both "use uid 500" without conflicting
> 2. provide (unprivileged) users privilege over their own resources
> 3. absolutely no extra privilege over other resources
> 4. be able to nest
>
> While (3) was technically achieved, the problem we have is that
> (2) provides unprivileged users the ability to exercise kernel code
> which they previously could not.
The consequence of the refusal to give users any way to control whether
or not user namespaces are available to unprivileged users is that a
non-significant number of distros still carry the same patch for about
10 years now that adds an unprivileged_userns_clone sysctl to restrict
them to privileged users. That includes current Debian and Archlinux btw.
The LSM hook is a simple way to allow administrators to control this and
will allow user namespaces to be enabled in scenarios where they
would otherwise not be accepted precisely because they are available to
unprivileged users.
I fully understand the motivation and usefulness in unprivileged
scenarios but it's an unfounded fear that giving users the ability to
control user namespace creation via an LSM hook will cause proliferation
of setuid binaries (Ignoring for a moment that any fully unprivileged
container with useful idmappings has to rely on the new{g,u}idmap setuid
binaries to setup useful mappings anyway.) or decrease system safety let
alone cause regressions (Which I don't think is an applicable term here
at all.). Distros that have unprivileged user namespaces turned on by
default are extremely unlikely to switch to an LSM profile that turns
them off and distros that already turn them off will continue to turn
them off whether or not that LSM hook is available.
It's much more likely that workloads that want to minimize their attack
surface while still getting the benefits of user namespaces for e.g.
service isolation will feel comfortable enabling them for the first time
since they can control them via an LSM profile.
On Mon, Aug 29, 2022 at 05:33:04PM +0200, Christian Brauner wrote:
> On Fri, Aug 26, 2022 at 04:00:39PM -0500, Serge Hallyn wrote:
> > On Fri, Aug 26, 2022 at 05:00:51PM +0000, Song Liu wrote:
> > >
> > >
> > > > On Aug 26, 2022, at 8:24 AM, Serge E. Hallyn <[email protected]> wrote:
> > > >
> > > > On Thu, Aug 25, 2022 at 09:58:46PM +0000, Song Liu wrote:
> > > >>
> > > >>
> > > >>> On Aug 25, 2022, at 12:19 PM, Paul Moore <[email protected]> wrote:
> > > >>>
> > > >>> On Thu, Aug 25, 2022 at 2:15 PM Eric W. Biederman <[email protected]> wrote:
> > > >>>> Paul Moore <[email protected]> writes:
> > > >>>>> On Fri, Aug 19, 2022 at 10:45 AM Serge E. Hallyn <[email protected]> wrote:
> > > >>>>>> I am hoping we can come up with
> > > >>>>>> "something better" to address people's needs, make everyone happy, and
> > > >>>>>> bring forth world peace. Which would stack just fine with what's here
> > > >>>>>> for defense in depth.
> > > >>>>>>
> > > >>>>>> You may well not be interested in further work, and that's fine. I need
> > > >>>>>> to set aside a few days to think on this.
> > > >>>>>
> > > >>>>> I'm happy to continue the discussion as long as it's constructive; I
> > > >>>>> think we all are. My gut feeling is that Frederick's approach falls
> > > >>>>> closest to the sweet spot of "workable without being overly offensive"
> > > >>>>> (*cough*), but if you've got an additional approach in mind, or an
> > > >>>>> alternative approach that solves the same use case problems, I think
> > > >>>>> we'd all love to hear about it.
> > > >>>>
> > > >>>> I would love to actually hear the problems people are trying to solve so
> > > >>>> that we can have a sensible conversation about the trade offs.
> > > >>>
> > > >>> Here are several taken from the previous threads, it's surely not a
> > > >>> complete list, but it should give you a good idea:
> > > >>>
> > > >>> https://lore.kernel.org/linux-security-module/CAHC9VhQnPAsmjmKo-e84XDJ1wmaOFkTKPjjztsOa9Yrq+AeAQA@mail.gmail.com/
> > > >>>
> > > >>>> As best I can tell without more information people want to use
> > > >>>> the creation of a user namespace as a signal that the code is
> > > >>>> attempting an exploit.
> > > >>>
> > > >>> Some use cases are like that, there are several other use cases that
> > > >>> go beyond this; see all of our previous discussions on this
> > > >>> topic/patchset. As has been mentioned before, there are use cases
> > > >>> that require improved observability, access control, or both.
> > > >>>
> > > >>>> As such let me propose instead of returning an error code which will let
> > > >>>> the exploit continue, have the security hook return a bool. With true
> > > >>>> meaning the code can continue and on false it will trigger using SIGSYS
> > > >>>> to terminate the program like seccomp does.
> > > >>>
> > > >>> Having the kernel forcibly exit the process isn't something that most
> > > >>> LSMs would likely want. I suppose we could modify the hook/caller so
> > > >>> that *if* an LSM wanted to return SIGSYS the system would kill the
> > > >>> process, but I would want that to be something in addition to
> > > >>> returning an error code like LSMs normally do (e.g. EACCES).
> > > >>
> > > >> I am new to user_namespace and security work, so please pardon me if
> > > >> anything below is very wrong.
> > > >>
> > > >> IIUC, user_namespace is a tool that enables trusted userspace code to
> > > >> control the behavior of untrusted (or less trusted) userspace code.
> > > >
> > > > No. user namespaces are not a way for more trusted code to control the
> > > > behavior of less trusted code.
> > >
> > > Hmm.. In this case, I think I really need to learn more.
> > >
> > > Thanks for pointing out my misunderstanding.
> >
> > (I thought maybe Eric would chime in with a better explanation, but I'll
> > fill it in for now :)
> >
> > One of the main goals of user namespaces is to allow unprivileged users
> > to do things like chroot and mount, which are very useful development
> > tools, without needing admin privileges. So it's almost the opposite
> > of what you said: rather than to enable trusted userspace code to control
> > the behavior of less trusted code, it's to allow less privileged code to
> > do things which do not affect other users, without having to assume *more*
> > privilege.
> >
> > To be precise, the goals were:
> >
> > 1. uid mapping - allow two users to both "use uid 500" without conflicting
> > 2. provide (unprivileged) users privilege over their own resources
> > 3. absolutely no extra privilege over other resources
> > 4. be able to nest
> >
> > While (3) was technically achieved, the problem we have is that
> > (2) provides unprivileged users the ability to exercise kernel code
> > which they previously could not.
>
> The consequence of the refusal to give users any way to control whether
> or not user namespaces are available to unprivileged users is that a
> non-significant number of distros still carry the same patch for about
> 10 years now that adds an unprivileged_userns_clone sysctl to restrict
> them to privileged users. That includes current Debian and Archlinux btw.
Hi Christian,
I'm wondering about your placement of this argument in the thread, and whether
you interpreted what I said above as an argument against this patchset, or
whether you're just expanding on what I said.
> The LSM hook is a simple way to allow administrators to control this and
(I think the "control" here is suboptimal, but I've not seen - nor
conceived of - anything better as of yet)
> will allow user namespaces to be enabled in scenarios where they
> would otherwise not be accepted precisely because they are available to
> unprivileged users.
>
> I fully understand the motivation and usefulness in unprivileged
> scenarios but it's an unfounded fear that giving users the ability to
> control user namespace creation via an LSM hook will cause proliferation
> of setuid binaries (Ignoring for a moment that any fully unprivileged
> container with useful idmappings has to rely on the new{g,u}idmap setuid
> binaries to setup useful mappings anyway.) or decrease system safety let
> alone cause regressions (Which I don't think is an applicable term here
> at all.). Distros that have unprivileged user namespaces turned on by
> default are extremely unlikely to switch to an LSM profile that turns
> them off and distros that already turn them off will continue to turn
> them off whether or not that LSM hook is available.
>
> It's much more likely that workloads that want to minimize their attack
> surface while still getting the benefits of user namespaces for e.g.
> service isolation will feel comfortable enabling them for the first time
> since they can control them via an LSM profile.