Idmapped mounts may have security implications [1] and have
no knobs to be disallowed at runtime or compile time.
This patch adds a sysctl and a config option to set its default value.
[1] https://lore.kernel.org/all/[email protected]/
Based on work from Alexey Gladkov <[email protected]>.
Signed-off-by: Anton V. Boyarshinov <[email protected]>
---
Documentation/admin-guide/sysctl/fs.rst | 12 ++++++++++++
fs/Kconfig | 8 ++++++++
fs/namespace.c | 21 ++++++++++++++++++++-
3 files changed, 40 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/sysctl/fs.rst b/Documentation/admin-guide/sysctl/fs.rst
index 2a501c9ddc55..f758c4ae5f66 100644
--- a/Documentation/admin-guide/sysctl/fs.rst
+++ b/Documentation/admin-guide/sysctl/fs.rst
@@ -105,6 +105,18 @@ you have some awesome number of simultaneous system users,
you might want to raise the limit.
+idmap_mounts
+------------
+
+Idmapped mounts may have security implications.
+This knob controls whether creation of idmapped mounts is allowed.
+When set to "1", creation of idmapped mounts is allowed.
+When set to "0", creation of idmapped mounts is not allowed.
+
+The default value is
+* 0, if ``IDMAP_MOUNTS_DEFAULT_OFF`` is enabled in the kernel configuration;
+* 1, otherwise.
+
file-max & file-nr
------------------
diff --git a/fs/Kconfig b/fs/Kconfig
index 7a2b11c0b803..d2203ba0183d 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -385,4 +385,12 @@ source "fs/unicode/Kconfig"
config IO_WQ
bool
+config IDMAP_MOUNTS_DEFAULT_OFF
+ bool "Disallow idmappad mounts by default"
+ help
+ Idmapped mounts may have security implications.
+ Enable this to disallow idmapped mounts by setting
+ the default value of /proc/sys/fs/idmap_mounts to 0.
+
+
endmenu
diff --git a/fs/namespace.c b/fs/namespace.c
index 40b994a29e90..66501ad75537 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -39,6 +39,10 @@
/* Maximum number of mounts in a mount namespace */
static unsigned int sysctl_mount_max __read_mostly = 100000;
+/* Whether idmapped mounts are allowed. */
+static int sysctl_idmap_mounts __read_mostly =
+ IS_ENABLED(CONFIG_IDMAP_MOUNTS_DEFAULT_OFF) ? 0 : 1;
+
static unsigned int m_hash_mask __read_mostly;
static unsigned int m_hash_shift __read_mostly;
static unsigned int mp_hash_mask __read_mostly;
@@ -3965,7 +3969,13 @@ static int can_idmap_mount(const struct mount_kattr *kattr, struct mount *mnt)
if (!is_anon_ns(mnt->mnt_ns))
return -EINVAL;
- return 0;
+ /* So far, there are concerns about the safety of idmaps. */
+ if (!sysctl_idmap_mounts) {
+ pr_warn_once("VFS: idmapped mounts are not allowed.\n");
+ return -EPERM;
+ } else {
+ return 0;
+ }
}
static struct mount *mount_setattr_prepare(struct mount_kattr *kattr,
@@ -4631,6 +4641,15 @@ static struct ctl_table fs_namespace_sysctls[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ONE,
},
+ {
+ .procname = "idmap_mounts",
+ .data = &sysctl_idmap_mounts,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE,
+ },
{ }
};
--
2.33.0
On Sat, 2022-02-05 at 10:57 +0300, Anton V. Boyarshinov wrote:
> В Fri, 4 Feb 2022 16:10:32 +0100
> Christian Brauner <[email protected]> пишет:
>
>
> > > It turns off much more than idmapped mounts only. More fine
> > > grained control seems better for me.
> >
> > If you allow user namespaces and not idmapped mounts you haven't
> > reduced your attack surface.
>
> I have. And many other people have. People who have creating user
> namespaces by unpriviliged user disabled.
Which would defeat the purpose of user namespaces which is to allow the
creation of unprivileged containers by anyone and allow us to reduce
the container attack surface by reducing the actual privilege given to
some real world containers.
You've raised vague, unactionable security concerns about this, but
basically one of the jobs of user namespaces is to take some designated
features guarded by CAP_SYS_ADMIN and give the admin of the namespace
(the unprivileged user) access to them. There are always going to be
vague security concerns about doing this. If you had an actual,
actionable concern, we could fix it. What happens without this is that
containers that need the functionality now have to run with real root
inside, which is a massively bigger security problem.
Adding knobs to disable features for unactionable security concerns
gives a feel good in terms of security theatre, but it causes system
unpredictability in that any given application now has to check if a
feature is usable before it uses it and figure out what to do if it
isn't available. The more we do it, the bigger the combinatoric
explosion of possible missing features and every distro ends up having
a different default combination.
The bottom line is it's much better to find and fix actual security
bugs than create a runtime configuration nightmare.
> I find it sad that we have no tool in mainline kernel to limit users
> access to creating user namespaces except complete disabling them.
> But many distros have that tools. Different tools with different
> interfaces and semantics :(
Have you actually proposed something? A more granular approach to
globally disabling user namespaces might be acceptable provided it
doesn't lead to a feature configuration explosion.
James
В Fri, 4 Feb 2022 16:10:32 +0100
Christian Brauner <[email protected]> пишет:
> > It turns off much more than idmapped mounts only. More fine grained
> > control seems better for me.
>
> If you allow user namespaces and not idmapped mounts you haven't reduced
> your attack surface.
I have. And many other people have. People who have creating user
namespaces by unpriviliged user disabled. I find it sad that we have no
tool in mainline kernel to limit users access to creating user
namespaces except complete disabling them. But many distros have that
tools. Different tools with different interfaces and semantics :(
And at least one major GNU/Linux distro disabled idmapped mounts
unconditionally. If I were the author of this functionality, I would
prefer to have a knob then have it unavailible for for so many users. But as you wish.
> An unprivileged user can reach much more
> exploitable code simply via unshare -user --map-root -mount which we
> still allow upstream without a second thought even with all the past and
> present exploits (see
> https://www.openwall.com/lists/oss-security/2022/01/29/1 for a current
> one from this January).
>
> >
> > > They can neither
> > > be created as an unprivileged user nor can they be created inside user
> > > namespaces.
> >
> > But actions of fully privileged user can open non-obvious ways to
> > privilege escalation.
>
> A fully privileged user potentially being able to cause issues is really
> not an argument; especially not for a new sysctl.
> You need root to create idmapped mounts and you need root to turn off
> the new knob.
>
> It also trivially applies to a whole slew of even basic kernel tunables
> basically everything that can be reached by unprivileged users after a
> privileged user has turned it on or configured it.
>
> After 2 years we haven't seen any issue with this code and while I'm not
> promising that there won't ever be issues - nobody can do that - the
> pure suspicion that there could be some is not a justification for
> anything.