2022-08-31 23:36:36

by Seth Jenkins

[permalink] [raw]
Subject: fsconfig parsing bugs

The codebase-wide refactor efforts to using the latest fs mounting API
(with support for fsopen/fsconfig/fsmount etc.) have introduced some
bugs into mount configuration parsing in several parse_param handlers,
most notably shmem_parse_one() which can be accessed from a userns.
There are several cases where the following code pattern is used:

ctx->value = <expression>
if(ctx->value is invalid)
goto fail;
ctx->seen |= SHMEM_SEEN_X;
break;

However, this coding pattern does not work in the case where multiple
fsconfig calls are made. For example, if I were to call fsconfig with
the key "nr_blocks" twice, the first time with a valid value, and the
second time with an invalid value, the invalid value will be persisted
and used upon creation of the mount for the value of ctx->blocks, and
consequently for sbinfo->max_blocks.

This code pattern is used for Opt_nr_blocks, Opt_nr_inodes, Opt_uid,
Opt_gid and Opt_huge. Probably the proper thing to do is to check for
validity before assigning the value to the shmem_options struct in the
fs_context.

We also see this code pattern replicated throughout other filesystems
for uid/gid resolution, including hugetlbfs, FUSE, ntfs3 and ffs.

The other outstanding issue I noticed comes from the fact that
fsconfig syscalls may occur in a different userns than that which
called fsopen. That means that resolving the uid/gid via
current_user_ns() can save a kuid that isn't mapped in the associated
namespace when the filesystem is finally mounted. This means that it
is possible for an unprivileged user to create files owned by any
group in a tmpfs mount (since we can set the SUID bit on the tmpfs
directory), or a tmpfs that is owned by any user, including the root
group/user. This is probably outside the original intention of this
code.

The fix for this bug is not quite so simple as the others. The options
that I've assessed are:

- Resolve the kuid/kgid via the fs_context namespace - this does
however mean that any task outside the fsopen'ing userns that tries to
set the uid/gid of a tmpfs will have to know that the uid/gid will be
resolved by a different namespace than that which the current task is
in. It also subtly changes the behavior of this specific subsystem in
a userland visible way.
- Globally disallow fsconfig calls originating from outside the
fs_context userns - This is a more robust solution that would prevent
any similar bugs, but it may impinge on valid mount use-cases. It's
the best from a security standpoint and if it's determined that it was
not in the original intention to be juggling user/mount namespaces
this way, it's probably the ideal solution.
- Throw EINVAL if the kuid specified cannot be mapped in the mounting
userns (and/or potentially in the fs_context userns) - This is
probably the solution that remains most faithful to all potential
use-cases, but it doesn't reduce the potential for variants in the
future in other parts of the codebase and it also introduces some
slight derivative logic bug risk.
- Don't resolve the uid/gid specified in fsconfig at all, and resolve
it during mount-time when calling an associated fill_super. This is
precedented and used in other parts of the codebase, but specificity
is lost in the final error case since an end-user cannot easily
attribute a mount failure to an unmappable uid.

I've also attached a PoC for this bug that demonstrates that an
unprivileged user can create files/directories with root uid/gid's.
There is no deadline for this issue as we can't see any obvious way to
cross a privilege boundary with this.

Thanks in advance!
--

Seth Jenkins
Information Security Engineer
Google Project Zero
[email protected]


Attachments:
main.c (3.63 kB)

2022-09-01 15:19:48

by Christian Brauner

[permalink] [raw]
Subject: Re: fsconfig parsing bugs

On Wed, Aug 31, 2022 at 04:12:21PM -0700, Seth Jenkins wrote:
> The codebase-wide refactor efforts to using the latest fs mounting API
> (with support for fsopen/fsconfig/fsmount etc.) have introduced some
> bugs into mount configuration parsing in several parse_param handlers,
> most notably shmem_parse_one() which can be accessed from a userns.
> There are several cases where the following code pattern is used:
>
> ctx->value = <expression>
> if(ctx->value is invalid)
> goto fail;
> ctx->seen |= SHMEM_SEEN_X;
> break;
>
> However, this coding pattern does not work in the case where multiple
> fsconfig calls are made. For example, if I were to call fsconfig with
> the key "nr_blocks" twice, the first time with a valid value, and the
> second time with an invalid value, the invalid value will be persisted
> and used upon creation of the mount for the value of ctx->blocks, and
> consequently for sbinfo->max_blocks.
>
> This code pattern is used for Opt_nr_blocks, Opt_nr_inodes, Opt_uid,
> Opt_gid and Opt_huge. Probably the proper thing to do is to check for
> validity before assigning the value to the shmem_options struct in the
> fs_context.
>
> We also see this code pattern replicated throughout other filesystems
> for uid/gid resolution, including hugetlbfs, FUSE, ntfs3 and ffs.
>
> The other outstanding issue I noticed comes from the fact that
> fsconfig syscalls may occur in a different userns than that which
> called fsopen. That means that resolving the uid/gid via
> current_user_ns() can save a kuid that isn't mapped in the associated
> namespace when the filesystem is finally mounted. This means that it
> is possible for an unprivileged user to create files owned by any
> group in a tmpfs mount (since we can set the SUID bit on the tmpfs
> directory), or a tmpfs that is owned by any user, including the root
> group/user. This is probably outside the original intention of this
> code.
>
> The fix for this bug is not quite so simple as the others. The options
> that I've assessed are:
>
> - Resolve the kuid/kgid via the fs_context namespace - this does
> however mean that any task outside the fsopen'ing userns that tries to
> set the uid/gid of a tmpfs will have to know that the uid/gid will be
> resolved by a different namespace than that which the current task is
> in. It also subtly changes the behavior of this specific subsystem in
> a userland visible way.
> - Globally disallow fsconfig calls originating from outside the
> fs_context userns - This is a more robust solution that would prevent
> any similar bugs, but it may impinge on valid mount use-cases. It's
> the best from a security standpoint and if it's determined that it was
> not in the original intention to be juggling user/mount namespaces
> this way, it's probably the ideal solution.
> - Throw EINVAL if the kuid specified cannot be mapped in the mounting
> userns (and/or potentially in the fs_context userns) - This is
> probably the solution that remains most faithful to all potential
> use-cases, but it doesn't reduce the potential for variants in the
> future in other parts of the codebase and it also introduces some
> slight derivative logic bug risk.
> - Don't resolve the uid/gid specified in fsconfig at all, and resolve
> it during mount-time when calling an associated fill_super. This is
> precedented and used in other parts of the codebase, but specificity
> is lost in the final error case since an end-user cannot easily
> attribute a mount failure to an unmappable uid.
>
> I've also attached a PoC for this bug that demonstrates that an
> unprivileged user can create files/directories with root uid/gid's.
> There is no deadline for this issue as we can't see any obvious way to
> cross a privilege boundary with this.
>
> Thanks in advance!

I'm involved in 2 large projects that make use of the new mount api LXC
and CRIU. None of them call fsconfig() outside of the target user
namespace. util-linux mount(2) does not yet use the new mount api and so
can't be affected either but will in maybe even the next release.
Additionally, glibc 2.36 is the first glibc with support for the new
mount api which just released. So all users before that users would have
to write their own system call wrappers so I think we have some liberty
here.

I think this is too much of a restriction to require that fsopen() and
fsconfig() userns must match in order to set options. It is pretty handy
to be able to set mount options outside of fc->user_ns. And we'd
definitely want to make use of this in the future.

So ideally, we just switch all filesystems that are mountable in userns
over to use fc->user_ns. There's not really a big regression risk here
because it's not used in userns workloads widely today. Taking a close
look, the affected filesystems are devpts and tmpfs. Having them rely on
fc->user_ns aligns them with how fuse does it today.

2023-04-05 14:12:30

by Christian Brauner

[permalink] [raw]
Subject: Re: fsconfig parsing bugs

On Thu, Sep 01, 2022 at 04:50:28PM +0200, Christian Brauner wrote:
> On Wed, Aug 31, 2022 at 04:12:21PM -0700, Seth Jenkins wrote:
> > The codebase-wide refactor efforts to using the latest fs mounting API
> > (with support for fsopen/fsconfig/fsmount etc.) have introduced some
> > bugs into mount configuration parsing in several parse_param handlers,
> > most notably shmem_parse_one() which can be accessed from a userns.
> > There are several cases where the following code pattern is used:
> >
> > ctx->value = <expression>
> > if(ctx->value is invalid)
> > goto fail;
> > ctx->seen |= SHMEM_SEEN_X;
> > break;
> >
> > However, this coding pattern does not work in the case where multiple
> > fsconfig calls are made. For example, if I were to call fsconfig with
> > the key "nr_blocks" twice, the first time with a valid value, and the
> > second time with an invalid value, the invalid value will be persisted
> > and used upon creation of the mount for the value of ctx->blocks, and
> > consequently for sbinfo->max_blocks.
> >
> > This code pattern is used for Opt_nr_blocks, Opt_nr_inodes, Opt_uid,
> > Opt_gid and Opt_huge. Probably the proper thing to do is to check for
> > validity before assigning the value to the shmem_options struct in the
> > fs_context.
> >
> > We also see this code pattern replicated throughout other filesystems
> > for uid/gid resolution, including hugetlbfs, FUSE, ntfs3 and ffs.
> >
> > The other outstanding issue I noticed comes from the fact that
> > fsconfig syscalls may occur in a different userns than that which
> > called fsopen. That means that resolving the uid/gid via
> > current_user_ns() can save a kuid that isn't mapped in the associated
> > namespace when the filesystem is finally mounted. This means that it
> > is possible for an unprivileged user to create files owned by any
> > group in a tmpfs mount (since we can set the SUID bit on the tmpfs
> > directory), or a tmpfs that is owned by any user, including the root
> > group/user. This is probably outside the original intention of this
> > code.
> >
> > The fix for this bug is not quite so simple as the others. The options
> > that I've assessed are:
> >
> > - Resolve the kuid/kgid via the fs_context namespace - this does
> > however mean that any task outside the fsopen'ing userns that tries to
> > set the uid/gid of a tmpfs will have to know that the uid/gid will be
> > resolved by a different namespace than that which the current task is
> > in. It also subtly changes the behavior of this specific subsystem in
> > a userland visible way.
> > - Globally disallow fsconfig calls originating from outside the
> > fs_context userns - This is a more robust solution that would prevent
> > any similar bugs, but it may impinge on valid mount use-cases. It's
> > the best from a security standpoint and if it's determined that it was
> > not in the original intention to be juggling user/mount namespaces
> > this way, it's probably the ideal solution.
> > - Throw EINVAL if the kuid specified cannot be mapped in the mounting
> > userns (and/or potentially in the fs_context userns) - This is
> > probably the solution that remains most faithful to all potential
> > use-cases, but it doesn't reduce the potential for variants in the
> > future in other parts of the codebase and it also introduces some
> > slight derivative logic bug risk.
> > - Don't resolve the uid/gid specified in fsconfig at all, and resolve
> > it during mount-time when calling an associated fill_super. This is
> > precedented and used in other parts of the codebase, but specificity
> > is lost in the final error case since an end-user cannot easily
> > attribute a mount failure to an unmappable uid.
> >
> > I've also attached a PoC for this bug that demonstrates that an
> > unprivileged user can create files/directories with root uid/gid's.
> > There is no deadline for this issue as we can't see any obvious way to
> > cross a privilege boundary with this.
> >
> > Thanks in advance!
>
> I'm involved in 2 large projects that make use of the new mount api LXC
> and CRIU. None of them call fsconfig() outside of the target user
> namespace. util-linux mount(2) does not yet use the new mount api and so
> can't be affected either but will in maybe even the next release.
> Additionally, glibc 2.36 is the first glibc with support for the new
> mount api which just released. So all users before that users would have
> to write their own system call wrappers so I think we have some liberty
> here.
>
> I think this is too much of a restriction to require that fsopen() and
> fsconfig() userns must match in order to set options. It is pretty handy
> to be able to set mount options outside of fc->user_ns. And we'd
> definitely want to make use of this in the future.
>
> So ideally, we just switch all filesystems that are mountable in userns
> over to use fc->user_ns. There's not really a big regression risk here
> because it's not used in userns workloads widely today. Taking a close
> look, the affected filesystems are devpts and tmpfs. Having them rely on
> fc->user_ns aligns them with how fuse does it today.

Seth, did you plan on sending a patch for this? I'd like to get this
sorted now that more projects are starting to make use of the new mount
api.