2024-04-24 10:53:16

by stsp

[permalink] [raw]
Subject: [PATCH v4 0/2] implement OA2_INHERIT_CRED flag for openat2()

This patch-set implements the OA2_INHERIT_CRED flag for openat2() syscall.
It is needed to perform an open operation with the creds that were in
effect when the dir_fd was opened. This allows the process to pre-open
some dirs and switch eUID (and other UIDs/GIDs) to the less-privileged
user, while still retaining the possibility to open/create files within
the pre-opened directory set.

The sand-boxing is security-oriented: symlinks leading outside of a
sand-box are rejected. /proc magic links are rejected.
The more detailed description (including security considerations)
is available in the log messages of individual patches.

Changes in v4:
- add optimizations suggested by David Laight <[email protected]>
- move security checks to build_open_flags()
- force RESOLVE_NO_MAGICLINKS as suggested by Andy Lutomirski <[email protected]>

Changes in v3:
- partially revert v2 changes to avoid overriding capabilities.
Only the bare minimum is overridden: fsuid, fsgid and group_info.
Document the fact the full cred override is unwanted, as it may
represent an unneeded security risk.

Changes in v2:
- capture full struct cred instead of just fsuid/fsgid.
Suggested by Stefan Metzmacher <[email protected]>

CC: Stefan Metzmacher <[email protected]>
CC: Eric Biederman <[email protected]>
CC: Alexander Viro <[email protected]>
CC: Andy Lutomirski <[email protected]>
CC: Christian Brauner <[email protected]>
CC: Jan Kara <[email protected]>
CC: Jeff Layton <[email protected]>
CC: Chuck Lever <[email protected]>
CC: Alexander Aring <[email protected]>
CC: David Laight <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: Paolo Bonzini <[email protected]>
CC: Christian Göttsche <[email protected]>

--
2.44.0



2024-04-24 10:53:19

by stsp

[permalink] [raw]
Subject: [PATCH 1/2] fs: reorganize path_openat()

This patch moves the call to alloc_empty_file() below the call to
path_init(). That changes is needed for the next patch, which adds
a cred override for alloc_empty_file(). The needed cred info is only
available after the call to path_init().

No functional changes are intended by that patch.

Signed-off-by: Stas Sergeev <[email protected]>

CC: Eric Biederman <[email protected]>
CC: Alexander Viro <[email protected]>
CC: Christian Brauner <[email protected]>
CC: Jan Kara <[email protected]>
CC: Andy Lutomirski <[email protected]>
CC: David Laight <[email protected]>
CC: [email protected]
CC: [email protected]
---
fs/namei.c | 29 ++++++++++++++++++-----------
1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index c5b2a25be7d0..413eef134234 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3781,23 +3781,30 @@ static struct file *path_openat(struct nameidata *nd,
{
struct file *file;
int error;
+ u64 open_flags = op->open_flag;

- file = alloc_empty_file(op->open_flag, current_cred());
- if (IS_ERR(file))
- return file;
-
- if (unlikely(file->f_flags & __O_TMPFILE)) {
- error = do_tmpfile(nd, flags, op, file);
- } else if (unlikely(file->f_flags & O_PATH)) {
- error = do_o_path(nd, flags, file);
+ if (unlikely(open_flags & (__O_TMPFILE | O_PATH))) {
+ file = alloc_empty_file(open_flags, current_cred());
+ if (IS_ERR(file))
+ return file;
+ if (open_flags & __O_TMPFILE)
+ error = do_tmpfile(nd, flags, op, file);
+ else
+ error = do_o_path(nd, flags, file);
} else {
const char *s = path_init(nd, flags);
- while (!(error = link_path_walk(s, nd)) &&
- (s = open_last_lookups(nd, file, op)) != NULL)
- ;
+ file = alloc_empty_file(open_flags, current_cred());
+ error = PTR_ERR_OR_ZERO(file);
+ if (!error) {
+ while (!(error = link_path_walk(s, nd)) &&
+ (s = open_last_lookups(nd, file, op)) != NULL)
+ ;
+ }
if (!error)
error = do_open(nd, file, op);
terminate_walk(nd);
+ if (IS_ERR(file))
+ return file;
}
if (likely(!error)) {
if (likely(file->f_mode & FMODE_OPENED))
--
2.44.0


2024-04-24 10:53:43

by stsp

[permalink] [raw]
Subject: [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag

This flag performs the open operation with the fs credentials
(fsuid, fsgid, group_info) that were in effect when dir_fd was opened.
This allows the process to pre-open some directories and then
change eUID (and all other UIDs/GIDs) to a less-privileged user,
retaining the ability to open/create files within these directories.

Design goal:
The idea is to provide a very light-weight sandboxing, where the
process, without the use of any heavy-weight techniques like chroot
within namespaces, can restrict the access to the set of pre-opened
directories.
This patch is just a first step to such sandboxing. If things go
well, in the future the same extension can be added to more syscalls.
These should include at least unlinkat(), renameat2() and the
not-yet-upstreamed setxattrat().

Security considerations:
- Only the bare minimal set of credentials is overridden:
fsuid, fsgid and group_info. The rest, for example capabilities,
are not overridden to avoid unneeded security risks.
- To avoid sandboxing escape, this patch makes sure the restricted
lookup modes are used. Namely, RESOLVE_BENEATH or RESOLVE_IN_ROOT.
- To avoid leaking creds across exec, this patch requires O_CLOEXEC
flag on a directory.
- Magic /proc symlinks are discarded, as suggested by
Andy Lutomirski <[email protected]>

Use cases:
Virtual machines that deal with untrusted code, can use that
instead of a more heavy-weighted approaches.
Currently the approach is being tested on a dosemu2 VM.

Signed-off-by: Stas Sergeev <[email protected]>

CC: Stefan Metzmacher <[email protected]>
CC: Eric Biederman <[email protected]>
CC: Alexander Viro <[email protected]>
CC: Andy Lutomirski <[email protected]>
CC: Christian Brauner <[email protected]>
CC: Jan Kara <[email protected]>
CC: Jeff Layton <[email protected]>
CC: Chuck Lever <[email protected]>
CC: Alexander Aring <[email protected]>
CC: [email protected]
CC: [email protected]
CC: Paolo Bonzini <[email protected]>
CC: Christian Göttsche <[email protected]>
---
fs/internal.h | 2 +-
fs/namei.c | 56 ++++++++++++++++++++++++++++++++++--
fs/open.c | 10 ++++++-
include/linux/fcntl.h | 2 ++
include/uapi/linux/openat2.h | 3 ++
5 files changed, 69 insertions(+), 4 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index 7ca738904e34..692b53b19aad 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -169,7 +169,7 @@ static inline void sb_end_ro_state_change(struct super_block *sb)
* open.c
*/
struct open_flags {
- int open_flag;
+ u64 open_flag;
umode_t mode;
int acc_mode;
int intent;
diff --git a/fs/namei.c b/fs/namei.c
index 413eef134234..aeb9f504538e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -586,6 +586,9 @@ struct nameidata {
int dfd;
vfsuid_t dir_vfsuid;
umode_t dir_mode;
+ kuid_t dir_open_fsuid;
+ kgid_t dir_open_fsgid;
+ struct group_info *dir_open_groups;
} __randomize_layout;

#define ND_ROOT_PRESET 1
@@ -695,6 +698,8 @@ static void terminate_walk(struct nameidata *nd)
nd->depth = 0;
nd->path.mnt = NULL;
nd->path.dentry = NULL;
+ if (nd->dir_open_groups)
+ put_group_info(nd->dir_open_groups);
}

/* path_put is needed afterwards regardless of success or failure */
@@ -2414,6 +2419,9 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
get_fs_pwd(current->fs, &nd->path);
nd->inode = nd->path.dentry->d_inode;
}
+ nd->dir_open_fsuid = current_cred()->fsuid;
+ nd->dir_open_fsgid = current_cred()->fsgid;
+ nd->dir_open_groups = get_current_groups();
} else {
/* Caller must check execute permissions on the starting path component */
struct fd f = fdget_raw(nd->dfd);
@@ -2437,6 +2445,10 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
path_get(&nd->path);
nd->inode = nd->path.dentry->d_inode;
}
+ nd->dir_open_fsuid = f.file->f_cred->fsuid;
+ nd->dir_open_fsgid = f.file->f_cred->fsgid;
+ nd->dir_open_groups = get_group_info(
+ f.file->f_cred->group_info);
fdput(f);
}

@@ -3776,6 +3788,29 @@ static int do_o_path(struct nameidata *nd, unsigned flags, struct file *file)
return error;
}

+static const struct cred *openat2_override_creds(struct nameidata *nd)
+{
+ const struct cred *old_cred;
+ struct cred *override_cred;
+
+ override_cred = prepare_creds();
+ if (!override_cred)
+ return NULL;
+
+ override_cred->fsuid = nd->dir_open_fsuid;
+ override_cred->fsgid = nd->dir_open_fsgid;
+ override_cred->group_info = nd->dir_open_groups;
+
+ override_cred->non_rcu = 1;
+
+ old_cred = override_creds(override_cred);
+
+ /* override_cred() gets its own ref */
+ put_cred(override_cred);
+
+ return old_cred;
+}
+
static struct file *path_openat(struct nameidata *nd,
const struct open_flags *op, unsigned flags)
{
@@ -3793,8 +3828,23 @@ static struct file *path_openat(struct nameidata *nd,
error = do_o_path(nd, flags, file);
} else {
const char *s = path_init(nd, flags);
- file = alloc_empty_file(open_flags, current_cred());
- error = PTR_ERR_OR_ZERO(file);
+ const struct cred *old_cred = NULL;
+
+ error = 0;
+ if (open_flags & OA2_INHERIT_CRED) {
+ /* Only work with O_CLOEXEC dirs. */
+ if (!get_close_on_exec(nd->dfd))
+ error = -EPERM;
+
+ if (!error)
+ old_cred = openat2_override_creds(nd);
+ }
+ if (!error) {
+ file = alloc_empty_file(open_flags, current_cred());
+ error = PTR_ERR_OR_ZERO(file);
+ } else {
+ file = ERR_PTR(error);
+ }
if (!error) {
while (!(error = link_path_walk(s, nd)) &&
(s = open_last_lookups(nd, file, op)) != NULL)
@@ -3802,6 +3852,8 @@ static struct file *path_openat(struct nameidata *nd,
}
if (!error)
error = do_open(nd, file, op);
+ if (old_cred)
+ revert_creds(old_cred);
terminate_walk(nd);
if (IS_ERR(file))
return file;
diff --git a/fs/open.c b/fs/open.c
index ee8460c83c77..c871ff8fc6e3 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1225,7 +1225,7 @@ inline int build_open_flags(const struct open_how *how, struct open_flags *op)
* values before calling build_open_flags(), but openat2(2) checks all
* of its arguments.
*/
- if (flags & ~VALID_OPEN_FLAGS)
+ if (flags & ~VALID_OPENAT2_FLAGS)
return -EINVAL;
if (how->resolve & ~VALID_RESOLVE_FLAGS)
return -EINVAL;
@@ -1326,6 +1326,14 @@ inline int build_open_flags(const struct open_how *how, struct open_flags *op)
lookup_flags |= LOOKUP_CACHED;
}

+ if (flags & OA2_INHERIT_CRED) {
+ /* Inherit creds only with scoped look-up modes. */
+ if (!(lookup_flags & LOOKUP_IS_SCOPED))
+ return -EPERM;
+ /* Reject /proc "magic" links if inheriting creds. */
+ lookup_flags |= LOOKUP_NO_MAGICLINKS;
+ }
+
op->lookup_flags = lookup_flags;
return 0;
}
diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h
index a332e79b3207..b71f8b162102 100644
--- a/include/linux/fcntl.h
+++ b/include/linux/fcntl.h
@@ -12,6 +12,8 @@
FASYNC | O_DIRECT | O_LARGEFILE | O_DIRECTORY | O_NOFOLLOW | \
O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE)

+#define VALID_OPENAT2_FLAGS (VALID_OPEN_FLAGS | OA2_INHERIT_CRED)
+
/* List of all valid flags for the how->resolve argument: */
#define VALID_RESOLVE_FLAGS \
(RESOLVE_NO_XDEV | RESOLVE_NO_MAGICLINKS | RESOLVE_NO_SYMLINKS | \
diff --git a/include/uapi/linux/openat2.h b/include/uapi/linux/openat2.h
index a5feb7604948..cdd676a10b62 100644
--- a/include/uapi/linux/openat2.h
+++ b/include/uapi/linux/openat2.h
@@ -40,4 +40,7 @@ struct open_how {
return -EAGAIN if that's not
possible. */

+/* openat2-specific flags go to upper 4 bytes. */
+#define OA2_INHERIT_CRED (1ULL << 32)
+
#endif /* _UAPI_LINUX_OPENAT2_H */
--
2.44.0


2024-04-24 16:39:36

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v4 0/2] implement OA2_INHERIT_CRED flag for openat2()

On Wed, Apr 24, 2024 at 01:52:46PM +0300, Stas Sergeev wrote:
> This patch-set implements the OA2_INHERIT_CRED flag for openat2() syscall.
> It is needed to perform an open operation with the creds that were in
> effect when the dir_fd was opened. This allows the process to pre-open
> some dirs and switch eUID (and other UIDs/GIDs) to the less-privileged
> user, while still retaining the possibility to open/create files within
> the pre-opened directory set.
>
> The sand-boxing is security-oriented: symlinks leading outside of a
> sand-box are rejected. /proc magic links are rejected.
> The more detailed description (including security considerations)
> is available in the log messages of individual patches.
>
> Changes in v4:
> - add optimizations suggested by David Laight <[email protected]>
> - move security checks to build_open_flags()
> - force RESOLVE_NO_MAGICLINKS as suggested by Andy Lutomirski <[email protected]>
>
> Changes in v3:
> - partially revert v2 changes to avoid overriding capabilities.
> Only the bare minimum is overridden: fsuid, fsgid and group_info.
> Document the fact the full cred override is unwanted, as it may
> represent an unneeded security risk.
>
> Changes in v2:
> - capture full struct cred instead of just fsuid/fsgid.
> Suggested by Stefan Metzmacher <[email protected]>

This smells ripe enough to serve as an attack vector in non-obvious
ways. And in general this has the potential to confuse the hell out
unsuspecting userspace. They can now suddenly get sent such
special-sauce files such as this that they have no way of recognizing as
there's neither an FMODE_* flag nor is the OA2_* flag recorded so it's
not available in F_GETFL.

There's not even a way to restrict that new flag because no LSM ever
sees it. So that behavior might break LSM assumptions as well.

And it is effectively usable to steal credentials. If process A opens a
directory with uid/gid 0 then sends that directory fd via AF_UNIX or
something to process B then process B can inherit the uid/gid of process
A by specifying OA2_* with no way for process A to prevent this - not
even through an LSM.

The permission checking model that we have right now is already baroque.
I see zero reason to add more complexity for the sake of "lightweight
sandboxing". We have LSMs and namespaces for stuff like this.

NAK.

2024-04-24 17:50:53

by stsp

[permalink] [raw]
Subject: Re: [PATCH v4 0/2] implement OA2_INHERIT_CRED flag for openat2()

24.04.2024 19:09, Christian Brauner пишет:
> This smells ripe enough to serve as an attack vector in non-obvious
> ways. And in general this has the potential to confuse the hell out
> unsuspecting userspace.

Unsuspecting user-space will simply
not use this flag. What do you mean?


> They can now suddenly get sent such
> special-sauce files

There are no any special files.
This flag helps you to open a file on
which you currently have no perms
to open, but had those in the past.


> such as this that they have no way of recognizing as
> there's neither an FMODE_* flag nor is the OA2_* flag recorded so it's
> not available in F_GETFL.
>
> There's not even a way to restrict that new flag because no LSM ever
> sees it. So that behavior might break LSM assumptions as well.
>
> And it is effectively usable to steal credentials. If process A opens a
> directory with uid/gid 0 then sends that directory fd via AF_UNIX or
> something to process B then process B can inherit the uid/gid of process

No, it doesn't inherit anything.
The inheritance happens only for
a duration of an open() call, helping
open() to succeed. The creds are
reverted when open() completed.

The only theoretically possible attack
would be to open some file you'd never
intended to open. Also note that a
very minimal sed of creds is overridden:
fsuid, fsgid, groupinfo.

> A by specifying OA2_* with no way for process A to prevent this - not
> even through an LSM.

If process B doesn't use that flag, it
inherits nothing, no matter what process
A did or passed via a socket.
So an unaware process that doesn't
use that flag, is completely unaffected.

> The permission checking model that we have right now is already baroque.
> I see zero reason to add more complexity for the sake of "lightweight
> sandboxing". We have LSMs and namespaces for stuff like this.
>
> NAK.

I don't think it is fair to say NAK
without actually reading the patch
or asking its author for clarifications.
Even though you didn't ask, I provided
my clarifications above, as I find that
a polite action.


2024-04-25 02:31:46

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag

On Wed, Apr 24, 2024 at 01:52:48PM +0300, Stas Sergeev wrote:
> @@ -3793,8 +3828,23 @@ static struct file *path_openat(struct nameidata *nd,
> error = do_o_path(nd, flags, file);
> } else {
> const char *s = path_init(nd, flags);
> - file = alloc_empty_file(open_flags, current_cred());
> - error = PTR_ERR_OR_ZERO(file);
> + const struct cred *old_cred = NULL;
> +
> + error = 0;
> + if (open_flags & OA2_INHERIT_CRED) {
> + /* Only work with O_CLOEXEC dirs. */
> + if (!get_close_on_exec(nd->dfd))
> + error = -EPERM;
> +
> + if (!error)
> + old_cred = openat2_override_creds(nd);
> + }
> + if (!error) {
> + file = alloc_empty_file(open_flags, current_cred());

Consider the following, currently absolutely harmless situation:
* process is owned by luser:students.
* descriptor 69 refers to root-opened root directory (O_RDONLY)
What's the expected result of
fcntl(69, F_SEFTD, O_CLOEXEC);
opening "etc/shadow" with dirfd equal to 69 and your flag given
subsequent read() from the resulting descriptor?

At which point will the kernel say "go fuck yourself, I'm not letting you
read that file", provided that attacker passes that new flag of yours?

As a bonus question, how about opening it for _write_, seeing that this
is an obvious instant roothole?

Again, currently the setup that has a root-opened directory in descriptor
table of a non-root process is safe.

Incidentally, suppose you have the same process run with stdin opened
(r/o) by root. F_SETFD it to O_CLOEXEC, then use your open with
dirfd being 0, pathname - "" and flags - O_RDWR.

AFAICS, without an explicit opt-in by the original opener it's
a non-starter, and TBH I doubt that even with such opt-in (FMODE_CRED,
whatever) it would be a good idea - it gives too much.

NAKed-by: Al Viro <[email protected]>

2024-04-25 07:24:59

by stsp

[permalink] [raw]
Subject: Re: [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag

25.04.2024 05:31, Al Viro пишет:
> Consider the following, currently absolutely harmless situation:
> * process is owned by luser:students.
> * descriptor 69 refers to root-opened root directory (O_RDONLY)
> What's the expected result of
> fcntl(69, F_SEFTD, O_CLOEXEC);
> opening "etc/shadow" with dirfd equal to 69 and your flag given
> subsequent read() from the resulting descriptor?
>
> At which point will the kernel say "go fuck yourself, I'm not letting you
> read that file", provided that attacker passes that new flag of yours?
>
> As a bonus question, how about opening it for _write_, seeing that this
> is an obvious instant roothole?
>
> Again, currently the setup that has a root-opened directory in descriptor
> table of a non-root process is safe.
>
> Incidentally, suppose you have the same process run with stdin opened
> (r/o) by root. F_SETFD it to O_CLOEXEC, then use your open with
> dirfd being 0, pathname - "" and flags - O_RDWR.

Ok, F_SETFD, how simple. :(

> AFAICS, without an explicit opt-in by the original opener it's
> a non-starter, and TBH I doubt that even with such opt-in (FMODE_CRED,
> whatever) it would be a good idea - it gives too much.
Yes, which is why I am quite sceptical
to this FMODE_CRED idea.

Please note that my O_CLOEXEC check
actually meant to check that exactly this
process have opened the dir. It just didn't
happen that way, as you pointed.
Can I replace the O_CLOEXEC check with
some explicit check that makes sure the
fd was opened by exactly that process?

2024-04-25 08:17:34

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 1/2] fs: reorganize path_openat()



Hello,

kernel test robot noticed "BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h" on:

commit: 831d3c6cc6f05873e33f4aaebafbb9c27618ea0b ("[PATCH 1/2] fs: reorganize path_openat()")
url: https://github.com/intel-lab-lkp/linux/commits/Stas-Sergeev/fs-reorganize-path_openat/20240424-185527
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 9d1ddab261f3e2af7c384dc02238784ce0cf9f98
patch link: https://lore.kernel.org/all/[email protected]/
patch subject: [PATCH 1/2] fs: reorganize path_openat()

in testcase: boot

compiler: clang-17
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


+-------------------------------------------------------------------------------+------------+------------+
| | 9d1ddab261 | 831d3c6cc6 |
+-------------------------------------------------------------------------------+------------+------------+
| boot_successes | 6 | 0 |
| boot_failures | 0 | 6 |
| BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h | 0 | 6 |
+-------------------------------------------------------------------------------+------------+------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]


[ 0.591465][ T33] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:315
[ 0.592508][ T33] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 33, name: kworker/u8:2
[ 0.593515][ T33] preempt_count: 0, expected: 0
[ 0.594071][ T33] RCU nest depth: 1, expected: 0
[ 0.594633][ T33] CPU: 0 PID: 33 Comm: kworker/u8:2 Not tainted 6.9.0-rc5-00037-g831d3c6cc6f0 #1
[ 0.595637][ T33] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 0.596216][ T33] Workqueue: async async_run_entry_fn
[ 0.596216][ T33] Call Trace:
[ 0.596216][ T33] <TASK>
[ 0.596216][ T33] dump_stack_lvl (lib/dump_stack.c:116)
[ 0.596216][ T33] __might_resched (kernel/sched/core.c:10198)
[ 0.596216][ T33] kmem_cache_alloc (include/linux/kernel.h:73 include/linux/sched/mm.h:315 mm/slub.c:3746 mm/slub.c:3827 mm/slub.c:3852)
[ 0.596216][ T33] alloc_empty_file (fs/file_table.c:203)
[ 0.596216][ T33] path_openat (fs/namei.c:3796)
[ 0.596216][ T33] do_filp_open (fs/namei.c:3833)
[ 0.596216][ T33] file_open_name (fs/open.c:1352)
[ 0.596216][ T33] filp_open (fs/open.c:1371)
[ 0.596216][ T33] do_name (init/initramfs.c:373)
[ 0.596216][ T33] flush_buffer (init/initramfs.c:452 init/initramfs.c:464)
[ 0.596216][ T33] ? __pfx_flush_buffer (init/initramfs.c:458)
[ 0.596216][ T33] __gunzip (lib/decompress_inflate.c:161)
[ 0.596216][ T33] ? __pfx_nofill (lib/decompress_inflate.c:37)
[ 0.596216][ T33] unpack_to_rootfs (init/initramfs.c:520)
[ 0.596216][ T33] ? __pfx_error (init/initramfs.c:59)
[ 0.596216][ T33] do_populate_rootfs (init/initramfs.c:714)
[ 0.596216][ T33] async_run_entry_fn (kernel/async.c:136)
[ 0.596216][ T33] process_scheduled_works (kernel/workqueue.c:3259)
[ 0.596216][ T33] worker_thread (include/linux/list.h:373 kernel/workqueue.c:955 kernel/workqueue.c:3417)
[ 0.596216][ T33] ? __pfx_worker_thread (kernel/workqueue.c:3362)
[ 0.596216][ T33] kthread (kernel/kthread.c:390)
[ 0.596216][ T33] ? __pfx_kthread (kernel/kthread.c:341)
[ 0.596216][ T33] ret_from_fork (arch/x86/kernel/process.c:153)
[ 0.596216][ T33] ? __pfx_kthread (kernel/kthread.c:341)
[ 0.596216][ T33] ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
[ 0.596216][ T33] </TASK>
[ 1.603321][ T33] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:315
[ 1.604448][ T33] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 33, name: kworker/u8:2
[ 1.605466][ T33] preempt_count: 0, expected: 0
[ 1.606028][ T33] RCU nest depth: 1, expected: 0
[ 1.606599][ T33] CPU: 0 PID: 33 Comm: kworker/u8:2 Tainted: G W 6.9.0-rc5-00037-g831d3c6cc6f0 #1
[ 1.607761][ T33] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 1.608136][ T33] Workqueue: async async_run_entry_fn
[ 1.608136][ T33] Call Trace:
[ 1.608136][ T33] <TASK>
[ 1.608136][ T33] dump_stack_lvl (lib/dump_stack.c:116)
[ 1.608136][ T33] __might_resched (kernel/sched/core.c:10198)
[ 1.608136][ T33] kmem_cache_alloc (include/linux/kernel.h:73 include/linux/sched/mm.h:315 mm/slub.c:3746 mm/slub.c:3827 mm/slub.c:3852)
[ 1.608136][ T33] alloc_empty_file (fs/file_table.c:203)
[ 1.608136][ T33] path_openat (fs/namei.c:3796)
[ 1.608136][ T33] do_filp_open (fs/namei.c:3833)
[ 1.608136][ T33] file_open_name (fs/open.c:1352)
[ 1.608136][ T33] filp_open (fs/open.c:1371)
[ 1.608136][ T33] do_name (init/initramfs.c:373)
[ 1.608136][ T33] flush_buffer (init/initramfs.c:452 init/initramfs.c:464)
[ 1.608136][ T33] ? __pfx_flush_buffer (init/initramfs.c:458)
[ 1.608136][ T33] __gunzip (lib/decompress_inflate.c:161)
[ 1.608136][ T33] ? __pfx_nofill (lib/decompress_inflate.c:37)
[ 1.608136][ T33] unpack_to_rootfs (init/initramfs.c:520)
[ 1.608136][ T33] ? __pfx_error (init/initramfs.c:59)
[ 1.608136][ T33] do_populate_rootfs (init/initramfs.c:714)
[ 1.608136][ T33] async_run_entry_fn (kernel/async.c:136)
[ 1.608136][ T33] process_scheduled_works (kernel/workqueue.c:3259)
[ 1.608136][ T33] worker_thread (include/linux/list.h:373 kernel/workqueue.c:955 kernel/workqueue.c:3417)
[ 1.608136][ T33] ? __pfx_worker_thread (kernel/workqueue.c:3362)
[ 1.608136][ T33] kthread (kernel/kthread.c:390)
[ 1.608136][ T33] ? __pfx_kthread (kernel/kthread.c:341)
[ 1.608136][ T33] ret_from_fork (arch/x86/kernel/process.c:153)
[ 1.608136][ T33] ? __pfx_kthread (kernel/kthread.c:341)
[ 1.608136][ T33] ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
[ 1.608136][ T33] </TASK>
[ 2.602317][ T33] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:315
[ 2.603414][ T33] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 33, name: kworker/u8:2
[ 2.604433][ T33] preempt_count: 0, expected: 0
[ 2.604985][ T33] RCU nest depth: 1, expected: 0
[ 2.605547][ T33] CPU: 0 PID: 33 Comm: kworker/u8:2 Tainted: G W 6.9.0-rc5-00037-g831d3c6cc6f0 #1
[ 2.606689][ T33] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 2.607825][ T33] Workqueue: async async_run_entry_fn
[ 2.608140][ T33] Call Trace:
[ 2.608140][ T33] <TASK>
[ 2.608140][ T33] dump_stack_lvl (lib/dump_stack.c:116)
[ 2.608140][ T33] __might_resched (kernel/sched/core.c:10198)
[ 2.608140][ T33] kmem_cache_alloc (include/linux/kernel.h:73 include/linux/sched/mm.h:315 mm/slub.c:3746 mm/slub.c:3827 mm/slub.c:3852)
[ 2.608140][ T33] alloc_empty_file (fs/file_table.c:203)
[ 2.608140][ T33] path_openat (fs/namei.c:3796)
[ 2.608140][ T33] do_filp_open (fs/namei.c:3833)
[ 2.608140][ T33] file_open_name (fs/open.c:1352)
[ 2.608140][ T33] filp_open (fs/open.c:1371)
[ 2.608140][ T33] do_name (init/initramfs.c:373)
[ 2.608140][ T33] flush_buffer (init/initramfs.c:452 init/initramfs.c:464)
[ 2.608140][ T33] ? __pfx_flush_buffer (init/initramfs.c:458)
[ 2.608140][ T33] __gunzip (lib/decompress_inflate.c:161)
[ 2.608140][ T33] ? __pfx_nofill (lib/decompress_inflate.c:37)
[ 2.608140][ T33] unpack_to_rootfs (init/initramfs.c:520)
[ 2.608140][ T33] ? __pfx_error (init/initramfs.c:59)
[ 2.608140][ T33] do_populate_rootfs (init/initramfs.c:714)
[ 2.608140][ T33] async_run_entry_fn (kernel/async.c:136)
[ 2.608140][ T33] process_scheduled_works (kernel/workqueue.c:3259)
[ 2.608140][ T33] worker_thread (include/linux/list.h:373 kernel/workqueue.c:955 kernel/workqueue.c:3417)
[ 2.608140][ T33] ? __pfx_worker_thread (kernel/workqueue.c:3362)
[ 2.608140][ T33] kthread (kernel/kthread.c:390)
[ 2.608140][ T33] ? __pfx_kthread (kernel/kthread.c:341)
[ 2.608140][ T33] ret_from_fork (arch/x86/kernel/process.c:153)
[ 2.608140][ T33] ? __pfx_kthread (kernel/kthread.c:341)
[ 2.608140][ T33] ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
[ 2.608140][ T33] </TASK>
[ 3.648001][ T33] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:315
[ 3.649103][ T33] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 33, name: kworker/u8:2
[ 3.650109][ T33] preempt_count: 0, expected: 0
[ 3.650660][ T33] RCU nest depth: 1, expected: 0
[ 3.651223][ T33] CPU: 0 PID: 33 Comm: kworker/u8:2 Tainted: G W 6.9.0-rc5-00037-g831d3c6cc6f0 #1
[ 3.651979][ T33] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 3.651979][ T33] Workqueue: async async_run_entry_fn
[ 3.651979][ T33] Call Trace:
[ 3.651979][ T33] <TASK>
[ 3.651979][ T33] dump_stack_lvl (lib/dump_stack.c:116)
[ 3.651979][ T33] __might_resched (kernel/sched/core.c:10198)
[ 3.651979][ T33] kmem_cache_alloc (include/linux/kernel.h:73 include/linux/sched/mm.h:315 mm/slub.c:3746 mm/slub.c:3827 mm/slub.c:3852)
[ 3.651979][ T33] alloc_empty_file (fs/file_table.c:203)
[ 3.651979][ T33] path_openat (fs/namei.c:3796)
[ 3.651979][ T33] do_filp_open (fs/namei.c:3833)
[ 3.651979][ T33] file_open_name (fs/open.c:1352)
[ 3.651979][ T33] filp_open (fs/open.c:1371)
[ 3.651979][ T33] do_name (init/initramfs.c:373)
[ 3.651979][ T33] flush_buffer (init/initramfs.c:452 init/initramfs.c:464)
[ 3.651979][ T33] ? __pfx_flush_buffer (init/initramfs.c:458)
[ 3.651979][ T33] __gunzip (lib/decompress_inflate.c:161)
[ 3.651979][ T33] ? __pfx_nofill (lib/decompress_inflate.c:37)
[ 3.651979][ T33] unpack_to_rootfs (init/initramfs.c:520)
[ 3.651979][ T33] ? __pfx_error (init/initramfs.c:59)
[ 3.651979][ T33] do_populate_rootfs (init/initramfs.c:714)
[ 3.651979][ T33] async_run_entry_fn (kernel/async.c:136)
[ 3.651979][ T33] process_scheduled_works (kernel/workqueue.c:3259)
[ 3.651979][ T33] worker_thread (include/linux/list.h:373 kernel/workqueue.c:955 kernel/workqueue.c:3417)
[ 3.651979][ T33] ? __pfx_worker_thread (kernel/workqueue.c:3362)
[ 3.651979][ T33] kthread (kernel/kthread.c:390)
[ 3.651979][ T33] ? __pfx_kthread (kernel/kthread.c:341)
[ 3.651979][ T33] ret_from_fork (arch/x86/kernel/process.c:153)
[ 3.651979][ T33] ? __pfx_kthread (kernel/kthread.c:341)
[ 3.651979][ T33] ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
[ 3.651979][ T33] </TASK>
[ 3.705833][ T33] Freeing initrd memory: 185612K



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240425/[email protected]



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


2024-04-25 09:27:48

by stsp

[permalink] [raw]
Subject: Re: [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag

25.04.2024 05:31, Al Viro пишет:
> Incidentally, suppose you have the same process run with stdin opened
> (r/o) by root. F_SETFD it to O_CLOEXEC, then use your open with
> dirfd being 0, pathname - "" and flags - O_RDWR.
I actually checked this with the test-case.
It seems to return ENOENT:


Breakpoint 1, openat2 (dirfd=0, pathname=0x7fffffffdbee "",
    how=0x7fffffffd5e0, size=24) at tst.c:13
13        return syscall(SYS_openat2, dirfd, pathname, how, size);
(gdb) fin
Run till exit from #0  openat2 (dirfd=0, pathname=0x7fffffffdbee "",
    how=0x7fffffffd5e0, size=24) at tst.c:13
0x000000000040167b in main (argc=3, argv=0x7fffffffd7b8) at tst.c:140
140        fd = openat2(0, efile, &how1, sizeof(how1));
Value returned is $1 = -1
(gdb) list
135        err = fcntl(0, F_SETFD, O_CLOEXEC);
136        if (err) {
137            perror("fcntl(F_SETFD)");
138            return EXIT_FAILURE;
139        }
140        fd = openat2(0, efile, &how1, sizeof(how1));
141        if (fd == -1) {
142            perror("openat2(1)");
143    //        return EXIT_FAILURE;
144        } else {
(gdb) p errno
$2 = 2


So it seems the creds can't be stolen
from a non-dir fd, but I wonder why
ENOENT is returned instead of ENOTDIR.
Such ENOENT is not dicumented in a
man page of openat2(), so I guess there
is some problem here even w/o my patch. :)


2024-04-25 09:55:06

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v4 0/2] implement OA2_INHERIT_CRED flag for openat2()

On Wed, Apr 24, 2024 at 08:50:30PM +0300, stsp wrote:
> 24.04.2024 19:09, Christian Brauner пишет:
> > This smells ripe enough to serve as an attack vector in non-obvious
> > ways. And in general this has the potential to confuse the hell out
> > unsuspecting userspace.
>
> Unsuspecting user-space will simply
> not use this flag. What do you mean?
>
>
> > They can now suddenly get sent such
> > special-sauce files
>
> There are no any special files.
> This flag helps you to open a file on
> which you currently have no perms
> to open, but had those in the past.
>
>
> > such as this that they have no way of recognizing as
> > there's neither an FMODE_* flag nor is the OA2_* flag recorded so it's
> > not available in F_GETFL.
> >
> > There's not even a way to restrict that new flag because no LSM ever
> > sees it. So that behavior might break LSM assumptions as well.
> >
> > And it is effectively usable to steal credentials. If process A opens a
> > directory with uid/gid 0 then sends that directory fd via AF_UNIX or
> > something to process B then process B can inherit the uid/gid of process
>
> No, it doesn't inherit anything.
> The inheritance happens only for
> a duration of an open() call, helping
> open() to succeed. The creds are
> reverted when open() completed.
>
> The only theoretically possible attack
> would be to open some file you'd never
> intended to open. Also note that a
> very minimal sed of creds is overridden:
> fsuid, fsgid, groupinfo.
>
> > A by specifying OA2_* with no way for process A to prevent this - not
> > even through an LSM.
>
> If process B doesn't use that flag, it
> inherits nothing, no matter what process
> A did or passed via a socket.
> So an unaware process that doesn't
> use that flag, is completely unaffected.

The point is that the original opener has no way to prevent his creds
being abused by a completely unrelated process later on. Something I've
clearly explained in my mail.

>
> > The permission checking model that we have right now is already baroque.
> > I see zero reason to add more complexity for the sake of "lightweight
> > sandboxing". We have LSMs and namespaces for stuff like this.
> >
> > NAK.
>
> I don't think it is fair to say NAK
> without actually reading the patch
> or asking its author for clarifications.
> Even though you didn't ask, I provided
> my clarifications above, as I find that
> a polite action.

I'm not sure what you don't understand or why you need further
clarification. Your patch allows any opener using your new flag to steal
the uid/gid/whatever from the original opener. It was even worse in the
first version where the whole struct cred of the original opener was
used. It's obviously a glaring security hole that's opened up by this.

Let alone that the justification "It's useful for some lightweight
sandboxing" is absolutely not sufficient to justify substantial shifts
in the permission model.

The NAK stands.

2024-04-25 10:12:43

by stsp

[permalink] [raw]
Subject: Re: [PATCH v4 0/2] implement OA2_INHERIT_CRED flag for openat2()

25.04.2024 12:54, Christian Brauner пишет:
> I'm not sure what you don't understand or why you need further
> clarification. Your patch allows any opener using your new flag to steal
> the uid/gid/whatever from the original opener.

No, absolutely impossible (see below).


> It was even worse in the
> first version where the whole struct cred of the original opener was
> used. It's obviously a glaring security hole that's opened up by this.

Well, it was the second version actually
(first one only had fsuid/fsgid), but no,
its the same thing either way.
The creds are overridden for a diration of
an openat2() syscall. It doesn't matter
what uid/gid are there, because they are
not used during openat2(), and are reverted
back at the end. The only reason I decided
to get back to fsuid/fsgid, were the capabilities,
which I don't want to be raised during openat2().

> Let alone that the justification "It's useful for some lightweight
> sandboxing" is absolutely not sufficient to justify substantial shifts
> in the permission model.
>
> The NAK stands.

But I am sure you still don't understand
what exactly the patch does, so why not
to ask instead of asserting?
You say uid/gid can be stolen, but no,
it can't: the creds are reverted. Only
fsuid/fsgid (and caps in v2 of the patch)
actually affect openat2(), but nothing is
"leaked" after openat2() finished.

That said, Viro already pointed to the actual
problem, and the patch-testing bot did the
same. So I have a valid complains already,
and you don't have to add the invalid ones
to them. :)


2024-04-25 12:09:07

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v4 0/2] implement OA2_INHERIT_CRED flag for openat2()

> But I am sure you still don't understand
> what exactly the patch does, so why not
> to ask instead of asserting?
> You say uid/gid can be stolen, but no,
> it can't: the creds are reverted. Only
> fsuid/fsgid (and caps in v2 of the patch)
> actually affect openat2(), but nothing is
> "leaked" after openat2() finished.

I say "stolen" because the original opener has no say in this. You're
taking their fsuid/fsgid and groups and overriding creds for the
duration of the lookup and open. Something the original opener never
consented to. But let's call it "borrowed" if you're hung up on
terminology here.

But ultimately it's the same complaint: the original opener has no way
of controlling this behavior. Once ignored in my first reply, and now
again conveniently cut off again. Let alone all the other objections.

And fwiw, the same way you somehow feel like I haven't read your patch
it seems you to consistently underestimate the implications of this
change. Which is really strange because it's pretty obvious. It's
effectively temporary setuid/setgid with some light limitations.

Leaking directory descriptors across security boundaries becomes a lot
more interesting with this patch applied. Something which has happened
multiple times already and heavily figures in container escapes. And the
RESOLVE_BENEATH/IN_ROOT restrictions only stave off symlink (and magic
link) attacks. If a directory fd is leaked a container can take the
fsuid/fsgid/groups from the original opener of that directory and write
to disk with whatever it resolves to in that namespace's idmapping. It's
basically a nice way to puzzle together arbitrary fsuid/fsgid and
idmapping pairs in whatever namespace the opener happens to be in.

And again it also messes with LSMs because you're changing credentials
behind their back.

And to the "unsuspecting userspace" point you dismissed earlier.
Providing a dirfd to a lesser privileged process isn't horrendously
dangerous right now. But with this change it sure is. For stuff like
libpathrs or systemd's fdstore this change has immediate security
implications.

2024-04-25 12:40:20

by stsp

[permalink] [raw]
Subject: Re: [PATCH v4 0/2] implement OA2_INHERIT_CRED flag for openat2()

25.04.2024 15:08, Christian Brauner пишет:
>> But I am sure you still don't understand
>> what exactly the patch does, so why not
>> to ask instead of asserting?
>> You say uid/gid can be stolen, but no,
>> it can't: the creds are reverted. Only
>> fsuid/fsgid (and caps in v2 of the patch)
>> actually affect openat2(), but nothing is
>> "leaked" after openat2() finished.
> I say "stolen" because the original opener has no say in this.

The initial idea was to keep this all
within a single-process boundary. It
wasn't coded that way though. :(


> You're
> taking their fsuid/fsgid and groups and overriding creds for the
> duration of the lookup and open. Something the original opener never
> consented to. But let's call it "borrowed" if you're hung up on
> terminology here.

Not a terminology: you were explicitly
talking about uid/gid, blaming a v2 of
my patch. But v2 was not any more
harmful than others and uid/gid cannot
be leaked even there.
But I don't mind: if now you realize v2
is not a leak for uid/gid, then we are on
the same track.

> But ultimately it's the same complaint: the original opener has no way
> of controlling this behavior.

It wasn't clear if the opener should
control that behaviour, or maybe instead
that all should be limited within a single
process?
Andy Lutomirski's explanation made it
clear that even if the openers are the
same, the control is still needed. So now
this argument is undeniable.


> Once ignored in my first reply, and now
> again conveniently cut off again. Let alone all the other objections.

Sorry but your complains were about
stealing uid/gid in v2, which were clearly
wrong. But this is a matter of the past.

> And fwiw, the same way you somehow feel like I haven't read your patch
> it seems you to consistently underestimate the implications of this
> change. Which is really strange because it's pretty obvious. It's
> effectively temporary setuid/setgid with some light limitations.

Temporary cred override is quite common
within the current code AFAICS when I grep
it for override_creds() call.

> Leaking directory descriptors across security boundaries becomes a lot
> more interesting with this patch applied. Something which has happened
> multiple times already and heavily figures in container escapes. And the
> RESOLVE_BENEATH/IN_ROOT restrictions only stave off symlink (and magic
> link) attacks. If a directory fd is leaked a container can take the
> fsuid/fsgid/groups from the original opener of that directory and write
> to disk with whatever it resolves to in that namespace's idmapping. It's
> basically a nice way to puzzle together arbitrary fsuid/fsgid and
> idmapping pairs in whatever namespace the opener happens to be in.

Yes, so the opt-in flag is undeniably needed.

> And to the "unsuspecting userspace" point you dismissed earlier.
> Providing a dirfd to a lesser privileged process isn't horrendously
> dangerous right now. But with this change it sure is. For stuff like
> libpathrs or systemd's fdstore this change has immediate security
> implications.

So am I getting your point correctly:
- process A uses the opt-in flag (eg O_CAPTURE_CREDS)
  and passes the fd to "unsuspecting userspace" B.
- process B is not going to use O_INHERIT_CREDS,
  but it now still can't drop his privs fully.

Is this what you mean?


2024-04-25 13:50:44

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag



Hello,

kernel test robot noticed "BUG:KASAN:wild-memory-access_in_terminate_walk" on:

commit: 97bb54b42b1d6150e9ae11a7bf7833ed9f8c471d ("[PATCH 2/2] openat2: add OA2_INHERIT_CRED flag")
url: https://github.com/intel-lab-lkp/linux/commits/Stas-Sergeev/fs-reorganize-path_openat/20240424-185527
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 9d1ddab261f3e2af7c384dc02238784ce0cf9f98
patch link: https://lore.kernel.org/all/[email protected]/
patch subject: [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag

in testcase: boot

compiler: clang-17
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


+---------------------------------------------------------------------------------------+------------+------------+
| | 831d3c6cc6 | 97bb54b42b |
+---------------------------------------------------------------------------------------+------------+------------+
| BUG:KASAN:wild-memory-access_in_terminate_walk | 0 | 12 |
| canonical_address#:#[##] | 0 | 12 |
| RIP:terminate_walk | 0 | 12 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 12 |
+---------------------------------------------------------------------------------------+------------+------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]


[ 2.555857][ T16] BUG: KASAN: wild-memory-access in terminate_walk (include/linux/instrumented.h:? include/linux/atomic/atomic-instrumented.h:400 include/linux/refcount.h:264 include/linux/refcount.h:307 include/linux/refcount.h:325 fs/namei.c:702)
[ 2.556181][ T16] Write of size 4 at addr aaaaaaaaaaaaaaaa by task kdevtmpfs/16
[ 2.556181][ T16]
[ 2.556181][ T16] CPU: 0 PID: 16 Comm: kdevtmpfs Tainted: G T 6.9.0-rc5-00038-g97bb54b42b1d #1 c90cc2d91176f38ca16e85ead0a72934082854cd
[ 2.556181][ T16] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 2.556181][ T16] Call Trace:
[ 2.556181][ T16] <TASK>
[ 2.556181][ T16] dump_stack_lvl (lib/dump_stack.c:116)
[ 2.556181][ T16] print_report (mm/kasan/report.c:?)
[ 2.556181][ T16] ? kasan_report (mm/kasan/report.c:214 mm/kasan/report.c:590)
[ 2.556181][ T16] ? terminate_walk (include/linux/instrumented.h:? include/linux/atomic/atomic-instrumented.h:400 include/linux/refcount.h:264 include/linux/refcount.h:307 include/linux/refcount.h:325 fs/namei.c:702)
[ 2.556181][ T16] kasan_report (mm/kasan/report.c:603)
[ 2.556181][ T16] ? terminate_walk (include/linux/instrumented.h:? include/linux/atomic/atomic-instrumented.h:400 include/linux/refcount.h:264 include/linux/refcount.h:307 include/linux/refcount.h:325 fs/namei.c:702)
[ 2.556181][ T16] kasan_check_range (mm/kasan/generic.c:?)
[ 2.556181][ T16] terminate_walk (include/linux/instrumented.h:? include/linux/atomic/atomic-instrumented.h:400 include/linux/refcount.h:264 include/linux/refcount.h:307 include/linux/refcount.h:325 fs/namei.c:702)
[ 2.556181][ T16] path_lookupat (fs/namei.c:2515)
[ 2.556181][ T16] filename_lookup (fs/namei.c:2526)
[ 2.556181][ T16] kern_path (fs/namei.c:2634)
[ 2.556181][ T16] init_mount (fs/init.c:22)
[ 2.556181][ T16] devtmpfs_setup (drivers/base/devtmpfs.c:419)
[ 2.556181][ T16] devtmpfsd (drivers/base/devtmpfs.c:436)
[ 2.556181][ T16] kthread (kernel/kthread.c:390)
[ 2.556181][ T16] ? vclkdev_alloc (drivers/base/devtmpfs.c:435)
[ 2.556181][ T16] ? kthread_unuse_mm (kernel/kthread.c:341)
[ 2.556181][ T16] ret_from_fork (arch/x86/kernel/process.c:153)
[ 2.556181][ T16] ? kthread_unuse_mm (kernel/kthread.c:341)
[ 2.556181][ T16] ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
[ 2.556181][ T16] </TASK>
[ 2.556181][ T16] ==================================================================
[ 2.556184][ T16] Disabling lock debugging due to kernel taint
[ 2.556901][ T16] general protection fault, probably for non-canonical address 0xaaaaaaaaaaaaaaaa: 0000 [#1] KASAN PTI
[ 2.558131][ T16] CPU: 0 PID: 16 Comm: kdevtmpfs Tainted: G B T 6.9.0-rc5-00038-g97bb54b42b1d #1 c90cc2d91176f38ca16e85ead0a72934082854cd
[ 2.559653][ T16] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 2.560181][ T16] RIP: 0010:terminate_walk (arch/x86/include/asm/atomic.h:103 include/linux/atomic/atomic-arch-fallback.h:949 include/linux/atomic/atomic-instrumented.h:401 include/linux/refcount.h:264 include/linux/refcount.h:307 include/linux/refcount.h:325 fs/namei.c:702)
[ 2.560181][ T16] Code: 03 43 80 3c 2e 00 74 08 4c 89 ff e8 01 61 f4 ff 49 8b 1f 48 85 db 74 41 48 89 df be 04 00 00 00 e8 dc 61 f4 ff b8 ff ff ff ff <0f> c1 03 83 f8 01 75 25 43 80 3c 2e 00 74 08 4c 89 ff e8 d0 60 f4
All code
========
0: 03 43 80 add -0x80(%rbx),%eax
3: 3c 2e cmp $0x2e,%al
5: 00 74 08 4c add %dh,0x4c(%rax,%rcx,1)
9: 89 ff mov %edi,%edi
b: e8 01 61 f4 ff call 0xfffffffffff46111
10: 49 8b 1f mov (%r15),%rbx
13: 48 85 db test %rbx,%rbx
16: 74 41 je 0x59
18: 48 89 df mov %rbx,%rdi
1b: be 04 00 00 00 mov $0x4,%esi
20: e8 dc 61 f4 ff call 0xfffffffffff46201
25: b8 ff ff ff ff mov $0xffffffff,%eax
2a:* 0f c1 03 xadd %eax,(%rbx) <-- trapping instruction
2d: 83 f8 01 cmp $0x1,%eax
30: 75 25 jne 0x57
32: 43 80 3c 2e 00 cmpb $0x0,(%r14,%r13,1)
37: 74 08 je 0x41
39: 4c 89 ff mov %r15,%rdi
3c: e8 .byte 0xe8
3d: d0 60 f4 shlb -0xc(%rax)

Code starting with the faulting instruction
===========================================
0: 0f c1 03 xadd %eax,(%rbx)
3: 83 f8 01 cmp $0x1,%eax
6: 75 25 jne 0x2d
8: 43 80 3c 2e 00 cmpb $0x0,(%r14,%r13,1)
d: 74 08 je 0x17
f: 4c 89 ff mov %r15,%rdi
12: e8 .byte 0xe8
13: d0 60 f4 shlb -0xc(%rax)
[ 2.560181][ T16] RSP: 0000:ffffc9000010fc40 EFLAGS: 00010246
[ 2.560181][ T16] RAX: 00000000ffffffff RBX: aaaaaaaaaaaaaaaa RCX: ffffffff811e4a0f
[ 2.560181][ T16] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff8792adc0
[ 2.560181][ T16] RBP: 0000000000000011 R08: ffffffff8792adc7 R09: 1ffffffff0f255b8
[ 2.560181][ T16] R10: dffffc0000000000 R11: fffffbfff0f255b9 R12: 1ffff92000021fc4
[ 2.560181][ T16] R13: dffffc0000000000 R14: 1ffff92000021fc1 R15: ffffc9000010fe08
[ 2.560181][ T16] FS: 0000000000000000(0000) GS:ffffffff878dc000(0000) knlGS:0000000000000000
[ 2.560181][ T16] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.560181][ T16] CR2: ffff88843ffff000 CR3: 000000000789c000 CR4: 00000000000406f0
[ 2.560181][ T16] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2.560181][ T16] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2.560181][ T16] Call Trace:
[ 2.560181][ T16] <TASK>
[ 2.560181][ T16] ? __die_body (arch/x86/kernel/dumpstack.c:421)
[ 2.560181][ T16] ? die_addr (arch/x86/kernel/dumpstack.c:?)
[ 2.560181][ T16] ? exc_general_protection (arch/x86/kernel/traps.c:?)
[ 2.560181][ T16] ? end_report (arch/x86/include/asm/current.h:49 mm/kasan/report.c:240)
[ 2.560181][ T16] ? asm_exc_general_protection (arch/x86/include/asm/idtentry.h:617)
[ 2.560181][ T16] ? add_taint (arch/x86/include/asm/bitops.h:60 include/asm-generic/bitops/instrumented-atomic.h:29 kernel/panic.c:555)
[ 2.560181][ T16] ? terminate_walk (arch/x86/include/asm/atomic.h:103 include/linux/atomic/atomic-arch-fallback.h:949 include/linux/atomic/atomic-instrumented.h:401 include/linux/refcount.h:264 include/linux/refcount.h:307 include/linux/refcount.h:325 fs/namei.c:702)
[ 2.560181][ T16] path_lookupat (fs/namei.c:2515)
[ 2.560181][ T16] filename_lookup (fs/namei.c:2526)
[ 2.560181][ T16] kern_path (fs/namei.c:2634)
[ 2.560181][ T16] init_mount (fs/init.c:22)
[ 2.560181][ T16] devtmpfs_setup (drivers/base/devtmpfs.c:419)
[ 2.560181][ T16] devtmpfsd (drivers/base/devtmpfs.c:436)
[ 2.560181][ T16] kthread (kernel/kthread.c:390)
[ 2.560181][ T16] ? vclkdev_alloc (drivers/base/devtmpfs.c:435)
[ 2.560181][ T16] ? kthread_unuse_mm (kernel/kthread.c:341)
[ 2.560181][ T16] ret_from_fork (arch/x86/kernel/process.c:153)
[ 2.560181][ T16] ? kthread_unuse_mm (kernel/kthread.c:341)
[ 2.560181][ T16] ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
[ 2.560181][ T16] </TASK>
[ 2.560181][ T16] Modules linked in:
[ 2.560183][ T16] ---[ end trace 0000000000000000 ]---
[ 2.560820][ T16] RIP: 0010:terminate_walk (arch/x86/include/asm/atomic.h:103 include/linux/atomic/atomic-arch-fallback.h:949 include/linux/atomic/atomic-instrumented.h:401 include/linux/refcount.h:264 include/linux/refcount.h:307 include/linux/refcount.h:325 fs/namei.c:702)
[ 2.561462][ T16] Code: 03 43 80 3c 2e 00 74 08 4c 89 ff e8 01 61 f4 ff 49 8b 1f 48 85 db 74 41 48 89 df be 04 00 00 00 e8 dc 61 f4 ff b8 ff ff ff ff <0f> c1 03 83 f8 01 75 25 43 80 3c 2e 00 74 08 4c 89 ff e8 d0 60 f4
All code
========
0: 03 43 80 add -0x80(%rbx),%eax
3: 3c 2e cmp $0x2e,%al
5: 00 74 08 4c add %dh,0x4c(%rax,%rcx,1)
9: 89 ff mov %edi,%edi
b: e8 01 61 f4 ff call 0xfffffffffff46111
10: 49 8b 1f mov (%r15),%rbx
13: 48 85 db test %rbx,%rbx
16: 74 41 je 0x59
18: 48 89 df mov %rbx,%rdi
1b: be 04 00 00 00 mov $0x4,%esi
20: e8 dc 61 f4 ff call 0xfffffffffff46201
25: b8 ff ff ff ff mov $0xffffffff,%eax
2a:* 0f c1 03 xadd %eax,(%rbx) <-- trapping instruction
2d: 83 f8 01 cmp $0x1,%eax
30: 75 25 jne 0x57
32: 43 80 3c 2e 00 cmpb $0x0,(%r14,%r13,1)
37: 74 08 je 0x41
39: 4c 89 ff mov %r15,%rdi
3c: e8 .byte 0xe8
3d: d0 60 f4 shlb -0xc(%rax)

Code starting with the faulting instruction
===========================================
0: 0f c1 03 xadd %eax,(%rbx)
3: 83 f8 01 cmp $0x1,%eax
6: 75 25 jne 0x2d
8: 43 80 3c 2e 00 cmpb $0x0,(%r14,%r13,1)
d: 74 08 je 0x17
f: 4c 89 ff mov %r15,%rdi
12: e8 .byte 0xe8
13: d0 60 f4 shlb -0xc(%rax)


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240425/[email protected]



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


2024-04-25 14:03:13

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag

> struct open_flags {
> - int open_flag;
> + u64 open_flag;

Btw, this change taken together with

> +#define VALID_OPENAT2_FLAGS (VALID_OPEN_FLAGS | OA2_INHERIT_CRED)

is also ripe to causes subtle bugs and security issues. This new
VALID_OPENAT2_FLAGS define bypasses

BUILD_BUG_ON_MSG(upper_32_bits(VALID_OPEN_FLAGS),
"struct open_flags doesn't yet handle flags > 32 bits");

in build_open_flags(). And right now lookup_open(), open_last_lookups(),
and do_open() just do:

int open_flag = op->open_flag;

Because op->open_flag was 32bit that was fine. But now ->open_flag is
64bit which means we truncate the upper 32bit including OA2_INHERIT_CRED
or any other new flag in the upper 32bits in those functions.

So as soon as there's an additional check in e.g., do_open() for
OA2_INHERIT_CRED or in any of the other helpers that's security relevant
we're fscked because that flag is never seen and no bot will help us
here. And it's super easy to miss during review...

2024-04-26 13:43:25

by stsp

[permalink] [raw]
Subject: Re: [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag

25.04.2024 17:02, Christian Brauner пишет:
>> struct open_flags {
>> - int open_flag;
>> + u64 open_flag;
> Btw, this change taken together with
All fixed in v5.
I dropped u64 use.
Other comments are addressed as well.
Please let me know if I missed some.

Thank you.