Hi!
bpf_d_path() can be invoked on a "struct path" that results from
following a pointer chain involving pointers that can concurrently
change; this can lead to stuff like use-after-free in d_path().
For example, the BPF verifier permits stuff like
bpf_d_path(¤t->mm->exe_file->f_path, ...), which is not actually
safe in many contexts:
current->mm->exe_file can concurrently change; so by the time
bpf_d_path() is called, the file's refcount might have gone to zero,
and __fput() may have already mostly torn down the file. "struct file"
currently has some limited RCU lifetime, but that is supposed to be an
implementation detail that BPF shouldn't be relying on, and "struct
file" will soon have even less RCU lifetime than before (see
<https://lore.kernel.org/all/20230930-glitzer-errungenschaft-b86880c177c4@brauner/>).
When __fput() tears down a file, it drops the references held by
file->f_path.mnt and file->f_path.dentry. "struct vfsmount" has some
kind of RCU lifetime, but "struct dentry" will be freed directly in
dentry_free() if it has DCACHE_NORCU set, which is the case if it was
allocated via d_alloc_pseudo(), which is how memfd files are
allocated.
So the following race is possible, if we start in a situation where
current->mm->exe_file points to a memfd:
thread A thread B
======== ========
begin RCU section
begin BPF program
compute path = ¤t->mm->exe_file->f_path
prctl(PR_SET_MM, PR_SET_MM_MAP, ...)
updates current->mm->exe_file
calls fput() on old ->exe_file
__fput() runs
dput(dentry);
mntput(mnt)
invoke helper bpf_d_path(path, ...)
d_path()
reads path->dentry->d_op *** UAF read ***
reads path->dentry->d_op->d_dname *** read through wild pointer ***
path->dentry->d_op->d_dname(...) *** wild pointer call ***
So if an attacker managed to reallocate the old "struct dentry" with
attacker-controlled data, they could probably get the kernel to call
an attacker-provided function pointer, eventually letting an attacker
gain kernel privileges.
Obviously this is not a bug an unprivileged attacker can just hit
directly on a system where no legitimate BPF programs are already
running, because loading tracing BPF programs requires privileges; but
if a privileged process loads a tracing BPF program that does
something unsafe like "bpf_d_path(¤t->mm->exe_file->f_path,
...)", an attacker might be able to leverage that.
If BPF wants to be able to promise that buggy BPF code can't crash the
kernel (or, worse, introduce privilege escalation vulnerabilities in
the kernel), then I think BPF programs must not be allowed to follow
any pointer chain and pass the object at the end of it into BPF
helpers.
On Fri, Oct 27, 2023 at 10:13 AM Jann Horn <[email protected]> wrote:
>
> Hi!
>
> bpf_d_path() can be invoked on a "struct path" that results from
> following a pointer chain involving pointers that can concurrently
> change; this can lead to stuff like use-after-free in d_path().
>
> For example, the BPF verifier permits stuff like
> bpf_d_path(¤t->mm->exe_file->f_path, ...), which is not actually
> safe in many contexts:
>
> current->mm->exe_file can concurrently change; so by the time
> bpf_d_path() is called, the file's refcount might have gone to zero,
> and __fput() may have already mostly torn down the file. "struct file"
> currently has some limited RCU lifetime, but that is supposed to be an
> implementation detail that BPF shouldn't be relying on, and "struct
> file" will soon have even less RCU lifetime than before (see
> <https://lore.kernel.org/all/20230930-glitzer-errungenschaft-b86880c177c4@brauner/>).
>
> When __fput() tears down a file, it drops the references held by
> file->f_path.mnt and file->f_path.dentry. "struct vfsmount" has some
> kind of RCU lifetime, but "struct dentry" will be freed directly in
> dentry_free() if it has DCACHE_NORCU set, which is the case if it was
> allocated via d_alloc_pseudo(), which is how memfd files are
> allocated.
>
> So the following race is possible, if we start in a situation where
> current->mm->exe_file points to a memfd:
>
> thread A thread B
> ======== ========
> begin RCU section
> begin BPF program
> compute path = ¤t->mm->exe_file->f_path
>
> prctl(PR_SET_MM, PR_SET_MM_MAP, ...)
> updates current->mm->exe_file
> calls fput() on old ->exe_file
> __fput() runs
> dput(dentry);
> mntput(mnt)
>
> invoke helper bpf_d_path(path, ...)
> d_path()
> reads path->dentry->d_op *** UAF read ***
> reads path->dentry->d_op->d_dname *** read through wild pointer ***
> path->dentry->d_op->d_dname(...) *** wild pointer call ***
>
> So if an attacker managed to reallocate the old "struct dentry" with
> attacker-controlled data, they could probably get the kernel to call
> an attacker-provided function pointer, eventually letting an attacker
> gain kernel privileges.
>
> Obviously this is not a bug an unprivileged attacker can just hit
> directly on a system where no legitimate BPF programs are already
> running, because loading tracing BPF programs requires privileges; but
> if a privileged process loads a tracing BPF program that does
> something unsafe like "bpf_d_path(¤t->mm->exe_file->f_path,
> ...)", an attacker might be able to leverage that.
Thanks for the report. That's a verifier bug indeed.
Curious, did you actually see such broken bpf program or this is
theoretical issue in case somebody will write such thing ?
>
> If BPF wants to be able to promise that buggy BPF code can't crash the
> kernel (or, worse, introduce privilege escalation vulnerabilities in
> the kernel),
Only the former. The verifier cannot possibly guarantee that the bpf-lsm
program or tracing bpf prog is not leaking addresses or acting maliciously.
Same in networking. XDP prog might be doing firewalling incorrectly,
dropping wrong packets, disabling ssh when it shouldn't, etc.
We cannot validate semantics. The verifier tries to guarantee non-crash only.
Hence loading bpf prog is a privileged operation.
But back to the verifier bug... I suspect it will be very hard to
craft a test that does prctl(PR_SET_MM) and goes all the way through
the delayed fput logic on one cpu while bpf prog under rcu_read_lock
calls bpf_d_path on the other cpu. I can see this happening in theory
and we need to close this verification gap, but we need to be realistic
in assessing the severity of it.
To fix it we need to make bpf_d_path KF_TRUSTED_ARGS. All new kfuncs
are done this way already. They don't allow unrestricted pointer walks.