release_task, where the seccomp's filter refcounter is released, is not
called for the case when the fork/clone is terminated midway by a
signal. This leaves an extra reference that prevents filter from being
destroyed even after all processes using it exit leading to a BPF JIT
memory leak. Dereference the refcounter in the failure path of the
copy_process function.
Fixes: 3a15fb6ed92c ("seccomp: release filter after task is fully dead")
Cc: Christian Brauner <[email protected]>
Cc: [email protected]
Signed-off-by: Oleksandr Tymoshenko <[email protected]>
---
kernel/fork.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/kernel/fork.c b/kernel/fork.c
index 90c85b17bf69..20f7a3e91354 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1763,6 +1763,21 @@ static void copy_seccomp(struct task_struct *p)
#endif
}
+static void release_seccomp(struct task_struct *p)
+{
+#ifdef CONFIG_SECCOMP
+ /*
+ * Must be called with sighand->lock held, which is common to
+ * all threads in the group. Holding cred_guard_mutex is not
+ * needed because this new task is not yet running and cannot
+ * be racing exec.
+ */
+ assert_spin_locked(¤t->sighand->siglock);
+
+ seccomp_filter_release(p);
+#endif
+}
+
SYSCALL_DEFINE1(set_tid_address, int __user *, tidptr)
{
current->clear_child_tid = tidptr;
@@ -2495,6 +2510,7 @@ static __latent_entropy struct task_struct *copy_process(
return p;
bad_fork_cancel_cgroup:
+ release_seccomp(p);
sched_core_free(p);
spin_unlock(¤t->sighand->siglock);
write_unlock_irq(&tasklist_lock);
--
2.37.2.789.g6183377224-goog
On Fri, Sep 02, 2022 at 03:41:35AM +0000, Oleksandr Tymoshenko wrote:
> release_task, where the seccomp's filter refcounter is released, is not
> called for the case when the fork/clone is terminated midway by a
> signal. This leaves an extra reference that prevents filter from being
> destroyed even after all processes using it exit leading to a BPF JIT
> memory leak. Dereference the refcounter in the failure path of the
> copy_process function.
>
> Fixes: 3a15fb6ed92c ("seccomp: release filter after task is fully dead")
> Cc: Christian Brauner <[email protected]>
> Cc: [email protected]
> Signed-off-by: Oleksandr Tymoshenko <[email protected]>
> ---
Hey Oleksandr,
Thanks for the patch! I'm really puzzled as to why we never noticed this
and I'm trying to re-architect how this happend. But in any case,
there's a patch in the seccomp tree that fixes this:
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?id=6d17452707ca
which is slighly different from your approach in that it moves
copy_seccomp() after the point of no return. Let us know if you see any
issues with this!
Christian
Hi Christian,
The patch in the seccomp tree, adapted to 5.10 branch, fixed the
memory leak in my reproducer.
Thanks for working on this, I should have checked the seccomp tree first :)
Please disregard the patch in my submission.
Thank you
On Mon, Sep 5, 2022 at 1:39 AM Christian Brauner <[email protected]> wrote:
>
> On Fri, Sep 02, 2022 at 03:41:35AM +0000, Oleksandr Tymoshenko wrote:
> > release_task, where the seccomp's filter refcounter is released, is not
> > called for the case when the fork/clone is terminated midway by a
> > signal. This leaves an extra reference that prevents filter from being
> > destroyed even after all processes using it exit leading to a BPF JIT
> > memory leak. Dereference the refcounter in the failure path of the
> > copy_process function.
> >
> > Fixes: 3a15fb6ed92c ("seccomp: release filter after task is fully dead")
> > Cc: Christian Brauner <[email protected]>
> > Cc: [email protected]
> > Signed-off-by: Oleksandr Tymoshenko <[email protected]>
> > ---
>
> Hey Oleksandr,
>
> Thanks for the patch! I'm really puzzled as to why we never noticed this
> and I'm trying to re-architect how this happend. But in any case,
> there's a patch in the seccomp tree that fixes this:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?id=6d17452707ca
>
> which is slighly different from your approach in that it moves
> copy_seccomp() after the point of no return. Let us know if you see any
> issues with this!
>
> Christian
On Mon, Sep 05, 2022 at 03:53:54PM -0700, Oleksandr Tymoshenko wrote:
> Hi Christian,
>
> The patch in the seccomp tree, adapted to 5.10 branch, fixed the
> memory leak in my reproducer.
> Thanks for working on this, I should have checked the seccomp tree first :)
> Please disregard the patch in my submission.
Oh, it wasn't my patch it was someone else's. :) I just acked it! Thanks
for working on this we need things like these spotted and fixed!