Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp900046pxb; Tue, 1 Feb 2022 12:45:49 -0800 (PST) X-Google-Smtp-Source: ABdhPJwGQANXdjhnQ18o8nup9Q78NDTjX8gjZfwwM42Ymv2/fWHeWbhFH9wdVGuAw3PliNo3PHcs X-Received: by 2002:a17:90b:4c92:: with SMTP id my18mr3801105pjb.135.1643748349454; Tue, 01 Feb 2022 12:45:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1643748349; cv=none; d=google.com; s=arc-20160816; b=nEGLCbPLOfAPEL0y3iLr1eE+8pBT+WkFj+7lIfjHrs1ufk3IdxNR5DRDjC32TVMrJe YH4aIVqXB7DxEXCH/r5oJh9gZBOWCG1rt6k+J853tbTZGzzuB4IXg5hO7L1upj5mqzof s7tTGPLqHlSHSXDF+lwIKh2vc+WM0BFW8bwe48l3EZvr6E5biI5jhHorMMi7POCQ7xFT D4UbROUJExghaikhDcee9vZku+nDDLg4fzojT8HMsEoRFTpQbQUmbRKWlNiGiRlzu0N1 zREE7oZgR1IZB9klKQldufxVuDotzIWdPzPJFr0H3mc60jLFRBEutR0kExaXBs2GI1P2 CTxQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=LyPS3Szlv9qh4gjOhtDn0yktNtrjffm5++UyP/rn3f4=; b=mX7NLB3QcOjn/qtMkALwoEsAlIX0pbiPoT7CB7jDjne8BEeNa+/anPaigWY3oOWaMX 32YReLF8hb9iIVgChawG8ahLsmMcD8xT3SNMBnzq/LPLGYA/zQCxYUt09YomEZIrRgGz /09b2IvLJpm7O7zZmdtOZtliAgApw/wxAOvju+boUQe5qUlLzSwqNcQK58+8h1DyhF3R Cr0KuhuZQwjUKKTObEj5s5Bk1YQkJWRZYDsRrqdncDwGqjQLwI1o1u9i8va/Pv8H6Bfp Z1HcGrcMTNQhCVVdnaoJRxXiloTXzOFw1XEcNYP0VnIPkWBt0ZCHp5O8LDq7gVcQzRHo cBjA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=IGrlP+yB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h70si16532955pge.217.2022.02.01.12.45.37; Tue, 01 Feb 2022 12:45:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=IGrlP+yB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1380813AbiAaRPc (ORCPT + 99 others); Mon, 31 Jan 2022 12:15:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47112 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1380796AbiAaRPI (ORCPT ); Mon, 31 Jan 2022 12:15:08 -0500 Received: from mail-lf1-x130.google.com (mail-lf1-x130.google.com [IPv6:2a00:1450:4864:20::130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 05752C061757 for ; Mon, 31 Jan 2022 09:14:11 -0800 (PST) Received: by mail-lf1-x130.google.com with SMTP id f10so1766460lfu.8 for ; Mon, 31 Jan 2022 09:14:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=LyPS3Szlv9qh4gjOhtDn0yktNtrjffm5++UyP/rn3f4=; b=IGrlP+yBf6s4abW3h/cJFVuo4oT6B/rCIdFAATcNHSETF4LFNBRGHRQs5xKe0zeKvo OTDgTiAvO1hcP64cT6bGl/o/InpN6yki/Q09yOh48jad3z5kgvyM+g2XkwDoGy+eUwN5 eJS1HTB/9ONSUcIHEvpuGA4ZZH+VXrDCXnEnR/1kGiclbHgu1nz28DhGjRgkT7ZKUucT Tj5h7gMC94Lu9uAomUqxhJqGFG+JhwECDdChE/WeuOe8/g/fdCSU2hYWBcNGIKc3RfnN +hYN/OUYtpgZKNwvwLQQKwfMQMSXgPtpYn9sD0KyNPB3WE2innFHqvsSG4XF7hHfGfwK dJ4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=LyPS3Szlv9qh4gjOhtDn0yktNtrjffm5++UyP/rn3f4=; b=HcgSJqIf3FVbyWLNYGu2UT0oDC2iYf70Rw4RoXwfMkI1jnhfRqM/Y0VUeDhrccb9ZO n+U/WUCjBn59oOMzjWJlyiFTdK6D3Ces0Rk4NC07xm5mDUfEEaSctla2qGxcPeVv66CV iM9auwIB//MS15jw2gYUdcs9C14Owh9ILnNr363Zk56pfucXcASDr8ApSVBiTWLCNTGg JH/a0h5RZD39/COE9mslyyLOa6ZTgyqFpOJD6LrTQE73X8IxCeeGL742LzMSrhyWFMDF Ldi2q+Xr+m/Py0rr26uaxjI9b+msM5HUU9ESTvDADoZ40Yn9rA1zPMHPWr68z1qUm1Ay udfg== X-Gm-Message-State: AOAM532Pki6skpTzZjck3+ZAM6Bi7KIkOXfrIB/kMHEiza9fHTFinv0r zVfpeBbSxdJnLlJhzKZ/Q1d8mFoqAYuxf3uMuaGsyw== X-Received: by 2002:a19:ee04:: with SMTP id g4mr15639705lfb.157.1643649248869; Mon, 31 Jan 2022 09:14:08 -0800 (PST) MIME-Version: 1.0 References: <20220131153740.2396974-1-willy@infradead.org> <871r0nriy4.fsf@email.froward.int.ebiederm.org> <877dafq3bw.fsf@email.froward.int.ebiederm.org> In-Reply-To: From: Jann Horn Date: Mon, 31 Jan 2022 18:13:42 +0100 Message-ID: Subject: Re: [PATCH] binfmt_elf: Take the mmap lock when walking the VMA list To: Matthew Wilcox Cc: "Eric W. Biederman" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Alexander Viro , Denys Vlasenko , Kees Cook , Vlastimil Babka , "Liam R . Howlett" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 31, 2022 at 5:35 PM Matthew Wilcox wrote: > > On Mon, Jan 31, 2022 at 10:26:11AM -0600, Eric W. Biederman wrote: > > Matthew Wilcox writes: > > > > > On Mon, Jan 31, 2022 at 10:03:31AM -0600, Eric W. Biederman wrote: > > >> "Matthew Wilcox (Oracle)" writes: > > >> > > >> > I'm not sure if the VMA list can change under us, but dump_vma_snapshot() > > >> > is very careful to take the mmap_lock in write mode. We only need to > > >> > take it in read mode here as we do not care if the size of the stack > > >> > VMA changes underneath us. > > >> > > > >> > If it can be changed underneath us, this is a potential use-after-free > > >> > for a multithreaded process which is dumping core. > > >> > > >> The problem is not multi-threaded process so much as processes that > > >> share their mm. > > > > > > I don't understand the difference. I appreciate that another process can > > > get read access to an mm through, eg, /proc, but how can another process > > > (that isn't a thread of this process) modify the VMAs? > > > > There are a couple of ways. > > > > A classic way is a multi-threads process can call vfork, and the > > mm_struct is shared with the child until exec is called. > > While true, I thought the semantics of vfork() were that the parent > was suspended. Given that, it can't core dump until the child execs > ... right? > > > A process can do this more deliberately by forking a child using > > clone(CLONE_VM) and not including CLONE_THREAD. Supporting this case > > is a hold over from before CLONE_THREAD was supported in the kernel and > > such processes were used to simulate threads. The syscall clone() is kind of the generalized version of fork() and vfork(), and it lets you create fun combinations of these things (some of which might be useful, others which make little sense), and e.g. vfork() is basically just clone() with CLONE_VM (for sharing address space) plus CLONE_VFORK (block until child exits or execs) plus SIGCHLD (child should send SIGCHLD to parent when it terminates). (Some combinations are disallowed, but you can IIRC make things like threads with separate FD tables, or processes that share virtual memory and signal handler tables but aren't actual threads.) Note that until commit 0258b5fd7c71 ("coredump: Limit coredumps to a single thread group", first in 5.16), coredumps would always tear down every process that shares the MM before dumping, and the coredumping code was trying to rely on that to protect against concurrency. (Which at some point didn't actually work anymore, see below...) > That is a multithreaded process then! Maybe not in the strict POSIX > compliance sense, but the intent is to be a multithreaded process. > ie multiple threads of execution, sharing an address space. current_is_single_threaded() agrees with you. :P > > It also happens that there are subsystems in the kernel that do things > > like kthread_use_mm that can also be modifying the mm during a coredump. > > Yikes. That's terrifying. It's really legitimate for a kthread to > attach to a process and start tearing down VMAs? I don't know anything about kthreads doing this, but a fun way to remotely mess with VMAs is userfaultfd. See https://bugs.chromium.org/p/project-zero/issues/detail?id=1790 ("Issue 1790: Linux: missing locking between ELF coredump code and userfaultfd VMA modification") for the long version - but the gist is that userfaultfd can set/clear flags on virtual address ranges (implemented as flags on VMAs), and that may involve splitting VMAs (when the boundaries of the specified range don't correspond to existing VMA boundaries) or merging VMAs (when the flags of adjacent VMAs become equal). And userfaultfd can by design be used remotely on another process (so long as that process first created the userfaultfd and then handed it over). That's why I added that locking in the coredumping code. > > > Uhh .. that seems like it needs a lot more understanding of binfmt_elf > > > than I currently possess. I'd rather spend my time working on folios > > > than learning much more about binfmt_elf. I was just trying to fix an > > > assertion failure with the maple tree patches (we now assert that you're > > > holding a lock when walking the list of VMAs). > > > > Fair enough. I will put it on my list of things to address. > > Thanks. Now that I've disclosed it's a UAF, I hope you're able to > get to it soon. Otherwise we should put this band-aid in for now > and you can address it properly in the fullness of time.