Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp914298ybz; Wed, 29 Apr 2020 11:37:53 -0700 (PDT) X-Google-Smtp-Source: APiQypJXtZxCdo6MY9BGq45VSj+/K+CdSH1TXRZUCWrb6+ZYzU+iql0K2GHZWi+DJHV13ygGxwli X-Received: by 2002:a17:907:2049:: with SMTP id pg9mr4008684ejb.248.1588185473741; Wed, 29 Apr 2020 11:37:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588185473; cv=none; d=google.com; s=arc-20160816; b=IiGDYp6Uws4zqGjGaTGti3SltrsZT8/sEA2M6IfclANSOyRtaybL189ov06ICGqCP3 HhNUyNZFyLDI1ybMGnZWqnRYNYEsNRfPg/vJZU3ebuI5DKnDC/JRSJHK5kOtOEiatgOY Peg07Uw/ObzzJvTkwNb5D5D6QsEXK/ORmkkMwAVr4jG35kUasmtD3ej6eAgwpXj6hOh+ uWslGUfUhrf2ojqo4mksR//+aoBCmzgluIxeBtUiN3XY1GULfWa3DU+g28iVWqqw6X4d 2N4K7MxO1XlH2CnDoMpP6pkOtSsoSFz7FxHUxgpXdw6fsVLnUhO7qDCz7YlK3+jdUdAm j20w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=Cir/lv0gr0jqM/9RAZTjqohFxMF8rxZT7xpH/V9tx08=; b=N4hlyCU+OEUWLlWVPQDPvAPMGyOPFrSDPAi8B+ewbJPgnY3BYk/3Iv7+Xpzud0Ekc5 cU2Kb2GCdf55/RVsihaLk1QMtaoXDT96W21z95jS28t55wlWQ/Z4bRytCfmLj8q36DB4 xBBJZvkI4fXZUhM3UfBRN6MdHNa5dxDKgtb08LMxZBEH/4l1uEq30251I6s7ve+nvHT7 F3uXxx4k1+i3SpusuR6QBXeTg6gqWAmqBnokw9XzpbrGBfDwjTwqiKiiqpIrqpenlluA PT88z3lyPiA/C7mDEbv7UZCZbl05f/5b1wIUpO89t0FtUg+m15bv/LJKkLbC+1YyDFzX 57OQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=isI3J18W; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q25si4677207eja.500.2020.04.29.11.37.29; Wed, 29 Apr 2020 11:37:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=isI3J18W; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726862AbgD2Sdq (ORCPT + 99 others); Wed, 29 Apr 2020 14:33:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43324 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1726456AbgD2Sdp (ORCPT ); Wed, 29 Apr 2020 14:33:45 -0400 Received: from mail-lj1-x241.google.com (mail-lj1-x241.google.com [IPv6:2a00:1450:4864:20::241]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 49FA2C03C1AE for ; Wed, 29 Apr 2020 11:33:44 -0700 (PDT) Received: by mail-lj1-x241.google.com with SMTP id e25so3723453ljg.5 for ; Wed, 29 Apr 2020 11:33:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Cir/lv0gr0jqM/9RAZTjqohFxMF8rxZT7xpH/V9tx08=; b=isI3J18WjpIGZO4EXpJhkASI5rqAKW4KQeH7U6lgW2hDd71iIRmx579BvS8WKq5wit ebHIJbs3dginlnOGzEEUmHreOZITcmFxtsu81uL7od9yLYAZNfYj/a0f7Rp7X0goJ8OC hUa+fZtmO/xxg60NMMjDnUFhUglozu3dIHPLUTtt8JrLx1aSmwre3S5s88HWRqTZn8XI G8WEkD2YV3eHdmEZC6DKfCv09PuFykTpuWi2e1FttPMu/kIpZ3Vu+jL8h2E+0NCmik0r IlCztZC/HZk0MNEiBJRYIoylrFd8dfFJigXEoIBI8DEVBmkp/9FE+zo6aHbHvDvrYAhO U6Wg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Cir/lv0gr0jqM/9RAZTjqohFxMF8rxZT7xpH/V9tx08=; b=aMJZCVhpGTL3xBhdgDKa2+IQ+IpqJ530+5LAXko03PfRBjbAfnzdv8A8BCIoTzedkj Wm9MdpEKDsazqBQJdZV6FBs6EAJP/BigNT09rf96Af1/oXsN9dfGUEEusRWEFB5ls+Bo 0PWzSb+60IKkwrBVWpJRftxVPnIDILDolE2OdpxtGWMqe7RZAfzdD6CAspCqsWlOzfjo Wk3Le+EuRzgMy3I8ltrNeBTG0q/EjgloMLIhcZAh/JzplMvgvvq0OhEmvpRqvDowWyXV d4ppJev/t/8wvJQuRU447mcAK2+NGyACmfvglU1EGXxKSyQ/WlDi3HIMIgjH1yr+AD0p R2rQ== X-Gm-Message-State: AGi0Pua0VOSewX5Ed0EROlIft+LB3EJXwTEJdUY/hlslg2DkraxNjl+r qpwclwEVmDveBHClpcfOnOfSUNufBZP1iYi038uznA== X-Received: by 2002:a2e:8999:: with SMTP id c25mr22226153lji.73.1588185222389; Wed, 29 Apr 2020 11:33:42 -0700 (PDT) MIME-Version: 1.0 References: <87imi8nzlw.fsf@x220.int.ebiederm.org> <20200411182043.GA3136@redhat.com> <20200412195049.GA23824@redhat.com> <20200428190836.GC29960@redhat.com> In-Reply-To: From: Jann Horn Date: Wed, 29 Apr 2020 20:33:14 +0200 Message-ID: Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1 To: Linus Torvalds Cc: Oleg Nesterov , Bernd Edlinger , "Eric W. Biederman" , Waiman Long , Ingo Molnar , Will Deacon , Linux Kernel Mailing List , Alexey Gladkov Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 29, 2020 at 7:58 PM Linus Torvalds wrote: > On Tue, Apr 28, 2020 at 4:36 PM Jann Horn wrote: > > > > On Wed, Apr 29, 2020 at 12:14 AM Linus Torvalds > > wrote: > > > > > > - we move check_unsafe_exec() down. As far as I can tell, there's no > > > reason it's that early - the flags it sets aren't actually used until > > > when we actually do that final set_creds.. > > > > Right, we should be able to do that stuff quite a bit later than it happens now. > > Actually, looking at it, this looks painful for multiple reasons. > > The LSM_UNSAFE_xyz flags are used by security_bprm_set_creds(), which > when I traced it through, happened much earlier than I thought. Making > things worse, it's done by prepare_binprm(), which also potentially > gets called from random points by the low-level binfmt handlers too. Yeah, but all of that happens before we actually need to do anything with the accumulated credential information from the prepare_binprm() calls. We can probably move the unsafe calculation and a new LSM hook into flush_old_exec(), right before de_thread(). > And we also have that odd "fs->in_exec" flag, which is used by thread > cloning and io_uring, and I'm not sure what the exact semantics are. The idea is to ensure that once we're through check_unsafe_exec() and have computed our LSM_UNSAFE_* flags, another thread that's still running must not be able to fork() off a child with CLONE_FS, because having an fs_struct that's shared with anything other than sibling threads (which will be killed off) is supposed to only be possible if LSM_UNSAFE_SHARE is set. So: If check_unsafe_exec() can match each reference in the refcount ->fs->users with a reference from a sibling thread (iow the fs_struct is not currently shared with another task), it sets p->fs->in_exec. If another thread tries to clone(CLONE_FS) while we're in execve(), copy_fs() will throw -EAGAIN. And if io_uring tries to grab a reference to the fs_struct with the intent to use it on a kernel worker thread (which conceptually is kinda similar to the clone(CLONE_FS) case), that also aborts. And then at the end of execve(), we clear the ->fs->in_exec flag again. So this should work fine as long as we ensure that we can't have two threads from the same process going through execve concurrently. (Or if we actually want to support that, we could make ->in_exec a counter instead of a flag, but really, preventing concurrent execve()s from a multithreaded process seems saner...) > I'm _almost_ inclined to say that we should just abort the execve() > entirely if somebody tries to attach in the middle. > > IOW, get rid of the locking, and replace it all just with a sequence > count. Make execve() abort if the sequence count has changed between > loading the original creds, and having installed the new creds. > > You can ptrace _over_ an execve, and you can ptrace _after_ an > execve(), but trying to attach just as we execve() would just cause > the execve() to fail. > > We could maybe make it conditional on the credentials actually having > changed at all (set another flag in bprm_fill_uid()). So it would only > fail for the suid exec case. > > Because honestly, trying to ptrace in the middle of a suid execve() > sounds like an attack, not a useful thing. > > That sequence count approach would be a much simpler change. In that model, what should happen if someone tries to attach to a process that's in execve(), but after the point of no return in de_thread()? "Abort" after the point of no return normally means force_sigsegv(), right?