Received: by 2002:a05:6a10:9e8c:0:0:0:0 with SMTP id y12csp3331661pxx; Mon, 2 Nov 2020 06:17:32 -0800 (PST) X-Google-Smtp-Source: ABdhPJwNPwxxf1VUMpYDFE2dnMa94H7e2fArxItzwFs5EKsjOHG/a4LafdgtKPJp1WXRz7gPQ1WN X-Received: by 2002:a17:906:a00c:: with SMTP id p12mr7273626ejy.249.1604326652157; Mon, 02 Nov 2020 06:17:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1604326652; cv=none; d=google.com; s=arc-20160816; b=AbalbD+hKyG6y+mtI7+TPwqcprYJU0BtFlL5lUqhl5nfAs8W1t3C8EoFSNkvjjuHde gBdV8CctnaOGqjraVtHY6Fx3T2ZzrCzFK0qyNhk0b+TPn02QVn6/QPMWwNMAzsM+mcra E+YvMFfMbHTRwgCVlFXv7I3+rI4DA+uZzusrRV/QN1Q/+Ih6NCDIVLEBXoKAaJGPXh/a fRCUPB93eIQZzMR0BMR6HN9dUOCw2ASieuvq65WaiY6TgyfIDXm4x74ohks6rNIT/5wy pt2Am+bDWtS16ECUAjv+MxItLlt8IEt3KjJxbyPvIlW2YHtw7W9D4uPR8mBWWmMEGE44 ZRFw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=UWSN6pUmUHVvmlI9O9ME30to/SMhrDIMong6ljJZfek=; b=mm64tKldjPzCmow7RtAre2/itA9x+X/x/LABE2HwQSn10/ZtHcpJuaM7RCrYp8xr/A q1O4o52afs/qmkf3+kK+yJx/ulLhPXpCRUOmhz0YfAxQUJnLlAzo17/YgQoxBrPsD68l pmTi+PfHOA7NI+cZTMuGwbm0BCuc0EEY/HCQRkKzfYHvdak03+7ExZMhSU5BXjoGARJA 1TOAFyWr+KtfU4N8/eiwSUev6JnFMkj6TCgYZ+ZK50hpOWNbXQLyaqDXxkDTqdBoUWoq CXsEG/HxvSO3sPcs7P2o4G7yP9b+aZEum/s7UBk6afYOmAIU87RbTwnj8KBfsxmE5mhG IDOQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=mH8M9jps; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g7si10499001ejd.530.2020.11.02.06.17.08; Mon, 02 Nov 2020 06:17:32 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=mH8M9jps; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726025AbgKBOOE (ORCPT + 99 others); Mon, 2 Nov 2020 09:14:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46266 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725768AbgKBOOC (ORCPT ); Mon, 2 Nov 2020 09:14:02 -0500 Received: from mail-lj1-x243.google.com (mail-lj1-x243.google.com [IPv6:2a00:1450:4864:20::243]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E822DC061A04 for ; Mon, 2 Nov 2020 06:14:00 -0800 (PST) Received: by mail-lj1-x243.google.com with SMTP id 2so15095775ljj.13 for ; Mon, 02 Nov 2020 06:14:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=UWSN6pUmUHVvmlI9O9ME30to/SMhrDIMong6ljJZfek=; b=mH8M9jpsGa3f9fcWJHPca/931KX2U+XkxbOX64TG8Zhi5C/ZcESSCqI25qknkzRBLO noKBzIlTe+kkzgJhAG7wFIWJHKb+7Biel9RAR2wvFwoQ8awGXak5DIOL/gSrErk8V9Le vlO7D9AMDRl3i5YY711AIk6T0Y/LyvuwzQGcNyl4w01onM/Gmvo0d0ZnirRD4P47OtY2 tzxcRqUU+MRRVpIAht7p86HHVpCbRPuFXM6Wp6qrioc/0YN/b7/4RkUf4BGA22vRsC28 ayKlOnSmhsPnPgbQkfW35DjULd5kYMYQ9XAPr8m5Wvyv0vFY8DY7k6OmPeCE5GBKo4Wr ts3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=UWSN6pUmUHVvmlI9O9ME30to/SMhrDIMong6ljJZfek=; b=AH0yjyh+3l3Mvlej4YFLUyd6/KQHi/t8x4zQOC7kjGAX3zw6t2ih/lDNF6K1WKn/Ex Iqc5GJ9ianiYrPuG3zn+XqRCIchRG9NvW5DmX7cOloAzgJ/d0eEuTDEx9m+QfmvEV3At 3Kf/uZahglaX9mb+m2bpmd4hkSeBOHJzpg7aizxmvUUI2NzoqxqESNSzElgx8e+7kOSW i8rysVnj934MtLU8qKyFgqiCTEL303Cy4WxKMMdT4e/+R21eIy/Adf0Ncb6FWamoxr1U QK996R0/IvOSxq+Z8M4EjJfCO3+2UUnYakS+Au4Ls8CLJ/RndzVIagUlHRTX9B6tawTL ADEw== X-Gm-Message-State: AOAM532xW6cdRSbWgaC5mb2xZ+d1weHLowGt4ec6SlKTjo5nZoUBhfnC IpQQauuxRS1v0IRlTULoVoGRV0upMU5ABTlWNb2wWw== X-Received: by 2002:a2e:b888:: with SMTP id r8mr6414174ljp.138.1604326439180; Mon, 02 Nov 2020 06:13:59 -0800 (PST) MIME-Version: 1.0 References: <63598b4f-6ce3-5a11-4552-cdfe308f68e4@gmail.com> <93cfdc79-4c48-bceb-3620-4c63e9f4822e@gmail.com> In-Reply-To: From: Jann Horn Date: Mon, 2 Nov 2020 15:13:31 +0100 Message-ID: Subject: Re: For review: seccomp_user_notif(2) manual page [v2] To: "Michael Kerrisk (man-pages)" Cc: Kees Cook , Tycho Andersen , Sargun Dhillon , Christian Brauner , Daniel Borkmann , Giuseppe Scrivano , Song Liu , Robert Sesek , Containers , linux-man , lkml , Aleksa Sarai , Alexei Starovoitov , Will Drewry , bpf , Andy Lutomirski Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Oct 31, 2020 at 9:51 AM Michael Kerrisk (man-pages) wrote: > On 10/30/20 8:20 PM, Jann Horn wrote: > > On Thu, Oct 29, 2020 at 8:14 PM Michael Kerrisk (man-pages) > > wrote: > >> On 10/29/20 2:42 AM, Jann Horn wrote: > >>> As discussed at > >>> , > >>> we need to re-check checkNotificationIdIsValid() after reading remote > >>> memory but before using the read value in any way. Otherwise, the > >>> syscall could in the meantime get interrupted by a signal handler, the > >>> signal handler could return, and then the function that performed the > >>> syscall could free() allocations or return (thereby freeing buffers on > >>> the stack). > >>> > >>> In essence, this pread() is (unavoidably) a potential use-after-free > >>> read; and to make that not have any security impact, we need to check > >>> whether UAF read occurred before using the read value. This should > >>> probably be called out elsewhere in the manpage, too... > >>> > >>> Now, of course, **reading** is the easy case. The difficult case is if > >>> we have to **write** to the remote process... because then we can't > >>> play games like that. If we write data to a freed pointer, we're > >>> screwed, that's it. (And for somewhat unrelated bonus fun, consider > >>> that /proc/$pid/mem is originally intended for process debugging, > >>> including installing breakpoints, and will therefore happily write > >>> over "readonly" private mappings, such as typical mappings of > >>> executable code.) > >>> > >>> So, uuuuh... I guess if anyone wants to actually write memory back to > >>> the target process, we'd better come up with some dedicated API for > >>> that, using an ioctl on the seccomp fd that magically freezes the > >>> target process inside the syscall while writing to its memory, or > >>> something like that? And until then, the manpage should have a big fat > >>> warning that writing to the target's memory is simply not possible > >>> (safely). > >> > >> Thank you for your very clear explanation! It turned out to be > >> trivially easy to demonstrate this issue with a slightly modified > >> version of my program. > >> > >> As well as the change to the code example that I already mentioned > >> my reply of a few hours ago, I've added the following text to the > >> page: > >> > >> Caveats regarding the use of /proc/[tid]/mem > >> The discussion above noted the need to use the > >> SECCOMP_IOCTL_NOTIF_ID_VALID ioctl(2) when opening the > >> /proc/[tid]/mem file of the target to avoid the possibility of > >> accessing the memory of the wrong process in the event that the > >> target terminates and its ID is recycled by another (unrelated) > >> thread. However, the use of this ioctl(2) operation is also > >> necessary in other situations, as explained in the following > >> pargraphs. > > > > (nit: paragraphs) > > I spotted that one also already. But thanks for reading carefully! > > >> Consider the following scenario, where the supervisor tries to > >> read the pathname argument of a target's blocked mount(2) system > >> call: > > [...] > >> Seem okay? > > > > Yeah, sounds good. > > > >> By the way, is there any analogous kind of issue concerning > >> pidfd_getfd()? I'm thinking not, but I wonder if I've missed > >> something. > > > > When it is used by a seccomp supervisor, you mean? I think basically > > the same thing applies - when resource identifiers (such as memory > > addresses or file descriptors) are passed to a syscall, it generally > > has to be assumed that those identifiers may become invalid and be > > reused as soon as the syscall has returned. > > I probably needed to be more explicit. Would the following (i.e., a > single cookie check) not be sufficient to handle the above scenario. > Here, the target is making a syscall a system call that employs the > file descriptor 'tfd': > > T: makes syscall that triggers notification > S: Get notification > S: pidfd = pidfd_open(T, 0); > S: sfd = pifd_getfd(pidfd, tfd, 0) > S: check that the cookie is still valid > S: do operation with sfd [*] > > By contrast, I can see that we might want to do multiple cookie > checks in the /proc/PID/mem case, since the supervisor might do > multiple reads. Aaah, okay. I didn't really understand the question at first. > Or, do you mean: there really needs to be another cookie check after > the point [*], since, if the the target's syscall was interrupted > and 'tfd' was closed/resused, then the supervisor would be operating > with a file descriptor that refers to an open file description > (a "struct file") that is no longer meaningful in the target? > (Thinking about it, I think this probably is what you mean, but > I want to confirm.) I wasn't thinking about your actual question when I wrote that. :P I think you could argue that leaving out the first cookie check does not make this incorrect if it was correct before; but you could also argue that it's hazardous either way (because programs might rely on synchronous actions that happen when closing an fd that they assume is the only one associated with a file description, e.g. assuming that close() will synchronously release an flock() lock). And if we do two checks, we can at least limit such potentially hazardous interference to processes that performed syscalls subject to interception, instead of risking triggering them all over the place.