Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp3448949pxk; Mon, 7 Sep 2020 13:22:10 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyH8owz4JPKjHZYBmxJVURapdfSDP5CpwSPEAH1trY39dHdSQzVsMrKCTqiJQjyA2YP7nyO X-Received: by 2002:a17:906:61b:: with SMTP id s27mr22375449ejb.176.1599510129894; Mon, 07 Sep 2020 13:22:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1599510129; cv=none; d=google.com; s=arc-20160816; b=wj4XyzkzB5c4+5eehuxXpn5N9czat3rqscz5dcTx/SBIY/NmuxmUrXtbwYLAfTF2xT j/o3VpHSzrWDbmAH7onJEQBs6VxtZVIx7rZ0wukwUHOlyuGK+e3gM43HzElyHCk37mVI AdmSR0Yof7Il87tOrfIatRaAvlG598lU72Q1DPIJ4h56Y5nlrxsCfY8erWmLbWFxe89g AZv3FPeiZst0X9WIuOQMvp7+WXEm+MO51GUrAyll/YC3LG1LHQHfWKAsoGRsW+OtZa6W CA/tsBO+ygGUYEF41AFnaaa8c4NEIntOCc5g0rWl7KjfxphUFqE5wLnNgl+aaqlcauCc fRLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=inMxwQX29BSC+4oeBGsVLU1kV+LT3EHiu4jvX4i6BUE=; b=UKv/Et5T68OIu1fh28dz30mIl5NOCybBckc5H97v4K/6GiTXthoXKQ7ENakMWS+3Lz 17UaWN8F6bBEgLJh6gXgzGfex2DgmjQUJgGXbwNSTeTiCf3L4DXnmWSdR8jlcfhfCtjd xbclU3jKhXe6rib12A9hP85WjkqMGmsglpvDl6QXhPZSMyUkZyTfdi0e1sfD9BG9tDWU Z9DNPaE8cvf3Adf0swQjLH5kmDs46WdZaO9QSq+jdipvdp0HyP+FOwkm7RvcFoTMv8cz WvK95JC+nNYD6fGgaq+aliA0BISqTSTP0EEUeIZmBoA8NNu3KFNNmoi8py6s4aOE8qBR pz5g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="GJi1QjE/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id yj16si4779294ejb.592.2020.09.07.13.21.46; Mon, 07 Sep 2020 13:22:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="GJi1QjE/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729434AbgIGUUm (ORCPT + 99 others); Mon, 7 Sep 2020 16:20:42 -0400 Received: from mail.kernel.org ([198.145.29.99]:39706 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729269AbgIGUUi (ORCPT ); Mon, 7 Sep 2020 16:20:38 -0400 Received: from mail-wr1-f41.google.com (mail-wr1-f41.google.com [209.85.221.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id BC4FA2177B for ; Mon, 7 Sep 2020 20:20:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1599510037; bh=uluieRHjASDijcIFo7gCF077dVcN0frnZrCkAriLSJ8=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=GJi1QjE/YfZNlkfZqazyPBchYHM89L/Ib8IKW/Eia/680VU0lYlG8ekUz/yktHgPk cu72Ky/O8tcJD9o6KawiqA0EVgH1KZECifj9Csvsh6KqFLhUTeo6fkIg7zyOIfxIhx AbYxFH1xbqJFjwkQBHg2Go2Razg0mEfL5JgDn6dI= Received: by mail-wr1-f41.google.com with SMTP id z4so16911598wrr.4 for ; Mon, 07 Sep 2020 13:20:36 -0700 (PDT) X-Gm-Message-State: AOAM533IytxxoLeigIePvoswRsXQ6PkK5YeLJisohvfakz4FuLxHSIHp C3u+DtgMx+zu151HrCmdfENXb3U7ti2Gd2LASZDApA== X-Received: by 2002:adf:db88:: with SMTP id u8mr23050764wri.184.1599510035328; Mon, 07 Sep 2020 13:20:35 -0700 (PDT) MIME-Version: 1.0 References: <20200907101522.zo6qzgp4qfzkz7cs@wittgenstein> <0639209E-B6C6-4F86-84F4-04B91E1CC8AA@amacapital.net> <20200907142510.klojh2urwyui23ox@wittgenstein> In-Reply-To: <20200907142510.klojh2urwyui23ox@wittgenstein> From: Andy Lutomirski Date: Mon, 7 Sep 2020 13:20:23 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v6 6/9] kernel: entry: Support Syscall User Dispatch for common syscall entry To: Christian Brauner Cc: Gabriel Krisman Bertazi , Andrew Lutomirski , Thomas Gleixner , Kees Cook , X86 ML , LKML , Linux API , Matthew Wilcox , "open list:KERNEL SELFTEST FRAMEWORK" , Shuah Khan , kernel@collabora.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Sep 7, 2020 at 7:25 AM Christian Brauner wrote: > > On Mon, Sep 07, 2020 at 07:15:52AM -0700, Andy Lutomirski wrote: > > > > > > > On Sep 7, 2020, at 3:15 AM, Christian Brauner wrote: > > > > > > =EF=BB=BFOn Fri, Sep 04, 2020 at 04:31:44PM -0400, Gabriel Krisman Be= rtazi wrote: > > >> Syscall User Dispatch (SUD) must take precedence over seccomp, since= the > > >> use case is emulation (it can be invoked with a different ABI) such = that > > >> seccomp filtering by syscall number doesn't make sense in the first > > >> place. In addition, either the syscall is dispatched back to usersp= ace, > > >> in which case there is no resource for seccomp to protect, or the > > > > > > Tbh, I'm torn here. I'm not a super clever attacker but it feels to m= e > > > that this is still at least a clever way to circumvent a seccomp > > > sandbox. > > > If I'd be confined by a seccomp profile that would cause me to be > > > SIGKILLed when I try do open() I could prctl() myself to do user > > > dispatch to prevent that from happening, no? > > > > > > > Not really, I think. The idea is that you didn=E2=80=99t actually do op= en(). > > You did a SYSCALL instruction which meant something else, and the > > syscall dispatch correctly prevented the kernel from misinterpreting > > it as open(). > > Right, for the case where you're e.g. emulating windows syscalls that's > true. I was thinking when you're running natively on Linux: couldn't I > first load a seccomp profile "kill me if someone does an open()", then > I exec() the target binary and that binary is setup to do > prctl(USER_DISPATCH) first thing. I guess, it's ok because as far as I > had time to read it this is a nothing or all mechanism, i.e. _all_ > system calls are re-routed in contrast to e.g. seccomp where I could do > this per-syscall. So for user-dispatch it wouldn't make sense to use it > on Linux per se. Still makes me a little uneasy. :) There's an escape hatch, so processes using this can still make syscalls. Maybe think about it another way: a process using user dispatch should definitely *not* trigger seccomp user notifiers, errno returns, or ptrace events, since they'll all do the wrong thing. IMO RET_KILL is the same. Barring some very severe defect, there's no way a program can use user dispatch to escape seccomp -- a program could use user dispatch to allow them to do: mov $__NR_open, %rax syscall without dying despite the presence of a filter that would kill the process if it tried to do open(), but this doesn't bypass the filter at all. The process could just as easily have done: mov $__NR_open jmp magic_stub(%rip) without tripping the filter, since no system call actually happens here. --Andy