Received: by 2002:a25:e74b:0:0:0:0:0 with SMTP id e72csp1365601ybh; Mon, 13 Jul 2020 17:17:02 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyxSRqSnKMq6DCdymA4Fw0gTk1WpI+F1Fy6+M/T0rR8LbQ7TknFuz9B/5JS95l6xZZXnp4e X-Received: by 2002:a17:906:743:: with SMTP id z3mr2130383ejb.216.1594685822250; Mon, 13 Jul 2020 17:17:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1594685822; cv=none; d=google.com; s=arc-20160816; b=WPsyRTu0Q6NqHmnkK5vgjuhVOmlOrU1FMNGL2Okk3xa1JI5Zumak2b75whQv3pKhfn IA5dGYRvDM39eBJFmIF/aqYuFCkRI1j90Txj0hXbfeuYeTYbMpyWilEO8sE+FS/ben6s dTV9/17Em7RVkY2rwdxsF2y/yhpDjTFOHvu0G9fhBp4GUZmaB+Ynq2KgpfnXb1LeyapD CbzGRbVbGyMoXBerVG56oQtODMhXiGVhHYpSt04YZGeY6xbwtGd5Csq3u3uFV4tgWwnP rQ8BHhM3U/fl5x3XAoj+y5BRRJDmjTsAE7DYXaYqbSDBE+J/urwlfoOc8T1rJd8Nc366 qYMg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=cDAVMiAV9Cf2sMWMCcN1ErVnmUCkKb59fUbYq+8rtyg=; b=iYz8yZoMmTapK4agLFiSl2u8B9/6OSKzKth9PWlfFqySmuoTZjrOV2tgOgK9f77eI4 hp2lJ4o1oKKUOUe5fTdarlNi5J/D6SRxbU5e0nZwsiDwcEwLYc9zvPtEenkSWJpVgLF1 msYfI3LDdI2oDFB/B3X+jLzpyH7VwLk2x+nzlGkABkWyITAQz4hYl/ON6Hm3LtyVIhSR js1CBD36fYxXOASdxASefUoYp68MhZ/KhK3z7fRbyBDVY0V4JRsaFAcQTQyhy0x5wNTb rC+HesswSRFGu3vvfGm9A59kXw7doOSV9MOvmGHrLm56nri3fF9ihdsmMR30NiEF9cEd MFPQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=eTKb1GkA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g7si9396871ejr.748.2020.07.13.17.16.39; Mon, 13 Jul 2020 17:17:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=eTKb1GkA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726345AbgGNAQG (ORCPT + 99 others); Mon, 13 Jul 2020 20:16:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35580 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726356AbgGNAQG (ORCPT ); Mon, 13 Jul 2020 20:16:06 -0400 Received: from mail-pg1-x544.google.com (mail-pg1-x544.google.com [IPv6:2607:f8b0:4864:20::544]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 50BDFC061794 for ; Mon, 13 Jul 2020 17:16:06 -0700 (PDT) Received: by mail-pg1-x544.google.com with SMTP id g67so6780670pgc.8 for ; Mon, 13 Jul 2020 17:16:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=cDAVMiAV9Cf2sMWMCcN1ErVnmUCkKb59fUbYq+8rtyg=; b=eTKb1GkAhSQsJaOrZAoKTPiCrHpSdAdM/ujzzqP/yT1BAdpe49gHv4/LXShcE0Kkqv V2wHXdoiptBRUlGNpdWRA8baRPnJz2z5xo6Aick4R59c72qrU1f+SHVaqFkKpfWKRyvc CUe19gqaNV9466MwwX6l7vQDvSVPNkyu/LtRE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=cDAVMiAV9Cf2sMWMCcN1ErVnmUCkKb59fUbYq+8rtyg=; b=DX7W2kfDqJibSyxxZiEsJ3m1HUG2ooTO95d+2o07hcQsNvySQu3hd9BqarqLMZ1NIs 1VtEt3SId+Ux1ShBMQiYZlqlR9Z1Tkm83k4zmcmhn3SNOcf+P8fhoHg+/dKTW4gabksB uR6VtOavVZi7NQSntNYWaSQuie2Sd0wNvf/svRzL8RioOPwLZUkY/EReKl6GKjZQoxqr Nrp6ne4y2B/U/HguWy0KgM2e3OxwUfrd6zpo3EpqfQFBy8MqiCYHm+gTo6v9bv+heHMC PIx1bHa91QUdbrAIQPSVo33E7ziUDwQtsxPPP3IjPSw+4u9yefkDcK7kS7rNeMQYevCs ey7w== X-Gm-Message-State: AOAM532XZ/rmWxDq8bXLKeUYUHMbVYTQ3NSrqR9J3A9v0qq9W4Fy9GVa 4CuqBZwNHdGJo9DCkK5LXTLDiQ== X-Received: by 2002:a63:e045:: with SMTP id n5mr1408753pgj.274.1594685765748; Mon, 13 Jul 2020 17:16:05 -0700 (PDT) Received: from www.outflux.net (smtp.outflux.net. [198.145.64.163]) by smtp.gmail.com with ESMTPSA id s68sm625948pjb.38.2020.07.13.17.16.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Jul 2020 17:16:04 -0700 (PDT) Date: Mon, 13 Jul 2020 17:16:03 -0700 From: Kees Cook To: Gabriel Krisman Bertazi , willy@infradead.org, luto@kernel.org Cc: tglx@linutronix.de, linux-kernel@vger.kernel.org, kernel@collabora.com, gofmanp@gmail.com, x86@kernel.org, linux-kselftest@vger.kernel.org, shuah@kernel.org Subject: Re: [PATCH v3 1/2] kernel: Implement selective syscall userspace redirection Message-ID: <202007131715.5598282C@keescook> References: <20200712044516.2347844-1-krisman@collabora.com> <20200712044516.2347844-2-krisman@collabora.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200712044516.2347844-2-krisman@collabora.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Jul 12, 2020 at 12:45:15AM -0400, Gabriel Krisman Bertazi wrote: > Introduce a mechanism to quickly disable/enable syscall handling for a > specific process and redirect to userspace via SIGSYS. This is useful > for processes with parts that require syscall redirection and parts that > don't, but who need to perform this boundary crossing really fast, > without paying the cost of a system call to reconfigure syscall handling > on each boundary transition. This is particularly important for Windows > games running over Wine. > > The proposed interface looks like this: > > prctl(PR_SET_SYSCALL_USER_DISPATCH, , , , [selector]) > > The range [,] is a part of the process memory map > that is allowed to by-pass the redirection code and dispatch syscalls > directly, such that in fast paths a process doesn't need to disable the > trap nor the kernel has to check the selector. This is essential to > return from SIGSYS to a blocked area without triggering another SIGSYS > from rt_sigreturn. > > selector is an optional pointer to a char-sized userspace memory region > that has a key switch for the mechanism. This key switch is set to > either PR_SYS_DISPATCH_ON, PR_SYS_DISPATCH_OFF to enable and disable the > redirection without calling the kernel. > > The feature is meant to be set per-thread and it is disabled on > fork/clone/execv. > > Internally, this doesn't add overhead to the syscall hot path, and it > requires very little per-architecture support. I avoided using seccomp, > even though it duplicates some functionality, due to previous feedback > that maybe it shouldn't mix with seccomp since it is not a security > mechanism. And obviously, this should never be considered a security > mechanism, since any part of the program can by-pass it by using the > syscall dispatcher. > > For the sysinfo benchmark, which measures the overhead added to > executing a native syscall that doesn't require interception, the > overhead using only the direct dispatcher region to issue syscalls is > pretty much irrelevant. The overhead of using the selector goes around > 40ns for a native (unredirected) syscall in my system, and it is (as > expected) dominated by the supervisor-mode user-address access. In > fact, with SMAP off, the overhead is consistently less than 5ns on my > test box. > > Right now, it is only supported by x86_64 and x86, but it should be > easily enabled for other architectures. > > An example code using this interface can be found at: > https://gitlab.collabora.com/krisman/syscall-disable-personality > > Changes since v2: > (Matthew Wilcox suggestions) > - Drop __user on non-ptr type. > - Move #define closer to similar defs > - Allow a memory region that can dispatch directly > (Kees Cook suggestions) > - Improve kconfig summary line > - Move flag cleanup on execve to begin_new_exec > - Hint branch predictor in the syscall path > (Me) > - Convert selector to char > > Changes since RFC: > (Kees Cook suggestions) > - Don't mention personality while explaining the feature > - Use syscall_get_nr > - Remove header guard on several places > - Convert WARN_ON to WARN_ON_ONCE > - Explicit check for state values > - Rename to syscall user dispatcher > > Cc: Matthew Wilcox > Cc: Andy Lutomirski > Cc: Paul Gofman > Cc: Kees Cook > Signed-off-by: Gabriel Krisman Bertazi I think this looks great. :) Reviewed-by: Kees Cook Any other folks able to look through it? -- Kees Cook