Received: by 2002:a05:6a10:9e8c:0:0:0:0 with SMTP id y12csp797010pxx; Tue, 27 Oct 2020 00:00:52 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx/OMoTTlGCd1rXQgJaWCNFHW7u5dVO2GCUKI9Mh32iO8eKSpEMhW/svlVFCrIP/VYz883H X-Received: by 2002:a17:906:4e16:: with SMTP id z22mr895753eju.527.1603782052141; Tue, 27 Oct 2020 00:00:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1603782052; cv=none; d=google.com; s=arc-20160816; b=mTN7BR1J7saIib7B5wBzSE51AiLKpGaX8+z+bznwHwhuWjnW+uuRDDms47ViHfYs+k 9YTlg7DO1gs/ENtRIvmikuVWYIkSq4Ai+1zo4DSnU1rI8rsNoKt9nttGlXKtur/6eZv/ GCDuMib8EDdz3ADdoOnXv3tbKz5QTSRAj3PVHkWhM61Yseim4nkV+pr1WRgCD8ZZJ/vH RUlzjqNWP4cB9ZPPIOBS9tty+IcaAwfiMPRfWQAL63K0mHpUGFEgN3QczUd9x3Qne9E4 6GfJEFcbDYYkvGF0HBmTWTOto5AnuwJuvBTEtPpF1zaE7Vld2Y14pTCSDRNzdMh6lEsw b9qA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:references:mime-version :message-id:in-reply-to:date:sender:dkim-signature; bh=7bRLqN/4G8Vg/p+SruLvrAF5HknNrILrTJkN47+Yh/M=; b=O8sqBnH3idnJvt7aprA2vu1WMo2XIryx8UOpEmYmSprWamPEICPC/AO/sbz3iRzx8g 5pHkOrFmtwIY3gyc51ZFLzJOGQKhe+F+2e2QT/0kj5Xb1upfEOOgydXCYzx8Db73kCYS 2Z/umwt1/68UkiOcA76bDFzqnt+yzD5NETknDgjmLmitXJ5fVfJET6nEZgB8LT4R+r/q 5+0STN3Pehe29cX6seAYXhU/uEVaxhOPUl+mSeFhMtKUoYbW4VkVguyIGGwAMOhu17di XCD4skCDD3PJMwq2GZu6udYwo4xW2zqWQA1agEBqzQ6C2kiFQkI4o3IXYQVuOj/Nyf1i 3zaQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=qUQ0TpNV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bs2si243216edb.384.2020.10.27.00.00.30; Tue, 27 Oct 2020 00:00:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=qUQ0TpNV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729190AbgJZVBG (ORCPT + 99 others); Mon, 26 Oct 2020 17:01:06 -0400 Received: from mail-yb1-f201.google.com ([209.85.219.201]:38928 "EHLO mail-yb1-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729156AbgJZVBE (ORCPT ); Mon, 26 Oct 2020 17:01:04 -0400 Received: by mail-yb1-f201.google.com with SMTP id j19so5297782ybg.6 for ; Mon, 26 Oct 2020 14:01:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=7bRLqN/4G8Vg/p+SruLvrAF5HknNrILrTJkN47+Yh/M=; b=qUQ0TpNVF/6695W1cT5dGQhemHPYRPVve802NdyObcUXlPd6c3stDHVd1mIZ0R9eYJ I2UHv9LxFCZhP1FQrl2GaUQ/WqVYVjm78mGGggW8O/ByiGHopRQQsdKs5sTtwtcCkp46 MOk84ZlRjr83ZZ9k87sskBOD2HG942PvsYUVgI7IQeuJWQI2h2KstlF/Pv4h1vYubIcT MQkOFXtVZGdnGk54/ur9rHHGX/e6t3nK7EPjyfMP/jNDHvB5B6/aHlor6ouyhvhDrqRj sRAZM7Z0rqozcZl7nJIydgV4bijKcNwphJY5Y6H4rlZ8nMiuq7P1Os+JIxxLJ370EIdB XBmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=7bRLqN/4G8Vg/p+SruLvrAF5HknNrILrTJkN47+Yh/M=; b=Zq0Umo/lIv38lcoo804bfvFbzQaaNEcjDRqeEVZobjl0NM5OBQ8Ln0boYfmibk1kUg TkrrzjgVlfkwxtsJjcxSV9qDGPXDzEKj/+EUDhHIshAmecp6y6GsGy8qxXrh0oBe0SML u4H9igMEqj3HjywEKS9FsyD2uDUwXAsIsTj3J27MTyqKPMbK3EcKOKNtKf/7tqh/sJtE xUx4H4Y3uSK8EeqHltcYbMA30ZAnREIutpA9+9mrWVt/2Svwaqn7dvdF6mf1cUZeVRZC guhMVWNCIqnpS/MEkPS1PwYOxH9wcxJooMwzrcYqQ4lnSycYQAp0MyFC223tCS7q1taS coag== X-Gm-Message-State: AOAM533KvNxIBNn1g3XwY/Yj8gMvt9bgtYOUq2r74u03hS5HTQeVYlTS 9i5cJozxnYbSF/f696I4AGOKCEWOV3jixYPH7Q== Sender: "lokeshgidra via sendgmr" X-Received: from lg.mtv.corp.google.com ([2620:15c:211:202:f693:9fff:fef4:29dd]) (user=lokeshgidra job=sendgmr) by 2002:a25:1c1:: with SMTP id 184mr26679062ybb.243.1603746062678; Mon, 26 Oct 2020 14:01:02 -0700 (PDT) Date: Mon, 26 Oct 2020 14:00:52 -0700 In-Reply-To: <20201026210052.3775167-1-lokeshgidra@google.com> Message-Id: <20201026210052.3775167-3-lokeshgidra@google.com> Mime-Version: 1.0 References: <20201026210052.3775167-1-lokeshgidra@google.com> X-Mailer: git-send-email 2.29.0.rc1.297.gfa9743e501-goog Subject: [PATCH v6 2/2] Add user-mode only option to unprivileged_userfaultfd sysctl knob From: Lokesh Gidra To: Kees Cook , Jonathan Corbet , Peter Xu , Andrea Arcangeli , Sebastian Andrzej Siewior , Andrew Morton Cc: Alexander Viro , Stephen Smalley , Eric Biggers , Lokesh Gidra , Daniel Colascione , "Joel Fernandes (Google)" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kaleshsingh@google.com, calin@google.com, surenb@google.com, nnk@google.com, jeffv@google.com, kernel-team@android.com, Mike Rapoport , Shaohua Li , Jerome Glisse , Mauro Carvalho Chehab , Johannes Weiner , Mel Gorman , Nitin Gupta , Vlastimil Babka , Iurii Zaikin , Luis Chamberlain Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org With this change, when the knob is set to 0, it allows unprivileged users to call userfaultfd, like when it is set to 1, but with the restriction that page faults from only user-mode can be handled. In this mode, an unprivileged user (without SYS_CAP_PTRACE capability) must pass UFFD_USER_MODE_ONLY to userfaultd or the API will fail with EPERM. This enables administrators to reduce the likelihood that an attacker with access to userfaultfd can delay faulting kernel code to widen timing windows for other exploits. The default value of this knob is changed to 0. This is required for correct functioning of pipe mutex. However, this will fail postcopy live migration, which will be unnoticeable to the VM guests. To avoid this, set 'vm.userfault = 1' in /sys/sysctl.conf. The main reason this change is desirable as in the short term is that the Android userland will behave as with the sysctl set to zero. So without this commit, any Linux binary using userfaultfd to manage its memory would behave differently if run within the Android userland. For more details, refer to Andrea's reply [1]. [1] https://lore.kernel.org/lkml/20200904033438.GI9411@redhat.com/ Signed-off-by: Lokesh Gidra Reviewed-by: Andrea Arcangeli --- Documentation/admin-guide/sysctl/vm.rst | 15 ++++++++++----- fs/userfaultfd.c | 10 ++++++++-- 2 files changed, 18 insertions(+), 7 deletions(-) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index f455fa00c00f..d06a98b2a4e7 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -873,12 +873,17 @@ file-backed pages is less than the high watermark in a zone. unprivileged_userfaultfd ======================== -This flag controls whether unprivileged users can use the userfaultfd -system calls. Set this to 1 to allow unprivileged users to use the -userfaultfd system calls, or set this to 0 to restrict userfaultfd to only -privileged users (with SYS_CAP_PTRACE capability). +This flag controls the mode in which unprivileged users can use the +userfaultfd system calls. Set this to 0 to restrict unprivileged users +to handle page faults in user mode only. In this case, users without +SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for userfaultfd to +succeed. Prohibiting use of userfaultfd for handling faults from kernel +mode may make certain vulnerabilities more difficult to exploit. -The default value is 1. +Set this to 1 to allow unprivileged users to use the userfaultfd system +calls without any restrictions. + +The default value is 0. user_reserve_kbytes diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 605599fde015..894cc28142e7 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -28,7 +28,7 @@ #include #include -int sysctl_unprivileged_userfaultfd __read_mostly = 1; +int sysctl_unprivileged_userfaultfd __read_mostly; static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly; @@ -1966,8 +1966,14 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) struct userfaultfd_ctx *ctx; int fd; - if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE)) + if (!sysctl_unprivileged_userfaultfd && + (flags & UFFD_USER_MODE_ONLY) == 0 && + !capable(CAP_SYS_PTRACE)) { + printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd " + "sysctl knob to 1 if kernel faults must be handled " + "without obtaining CAP_SYS_PTRACE capability\n"); return -EPERM; + } BUG_ON(!current->mm); -- 2.29.0.rc1.297.gfa9743e501-goog