Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp95715pxk; Wed, 23 Sep 2020 23:58:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxhI2bYMM3nwQhUQ4TYDja2K9bwXu3rk1Sbqf4EHeUiK4d0ZaAUh2nN9jEIQljD/9IzuFm/ X-Received: by 2002:a17:906:7248:: with SMTP id n8mr3147806ejk.160.1600930738034; Wed, 23 Sep 2020 23:58:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600930738; cv=none; d=google.com; s=arc-20160816; b=EM+3HA2YBxOBVGgjsxzVNpLJ/irHfpt/1A8Uh5xL/QTcm9aUPF1jrKM8KWhWFu8O0G MSze+0GJg9LdzU92Ai/EzkoTGvQ2xgnF+sF7IatWse4sKAkikBCUKEp4Wsa4JXx0F3k6 UwVRGbaGotncGg0U9KBJfcFUByMNgY15cRKuXfwoodrxB3XhZtHpbl6IINhg9ceu+vF6 FcdDYEXDe7dyPzWlsvYMlf/YM5o4iiZxM+bg/NyJH3OjHh5Xq6lL9+rqz5i9xHEoSsyO 01UUyu8KEkeZVCQylGAsxrkg0ZY/iS9dLA8qQyGPWl1xntXe2YxIpZONUXYER7X8Kaxt vSPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:references:mime-version :message-id:in-reply-to:date:sender:dkim-signature; bh=ke1FEsl77Kc9oiXNsFHcCDNOwBSCt2v+UAY//5HtPgA=; b=qa2imYAHgUlz0ULvFT+sQfJta62Lg59n4XzawDh1AA6XJgDVgep05kgw/kCjD/koo3 I9NYg9UxxTWsb4Rt3qzkz0mvqsabr+q0rfLzpIIzKbuhhxkSWDE7ETbvideXuZcTNKfh s/dC+pjOCo3fkQUEldME+YpvFgNYNFiR5bOGlk/zEhpYUKk+orhVsaxqlkb5jvhLr8mW MBwgEvYD8eMWrV6OXEDU9iGRSQc1wfZQRqKp1/LqFYZ69g3dNiNo5KWCFHEVBB9MtP/z 3r+QPlM28JGqobq2mllLsYRvbiql+eGXZ5rBg19j3TiGnslq9egIMrrr+CRZZN9zzyCC /xTQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=wFNt8Gqo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h23si1475317ejd.576.2020.09.23.23.58.34; Wed, 23 Sep 2020 23:58:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=wFNt8Gqo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727130AbgIXG4T (ORCPT + 99 others); Thu, 24 Sep 2020 02:56:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44176 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727125AbgIXG4R (ORCPT ); Thu, 24 Sep 2020 02:56:17 -0400 Received: from mail-qv1-xf4a.google.com (mail-qv1-xf4a.google.com [IPv6:2607:f8b0:4864:20::f4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 673AEC0613D4 for ; Wed, 23 Sep 2020 23:56:17 -0700 (PDT) Received: by mail-qv1-xf4a.google.com with SMTP id v14so1577066qvq.10 for ; Wed, 23 Sep 2020 23:56:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=ke1FEsl77Kc9oiXNsFHcCDNOwBSCt2v+UAY//5HtPgA=; b=wFNt8GqobtkvzKQbpeNuwwKD7cJM6653QEQTP0/fQ32HCibNFnNSCAdUGk7XxvxLCQ wZMVMziyIc+3KTIXWzlQ44tdEtv4Jms/lVWMSApeK+7xyiNDVGXduAgCQPWQmGScBe0y b3DZXlUy+6BpBugiPDDa6yscGSmDf4orJiy+k4J33nRIG0YifrMCLg9DsHILEenRyogr xgpuzhDOsJoD/y9ak4A+GfsFCQdUT9rNiT2C+iRojP6S0NSxY5Yy9rih4puQaZqexUQi WfclaOJ5k4/HwBP5gzQpCfxbdYlMU6bi6k0Pb6Pk2/aFrsXrwg4ro0JeWieN9YfqtCIW iqEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=ke1FEsl77Kc9oiXNsFHcCDNOwBSCt2v+UAY//5HtPgA=; b=EplJMCuTnvDOUMoOqwOxZrMue3d0hfbRgsgbyrqwSZAzCaYRRg7ld4wGuBYoJH7gD5 clRVZBURBJqa5XIOeGB6y9UjhHeQKiKGT/rbHCx3sHB3sRWESHD6Nqd937fQxSnMtHtV eZszIc3h94GsmoWWHxgkhDQBU9EPhbyDFpv7Wnc06Tq6Q7d8dM3okAaUUiZmfXVTyLob kVLHO1TmSwnF55rbHxGuSBImcr8bAEi+VsBE+pbu8Cr0e5jNjz0ZLe7p7z/Dh5DPdXH2 o0S1o78Qdc7bw8vjYmkFjVRC6VskiKy48ka8H8OcGBjWOhXaZZ6LkVyr720tMnuIkbiL ylRA== X-Gm-Message-State: AOAM532uf2+eVQcUQyd8LL79OK/5la8nfC9JHR6EUhNK/Kll2wK3+IYj lf5JhPnuvjynPryv7UEcYC9DHClqFMozoK+9bg== Sender: "lokeshgidra via sendgmr" X-Received: from lg.mtv.corp.google.com ([2620:15c:211:202:f693:9fff:fef4:29dd]) (user=lokeshgidra job=sendgmr) by 2002:ad4:58e3:: with SMTP id di3mr3949934qvb.54.1600930576477; Wed, 23 Sep 2020 23:56:16 -0700 (PDT) Date: Wed, 23 Sep 2020 23:56:06 -0700 In-Reply-To: <20200924065606.3351177-1-lokeshgidra@google.com> Message-Id: <20200924065606.3351177-3-lokeshgidra@google.com> Mime-Version: 1.0 References: <20200924065606.3351177-1-lokeshgidra@google.com> X-Mailer: git-send-email 2.28.0.681.g6f77f65b4e-goog Subject: [PATCH v4 2/2] Add user-mode only option to unprivileged_userfaultfd sysctl knob From: Lokesh Gidra To: Kees Cook , Jonathan Corbet , Peter Xu , Andrea Arcangeli , Sebastian Andrzej Siewior , Andrew Morton Cc: Alexander Viro , Stephen Smalley , Eric Biggers , Lokesh Gidra , Daniel Colascione , "Joel Fernandes (Google)" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kaleshsingh@google.com, calin@google.com, surenb@google.com, nnk@google.com, jeffv@google.com, kernel-team@android.com, Mike Rapoport , Shaohua Li , Jerome Glisse , Mauro Carvalho Chehab , Johannes Weiner , Mel Gorman , Nitin Gupta , Vlastimil Babka , Iurii Zaikin , Luis Chamberlain Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org With this change, when the knob is set to 0, it allows unprivileged users to call userfaultfd, like when it is set to 1, but with the restriction that page faults from only user-mode can be handled. In this mode, an unprivileged user (without SYS_CAP_PTRACE capability) must pass UFFD_USER_MODE_ONLY to userfaultd or the API will fail with EPERM. This enables administrators to reduce the likelihood that an attacker with access to userfaultfd can delay faulting kernel code to widen timing windows for other exploits. The default value of this knob is changed to 0. This is required for correct functioning of pipe mutex. However, this will fail postcopy live migration, which will be unnoticeable to the VM guests. To avoid this, set 'vm.userfault = 1' in /sys/sysctl.conf. For more details, refer to Andrea's reply [1]. [1] https://lore.kernel.org/lkml/20200904033438.GI9411@redhat.com/ Signed-off-by: Lokesh Gidra --- Documentation/admin-guide/sysctl/vm.rst | 15 ++++++++++----- fs/userfaultfd.c | 6 ++++-- 2 files changed, 14 insertions(+), 7 deletions(-) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 4b9d2e8e9142..4263d38c3c21 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -871,12 +871,17 @@ file-backed pages is less than the high watermark in a zone. unprivileged_userfaultfd ======================== -This flag controls whether unprivileged users can use the userfaultfd -system calls. Set this to 1 to allow unprivileged users to use the -userfaultfd system calls, or set this to 0 to restrict userfaultfd to only -privileged users (with SYS_CAP_PTRACE capability). +This flag controls the mode in which unprivileged users can use the +userfaultfd system calls. Set this to 0 to restrict unprivileged users +to handle page faults in user mode only. In this case, users without +SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for userfaultfd to +succeed. Prohibiting use of userfaultfd for handling faults from kernel +mode may make certain vulnerabilities more difficult to exploit. -The default value is 1. +Set this to 1 to allow unprivileged users to use the userfaultfd system +calls without any restrictions. + +The default value is 0. user_reserve_kbytes diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 3191434057f3..3816c11a986a 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -28,7 +28,7 @@ #include #include -int sysctl_unprivileged_userfaultfd __read_mostly = 1; +int sysctl_unprivileged_userfaultfd __read_mostly; static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly; @@ -1972,7 +1972,9 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) struct userfaultfd_ctx *ctx; int fd; - if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE)) + if (!sysctl_unprivileged_userfaultfd && + (flags & UFFD_USER_MODE_ONLY) == 0 && + !capable(CAP_SYS_PTRACE)) return -EPERM; BUG_ON(!current->mm); -- 2.28.0.681.g6f77f65b4e-goog