Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp3516514pxu; Sun, 11 Oct 2020 12:45:56 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzCf58XPLmFV5bJMenZ1+CxxI3tN/Jznxrt7K2JZG6TsB6laE/2XBm3U3oezSRu3BbdHeLI X-Received: by 2002:a50:e447:: with SMTP id e7mr10722767edm.263.1602445556459; Sun, 11 Oct 2020 12:45:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1602445556; cv=none; d=google.com; s=arc-20160816; b=q7QXJryNCTiNz5Ilnqnk3obgHwXt68d5jqOGzqDamh52ha4vb9P/Uvwp1IYqqtvlXT uAelvEBKsZ/nUcdYGDJEf9ntRWMHrFDvVvsXl1tyvglBpoE53NySENSqt1ZEE0GJX7Bh ItFhfaNZIhviGsp7mkxPCIkgKLwvegzbpaqeVBFA0FuZkooC46vWn6DkePrZ2LGiIr7R hXfeZsMJc4tNkO3ymbN/9ndR+2RfXPKdnI5IW8Htvf/+XgGMdk1AY6nuN7uJUkX1iQ8v OGdkJeN1TKLhFh9HxK1abQ6w0K9AyQ9PATabk1NFJ8IR4eC58fw1BFox0vBHJNuCDvVD 4Jlg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:references:mime-version :message-id:in-reply-to:date:sender:dkim-signature; bh=5fGPuFwLHe/GLon7KcEGj+q06IZAPIH4nCR0GWMNees=; b=hmPE/MrcMRKpWtnV/vfdncpR8rMBs4lBcHPAaS2HZSVvA0x/SgV5V0jTWD+2yEjSUm yiv7c/g/xT3kM5egF2WoYdpxCIgmV6crHarOLLQ2klpwBt2PPPDivREh53MvOfRhzBUI E+NYhsq1FjkGK3zedFKTrTOIeyzCe3wZqyAIA5gwONJsATpcAw8hRffhNIu4hsIGbBnv 6GrIFXZrgATjZxgrhz+ffIw47bkJS5wOS9iJmIl2mIM1Fv4ThPI/zHFTIYkr0TXZc3Oz R3Hx59/IHm7/9X0uW0+9k2j3UZ0NpL8WdPU9EEt7oXiQh6T6VyM9suObcSMWgiVgKGmz wb8Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=iuK2qujS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d2si10896424ejh.727.2020.10.11.12.45.30; Sun, 11 Oct 2020 12:45:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=iuK2qujS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727313AbgJKGZK (ORCPT + 99 others); Sun, 11 Oct 2020 02:25:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54056 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727109AbgJKGZH (ORCPT ); Sun, 11 Oct 2020 02:25:07 -0400 Received: from mail-qk1-x74a.google.com (mail-qk1-x74a.google.com [IPv6:2607:f8b0:4864:20::74a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DFC26C0613D0 for ; Sat, 10 Oct 2020 23:25:06 -0700 (PDT) Received: by mail-qk1-x74a.google.com with SMTP id n125so10459651qke.19 for ; Sat, 10 Oct 2020 23:25:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=5fGPuFwLHe/GLon7KcEGj+q06IZAPIH4nCR0GWMNees=; b=iuK2qujS2Wpb/UqMfnOeCjgHvZ1IsRGnt0EmpuwPtogcu4rNijokbrOyxMBpdb9OAK 85ZHfXv1VPQNQdn1y41zmpAAmWpNR40EPAMVSaiKmahqGdI55+GfXLwGSXuYmI41+d37 mTvdq6tLM9FgV7Sr3lLamNKbDZzCDB0bEup8RZg/XsEMsawRG0IbvaT3zFghbjRIY8CU HsQo72JTpSux84TMMpY3ZwnfkMeBWZusK9Bqzx2ZlC3qbUKbRU+hl7cLM4nkoJvVMw9J qfa01qh2VDH/dee9xBYEYl4LGHYvW7GcqzTeABvB/1rlz3VFA/pXUrZIxZzqMzrgfiO2 L3xw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=5fGPuFwLHe/GLon7KcEGj+q06IZAPIH4nCR0GWMNees=; b=Ami4zFo3fO5c34TIW6h3MHDtET9ReJmadJy4/lENIb7twrPhRLxJzu7ufrbWfy5XV1 6rnfrmWYfGoS8lZ/8jd/rGJLhQY9ZMw0zHMeniIk+Ff+B/usYdh57sOo9b2U184nFZCr HcggMKdnblYEXkVIDVISqSrMTiLAmPMU7OEHBSup8RlcjAKIXHvuIGmGnOkoRPhyhI0C rPLxivUq8QWh9iKwsE2CcnQQVzlmoUnbmBDoVXoF38/ZQ1lhgXvGlGA0fMx54MGvvONX Y99YopdtqabiDv7dUuVzK9z192SQ+WfbOp5pF3u/Fjkl0x9nSqjKmEHFPCULwuma0z6x grzA== X-Gm-Message-State: AOAM532gTRy4dmcTMJSt6Q8Y9+sixN+WvBfCGFZ+n/Q1HkRqpBtl5dWS EumWQs/zXzABSvanCthOtxrwspK47pXYW79ofA== Sender: "lokeshgidra via sendgmr" X-Received: from lg.mtv.corp.google.com ([2620:15c:211:202:f693:9fff:fef4:29dd]) (user=lokeshgidra job=sendgmr) by 2002:a0c:fca9:: with SMTP id h9mr20337980qvq.30.1602397506018; Sat, 10 Oct 2020 23:25:06 -0700 (PDT) Date: Sat, 10 Oct 2020 23:24:56 -0700 In-Reply-To: <20201011062456.4065576-1-lokeshgidra@google.com> Message-Id: <20201011062456.4065576-3-lokeshgidra@google.com> Mime-Version: 1.0 References: <20201011062456.4065576-1-lokeshgidra@google.com> X-Mailer: git-send-email 2.28.0.1011.ga647a8990f-goog Subject: [PATCH v5 2/2] Add user-mode only option to unprivileged_userfaultfd sysctl knob From: Lokesh Gidra To: Kees Cook , Jonathan Corbet , Peter Xu , Andrea Arcangeli , Sebastian Andrzej Siewior , Andrew Morton Cc: Alexander Viro , Stephen Smalley , Eric Biggers , Lokesh Gidra , Daniel Colascione , "Joel Fernandes (Google)" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kaleshsingh@google.com, calin@google.com, surenb@google.com, nnk@google.com, jeffv@google.com, kernel-team@android.com, Mike Rapoport , Shaohua Li , Jerome Glisse , Mauro Carvalho Chehab , Johannes Weiner , Mel Gorman , Nitin Gupta , Vlastimil Babka , Iurii Zaikin , Luis Chamberlain Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org With this change, when the knob is set to 0, it allows unprivileged users to call userfaultfd, like when it is set to 1, but with the restriction that page faults from only user-mode can be handled. In this mode, an unprivileged user (without SYS_CAP_PTRACE capability) must pass UFFD_USER_MODE_ONLY to userfaultd or the API will fail with EPERM. This enables administrators to reduce the likelihood that an attacker with access to userfaultfd can delay faulting kernel code to widen timing windows for other exploits. The default value of this knob is changed to 0. This is required for correct functioning of pipe mutex. However, this will fail postcopy live migration, which will be unnoticeable to the VM guests. To avoid this, set 'vm.userfault = 1' in /sys/sysctl.conf. For more details, refer to Andrea's reply [1]. [1] https://lore.kernel.org/lkml/20200904033438.GI9411@redhat.com/ Signed-off-by: Lokesh Gidra --- Documentation/admin-guide/sysctl/vm.rst | 15 ++++++++++----- fs/userfaultfd.c | 6 ++++-- 2 files changed, 14 insertions(+), 7 deletions(-) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 4b9d2e8e9142..4263d38c3c21 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -871,12 +871,17 @@ file-backed pages is less than the high watermark in a zone. unprivileged_userfaultfd ======================== -This flag controls whether unprivileged users can use the userfaultfd -system calls. Set this to 1 to allow unprivileged users to use the -userfaultfd system calls, or set this to 0 to restrict userfaultfd to only -privileged users (with SYS_CAP_PTRACE capability). +This flag controls the mode in which unprivileged users can use the +userfaultfd system calls. Set this to 0 to restrict unprivileged users +to handle page faults in user mode only. In this case, users without +SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for userfaultfd to +succeed. Prohibiting use of userfaultfd for handling faults from kernel +mode may make certain vulnerabilities more difficult to exploit. -The default value is 1. +Set this to 1 to allow unprivileged users to use the userfaultfd system +calls without any restrictions. + +The default value is 0. user_reserve_kbytes diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index bd229f06d4e9..0f8a975db3be 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -28,7 +28,7 @@ #include #include -int sysctl_unprivileged_userfaultfd __read_mostly = 1; +int sysctl_unprivileged_userfaultfd __read_mostly; static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly; @@ -1976,7 +1976,9 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) struct userfaultfd_ctx *ctx; int fd; - if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE)) + if (!sysctl_unprivileged_userfaultfd && + (flags & UFFD_USER_MODE_ONLY) == 0 && + !capable(CAP_SYS_PTRACE)) return -EPERM; BUG_ON(!current->mm); -- 2.28.0.1011.ga647a8990f-goog