2020-11-20 03:07:08

by Lokesh Gidra

[permalink] [raw]
Subject: [PATCH v6 0/2] Control over userfaultfd kernel-fault handling

This patch series is split from [1]. The other series enables SELinux
support for userfaultfd file descriptors so that its creation and
movement can be controlled.

It has been demonstrated on various occasions that suspending kernel
code execution for an arbitrary amount of time at any access to
userspace memory (copy_from_user()/copy_to_user()/...) can be exploited
to change the intended behavior of the kernel. For instance, handling
page faults in kernel-mode using userfaultfd has been exploited in [2, 3].
Likewise, FUSE, which is similar to userfaultfd in this respect, has been
exploited in [4, 5] for similar outcome.

This small patch series adds a new flag to userfaultfd(2) that allows
callers to give up the ability to handle kernel-mode faults with the
resulting UFFD file object. It then adds a 'user-mode only' option to
the unprivileged_userfaultfd sysctl knob to require unprivileged
callers to use this new flag.

The purpose of this new interface is to decrease the chance of an
unprivileged userfaultfd user taking advantage of userfaultfd to
enhance security vulnerabilities by lengthening the race window in
kernel code.

[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://duasynt.com/blog/linux-kernel-heap-spray
[3] https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit
[4] https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html
[5] https://bugs.chromium.org/p/project-zero/issues/detail?id=808

Changes since v5:

- Added printk_once when unprivileged_userfaultfd is set to 0 and
userfaultfd syscall is called without UFFD_USER_MODE_ONLY in the
absence of CAP_SYS_PTRACE capability.

Changes since v4:

- Added warning when bailing out from handling kernel fault.

Changes since v3:

- Modified the meaning of value '0' of unprivileged_userfaultfd
sysctl knob. Setting this knob to '0' now allows unprivileged users
to use userfaultfd, but can handle page faults in user-mode only.
- The default value of unprivileged_userfaultfd sysctl knob is changed
to '0'.

Changes since v2:

- Removed 'uffd_flags' and directly used 'UFFD_USER_MODE_ONLY' in
userfaultfd().

Changes since v1:

- Added external references to the threats from allowing unprivileged
users to handle page faults from kernel-mode.
- Removed the new sysctl knob restricting handling of page
faults from kernel-mode, and added an option for the same
in the existing 'unprivileged_userfaultfd' knob.

Lokesh Gidra (2):
Add UFFD_USER_MODE_ONLY
Add user-mode only option to unprivileged_userfaultfd sysctl knob

Documentation/admin-guide/sysctl/vm.rst | 15 ++++++++++-----
fs/userfaultfd.c | 20 +++++++++++++++++---
include/uapi/linux/userfaultfd.h | 9 +++++++++
3 files changed, 36 insertions(+), 8 deletions(-)

--
2.29.0.rc1.297.gfa9743e501-goog


2020-11-20 03:08:47

by Lokesh Gidra

[permalink] [raw]
Subject: [PATCH v6 2/2] Add user-mode only option to unprivileged_userfaultfd sysctl knob

With this change, when the knob is set to 0, it allows unprivileged
users to call userfaultfd, like when it is set to 1, but with the
restriction that page faults from only user-mode can be handled.
In this mode, an unprivileged user (without SYS_CAP_PTRACE capability)
must pass UFFD_USER_MODE_ONLY to userfaultd or the API will fail with
EPERM.

This enables administrators to reduce the likelihood that an attacker
with access to userfaultfd can delay faulting kernel code to widen
timing windows for other exploits.

The default value of this knob is changed to 0. This is required for
correct functioning of pipe mutex. However, this will fail postcopy
live migration, which will be unnoticeable to the VM guests. To avoid
this, set 'vm.userfault = 1' in /sys/sysctl.conf.

The main reason this change is desirable as in the short term is that
the Android userland will behave as with the sysctl set to zero. So
without this commit, any Linux binary using userfaultfd to manage its
memory would behave differently if run within the Android userland.
For more details, refer to Andrea's reply [1].

[1] https://lore.kernel.org/lkml/[email protected]/

Signed-off-by: Lokesh Gidra <[email protected]>
Reviewed-by: Andrea Arcangeli <[email protected]>
---
Documentation/admin-guide/sysctl/vm.rst | 15 ++++++++++-----
fs/userfaultfd.c | 10 ++++++++--
2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index f455fa00c00f..d06a98b2a4e7 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -873,12 +873,17 @@ file-backed pages is less than the high watermark in a zone.
unprivileged_userfaultfd
========================

-This flag controls whether unprivileged users can use the userfaultfd
-system calls. Set this to 1 to allow unprivileged users to use the
-userfaultfd system calls, or set this to 0 to restrict userfaultfd to only
-privileged users (with SYS_CAP_PTRACE capability).
+This flag controls the mode in which unprivileged users can use the
+userfaultfd system calls. Set this to 0 to restrict unprivileged users
+to handle page faults in user mode only. In this case, users without
+SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for userfaultfd to
+succeed. Prohibiting use of userfaultfd for handling faults from kernel
+mode may make certain vulnerabilities more difficult to exploit.

-The default value is 1.
+Set this to 1 to allow unprivileged users to use the userfaultfd system
+calls without any restrictions.
+
+The default value is 0.


user_reserve_kbytes
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 605599fde015..894cc28142e7 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -28,7 +28,7 @@
#include <linux/security.h>
#include <linux/hugetlb.h>

-int sysctl_unprivileged_userfaultfd __read_mostly = 1;
+int sysctl_unprivileged_userfaultfd __read_mostly;

static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly;

@@ -1966,8 +1966,14 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
struct userfaultfd_ctx *ctx;
int fd;

- if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE))
+ if (!sysctl_unprivileged_userfaultfd &&
+ (flags & UFFD_USER_MODE_ONLY) == 0 &&
+ !capable(CAP_SYS_PTRACE)) {
+ printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd "
+ "sysctl knob to 1 if kernel faults must be handled "
+ "without obtaining CAP_SYS_PTRACE capability\n");
return -EPERM;
+ }

BUG_ON(!current->mm);

--
2.29.0.rc1.297.gfa9743e501-goog

2020-11-20 03:13:27

by Lokesh Gidra

[permalink] [raw]
Subject: Re: [PATCH v6 0/2] Control over userfaultfd kernel-fault handling

On Thu, Nov 19, 2020 at 7:04 PM Lokesh Gidra <[email protected]> wrote:
>
> This patch series is split from [1]. The other series enables SELinux
> support for userfaultfd file descriptors so that its creation and
> movement can be controlled.
>
> It has been demonstrated on various occasions that suspending kernel
> code execution for an arbitrary amount of time at any access to
> userspace memory (copy_from_user()/copy_to_user()/...) can be exploited
> to change the intended behavior of the kernel. For instance, handling
> page faults in kernel-mode using userfaultfd has been exploited in [2, 3].
> Likewise, FUSE, which is similar to userfaultfd in this respect, has been
> exploited in [4, 5] for similar outcome.
>
> This small patch series adds a new flag to userfaultfd(2) that allows
> callers to give up the ability to handle kernel-mode faults with the
> resulting UFFD file object. It then adds a 'user-mode only' option to
> the unprivileged_userfaultfd sysctl knob to require unprivileged
> callers to use this new flag.
>
> The purpose of this new interface is to decrease the chance of an
> unprivileged userfaultfd user taking advantage of userfaultfd to
> enhance security vulnerabilities by lengthening the race window in
> kernel code.
>
> [1] https://lore.kernel.org/lkml/[email protected]/
> [2] https://duasynt.com/blog/linux-kernel-heap-spray
> [3] https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit
> [4] https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html
> [5] https://bugs.chromium.org/p/project-zero/issues/detail?id=808
>
> Changes since v5:
>
> - Added printk_once when unprivileged_userfaultfd is set to 0 and
> userfaultfd syscall is called without UFFD_USER_MODE_ONLY in the
> absence of CAP_SYS_PTRACE capability.
>
> Changes since v4:
>
> - Added warning when bailing out from handling kernel fault.
>
> Changes since v3:
>
> - Modified the meaning of value '0' of unprivileged_userfaultfd
> sysctl knob. Setting this knob to '0' now allows unprivileged users
> to use userfaultfd, but can handle page faults in user-mode only.
> - The default value of unprivileged_userfaultfd sysctl knob is changed
> to '0'.
>
> Changes since v2:
>
> - Removed 'uffd_flags' and directly used 'UFFD_USER_MODE_ONLY' in
> userfaultfd().
>
> Changes since v1:
>
> - Added external references to the threats from allowing unprivileged
> users to handle page faults from kernel-mode.
> - Removed the new sysctl knob restricting handling of page
> faults from kernel-mode, and added an option for the same
> in the existing 'unprivileged_userfaultfd' knob.
>
> Lokesh Gidra (2):
> Add UFFD_USER_MODE_ONLY
> Add user-mode only option to unprivileged_userfaultfd sysctl knob
>
> Documentation/admin-guide/sysctl/vm.rst | 15 ++++++++++-----
> fs/userfaultfd.c | 20 +++++++++++++++++---
> include/uapi/linux/userfaultfd.h | 9 +++++++++
> 3 files changed, 36 insertions(+), 8 deletions(-)
>
> --
> 2.29.0.rc1.297.gfa9743e501-goog
>
Adding [email protected] mailing list.

2020-11-20 03:15:13

by Lokesh Gidra

[permalink] [raw]
Subject: Re: [PATCH v6 2/2] Add user-mode only option to unprivileged_userfaultfd sysctl knob

On Thu, Nov 19, 2020 at 7:04 PM Lokesh Gidra <[email protected]> wrote:
>
> With this change, when the knob is set to 0, it allows unprivileged
> users to call userfaultfd, like when it is set to 1, but with the
> restriction that page faults from only user-mode can be handled.
> In this mode, an unprivileged user (without SYS_CAP_PTRACE capability)
> must pass UFFD_USER_MODE_ONLY to userfaultd or the API will fail with
> EPERM.
>
> This enables administrators to reduce the likelihood that an attacker
> with access to userfaultfd can delay faulting kernel code to widen
> timing windows for other exploits.
>
> The default value of this knob is changed to 0. This is required for
> correct functioning of pipe mutex. However, this will fail postcopy
> live migration, which will be unnoticeable to the VM guests. To avoid
> this, set 'vm.userfault = 1' in /sys/sysctl.conf.
>
> The main reason this change is desirable as in the short term is that
> the Android userland will behave as with the sysctl set to zero. So
> without this commit, any Linux binary using userfaultfd to manage its
> memory would behave differently if run within the Android userland.
> For more details, refer to Andrea's reply [1].
>
> [1] https://lore.kernel.org/lkml/[email protected]/
>
> Signed-off-by: Lokesh Gidra <[email protected]>
> Reviewed-by: Andrea Arcangeli <[email protected]>
> ---
> Documentation/admin-guide/sysctl/vm.rst | 15 ++++++++++-----
> fs/userfaultfd.c | 10 ++++++++--
> 2 files changed, 18 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
> index f455fa00c00f..d06a98b2a4e7 100644
> --- a/Documentation/admin-guide/sysctl/vm.rst
> +++ b/Documentation/admin-guide/sysctl/vm.rst
> @@ -873,12 +873,17 @@ file-backed pages is less than the high watermark in a zone.
> unprivileged_userfaultfd
> ========================
>
> -This flag controls whether unprivileged users can use the userfaultfd
> -system calls. Set this to 1 to allow unprivileged users to use the
> -userfaultfd system calls, or set this to 0 to restrict userfaultfd to only
> -privileged users (with SYS_CAP_PTRACE capability).
> +This flag controls the mode in which unprivileged users can use the
> +userfaultfd system calls. Set this to 0 to restrict unprivileged users
> +to handle page faults in user mode only. In this case, users without
> +SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for userfaultfd to
> +succeed. Prohibiting use of userfaultfd for handling faults from kernel
> +mode may make certain vulnerabilities more difficult to exploit.
>
> -The default value is 1.
> +Set this to 1 to allow unprivileged users to use the userfaultfd system
> +calls without any restrictions.
> +
> +The default value is 0.
>
>
> user_reserve_kbytes
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 605599fde015..894cc28142e7 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -28,7 +28,7 @@
> #include <linux/security.h>
> #include <linux/hugetlb.h>
>
> -int sysctl_unprivileged_userfaultfd __read_mostly = 1;
> +int sysctl_unprivileged_userfaultfd __read_mostly;
>
> static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly;
>
> @@ -1966,8 +1966,14 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
> struct userfaultfd_ctx *ctx;
> int fd;
>
> - if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE))
> + if (!sysctl_unprivileged_userfaultfd &&
> + (flags & UFFD_USER_MODE_ONLY) == 0 &&
> + !capable(CAP_SYS_PTRACE)) {
> + printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd "
> + "sysctl knob to 1 if kernel faults must be handled "
> + "without obtaining CAP_SYS_PTRACE capability\n");
> return -EPERM;
> + }
>
> BUG_ON(!current->mm);
>
> --
> 2.29.0.rc1.297.gfa9743e501-goog
>
Adding [email protected] list