Introduce a new sysctl (io_uring_disabled) which can be either 0, 1,
or 2. When 0 (the default), all processes are allowed to create io_uring
instances, which is the current behavior. When 1, all calls to
io_uring_setup fail with -EPERM unless the calling process has
CAP_SYS_ADMIN. When 2, calls to io_uring_setup fail with -EPERM
regardless of privilege.
Signed-off-by: Matteo Rizzo <[email protected]>
---
Documentation/admin-guide/sysctl/kernel.rst | 19 +++++++++++++
io_uring/io_uring.c | 30 +++++++++++++++++++++
2 files changed, 49 insertions(+)
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 3800fab1619b..ee65f7aeb0cf 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -450,6 +450,25 @@ this allows system administrators to override the
``IA64_THREAD_UAC_NOPRINT`` ``prctl`` and avoid logs being flooded.
+io_uring_disabled
+=================
+
+Prevents all processes from creating new io_uring instances. Enabling this
+shrinks the kernel's attack surface.
+
+= ==================================================================
+0 All processes can create io_uring instances as normal. This is the
+ default setting.
+1 io_uring creation is disabled for unprivileged processes.
+ io_uring_setup fails with -EPERM unless the calling process is
+ privileged (CAP_SYS_ADMIN). Existing io_uring instances can
+ still be used.
+2 io_uring creation is disabled for all processes. io_uring_setup
+ always fails with -EPERM. Existing io_uring instances can still be
+ used.
+= ==================================================================
+
+
kexec_load_disabled
===================
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 1b53a2ab0a27..2343ae518546 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -153,6 +153,22 @@ static __cold void io_fallback_tw(struct io_uring_task *tctx);
struct kmem_cache *req_cachep;
+static int __read_mostly sysctl_io_uring_disabled;
+#ifdef CONFIG_SYSCTL
+static struct ctl_table kernel_io_uring_disabled_table[] = {
+ {
+ .procname = "io_uring_disabled",
+ .data = &sysctl_io_uring_disabled,
+ .maxlen = sizeof(sysctl_io_uring_disabled),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_TWO,
+ },
+ {},
+};
+#endif
+
struct sock *io_uring_get_socket(struct file *file)
{
#if defined(CONFIG_UNIX)
@@ -4000,9 +4016,18 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
return io_uring_create(entries, &p, params);
}
+static inline bool io_uring_allowed(void)
+{
+ return sysctl_io_uring_disabled == 0 ||
+ (sysctl_io_uring_disabled == 1 && capable(CAP_SYS_ADMIN));
+}
+
SYSCALL_DEFINE2(io_uring_setup, u32, entries,
struct io_uring_params __user *, params)
{
+ if (!io_uring_allowed())
+ return -EPERM;
+
return io_uring_setup(entries, params);
}
@@ -4577,6 +4602,11 @@ static int __init io_uring_init(void)
req_cachep = KMEM_CACHE(io_kiocb, SLAB_HWCACHE_ALIGN | SLAB_PANIC |
SLAB_ACCOUNT | SLAB_TYPESAFE_BY_RCU);
+
+#ifdef CONFIG_SYSCTL
+ register_sysctl_init("kernel", kernel_io_uring_disabled_table);
+#endif
+
return 0;
};
__initcall(io_uring_init);
--
2.41.0.162.gfafddb0af9-goog
On 6/29/23 06:27, Matteo Rizzo wrote:
> +static int __read_mostly sysctl_io_uring_disabled;
Shouldn't this be a static key instead of an int in order to minimize the
performance impact on the io_uring_setup() system call? See also
Documentation/staging/static-keys.rst.
Thanks,
Bart.
On Thu, 29 Jun 2023 at 17:16, Bart Van Assche <[email protected]> wrote:
>
> On 6/29/23 06:27, Matteo Rizzo wrote:
> > +static int __read_mostly sysctl_io_uring_disabled;
>
> Shouldn't this be a static key instead of an int in order to minimize the
> performance impact on the io_uring_setup() system call? See also
> Documentation/staging/static-keys.rst.
>
> Thanks,
>
> Bart.
Is io_uring_setup in any hot path? io_uring_create is marked as __cold.
--
Matteo
Matteo Rizzo <[email protected]> writes:
> Introduce a new sysctl (io_uring_disabled) which can be either 0, 1,
> or 2. When 0 (the default), all processes are allowed to create io_uring
> instances, which is the current behavior. When 1, all calls to
> io_uring_setup fail with -EPERM unless the calling process has
> CAP_SYS_ADMIN. When 2, calls to io_uring_setup fail with -EPERM
> regardless of privilege.
>
> Signed-off-by: Matteo Rizzo <[email protected]>
This looks good to me. You may also consider updating the
io_uring_setup(2) man page (part of liburing) to reflect this new
meaning for -EPERM.
Reviewed-by: Jeff Moyer <[email protected]>
> ---
> Documentation/admin-guide/sysctl/kernel.rst | 19 +++++++++++++
> io_uring/io_uring.c | 30 +++++++++++++++++++++
> 2 files changed, 49 insertions(+)
>
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index 3800fab1619b..ee65f7aeb0cf 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -450,6 +450,25 @@ this allows system administrators to override the
> ``IA64_THREAD_UAC_NOPRINT`` ``prctl`` and avoid logs being flooded.
>
>
> +io_uring_disabled
> +=================
> +
> +Prevents all processes from creating new io_uring instances. Enabling this
> +shrinks the kernel's attack surface.
> +
> += ==================================================================
> +0 All processes can create io_uring instances as normal. This is the
> + default setting.
> +1 io_uring creation is disabled for unprivileged processes.
> + io_uring_setup fails with -EPERM unless the calling process is
> + privileged (CAP_SYS_ADMIN). Existing io_uring instances can
> + still be used.
> +2 io_uring creation is disabled for all processes. io_uring_setup
> + always fails with -EPERM. Existing io_uring instances can still be
> + used.
> += ==================================================================
> +
> +
> kexec_load_disabled
> ===================
>
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 1b53a2ab0a27..2343ae518546 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -153,6 +153,22 @@ static __cold void io_fallback_tw(struct io_uring_task *tctx);
>
> struct kmem_cache *req_cachep;
>
> +static int __read_mostly sysctl_io_uring_disabled;
> +#ifdef CONFIG_SYSCTL
> +static struct ctl_table kernel_io_uring_disabled_table[] = {
> + {
> + .procname = "io_uring_disabled",
> + .data = &sysctl_io_uring_disabled,
> + .maxlen = sizeof(sysctl_io_uring_disabled),
> + .mode = 0644,
> + .proc_handler = proc_dointvec_minmax,
> + .extra1 = SYSCTL_ZERO,
> + .extra2 = SYSCTL_TWO,
> + },
> + {},
> +};
> +#endif
> +
> struct sock *io_uring_get_socket(struct file *file)
> {
> #if defined(CONFIG_UNIX)
> @@ -4000,9 +4016,18 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
> return io_uring_create(entries, &p, params);
> }
>
> +static inline bool io_uring_allowed(void)
> +{
> + return sysctl_io_uring_disabled == 0 ||
> + (sysctl_io_uring_disabled == 1 && capable(CAP_SYS_ADMIN));
> +}
> +
> SYSCALL_DEFINE2(io_uring_setup, u32, entries,
> struct io_uring_params __user *, params)
> {
> + if (!io_uring_allowed())
> + return -EPERM;
> +
> return io_uring_setup(entries, params);
> }
>
> @@ -4577,6 +4602,11 @@ static int __init io_uring_init(void)
>
> req_cachep = KMEM_CACHE(io_kiocb, SLAB_HWCACHE_ALIGN | SLAB_PANIC |
> SLAB_ACCOUNT | SLAB_TYPESAFE_BY_RCU);
> +
> +#ifdef CONFIG_SYSCTL
> + register_sysctl_init("kernel", kernel_io_uring_disabled_table);
> +#endif
> +
> return 0;
> };
> __initcall(io_uring_init);
On 6/29/23 08:28, Matteo Rizzo wrote:
> On Thu, 29 Jun 2023 at 17:16, Bart Van Assche <[email protected]> wrote:
>>
>> On 6/29/23 06:27, Matteo Rizzo wrote:
>>> +static int __read_mostly sysctl_io_uring_disabled;
>>
>> Shouldn't this be a static key instead of an int in order to minimize the
>> performance impact on the io_uring_setup() system call? See also
>> Documentation/staging/static-keys.rst.
>>
> Is io_uring_setup in any hot path? io_uring_create is marked as __cold.
I confused io_uring_setup() with io_uring_enter() so please ignore my comment.
Bart.
Matteo Rizzo <[email protected]> writes:
> Introduce a new sysctl (io_uring_disabled) which can be either 0, 1,
> or 2. When 0 (the default), all processes are allowed to create io_uring
> instances, which is the current behavior. When 1, all calls to
> io_uring_setup fail with -EPERM unless the calling process has
> CAP_SYS_ADMIN. When 2, calls to io_uring_setup fail with -EPERM
> regardless of privilege.
>
> Signed-off-by: Matteo Rizzo <[email protected]>
> ---
Thanks for adding the extra level for root-only rings.
The patch looks good to me.
Reviewed-by: Gabriel Krisman Bertazi <[email protected]>
--
Gabriel Krisman Bertazi
On Thu, 29 Jun 2023 at 20:36, Gabriel Krisman Bertazi <[email protected]> wrote:
>
> Thanks for adding the extra level for root-only rings.
>
> The patch looks good to me.
>
> Reviewed-by: Gabriel Krisman Bertazi <[email protected]>
Thanks everyone for the reviews! Unfortunately I forgot the subsystem name
in the commit message. Jann also pointed out to me internally that the
check in io_uring_allowed could race with another process that is trying to
change the sysctl. I will send a v3 that fixes both issues.
Thanks,
--
Matteo