2023-06-27 12:14:53

by Matteo Rizzo

[permalink] [raw]
Subject: [PATCH 0/1] Add a sysctl to disable io_uring system-wide

Over the last few years we've seen many critical vulnerabilities in
io_uring (https://goo.gle/limit-iouring) which could be exploited by
an unprivileged process. There is currently no way to disable io_uring
system-wide except by compiling it out of the kernel entirely. The only
way to prevent a process from accessing io_uring is to use a seccomp
filter, but seccomp cannot be applied system-wide. This patch introduces a
new sysctl which disables the creation of new io_uring instances
system-wide. This gives system admins a way to reduce the kernel's attack
surface on systems where io_uring is not used.


Matteo Rizzo (1):
Add a new sysctl to disable io_uring system-wide

Documentation/admin-guide/sysctl/kernel.rst | 14 ++++++++++++
io_uring/io_uring.c | 24 +++++++++++++++++++++
2 files changed, 38 insertions(+)

--
2.41.0.162.gfafddb0af9-goog



2023-06-27 12:52:24

by Matteo Rizzo

[permalink] [raw]
Subject: [PATCH 1/1] Add a new sysctl to disable io_uring system-wide

Introduce a new sysctl (io_uring_disabled) which can be either 0 or 1.
When 0 (the default), all processes are allowed to create io_uring
instances, which is the current behavior. When 1, all calls to
io_uring_setup fail with -EPERM.

Signed-off-by: Matteo Rizzo <[email protected]>
---
Documentation/admin-guide/sysctl/kernel.rst | 14 ++++++++++++
io_uring/io_uring.c | 24 +++++++++++++++++++++
2 files changed, 38 insertions(+)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index d85d90f5d000..3c53a238332a 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -450,6 +450,20 @@ this allows system administrators to override the
``IA64_THREAD_UAC_NOPRINT`` ``prctl`` and avoid logs being flooded.


+io_uring_disabled
+=========================
+
+Prevents all processes from creating new io_uring instances. Enabling this
+shrinks the kernel's attack surface.
+
+= =============================================================
+0 All processes can create io_uring instances as normal. This is the default
+ setting.
+1 io_uring is disabled. io_uring_setup always fails with -EPERM. Existing
+ io_uring instances can still be used.
+= =============================================================
+
+
kexec_load_disabled
===================

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 1b53a2ab0a27..0496ae7017f7 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -153,6 +153,22 @@ static __cold void io_fallback_tw(struct io_uring_task *tctx);

struct kmem_cache *req_cachep;

+static int __read_mostly sysctl_io_uring_disabled;
+#ifdef CONFIG_SYSCTL
+static struct ctl_table kernel_io_uring_disabled_table[] = {
+ {
+ .procname = "io_uring_disabled",
+ .data = &sysctl_io_uring_disabled,
+ .maxlen = sizeof(sysctl_io_uring_disabled),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE,
+ },
+ {},
+};
+#endif
+
struct sock *io_uring_get_socket(struct file *file)
{
#if defined(CONFIG_UNIX)
@@ -4003,6 +4019,9 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
SYSCALL_DEFINE2(io_uring_setup, u32, entries,
struct io_uring_params __user *, params)
{
+ if (sysctl_io_uring_disabled)
+ return -EPERM;
+
return io_uring_setup(entries, params);
}

@@ -4577,6 +4596,11 @@ static int __init io_uring_init(void)

req_cachep = KMEM_CACHE(io_kiocb, SLAB_HWCACHE_ALIGN | SLAB_PANIC |
SLAB_ACCOUNT | SLAB_TYPESAFE_BY_RCU);
+
+#ifdef CONFIG_SYSCTL
+ register_sysctl_init("kernel", kernel_io_uring_disabled_table);
+#endif
+
return 0;
};
__initcall(io_uring_init);
--
2.41.0.162.gfafddb0af9-goog


2023-06-27 16:42:23

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH 1/1] Add a new sysctl to disable io_uring system-wide

Hi--

On 6/27/23 05:00, Matteo Rizzo wrote:
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index d85d90f5d000..3c53a238332a 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -450,6 +450,20 @@ this allows system administrators to override the
> ``IA64_THREAD_UAC_NOPRINT`` ``prctl`` and avoid logs being flooded.
>
>
> +io_uring_disabled
> +=========================
> +
> +Prevents all processes from creating new io_uring instances. Enabling this
> +shrinks the kernel's attack surface.
> +
> += =============================================================
> +0 All processes can create io_uring instances as normal. This is the default
> + setting.
> +1 io_uring is disabled. io_uring_setup always fails with -EPERM. Existing
> + io_uring instances can still be used.
> += =============================================================

These table lines should be extended at least as far as the text that they
enclose. I.e., the top and bottom lines should be like:

> += ==========================================================================

thanks.
--
~Randy

2023-06-27 17:41:09

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH 1/1] Add a new sysctl to disable io_uring system-wide

On 6/27/23 05:00, Matteo Rizzo wrote:
> +Prevents all processes from creating new io_uring instances. Enabling this
> +shrinks the kernel's attack surface.
> +
> += =============================================================
> +0 All processes can create io_uring instances as normal. This is the default
> + setting.
> +1 io_uring is disabled. io_uring_setup always fails with -EPERM. Existing
> + io_uring instances can still be used.
> += =============================================================

I'm using fio + io_uring all the time on Android devices. I think we need a
better solution than disabling io_uring system-wide, e.g. a mechanism based
on SELinux that disables io_uring for apps and that keeps io_uring enabled
for processes started via 'adb root && adb shell ...'

Bart.


2023-06-27 18:31:34

by Matteo Rizzo

[permalink] [raw]
Subject: Re: [PATCH 1/1] Add a new sysctl to disable io_uring system-wide

On Tue, 27 Jun 2023 at 19:10, Bart Van Assche <[email protected]> wrote:
> I'm using fio + io_uring all the time on Android devices. I think we need a
> better solution than disabling io_uring system-wide, e.g. a mechanism based
> on SELinux that disables io_uring for apps and that keeps io_uring enabled
> for processes started via 'adb root && adb shell ...'

Android already uses seccomp to prevent untrusted applications from using
io_uring. This patch is aimed at server/desktop environments where there is
no easy way to set a system-wide seccomp policy and right now the only way
to disable io_uring system-wide is to compile it out of the kernel entirely
(not really feasible for e.g. a general-purpose distro).

I thought about adding a capability check that lets privileged processes
bypass this sysctl, but it wasn't clear to me which capability I should use.
For userfaultfd the kernel uses CAP_SYS_PTRACE, but I wasn't sure that's
the best choice here since io_uring has nothing to do with ptrace.
If anyone has any suggestions please let me know. A LSM hook also sounds
like an option but it would be more complicated to implement and use.

2023-06-28 12:11:53

by Ricardo Ribalda

[permalink] [raw]
Subject: Re: [PATCH 1/1] Add a new sysctl to disable io_uring system-wide

Hi Matteo

On Tue, 27 Jun 2023 at 20:15, Matteo Rizzo <[email protected]> wrote:
>
> On Tue, 27 Jun 2023 at 19:10, Bart Van Assche <[email protected]> wrote:
> > I'm using fio + io_uring all the time on Android devices. I think we need a
> > better solution than disabling io_uring system-wide, e.g. a mechanism based
> > on SELinux that disables io_uring for apps and that keeps io_uring enabled
> > for processes started via 'adb root && adb shell ...'
>
> Android already uses seccomp to prevent untrusted applications from using
> io_uring. This patch is aimed at server/desktop environments where there is
> no easy way to set a system-wide seccomp policy and right now the only way
> to disable io_uring system-wide is to compile it out of the kernel entirely
> (not really feasible for e.g. a general-purpose distro).
>
> I thought about adding a capability check that lets privileged processes
> bypass this sysctl, but it wasn't clear to me which capability I should use.
> For userfaultfd the kernel uses CAP_SYS_PTRACE, but I wasn't sure that's
> the best choice here since io_uring has nothing to do with ptrace.
> If anyone has any suggestions please let me know. A LSM hook also sounds
> like an option but it would be more complicated to implement and use.

Have you considered that the new sysctl is "sticky like kexec_load_disabled.
When the user disables it there is no way to turn it back on until the
system is rebooted.

Best regards!

--
Ricardo Ribalda

2023-06-28 14:23:17

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [PATCH 1/1] Add a new sysctl to disable io_uring system-wide

Matteo Rizzo <[email protected]> writes:

> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index d85d90f5d000..3c53a238332a 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -450,6 +450,20 @@ this allows system administrators to override the
> ``IA64_THREAD_UAC_NOPRINT`` ``prctl`` and avoid logs being flooded.
>
>
> +io_uring_disabled
> +=========================
> +
> +Prevents all processes from creating new io_uring instances. Enabling this
> +shrinks the kernel's attack surface.
> +
> += =============================================================
> +0 All processes can create io_uring instances as normal. This is the default
> + setting.
> +1 io_uring is disabled. io_uring_setup always fails with -EPERM. Existing
> + io_uring instances can still be used.
> += =============================================================

I had an internal request for something like this recently. If we go
this route, we could use a intermediary option that limits io_uring
to root processes only.

--
Gabriel Krisman Bertazi

2023-06-28 15:53:06

by Matteo Rizzo

[permalink] [raw]
Subject: Re: [PATCH 1/1] Add a new sysctl to disable io_uring system-wide

On Wed, 28 Jun 2023 at 13:44, Ricardo Ribalda <[email protected]> wrote:
>
> Have you considered that the new sysctl is "sticky like kexec_load_disabled.
> When the user disables it there is no way to turn it back on until the
> system is rebooted.

Are you suggesting making this sysctl sticky? Are there any examples of how to
implement a sticky sysctl that can take more than 2 values in case we want to
add an intermediate level that still allows privileged processes to use
io_uring? Also, what would be the use case? Preventing privileged processes
from re-enabling io_uring?

Thanks!
--
Matteo

2023-06-28 15:58:08

by Jeff Moyer

[permalink] [raw]
Subject: Re: [PATCH 1/1] Add a new sysctl to disable io_uring system-wide

Matteo Rizzo <[email protected]> writes:

> On Wed, 28 Jun 2023 at 13:44, Ricardo Ribalda <[email protected]> wrote:
>>
>> Have you considered that the new sysctl is "sticky like kexec_load_disabled.
>> When the user disables it there is no way to turn it back on until the
>> system is rebooted.
>
> Are you suggesting making this sysctl sticky? Are there any examples of how to
> implement a sticky sysctl that can take more than 2 values in case we want to
> add an intermediate level that still allows privileged processes to use
> io_uring? Also, what would be the use case? Preventing privileged processes
> from re-enabling io_uring?

See unprivileged_bpf_disabled for an example. I can't speak to the use
case for a sticky value.

-Jeff


2023-06-28 16:14:52

by Jeff Moyer

[permalink] [raw]
Subject: Re: [PATCH 1/1] Add a new sysctl to disable io_uring system-wide

Gabriel Krisman Bertazi <[email protected]> writes:

> Matteo Rizzo <[email protected]> writes:
>
>> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
>> index d85d90f5d000..3c53a238332a 100644
>> --- a/Documentation/admin-guide/sysctl/kernel.rst
>> +++ b/Documentation/admin-guide/sysctl/kernel.rst
>> @@ -450,6 +450,20 @@ this allows system administrators to override the
>> ``IA64_THREAD_UAC_NOPRINT`` ``prctl`` and avoid logs being flooded.
>>
>>
>> +io_uring_disabled
>> +=========================
>> +
>> +Prevents all processes from creating new io_uring instances. Enabling this
>> +shrinks the kernel's attack surface.
>> +
>> += =============================================================
>> +0 All processes can create io_uring instances as normal. This is the default
>> + setting.
>> +1 io_uring is disabled. io_uring_setup always fails with -EPERM. Existing
>> + io_uring instances can still be used.
>> += =============================================================
>
> I had an internal request for something like this recently. If we go
> this route, we could use a intermediary option that limits io_uring
> to root processes only.

This is all regrettable, but this option makes the most sense to me.
Testing for CAP_SYS_ADMIN or CAP_SYS_RAW_IO would work for that third
option, I think.

-Jeff


2023-06-28 16:30:22

by Ricardo Ribalda

[permalink] [raw]
Subject: Re: [PATCH 1/1] Add a new sysctl to disable io_uring system-wide

HI Matteo

On Wed, 28 Jun 2023 at 17:12, Matteo Rizzo <[email protected]> wrote:
>
> On Wed, 28 Jun 2023 at 13:44, Ricardo Ribalda <[email protected]> wrote:
> >
> > Have you considered that the new sysctl is "sticky like kexec_load_disabled.
> > When the user disables it there is no way to turn it back on until the
> > system is rebooted.
>
> Are you suggesting making this sysctl sticky? Are there any examples of how to
> implement a sticky sysctl that can take more than 2 values in case we want to
> add an intermediate level that still allows privileged processes to use
> io_uring? Also, what would be the use case? Preventing privileged processes
> from re-enabling io_uring?

Yes, if this sysctl is accepted, I think it would make sense to make it sticky.

For more than one value take a look to kexec_load_limit_reboot and
kexec_load_limit_panic

Thanks!

>
> Thanks!
> --
> Matteo



--
Ricardo Ribalda