Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp20985916rwd; Thu, 29 Jun 2023 09:23:55 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5w+TiB0RcvnIxX7Fzue0efoDi8Abrji7ip9+JCL5xmyYuwHCyLYNEtNthFS4S8JPyd2eel X-Received: by 2002:a05:6a20:244c:b0:105:66d3:8572 with SMTP id t12-20020a056a20244c00b0010566d38572mr404354pzc.24.1688055835008; Thu, 29 Jun 2023 09:23:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688055834; cv=none; d=google.com; s=arc-20160816; b=GUTlriRdbVENrDY+wapkbVwzkNYYT5VI9nmkB22KcDiXx4yTnE00qTuLPrc/3NVrdC 8qF+vCW1zGuaGYsChr1fvaLzHDvidHVjRG4Ci/yBPKG8CbCS8aXLn3FDds4YbGbFBN6y swyROafdrsIthOwIFSPHRCPgWOs/4Aj9HolfQdYk0TMFUslMwUMA89D98nmlDNv3lHy6 CA5rRO2Mvc2FYrV5Uxn51wbrDZKkokF0yPmK3Mk8rFuC4UJgfzT3+eELAK+50PGMGWNJ rdgxcVf5pRPYJNsIVg5GMmwDOPcdFWSy34FQ0ZWBOjAbHLU4jWXONjSC+cpxqdsAhkGo Pg1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:message-id:in-reply-to :date:references:subject:cc:to:from:dkim-signature; bh=hXzn/T+eskwyoHAIYDRxcHdpwP2+My4amFKoi/35Pwg=; fh=S/zG86ynn7myFRCQobgtHcxfNyzxLL0y02Z7pRV29sY=; b=zXDg3hTP/si7n6m8cYPWWKq/NQ3sfWVVFB7LzX3OqcOdgne/wPIhd+gtAY+XT26/s0 1ZyJg4uBWQmMyfVqGhpZi4ceGXgSPxCpR0oM2+1q89ySJTdJkrtAa9d1E46Ilbh2fXWO LgTEDnjgccsVJwHhuJ4ZrINM6aAwE4H1U3O+chxYo2VGYa3wGTUK93VL/VR+YXaaCn72 K/W237Bg29o+5pdJn6oAi44JYbO5dw0xAmeiVRh2DDp9kgHG9T/dQFFMgLNzKijf77lS kQ5/s/GSZTyaFrKjE4YputuuPrAMKXKcnpDgnNeFxmV3SSbkf8WL/HRtv9bCBSC3+00p ydCA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NfHnsDuY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c4-20020a634e04000000b005577eec6c6csi10395317pgb.160.2023.06.29.09.23.42; Thu, 29 Jun 2023 09:23:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NfHnsDuY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232082AbjF2QM3 (ORCPT + 99 others); Thu, 29 Jun 2023 12:12:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37590 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232006AbjF2QM2 (ORCPT ); Thu, 29 Jun 2023 12:12:28 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 85B913596 for ; Thu, 29 Jun 2023 09:11:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1688055097; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=hXzn/T+eskwyoHAIYDRxcHdpwP2+My4amFKoi/35Pwg=; b=NfHnsDuY+2s4kzk9soFGpcBfZwELZ1aqvSagZu63oa2b7Y1WuZgtUPmCe+hc7xJrNM0XgY ZW3GWOdFB6W440BvSCkd7lCdstQPO9OtBuYF8yFWBYO4eOmIRqhRuwx77qvGVXT6UWjdkr 5KXJn7a2nUB8A44MWGGsxhu8nuLOUm0= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-569-Wpeqv30fOWmfcZrSHaZ9ug-1; Thu, 29 Jun 2023 12:11:32 -0400 X-MC-Unique: Wpeqv30fOWmfcZrSHaZ9ug-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id BF2C81C31C4B; Thu, 29 Jun 2023 16:11:30 +0000 (UTC) Received: from segfault.boston.devel.redhat.com (segfault.boston.devel.redhat.com [10.19.60.26]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 05AF2492B02; Thu, 29 Jun 2023 16:11:29 +0000 (UTC) From: Jeff Moyer To: Matteo Rizzo Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, io-uring@vger.kernel.org, jordyzomer@google.com, evn@google.com, poprdi@google.com, corbet@lwn.net, axboe@kernel.dk, asml.silence@gmail.com, akpm@linux-foundation.org, keescook@chromium.org, rostedt@goodmis.org, dave.hansen@linux.intel.com, ribalda@chromium.org, chenhuacai@kernel.org, steve@sk2.org, gpiccoli@igalia.com, ldufour@linux.ibm.com, bhe@redhat.com, oleksandr@natalenko.name Subject: Re: [PATCH v2 1/1] Add a new sysctl to disable io_uring system-wide References: <20230629132711.1712536-1-matteorizzo@google.com> <20230629132711.1712536-2-matteorizzo@google.com> X-PGP-KeyID: 1F78E1B4 X-PGP-CertKey: F6FE 280D 8293 F72C 65FD 5A58 1FF8 A7CA 1F78 E1B4 Date: Thu, 29 Jun 2023 12:17:20 -0400 In-Reply-To: <20230629132711.1712536-2-matteorizzo@google.com> (Matteo Rizzo's message of "Thu, 29 Jun 2023 13:27:11 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Matteo Rizzo writes: > Introduce a new sysctl (io_uring_disabled) which can be either 0, 1, > or 2. When 0 (the default), all processes are allowed to create io_uring > instances, which is the current behavior. When 1, all calls to > io_uring_setup fail with -EPERM unless the calling process has > CAP_SYS_ADMIN. When 2, calls to io_uring_setup fail with -EPERM > regardless of privilege. > > Signed-off-by: Matteo Rizzo This looks good to me. You may also consider updating the io_uring_setup(2) man page (part of liburing) to reflect this new meaning for -EPERM. Reviewed-by: Jeff Moyer > --- > Documentation/admin-guide/sysctl/kernel.rst | 19 +++++++++++++ > io_uring/io_uring.c | 30 +++++++++++++++++++++ > 2 files changed, 49 insertions(+) > > diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst > index 3800fab1619b..ee65f7aeb0cf 100644 > --- a/Documentation/admin-guide/sysctl/kernel.rst > +++ b/Documentation/admin-guide/sysctl/kernel.rst > @@ -450,6 +450,25 @@ this allows system administrators to override the > ``IA64_THREAD_UAC_NOPRINT`` ``prctl`` and avoid logs being flooded. > > > +io_uring_disabled > +================= > + > +Prevents all processes from creating new io_uring instances. Enabling this > +shrinks the kernel's attack surface. > + > += ================================================================== > +0 All processes can create io_uring instances as normal. This is the > + default setting. > +1 io_uring creation is disabled for unprivileged processes. > + io_uring_setup fails with -EPERM unless the calling process is > + privileged (CAP_SYS_ADMIN). Existing io_uring instances can > + still be used. > +2 io_uring creation is disabled for all processes. io_uring_setup > + always fails with -EPERM. Existing io_uring instances can still be > + used. > += ================================================================== > + > + > kexec_load_disabled > =================== > > diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c > index 1b53a2ab0a27..2343ae518546 100644 > --- a/io_uring/io_uring.c > +++ b/io_uring/io_uring.c > @@ -153,6 +153,22 @@ static __cold void io_fallback_tw(struct io_uring_task *tctx); > > struct kmem_cache *req_cachep; > > +static int __read_mostly sysctl_io_uring_disabled; > +#ifdef CONFIG_SYSCTL > +static struct ctl_table kernel_io_uring_disabled_table[] = { > + { > + .procname = "io_uring_disabled", > + .data = &sysctl_io_uring_disabled, > + .maxlen = sizeof(sysctl_io_uring_disabled), > + .mode = 0644, > + .proc_handler = proc_dointvec_minmax, > + .extra1 = SYSCTL_ZERO, > + .extra2 = SYSCTL_TWO, > + }, > + {}, > +}; > +#endif > + > struct sock *io_uring_get_socket(struct file *file) > { > #if defined(CONFIG_UNIX) > @@ -4000,9 +4016,18 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params) > return io_uring_create(entries, &p, params); > } > > +static inline bool io_uring_allowed(void) > +{ > + return sysctl_io_uring_disabled == 0 || > + (sysctl_io_uring_disabled == 1 && capable(CAP_SYS_ADMIN)); > +} > + > SYSCALL_DEFINE2(io_uring_setup, u32, entries, > struct io_uring_params __user *, params) > { > + if (!io_uring_allowed()) > + return -EPERM; > + > return io_uring_setup(entries, params); > } > > @@ -4577,6 +4602,11 @@ static int __init io_uring_init(void) > > req_cachep = KMEM_CACHE(io_kiocb, SLAB_HWCACHE_ALIGN | SLAB_PANIC | > SLAB_ACCOUNT | SLAB_TYPESAFE_BY_RCU); > + > +#ifdef CONFIG_SYSCTL > + register_sysctl_init("kernel", kernel_io_uring_disabled_table); > +#endif > + > return 0; > }; > __initcall(io_uring_init);