Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp1672618ybt; Mon, 15 Jun 2020 06:37:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz1kpGVwP5Ghabp52iqdSJPOswrIAJqrjY2+RN1QGEsvv/waS+5WgU4MBpfLq4txS2NI7VQ X-Received: by 2002:a05:6402:7cb:: with SMTP id u11mr23043788edy.316.1592228273355; Mon, 15 Jun 2020 06:37:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592228273; cv=none; d=google.com; s=arc-20160816; b=SwDnH0jRUN9yffHtmDWKX22KcU47zogceBXfOhKaqfKrwCYHp96fKb3oMUNHBT7GGC MowusjmgXruBoprfQ332b+yDKzqEgM5nat3je+JzSZBAcV+kCiPBrPHtJf1w5+wPIYoD KGI4A4SpnNXN/qqlG2JJSsHBsLtTUlbl6ixWDYjUCTCmJUqwDiWtwJgzmIlhv8w/G9uq pP+1A/385E4andU15ZnOn+3H0fa/5SpyfkZVteEJMBPTLxnbVhAoisqwN2Web8auwFq5 eHZcmZiQVoXwTe9ipiEUWbX/yge2ijCHWZnqQyVJJX2TSectbUzU65MF1CXhsQRicO86 WhVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=rW5iv07E6/hrUraCliFumYl+bzeZy7nOt0DSLH2NiHw=; b=DLXIvH+Elms+2vDfQ1ITbIw1kBXPWIafkZaKFQucCC85ouK/sQ8owJ1AaO27c1uga7 3XZuoJWEsgMkN7B4DYXMVbMKAYl0mNGIN+MLtG24NW/mYNPUdw9DVclPP5f/3S2sj9Oc OmkaWnlpMopU+brXsn9uUrgnSynoe9DMTQdLBy9sG6g//GQ2sNOLDq0IfthIl1zP+8qd ktf6KTKv9Fvoc9pu6qbsUrZDfdFH2Z14U6kyzw9yYigYhuIEzFTSbThym8gbaOoJhetb YJbN3Z9nn5c4t7NLGXNaa1Yq9XUtwUuSWyUfhb9Rh0DVnVwcvdnGly+YRtTv+8xSJoQb 0ABA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=WYviMdqQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k3si9038041ejx.122.2020.06.15.06.37.29; Mon, 15 Jun 2020 06:37:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=WYviMdqQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730556AbgFONd0 (ORCPT + 99 others); Mon, 15 Jun 2020 09:33:26 -0400 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:60303 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1730065AbgFONdW (ORCPT ); Mon, 15 Jun 2020 09:33:22 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1592227999; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=rW5iv07E6/hrUraCliFumYl+bzeZy7nOt0DSLH2NiHw=; b=WYviMdqQ5pSKcUymcCZwcFnPBBDDxM8N7S2qDqxKWsLmg8BZrcjA2zm7V0UJ7IWQlAbMw1 IgLRhXJZywWBIG1Rchi272wNlbFeeAF3Cfn2/QPNRbpFyak1QTUna84NvYLWLygb6wsdkI UuPGliWCafBuCEk0ZtJmGAFKeqW4P98= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-108-sS7K9garOXu5AYIN-tR-MQ-1; Mon, 15 Jun 2020 09:33:17 -0400 X-MC-Unique: sS7K9garOXu5AYIN-tR-MQ-1 Received: by mail-wr1-f72.google.com with SMTP id e7so7097830wrp.14 for ; Mon, 15 Jun 2020 06:33:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=rW5iv07E6/hrUraCliFumYl+bzeZy7nOt0DSLH2NiHw=; b=eCKfQ+QYAeKjNTPBQA9F58m/6Hno9IbHOwDXXvtnzjlQLkur99HYbWP4fsB3kgnLwR WYBoK9ha/ms4SIjGaQSLvfgVOnJGO0LHNVXMZDNqPllwqN02PKtp2EJQY7ZgF0T5aU34 96gYtzFUDn+csFBdnxt4vWkTZsPiLwOUHTK7/62wMIKNX7EYIJANibw9e+PNthYgTSvo TpUIUEOYJ3IU0+vvH6S/8pv6c/dCgViAcowweowm7Rqquu+X9iYc15n0tsCXLqTKR/1x NZ6ysF0lveANIgBAQfdJahIw0WGlu/usEGceKinGmWgxnmEJccodd2b2kq9x+4LXHv1e 7PUA== X-Gm-Message-State: AOAM533fC56xxQEyevxDt9jz4FhrK6RETyMPvKEOLP5zTknedhcrXfj/ 6neawttGuxWshbEWWvC8G4I8F4kHFvdHTDk2rqL3j15dZcgURWoUyMTtkYqAwzoDkt1XEKq+dtO l/CiIrSLbZ5p45sQcbl4RTJcw X-Received: by 2002:a1c:dd44:: with SMTP id u65mr14130849wmg.180.1592227995767; Mon, 15 Jun 2020 06:33:15 -0700 (PDT) X-Received: by 2002:a1c:dd44:: with SMTP id u65mr14130816wmg.180.1592227995477; Mon, 15 Jun 2020 06:33:15 -0700 (PDT) Received: from steredhat ([5.180.207.22]) by smtp.gmail.com with ESMTPSA id b19sm53221wmj.0.2020.06.15.06.33.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Jun 2020 06:33:14 -0700 (PDT) Date: Mon, 15 Jun 2020 15:33:10 +0200 From: Stefano Garzarella To: Jann Horn Cc: Kees Cook , Christian Brauner , Sargun Dhillon , Aleksa Sarai , Jens Axboe , Stefan Hajnoczi , Jeff Moyer , io-uring , kernel list , Kernel Hardening Subject: Re: [RFC] io_uring: add restrictions to support untrusted applications and guests Message-ID: <20200615133310.qwdmnctrir5zgube@steredhat> References: <20200609142406.upuwpfmgqjeji4lc@steredhat> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 15, 2020 at 11:04:06AM +0200, Jann Horn wrote: > +Kees, Christian, Sargun, Aleksa, kernel-hardening for their opinions > on seccomp-related aspects > > On Tue, Jun 9, 2020 at 4:24 PM Stefano Garzarella wrote: > > Hi Jens, > > Stefan and I have a proposal to share with io_uring community. > > Before implementing it we would like to discuss it to receive feedbacks and > > to see if it could be accepted: > > > > Adding restrictions to io_uring > > ===================================== > > The io_uring API provides submission and completion queues for performing > > asynchronous I/O operations. The queues are located in memory that is > > accessible to both the host userspace application and the kernel, making it > > possible to monitor for activity through polling instead of system calls. This > > design offers good performance and this makes exposing io_uring to guests an > > attractive idea for improving I/O performance in virtualization. > [...] > > Restrictions > > ------------ > > This document proposes io_uring API changes that safely allow untrusted > > applications or guests to use io_uring. io_uring's existing security model is > > that of kernel system call handler code. It is designed to reject invalid > > inputs from host userspace applications. Supporting guests as io_uring API > > clients adds a new trust domain with access to even fewer resources than host > > userspace applications. > > > > Guests do not have direct access to host userspace application file descriptors > > or memory. The host userspace application, a Virtual Machine Monitor (VMM) such > > as QEMU, grants access to a subset of its file descriptors and memory. The > > allowed file descriptors are typically the disk image files belonging to the > > guest. The memory is typically the virtual machine's RAM that the VMM has > > allocated on behalf of the guest. > > > > The following extensions to the io_uring API allow the host application to > > grant access to some of its file descriptors. > > > > These extensions are designed to be applicable to other use cases besides > > untrusted guests and are not virtualization-specific. For example, the > > restrictions can be used to allow only a subset of sqe operations available to > > an application similar to seccomp syscall whitelisting. > > > > An address translation and memory restriction mechanism would also be > > necessary, but we can discuss this later. > > > > The IOURING_REGISTER_RESTRICTIONS opcode > > ---------------------------------------- > > The new io_uring_register(2) IOURING_REGISTER_RESTRICTIONS opcode permanently > > installs a feature whitelist on an io_ring_ctx. The io_ring_ctx can then be > > passed to untrusted code with the knowledge that only operations present in the > > whitelist can be executed. > > This approach of first creating a normal io_uring instance and then > installing restrictions separately in a second syscall means that it > won't be possible to use seccomp to restrict newly created io_uring > instances; code that should be subject to seccomp restrictions and > uring restrictions would only be able to use preexisting io_uring > instances that have already been configured by trusted code. > > So I think that from the seccomp perspective, it might be preferable > to set up these restrictions in the io_uring_setup() syscall. It might > also be a bit nicer from a code cleanliness perspective, since you > won't have to worry about concurrently changing restrictions. > Thank you for these details! It seems feasible to include the restrictions during io_uring_setup(). The only doubt concerns the possibility of allowing the trusted code to do some operations, before passing queues to the untrusted code, for example registering file descriptors, buffers, eventfds, etc. To avoid this, I should include these operations in io_uring_setup(), adding some code that I wanted to avoid by reusing io_uring_register(). If I add restrictions in io_uring_setup() and then add an operation to go into safe mode (e.g. a flag in io_uring_enter()), we would have the same problem, right? Just to be clear, I mean something like this: /* params will include restrictions */ fd = io_uring_setup(entries, params); /* trusted code */ io_uring_register_files(fd, ...); io_uring_register_buffers(fd, ...); io_uring_register_eventfd(fd, ...); /* enable safe mode */ io_uring_enter(fd, ..., IORING_ENTER_ENABLE_RESTRICTIONS); Anyway, including a list of things to register in the 'params', passed to io_uring_setup(), should be feasible, if Jens agree :-) Thanks, Stefano