Received: by 2002:a25:e74b:0:0:0:0:0 with SMTP id e72csp835173ybh; Tue, 21 Jul 2020 09:01:55 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy6Qt3HURML1wqAu8G0+P/3HkKKC192ees+Bitd2dWPyQrTR3mG6PbKjeWpx3Q8m3QkIV/Y X-Received: by 2002:a17:906:7694:: with SMTP id o20mr25554036ejm.289.1595347315388; Tue, 21 Jul 2020 09:01:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1595347315; cv=none; d=google.com; s=arc-20160816; b=dO1bYdGaoXfkH/HGUHeH6SJ8ETpj5GVmEniTzMKKW/S6tkdjk6UYAZdAXxKzASaKJs EWGl1slYDaUYmjOpRKI5ww5p5N+AAUEx7HcACw6zb93hb8flC2vz0VkdmmmUvs25YZdZ ykxGF1wiPupcZmo+slymfETyoPgqtHKw2sNsrLQlX7mn1f8Qx8KMHef9ec8GbWDhAi1c +EZnA2uoLVV9/8pFYQiylmb8hiTR/lqqQcE5tNG36rlCfwg0hQzYyaXEQo2YHGOhYXC9 YzBIrZYEM4+8+pJY1BCkmu43n0+tgiCKLm98pq/6SE3yeLBx7CknyV/5kD9kcJSO0jCp 5zCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=dgPUhH0r+sKpeUc9BKumgQDPz2ogAxxmhche9Bz8CLg=; b=YJuzon5t/qI5JKyJGXfWd/KF2NJnLeDy3i01vIBzihpHt0Irb0sdWsFrwUaFyfMcV7 cMrS6OZEqTuGBWdsgVAAydBByOF8tpr47gxun82fFVvJV8IIyytbsl8uDMqxZ4ErsN7x FUJxHpsZQhd7D1Uf/EJycNzY3P7vgPJvqpCgQEj3YY1m/1vE+gDyVC8aAi8bBtN1gV3d wtdBQ45C/yvqGcqq7T6Qwaxixqyf9TAYJ2wFGUCaRAoEPWiBGpp3RMlOqJsxQt6BRV7r ANdlCfE3J3GhD6grh5IQ6yjqaNSfsfKcskG4KkqKXi++GncDYbAYuZZMTNq3tOAXR5kg f2Kg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="cy/CtUEZ"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f21si11458307ejt.239.2020.07.21.09.01.32; Tue, 21 Jul 2020 09:01:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="cy/CtUEZ"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727903AbgGUP7D (ORCPT + 99 others); Tue, 21 Jul 2020 11:59:03 -0400 Received: from us-smtp-2.mimecast.com ([205.139.110.61]:32229 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726890AbgGUP7C (ORCPT ); Tue, 21 Jul 2020 11:59:02 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1595347141; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=dgPUhH0r+sKpeUc9BKumgQDPz2ogAxxmhche9Bz8CLg=; b=cy/CtUEZ479MMrDC2RrEHxk0Bqasvq+bsA4KNHktokhvaOeW9phYETPWa0fYRGcPKhuyAL Us9Syvf4Gz5syuBK0hDlQlP8fksxEeDXXfLceS44HjHLXPdRJrjWjMhXDPJ2hkS4qI2ola ZHxDu+mNNOcmn9cXzGABywMJjSBBF0Y= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-502-Ez40Nm_xPt-NTFcauLAM6g-1; Tue, 21 Jul 2020 11:58:59 -0400 X-MC-Unique: Ez40Nm_xPt-NTFcauLAM6g-1 Received: by mail-wm1-f70.google.com with SMTP id o13so1114667wmh.9 for ; Tue, 21 Jul 2020 08:58:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=dgPUhH0r+sKpeUc9BKumgQDPz2ogAxxmhche9Bz8CLg=; b=oMzXF8OdD6gCupZ+cKy/0h3/I2gqgVzzBe75D6VDbQfqNN9BtI02bfU1opLaRLCcx3 jAtI4nrK5GhGOtpqxZXDrb/43kj5fpSBLKt/DKW/zYNMzMFeJp1UV59HOEkvDVtkzLNq uY9iVXGCxNjkhXMMHeTM3jsPO12ByIC91NcvZ0TzEb4NvFGj0Bjnu3OOXM/B2Qu8GNhK 1aa/nQNlBO29Y3f0kSdOnaY7KJANZ2t/SKH3csx1kHS/+u86FKMDiXfg/AEaG8xvXmYS 0xH5q+PSQO8XG05S7JvwJ8Rpol7z0SesDL1+n0z1lr2DzIoB10kAOEgY2AMPCKxzTFA8 lt0g== X-Gm-Message-State: AOAM532HnYwT5ASPK05dPUNQwKiLW6nZ5n07us2SwjyVTVT5rH58fSVm S+Sov37sehay3rOHrjry/+CVN+wYb6q4uDvsQnBGhv5TEqvJH7v5fhk8Vd1T0S2TdhEg6Us9u0e cJKARbPjhDaTpZc+TPRcnIbAy X-Received: by 2002:a1c:bc8a:: with SMTP id m132mr4446940wmf.1.1595347138040; Tue, 21 Jul 2020 08:58:58 -0700 (PDT) X-Received: by 2002:a1c:bc8a:: with SMTP id m132mr4446919wmf.1.1595347137664; Tue, 21 Jul 2020 08:58:57 -0700 (PDT) Received: from steredhat ([5.180.207.22]) by smtp.gmail.com with ESMTPSA id u2sm3741424wml.16.2020.07.21.08.58.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Jul 2020 08:58:56 -0700 (PDT) Date: Tue, 21 Jul 2020 17:58:48 +0200 From: Stefano Garzarella To: Andy Lutomirski Cc: Jens Axboe , Christoph Hellwig , Kees Cook , Pavel Begunkov , Miklos Szeredi , Matthew Wilcox , Jann Horn , Christian Brauner , strace-devel@lists.strace.io, io-uring@vger.kernel.org, Linux API , Linux FS Devel , LKML , Michael Kerrisk , Stefan Hajnoczi Subject: Re: strace of io_uring events? Message-ID: <20200721155848.32xtze5ntvcmjv63@steredhat> References: <20200715171130.GG12769@casper.infradead.org> <7c09f6af-653f-db3f-2378-02dca2bc07f7@gmail.com> <48cc7eea-5b28-a584-a66c-4eed3fac5e76@gmail.com> <202007151511.2AA7718@keescook> <20200716131404.bnzsaarooumrp3kx@steredhat> <202007160751.ED56C55@keescook> <20200717080157.ezxapv7pscbqykhl@steredhat.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 21, 2020 at 08:27:34AM -0700, Andy Lutomirski wrote: > On Fri, Jul 17, 2020 at 1:02 AM Stefano Garzarella wrote: > > > > On Thu, Jul 16, 2020 at 08:12:35AM -0700, Kees Cook wrote: > > > On Thu, Jul 16, 2020 at 03:14:04PM +0200, Stefano Garzarella wrote: > > > > access (IIUC) is possible without actually calling any of the io_uring > > > syscalls. Is that correct? A process would receive an fd (via SCM_RIGHTS, > > > pidfd_getfd, or soon seccomp addfd), and then call mmap() on it to gain > > > access to the SQ and CQ, and off it goes? (The only glitch I see is > > > waking up the worker thread?) > > > > It is true only if the io_uring istance is created with SQPOLL flag (not the > > default behaviour and it requires CAP_SYS_ADMIN). In this case the > > kthread is created and you can also set an higher idle time for it, so > > also the waking up syscall can be avoided. > > I stared at the io_uring code for a while, and I'm wondering if we're > approaching this the wrong way. It seems to me that most of the > complications here come from the fact that io_uring SQEs don't clearly > belong to any particular security principle. (We have struct creds, > but we don't really have a task or mm.) But I'm also not convinced > that io_uring actually supports cross-mm submission except by accident > -- as it stands, unless a user is very careful to only submit SQEs > that don't use user pointers, the results will be unpredictable. > Perhaps we can get away with this: > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index 74bc4a04befa..92266f869174 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -7660,6 +7660,20 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, > fd, u32, to_submit, > if (!percpu_ref_tryget(&ctx->refs)) > goto out_fput; > > + if (unlikely(current->mm != ctx->sqo_mm)) { > + /* > + * The mm used to process SQEs will be current->mm or > + * ctx->sqo_mm depending on which submission path is used. > + * It's also unclear who is responsible for an SQE submitted > + * out-of-process from a security and auditing perspective. > + * > + * Until a real usecase emerges and there are clear semantics > + * for out-of-process submission, disallow it. > + */ > + ret = -EACCES; > + goto out; > + } > + > /* > * For SQ polling, the thread will do all submissions and completions. > * Just return the requested submit count, and wake the thread if > > If we can do that, then we could bind seccomp-like io_uring filters to > an mm, and we get obvious semantics that ought to cover most of the > bases. > > Jens, Christoph? > > Stefano, what's your intended usecase for your restriction patchset? > Hi Andy, my use case concerns virtualization. The idea, that I described in the proposal of io-uring restrictions [1], is to share io_uring CQ and SQ queues with a guest VM for block operations. In the PoC that I realized, there is a block device driver in the guest that uses io_uring queues coming from the host to submit block requests. Since the guest is not trusted, we need restrictions to allow only a subset of syscalls on a subset of file descriptors and memory. Cheers, Stefano [1] https://lore.kernel.org/io-uring/20200609142406.upuwpfmgqjeji4lc@steredhat/