Received: by 2002:a25:e74b:0:0:0:0:0 with SMTP id e72csp895087ybh; Tue, 21 Jul 2020 10:24:18 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzT+xmfyzcYRPJy8j+cdUc3/BQCPYJtDnXr5PTNBPOHFIMz6yw+tSY8imb4pV39HmRUelzI X-Received: by 2002:a17:906:2a5b:: with SMTP id k27mr25759768eje.82.1595352258109; Tue, 21 Jul 2020 10:24:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1595352258; cv=none; d=google.com; s=arc-20160816; b=HAl8kOibPwJxgPWN9wjrI/Jvej2AQ2oFYtC1EE1cA+dy4gY9I4aKwakjbkZnusx9Vv o4KKY83zbtP3QOoJgVStDLDJM9STaBA7ateowUZ8K89ZXSrLSDd5fWsqeEPPjkB2MICa iZ4gBMhZzWRD01//S32zkylFUUJKsxq7VJvJNacN1+z77mVGhk8FLxZENV7wbgHVENai CCE78c4CyjaKfFeEKMpHr599s6ylz9mIfXlxMMPtVsAwWnE/91j9Uoc/N3PmFNpYKm7t Z16Bxgq6CW7ggwHNDNvyVQMYB3MvXRmbcyZhHYWYycj7X5QVYPyIkadGHcC36Sl5mO6n peoQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=UzG+6MtI2Msssum44o7TrmFhjWEo8dbwPkIMoK23IQI=; b=DPy9d+4LbSHTj+97HjZbw8nE0Dv96j5pAi6vmgEURkKxUxFspmHTNBkxvlPVha3DBw D8/4UYaimPQ4i0SLpTH5+YuOvOlMgg/e+NirL5LilTgabFWGfI76bWTXoD5qHHnBMiSU t5B0scpLQ3n20ZOZHdoZlMkQd3jhe9JqosCp54WktyL2OX70YRdN9lIVSOABGqzwxVRb 3VjhcqKX/DiPxuX324VGQ70Pv2cMNf2ES/olok1e2jpsTBsSOmaD4Ejz2tT5H0wNcL2a N3g+oBy2TOidONyt+k2ce7rlMi0njcE5CRSwLnObEL8wf2r5EQAI8h6pePXGkUrG60Tj lJiQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=jx74mrKp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u17si13543300ejj.13.2020.07.21.10.23.55; Tue, 21 Jul 2020 10:24:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=jx74mrKp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728922AbgGURXi (ORCPT + 99 others); Tue, 21 Jul 2020 13:23:38 -0400 Received: from mail.kernel.org ([198.145.29.99]:58570 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728306AbgGURXh (ORCPT ); Tue, 21 Jul 2020 13:23:37 -0400 Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com [209.85.128.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 67AA522BF5 for ; Tue, 21 Jul 2020 17:23:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1595352216; bh=o1VTqZy3PaEzLTyui6w9+v5TbCVASw3DdtRw6gu2BQI=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=jx74mrKpjkk5k6VOd5/kCwNOwwwf8wMs2cO4TwFtGtU593JDHKwoZZ/1yFW5DAFN3 uEJ0ee60GMUeaOGQbp5HsrLOothM3N5kpM8hhMr4fCTVv0ji9CEdSwTHGrvG5u9zNM wAgsF02n2BBAlOUQUx+bQzZ5BSxQioXpjNbvHXsQ= Received: by mail-wm1-f48.google.com with SMTP id c80so3612514wme.0 for ; Tue, 21 Jul 2020 10:23:36 -0700 (PDT) X-Gm-Message-State: AOAM5319CkIuGHd4f1esBwBQyck+mALB6nIH6u6CgfICR/4LKSEquRt7 LSADFIMBlxQDPu/lBLi8ix7b9Ocd0UrXabKV8dBJgA== X-Received: by 2002:a1c:56c3:: with SMTP id k186mr3860162wmb.21.1595352214965; Tue, 21 Jul 2020 10:23:34 -0700 (PDT) MIME-Version: 1.0 References: <20200715171130.GG12769@casper.infradead.org> <7c09f6af-653f-db3f-2378-02dca2bc07f7@gmail.com> <48cc7eea-5b28-a584-a66c-4eed3fac5e76@gmail.com> <202007151511.2AA7718@keescook> <20200716131404.bnzsaarooumrp3kx@steredhat> <202007160751.ED56C55@keescook> <20200717080157.ezxapv7pscbqykhl@steredhat.lan> <39a3378a-f8f3-6706-98c8-be7017e64ddb@kernel.dk> In-Reply-To: <39a3378a-f8f3-6706-98c8-be7017e64ddb@kernel.dk> From: Andy Lutomirski Date: Tue, 21 Jul 2020 10:23:22 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: strace of io_uring events? To: Jens Axboe , Andres Freund Cc: Andy Lutomirski , Stefano Garzarella , Christoph Hellwig , Kees Cook , Pavel Begunkov , Miklos Szeredi , Matthew Wilcox , Jann Horn , Christian Brauner , strace-devel@lists.strace.io, io-uring@vger.kernel.org, Linux API , Linux FS Devel , LKML , Michael Kerrisk , Stefan Hajnoczi Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 21, 2020 at 8:31 AM Jens Axboe wrote: > > On 7/21/20 9:27 AM, Andy Lutomirski wrote: > > On Fri, Jul 17, 2020 at 1:02 AM Stefano Garzarella wrote: > >> > >> On Thu, Jul 16, 2020 at 08:12:35AM -0700, Kees Cook wrote: > >>> On Thu, Jul 16, 2020 at 03:14:04PM +0200, Stefano Garzarella wrote: > > > >>> access (IIUC) is possible without actually calling any of the io_uring > >>> syscalls. Is that correct? A process would receive an fd (via SCM_RIGHTS, > >>> pidfd_getfd, or soon seccomp addfd), and then call mmap() on it to gain > >>> access to the SQ and CQ, and off it goes? (The only glitch I see is > >>> waking up the worker thread?) > >> > >> It is true only if the io_uring istance is created with SQPOLL flag (not the > >> default behaviour and it requires CAP_SYS_ADMIN). In this case the > >> kthread is created and you can also set an higher idle time for it, so > >> also the waking up syscall can be avoided. > > > > I stared at the io_uring code for a while, and I'm wondering if we're > > approaching this the wrong way. It seems to me that most of the > > complications here come from the fact that io_uring SQEs don't clearly > > belong to any particular security principle. (We have struct creds, > > but we don't really have a task or mm.) But I'm also not convinced > > that io_uring actually supports cross-mm submission except by accident > > -- as it stands, unless a user is very careful to only submit SQEs > > that don't use user pointers, the results will be unpredictable. > > How so? Unless I've missed something, either current->mm or sqo_mm will be used depending on which thread ends up doing the IO. (And there might be similar issues with threads.) Having the user memory references end up somewhere that is an implementation detail seems suboptimal. > > > Perhaps we can get away with this: > > > > diff --git a/fs/io_uring.c b/fs/io_uring.c > > index 74bc4a04befa..92266f869174 100644 > > --- a/fs/io_uring.c > > +++ b/fs/io_uring.c > > @@ -7660,6 +7660,20 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, > > fd, u32, to_submit, > > if (!percpu_ref_tryget(&ctx->refs)) > > goto out_fput; > > > > + if (unlikely(current->mm != ctx->sqo_mm)) { > > + /* > > + * The mm used to process SQEs will be current->mm or > > + * ctx->sqo_mm depending on which submission path is used. > > + * It's also unclear who is responsible for an SQE submitted > > + * out-of-process from a security and auditing perspective. > > + * > > + * Until a real usecase emerges and there are clear semantics > > + * for out-of-process submission, disallow it. > > + */ > > + ret = -EACCES; > > + goto out; > > + } > > + > > /* > > * For SQ polling, the thread will do all submissions and completions. > > * Just return the requested submit count, and wake the thread if > > That'll break postgres that already uses this, also see: > > commit 73e08e711d9c1d79fae01daed4b0e1fee5f8a275 > Author: Jens Axboe > Date: Sun Jan 26 09:53:12 2020 -0700 > > Revert "io_uring: only allow submit from owning task" > > So no, we can't do that. > Yikes, I missed that. Andres, how final is your Postgres branch? I'm wondering if we could get away with requiring a special flag when creating an io_uring to indicate that you intend to submit IO from outside the creating mm. Even if we can't make this change, we could plausibly get away with tying seccomp-style filtering to sqo_mm. IOW we'd look up a hypothetical sqo_mm->io_uring_filters to filter SQEs even when submitted from a different mm. --Andy