Received: by 2002:a25:e74b:0:0:0:0:0 with SMTP id e72csp982965ybh; Tue, 21 Jul 2020 12:44:55 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyjWC8LuNy7j062odn4lKjaD6eUPhX6M28yHRHl31ol2sXua2bVLl28IlbG6Gk8hGg4uzSw X-Received: by 2002:a17:906:4dd4:: with SMTP id f20mr2605360ejw.170.1595360695664; Tue, 21 Jul 2020 12:44:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1595360695; cv=none; d=google.com; s=arc-20160816; b=F+K5pSxjPjlBrq6xOeNKDcGWO9mTD5gyXweuHHLqJ6CAf/YhS2UOkeNx6ppbK8spJq Hc+uz/vYFae6txjSRq1NuBYDpS4LvxqPFQjNxRj/0aiJx5K3Ga/7XwdxYMdzcBtMhlSu UQeEmXU09g6RadkofUhswrwn0wJoUfOq3SesbVQHj1nmUYEwgMgsK/hMcMou2jcXRq4S IVY1oKV6cxIutg5rGZm/fDWzbJr/j+54oA3cBvEXfxnpLl2QAl9jEWj1maZudbqprLlG 4vjIHIoO59eNGqcT56kR468X9i5zs6NdVp1Lus0Y2/n02FWSxJwz9wI3AOOfEN2dO6ZK J7RA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=i1a15x0943beqYg8iI9UihIiyO1nH53FflXuZvLzOVI=; b=pOlBAImR39uMDCogNAygF4Xy08bvHIYuylUxgWatq49zN8eZG4fP5WzJYMB2y3bog0 TcnqBgeIVRQq4TdeXMGH8teKByfs1MAcOgJ/Y1jC9TmrKroMOEBLV7sGWlx8mbG9fUgI 2rmWQ6FSbgtm9eVMh6g7RYAKfcb6hfiWDFUzYXTI05+KELerVSqqH3roWKNLYveN6wZv gG/YKxp0IVS8xSblZFw02WbbW+LV3bPHXQfswsjXZwv7uSEatbP94znjmg7q4FoNcl2Q nEG0251J8iXrs8MIdHiD4P/KlLoZQ/XedHDFAhbuZ4AWny0gg9EVc6pTSElWQuw8AitJ DYhQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=KwzGcRpD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id lx8si13123226ejb.415.2020.07.21.12.44.31; Tue, 21 Jul 2020 12:44:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=KwzGcRpD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730903AbgGUToY (ORCPT + 99 others); Tue, 21 Jul 2020 15:44:24 -0400 Received: from mail.kernel.org ([198.145.29.99]:40006 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730250AbgGUToX (ORCPT ); Tue, 21 Jul 2020 15:44:23 -0400 Received: from mail-wm1-f47.google.com (mail-wm1-f47.google.com [209.85.128.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id B4FB92080D for ; Tue, 21 Jul 2020 19:44:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1595360663; bh=yPfilvL6mGVNKu4G5eZH+Fbv+tqaNwo2jtx+LBmp5J0=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=KwzGcRpD0itKg0elf04EU6Oet5+2W9gZSnYnUHHJhY5sFC9BzbR/l8N+zNJvwL7G9 1byOwK2vqVpBBB5i1ODabqUF2Xgcw3lEH2rSDuvBVprM6jujs+O5RoWzpFceFBVys3 KhLyQgYB68npWx282EjdgNIISX/ODldTOEOO1fn8= Received: by mail-wm1-f47.google.com with SMTP id 184so4047207wmb.0 for ; Tue, 21 Jul 2020 12:44:22 -0700 (PDT) X-Gm-Message-State: AOAM5304MBnI3dqgQ4DfGEInhxqgveo0a/rnedTAIE7/IzikYXfoJSIi VLzSVhQVN4hENkt2hG5E8XlY54Ar1b6IwwxmYeYr4g== X-Received: by 2002:a1c:e4d4:: with SMTP id b203mr5760719wmh.49.1595360661232; Tue, 21 Jul 2020 12:44:21 -0700 (PDT) MIME-Version: 1.0 References: <20200715171130.GG12769@casper.infradead.org> <7c09f6af-653f-db3f-2378-02dca2bc07f7@gmail.com> <48cc7eea-5b28-a584-a66c-4eed3fac5e76@gmail.com> <202007151511.2AA7718@keescook> <20200716131404.bnzsaarooumrp3kx@steredhat> <202007160751.ED56C55@keescook> <20200717080157.ezxapv7pscbqykhl@steredhat.lan> <39a3378a-f8f3-6706-98c8-be7017e64ddb@kernel.dk> <65ad6c17-37d0-da30-4121-43554ad8f51f@kernel.dk> In-Reply-To: <65ad6c17-37d0-da30-4121-43554ad8f51f@kernel.dk> From: Andy Lutomirski Date: Tue, 21 Jul 2020 12:44:09 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: strace of io_uring events? To: Jens Axboe Cc: Andy Lutomirski , Andres Freund , Stefano Garzarella , Christoph Hellwig , Kees Cook , Pavel Begunkov , Miklos Szeredi , Matthew Wilcox , Jann Horn , Christian Brauner , strace-devel@lists.strace.io, io-uring@vger.kernel.org, Linux API , Linux FS Devel , LKML , Michael Kerrisk , Stefan Hajnoczi Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 21, 2020 at 11:39 AM Jens Axboe wrote: > > On 7/21/20 11:44 AM, Andy Lutomirski wrote: > > On Tue, Jul 21, 2020 at 10:30 AM Jens Axboe wrote: > >> > >> On 7/21/20 11:23 AM, Andy Lutomirski wrote: > >>> On Tue, Jul 21, 2020 at 8:31 AM Jens Axboe wrote: > >>>> > >>>> On 7/21/20 9:27 AM, Andy Lutomirski wrote: > >>>>> On Fri, Jul 17, 2020 at 1:02 AM Stefano Garzarella wrote: > >>>>>> > >>>>>> On Thu, Jul 16, 2020 at 08:12:35AM -0700, Kees Cook wrote: > >>>>>>> On Thu, Jul 16, 2020 at 03:14:04PM +0200, Stefano Garzarella wrote: > >>>>> > >>>>>>> access (IIUC) is possible without actually calling any of the io_uring > >>>>>>> syscalls. Is that correct? A process would receive an fd (via SCM_RIGHTS, > >>>>>>> pidfd_getfd, or soon seccomp addfd), and then call mmap() on it to gain > >>>>>>> access to the SQ and CQ, and off it goes? (The only glitch I see is > >>>>>>> waking up the worker thread?) > >>>>>> > >>>>>> It is true only if the io_uring istance is created with SQPOLL flag (not the > >>>>>> default behaviour and it requires CAP_SYS_ADMIN). In this case the > >>>>>> kthread is created and you can also set an higher idle time for it, so > >>>>>> also the waking up syscall can be avoided. > >>>>> > >>>>> I stared at the io_uring code for a while, and I'm wondering if we're > >>>>> approaching this the wrong way. It seems to me that most of the > >>>>> complications here come from the fact that io_uring SQEs don't clearly > >>>>> belong to any particular security principle. (We have struct creds, > >>>>> but we don't really have a task or mm.) But I'm also not convinced > >>>>> that io_uring actually supports cross-mm submission except by accident > >>>>> -- as it stands, unless a user is very careful to only submit SQEs > >>>>> that don't use user pointers, the results will be unpredictable. > >>>> > >>>> How so? > >>> > >>> Unless I've missed something, either current->mm or sqo_mm will be > >>> used depending on which thread ends up doing the IO. (And there might > >>> be similar issues with threads.) Having the user memory references > >>> end up somewhere that is an implementation detail seems suboptimal. > >> > >> current->mm is always used from the entering task - obviously if done > >> synchronously, but also if it needs to go async. The only exception is a > >> setup with SQPOLL, in which case ctx->sqo_mm is the task that set up the > >> ring. SQPOLL requires root privileges to setup, and there's no task > >> entering the io_uring at all necessarily. It'll just submit sqes with > >> the credentials that are registered with the ring. > > > > Really? I admit I haven't fully followed how the code works, but it > > looks like anything that goes through the io_queue_async_work() path > > will use sqo_mm, and can't most requests that end up blocking end up > > there? It looks like, even if SQPOLL is not set, the mm used will > > depend on whether the request ends up blocking and thus getting queued > > for later completion. > > > > Or does some magic I missed make this a nonissue. > > No, you are wrong. The logic works as I described it. Can you enlighten me? I don't see any iov_iter_get_pages() calls or equivalents. If an IO is punted, how does the data end up in the io_uring_enter() caller's mm?