Received: by 2002:a25:e74b:0:0:0:0:0 with SMTP id e72csp1291504ybh; Thu, 16 Jul 2020 08:13:29 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwjbOKDPVml3t/4aLlD30HKDGDla7w81ElikhYKoG/1PgAaL8gAkIPd7vrq7/L6qH7EL2uB X-Received: by 2002:a50:fd07:: with SMTP id i7mr4854362eds.221.1594912409461; Thu, 16 Jul 2020 08:13:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1594912409; cv=none; d=google.com; s=arc-20160816; b=0vAXU7ZBIbmzCp8jstuHwKYwoENlF86RCFTmjtnJMlFvt29wpVGIjAMTmW2CyDKJ+8 bkB6k1GWjFLBSr+r2XPfoj/F6+Yr6uVKuQ4LgFl0tv1IE3TRChYPiMNRAWMTyA39FR+u Lc7kfd/KE3rjNVCENPwm7miBBGSFIIdBXAy3e90gLvARy9YmuDvxipL1kBvLV6wkewRN 4S3ulHjrC9pjLzb0RcAJkZyG1JhUd8cxz1VUFj4ISS+O0hbO8xJLAamsb854yZC7JkiB PdOWCEUSK1DcUmqMaeNQpnI7KmDTTT8XjoSiRC/Dpa989LcvYf9sB+XwzvV4Z+5KPmPO BcjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=N9cLlw64hWKscX8FZb2xpNSIxOyx8OVcITnqRM7FpVI=; b=bKDc/dfgjgS6hOTsX+rgRPY9I/OqUuVoVpeCnnhhb84rO1yZlQ+XY/U7C0lzTdiIc1 BO7z7jO9zvfSJx55JVEWEklp2uJA8TrqdYc7EgaU6EudHjV/PwjwYLf4g8HyMcvAFq2B B/JSccfRH2TITCK+qXoE4UnygPhfWCaW4m/qIMgDF4RRyM7Y8mTNL1MhzrCQX5NxHs/T uyukR2A/8Tbpk8SGwjG+kjivZdhuqpKi0jF3tXCdFZQObGuPe0thVIGvKmNpPpf0Md1D 4k92XA6Wswe/91D389L7BfZlhSd7PjkeObh3WIGB7sGLIh8LjEDmNz6y//gAMduJqpyC yPvQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=mv9JS06g; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id sd17si3521309ejb.306.2020.07.16.08.13.05; Thu, 16 Jul 2020 08:13:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=mv9JS06g; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728879AbgGPPMj (ORCPT + 99 others); Thu, 16 Jul 2020 11:12:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54412 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728725AbgGPPMi (ORCPT ); Thu, 16 Jul 2020 11:12:38 -0400 Received: from mail-pl1-x643.google.com (mail-pl1-x643.google.com [IPv6:2607:f8b0:4864:20::643]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 673C6C08C5C0 for ; Thu, 16 Jul 2020 08:12:38 -0700 (PDT) Received: by mail-pl1-x643.google.com with SMTP id d1so3999688plr.8 for ; Thu, 16 Jul 2020 08:12:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=N9cLlw64hWKscX8FZb2xpNSIxOyx8OVcITnqRM7FpVI=; b=mv9JS06gZ97E3G32ZQWL6GS0ElnkOcD6uZoaEwIKSYlYwnw+I789fgsnu9RIyKRAU+ UtjwIxEz8y/Rd1KzJvRZW5NPhGYYsAgc+9JcovTIRuRW4JSCU7Yy4LtIaNRwRtKYVMAa O0HqOafeihrdM6BtpKQ9bf/CSzxCZGvW/VyGU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=N9cLlw64hWKscX8FZb2xpNSIxOyx8OVcITnqRM7FpVI=; b=hDnVqT3yHFP9hJ+wgySRDZdbtyRFfIOw5esT0qLDFKv3cZK0yOTsnMH6Ma55u62S6D AdTIg2w3gC1NjP+mYmwZd4P0vErMFEhGNZdjlfm49PXUWJcvjY+DDtTV5Jta/tFIpOuQ Jep0+w6KLEIM7hB16ak0sDZc1CF0VLkZ0U6/hBBt7dE8GNjVqGfxNmC5NE56h4P2ZLCP pLadG9epEkLohNRNpCrFsjr61n/5fMIf5rr1fz/mcJjAcxM8sjm+Fk/F5Xhjjbh0UGuh 8n72bIGssY7Wpn7wcYpyjoyHWE7aH4vetqUNRAznfKKvB1kXYQwKVEGLplRd97y2C+Wk NX/w== X-Gm-Message-State: AOAM531iaHa879j2g6/SXa+MnBgmbgFrIapMSQTzQaZnZ/rVXnrobbso zYO4FJz/wqe9wjGIiXEVgsyGmw== X-Received: by 2002:a17:90a:cc:: with SMTP id v12mr4994854pjd.96.1594912357822; Thu, 16 Jul 2020 08:12:37 -0700 (PDT) Received: from www.outflux.net (smtp.outflux.net. [198.145.64.163]) by smtp.gmail.com with ESMTPSA id z6sm5060155pfn.173.2020.07.16.08.12.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Jul 2020 08:12:36 -0700 (PDT) Date: Thu, 16 Jul 2020 08:12:35 -0700 From: Kees Cook To: Stefano Garzarella Cc: Pavel Begunkov , Miklos Szeredi , Matthew Wilcox , Andy Lutomirski , Jann Horn , Christian Brauner , strace-devel@lists.strace.io, io-uring@vger.kernel.org, Linux API , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Michael Kerrisk , Stefan Hajnoczi Subject: Re: strace of io_uring events? Message-ID: <202007160751.ED56C55@keescook> References: <20200715171130.GG12769@casper.infradead.org> <7c09f6af-653f-db3f-2378-02dca2bc07f7@gmail.com> <48cc7eea-5b28-a584-a66c-4eed3fac5e76@gmail.com> <202007151511.2AA7718@keescook> <20200716131404.bnzsaarooumrp3kx@steredhat> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200716131404.bnzsaarooumrp3kx@steredhat> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 16, 2020 at 03:14:04PM +0200, Stefano Garzarella wrote: > On Wed, Jul 15, 2020 at 04:07:00PM -0700, Kees Cook wrote: > [...] > > > Speaking to Stefano's proposal[1]: > > > > - There appear to be three classes of desired restrictions: > > - opcodes for io_uring_register() (which can be enforced entirely with > > seccomp right now). > > - opcodes from SQEs (this _could_ be intercepted by seccomp, but is > > not currently written) > > - opcodes of the types of restrictions to restrict... for making sure > > things can't be changed after being set? seccomp already enforces > > that kind of "can only be made stricter" > > In addition we want to limit the SQEs to use only the registered fd and buffers. Hmm, good point. Yeah, since it's an "extra" mapping (ioring file number vs fd number) this doesn't really map well to seccomp. (And frankly, there's some difficulty here mapping many of the ioring-syscalls to seccomp because it's happening "deeper" than the syscall layer (i.e. some of the arguments have already been resolved into kernel object pointers, etc). > Do you think it's better to have everything in seccomp instead of adding > the restrictions in io_uring (the patch isn't very big)? I'm still trying to understand how io_uring will be used, and it seems odd to me that it's effectively a seccomp bypass. (Though from what I can tell it is not an LSM bypass, which is good -- though I'm worried there might be some embedded assumptions in LSMs about creds vs current and LSMs may try to reason (or report) on actions with the kthread in mind, but afaict everything important is checked against creds. > With seccomp, would it be possible to have different restrictions for two > instances of io_uring in the same process? For me, this is the most compelling reason to have the restrictions NOT implemented via seccomp. Trying to make "which instance" choice in seccomp would be extremely clumsy. So at this point, I think it makes sense for the restriction series to carry on -- it is io_uring-specific and solves some problems that seccomp is not in good position to reason about. All this said, I'd still like a way to apply seccomp to io_uring because it's a rather giant syscall filter bypass mechanism, and gaining access (IIUC) is possible without actually calling any of the io_uring syscalls. Is that correct? A process would receive an fd (via SCM_RIGHTS, pidfd_getfd, or soon seccomp addfd), and then call mmap() on it to gain access to the SQ and CQ, and off it goes? (The only glitch I see is waking up the worker thread?) What appears to be the worst bit about adding seccomp to io_uring is the almost complete disassociation of process hierarchy from syscall action. Only a cred is used for io_uring, and seccomp filters are associated with task structs. I'm not sure if there is a way to solve this disconnect without a major internal refactoring of seccomp to attach to creds and then make every filter attachment create a new cred... *head explody* -- Kees Cook