Received: by 2002:a25:e74b:0:0:0:0:0 with SMTP id e72csp545744ybh; Wed, 22 Jul 2020 07:15:02 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyL7sKrJByl5qVw7BQIMssv9HzCyRJovli1UwA5yrS5fONuKSx/HdzMIjsMsqmWzUTruA0N X-Received: by 2002:a50:ee01:: with SMTP id g1mr29946369eds.264.1595427302468; Wed, 22 Jul 2020 07:15:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1595427302; cv=none; d=google.com; s=arc-20160816; b=sRXb4zqlnF74QMhAGh7RX+YVHavK/WEDJmc5KoSRIJbUpKl/fk7wyoDVWCguE8Vu4B rXarrmXVEPp4vDWtfFCOXG/8TsIX3txCwPrpvrx+kc9eXMftFnDOrtohFot5dbsidQst v3MnTzJE77guq8i+7ZKF06d+oW1xHtmYQEUCLSVkInk/V+zqVsNWT9NYbO9OVxVNil0z OX98f5V5RRnDGqfNwaFhmPSKlfv3MdgR7HFk7ijZNRBupNzgSEwUTG3vl/yFWb6zL/7G JgeTUL/rZFKK3glsYwW8ti8J7Y4Z6NUqWV8SX/WTVLHl+ktGG3DiZurg6xZkKM6IPxDz H43A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=ps88iqkraXTn/rbzr2+oz1MZHz7VliYQ9j+dqbYwLek=; b=esUOYqOFLIwIcHYg+O1tH8LD/Spvt58xEZr4eQHHm11SHItghs0qKweW7Kl8ZL6xMN Mno0x5wdX1cyuYsZNahx+WWQaLY4gWti/+CSDe8HSwZSZS5CDs2k/Yub9OFllhzXKVS/ dDN0/T4ExXb0ABaXdGuYFSaUFMP+uvJ5D+Nv4OtLpB8QbyUV3ZmMADJi/gCix0EoMqS3 hFKPie+JwSRTTN7Qk7aTPbKPQFy363OqnHt1f90Sq/fuM/P/kSj08oWxuiOXuUM/5Rzi xU8KuyObZhr3gin8rTCSemTLivFmUcZ+uRO9Usm7+84syepzwAhQDQwsCQfOAZ+J0JC5 /RIg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=FOkZQDff; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t24si59030ejr.733.2020.07.22.07.14.38; Wed, 22 Jul 2020 07:15:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=FOkZQDff; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730870AbgGVOOR (ORCPT + 99 others); Wed, 22 Jul 2020 10:14:17 -0400 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:30208 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726003AbgGVOOP (ORCPT ); Wed, 22 Jul 2020 10:14:15 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1595427253; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=ps88iqkraXTn/rbzr2+oz1MZHz7VliYQ9j+dqbYwLek=; b=FOkZQDffUfRQKnQf+vbJfg15RsxoX3fZ36VmEEhI4nzEaz5oixG9j9mqR1bX7bmjFrsUI5 37irN5qhC5Z4Pnez/MJsNAzSNEydKKqIfxNCla8O76Bsfp/o5rLJLk3UXuoDoXaXL/VlpI KnqJD8vdmoJ/rDnyt0TOkP/y+IqqEbU= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-180-KRr9dM-PMJGaDM2L3KC_AQ-1; Wed, 22 Jul 2020 10:14:12 -0400 X-MC-Unique: KRr9dM-PMJGaDM2L3KC_AQ-1 Received: by mail-wm1-f72.google.com with SMTP id h205so1147533wmf.0 for ; Wed, 22 Jul 2020 07:14:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=ps88iqkraXTn/rbzr2+oz1MZHz7VliYQ9j+dqbYwLek=; b=rG+HS1XDYNn3KKPHx5goltO/8Yra1WU7rOZDCqzjRRZVeoOSceNAm5jvIFfMJvPZ5y LZjElTef/B/9MN0GWCSOslTMjVft4qV/1yPQ6wX3C5WHSWNnvKsr1CF8b732Ek6h+5WH XuXdGTIyc5xb3GM3WgcTBVaTU9kaabDHOueBTHNwT1DO7svXqE+Y56j7oE1r6nppHt7Y mdqh/Di4peobInwlSxrSDqU1dl5RfCKkEs0COVDkfWYvFBBZPssJfDtd/Y4Uq7M0q06+ 2Nzpx4mU73wcRgNu2Mky4WYpPne0W1lOgy8NigFvWN3CA9nWoAi+/fbIQbgRdcQ/Jn2l PXTQ== X-Gm-Message-State: AOAM531pge6Xl+fMxPiAwpJaPzdBzWUu/oRWk2iatvIRn3z5uQQ4UxPZ XlNlcaoRtv5YDzXfel1qCQ9GIFy8VMAfSlsVUNzeDY4Jl0tr0dezypHgcsMjgW6FfPVHOtSv4hL YAUHBh2glXE9gOPkYKjE1I5Ii X-Received: by 2002:a7b:cd83:: with SMTP id y3mr8594863wmj.105.1595427250733; Wed, 22 Jul 2020 07:14:10 -0700 (PDT) X-Received: by 2002:a7b:cd83:: with SMTP id y3mr8594833wmj.105.1595427250464; Wed, 22 Jul 2020 07:14:10 -0700 (PDT) Received: from steredhat ([5.180.207.22]) by smtp.gmail.com with ESMTPSA id d18sm92174wrj.8.2020.07.22.07.14.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Jul 2020 07:14:09 -0700 (PDT) Date: Wed, 22 Jul 2020 16:14:04 +0200 From: Stefano Garzarella To: Daurnimator Cc: Jens Axboe , Alexander Viro , Kernel Hardening , Kees Cook , Aleksa Sarai , Stefan Hajnoczi , Christian Brauner , Sargun Dhillon , Jann Horn , io-uring , linux-fsdevel@vger.kernel.org, Jeff Moyer , linux-kernel@vger.kernel.org Subject: Re: [PATCH RFC v2 2/3] io_uring: add IOURING_REGISTER_RESTRICTIONS opcode Message-ID: <20200722141404.jfzfl3alpyw7o7dw@steredhat> References: <20200716124833.93667-1-sgarzare@redhat.com> <20200716124833.93667-3-sgarzare@redhat.com> <0fbb0393-c14f-3576-26b1-8bb22d2e0615@kernel.dk> <20200721104009.lg626hmls5y6ihdr@steredhat> <15f7fcf5-c5bb-7752-fa9a-376c4c7fc147@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 22, 2020 at 12:35:15PM +1000, Daurnimator wrote: > On Wed, 22 Jul 2020 at 03:11, Jens Axboe wrote: > > > > On 7/21/20 4:40 AM, Stefano Garzarella wrote: > > > On Thu, Jul 16, 2020 at 03:26:51PM -0600, Jens Axboe wrote: > > >> On 7/16/20 6:48 AM, Stefano Garzarella wrote: > > >>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h > > >>> index efc50bd0af34..0774d5382c65 100644 > > >>> --- a/include/uapi/linux/io_uring.h > > >>> +++ b/include/uapi/linux/io_uring.h > > >>> @@ -265,6 +265,7 @@ enum { > > >>> IORING_REGISTER_PROBE, > > >>> IORING_REGISTER_PERSONALITY, > > >>> IORING_UNREGISTER_PERSONALITY, > > >>> + IORING_REGISTER_RESTRICTIONS, > > >>> > > >>> /* this goes last */ > > >>> IORING_REGISTER_LAST > > >>> @@ -293,4 +294,30 @@ struct io_uring_probe { > > >>> struct io_uring_probe_op ops[0]; > > >>> }; > > >>> > > >>> +struct io_uring_restriction { > > >>> + __u16 opcode; > > >>> + union { > > >>> + __u8 register_op; /* IORING_RESTRICTION_REGISTER_OP */ > > >>> + __u8 sqe_op; /* IORING_RESTRICTION_SQE_OP */ > > >>> + }; > > >>> + __u8 resv; > > >>> + __u32 resv2[3]; > > >>> +}; > > >>> + > > >>> +/* > > >>> + * io_uring_restriction->opcode values > > >>> + */ > > >>> +enum { > > >>> + /* Allow an io_uring_register(2) opcode */ > > >>> + IORING_RESTRICTION_REGISTER_OP, > > >>> + > > >>> + /* Allow an sqe opcode */ > > >>> + IORING_RESTRICTION_SQE_OP, > > >>> + > > >>> + /* Only allow fixed files */ > > >>> + IORING_RESTRICTION_FIXED_FILES_ONLY, > > >>> + > > >>> + IORING_RESTRICTION_LAST > > >>> +}; > > >>> + > > >> > > >> Not sure I totally love this API. Maybe it'd be cleaner to have separate > > >> ops for this, instead of muxing it like this. One for registering op > > >> code restrictions, and one for disallowing other parts (like fixed > > >> files, etc). > > >> > > >> I think that would look a lot cleaner than the above. > > >> > > > > > > Talking with Stefan, an alternative, maybe more near to your suggestion, > > > would be to remove the 'struct io_uring_restriction' and add the > > > following register ops: > > > > > > /* Allow an sqe opcode */ > > > IORING_REGISTER_RESTRICTION_SQE_OP > > > > > > /* Allow an io_uring_register(2) opcode */ > > > IORING_REGISTER_RESTRICTION_REG_OP > > > > > > /* Register IORING_RESTRICTION_* */ > > > IORING_REGISTER_RESTRICTION_OP > > > > > > > > > enum { > > > /* Only allow fixed files */ > > > IORING_RESTRICTION_FIXED_FILES_ONLY, > > > > > > IORING_RESTRICTION_LAST > > > } > > > > > > > > > We can also enable restriction only when the rings started, to avoid to > > > register IORING_REGISTER_ENABLE_RINGS opcode. Once rings are started, > > > the restrictions cannot be changed or disabled. > > > > My concerns are largely: > > > > 1) An API that's straight forward to use > > 2) Something that'll work with future changes > > > > The "allow these opcodes" is straightforward, and ditto for the register > > opcodes. The fixed file I guess is the odd one out. So if we need to > > disallow things in the future, we'll need to add a new restriction > > sub-op. Should this perhaps be "these flags must be set", and that could > > easily be augmented with "these flags must not be set"? > > > > -- > > Jens Axboe > > > > This is starting to sound a lot like seccomp filtering. > Perhaps we should go straight to adding a BPF hook that fires when > reading off the submission queue? > You're right. I e-mailed about that whit Kees Cook [1] and he agreed that the restrictions in io_uring should allow us to address some issues that with seccomp it's a bit difficult. For example: - different restrictions for different io_uring instances in the same process - limit SQEs to use only registered fds and buffers Maybe seccomp could take advantage of the restrictions to filter SQEs opcodes. Thanks, Stefano [1] https://lore.kernel.org/io-uring/202007160751.ED56C55@keescook/