Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp221947iob; Wed, 18 May 2022 00:16:26 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwV+c4pod9tSKBhcjM3OPKbA6V1UAE+vRptQYAkOb8X7TqADXwkoFY8lJsuBetJMX2aDMhT X-Received: by 2002:a17:902:ea0e:b0:161:a888:660c with SMTP id s14-20020a170902ea0e00b00161a888660cmr6895837plg.10.1652858186166; Wed, 18 May 2022 00:16:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652858186; cv=none; d=google.com; s=arc-20160816; b=ljRpa9zKZR+kZ0zQQgObHUtIc9wd9LCF03epCLM/9WRZOCOv0RQQfk+sPND5uUSa6U qRJ22YwpbYx2tGi58RaXCJ9WRjbT4zTJQHFCqogdSEL7Xd85xH+k5zEEwRQ1IKXV5Ua7 7ssWGDNWI9W/1EplysqJuQau0Mc/jLWjySFnf/QhUxuAr0IkEKzkFQY3jQ4nkOyz7aAS mIgsou29qX5bPHJYmaAB6KA13E2vqHa386/ley7pe1r/Qz2BE1Fl5a+I5hnLW0jl63P1 lYKGXYyDfBjaW+SWqSL7hQjCKYzaZONA2q76PcGmegC20XfBIdKhDbNthwXw3MrN/kcg ShMA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=prrUFOxxPGBXsAVaNBQls2o1O45l2Jpg1ONT+gGV6XE=; b=ourqPKgihkLuy3UrpVOxA3WLm0UnU3fEiRfMMNlaXynfafnNYhOQV3B5CW8t5BR5bi YM22/ZID93hYlB66qjEUNJP3ROgsJ3coYgh6wXFU79YkX+0LaDhdudUeWX7BFIxesveX mKjROW53XIeTh8QmbjdhlE78uVd61qYkul/3azME8Kmbq6WQ6BRSe9XNaOzIQGyNSyrN tft7EAqMn6ZQ1/oZlzsXferr1nJ1OmXjkER1r/5RP7eCksCC/eTquDul7socfJlArn1J cA1kCMtRN7DbHKirkG62MudmmmTqpiD63Y3v3fgzvJxPX1Bhx+AhknCu6ht4MW1xRsOY O6Ow== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="AN/CF4Ep"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id s5-20020a17090302c500b0015eb226901csi1740751plk.595.2022.05.18.00.16.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 May 2022 00:16:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="AN/CF4Ep"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 6D90DEBEB8; Wed, 18 May 2022 00:10:25 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231769AbiERHKP (ORCPT + 99 others); Wed, 18 May 2022 03:10:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58700 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231770AbiERHKJ (ORCPT ); Wed, 18 May 2022 03:10:09 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 0F7B7E64FA for ; Wed, 18 May 2022 00:10:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1652857807; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=prrUFOxxPGBXsAVaNBQls2o1O45l2Jpg1ONT+gGV6XE=; b=AN/CF4EpIsT+w1Zeq7+96ozKQ+L8sG7huIDGIl7yY2S/VExQmnCaoXXyqP7QwvYBCIWhvm KJE/BQV3pTlGz9kiDGUMY+yacGWYWsa0M2GumPcdQFi8IHn9rPydXeWix7jqy5H9NvWFyD Kzdzrmv/MmD2hq9ANcCalE5N8pRNf5k= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-642-f4PwmSjFMrK1OTWJQfOgjw-1; Wed, 18 May 2022 03:10:03 -0400 X-MC-Unique: f4PwmSjFMrK1OTWJQfOgjw-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id BC9DD85A5BE; Wed, 18 May 2022 07:10:02 +0000 (UTC) Received: from T590 (ovpn-8-29.pek2.redhat.com [10.72.8.29]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 1CF6E492C3B; Wed, 18 May 2022 07:09:54 +0000 (UTC) Date: Wed, 18 May 2022 15:09:46 +0800 From: Ming Lei To: Stefan Hajnoczi Cc: Jens Axboe , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Harris James R , io-uring@vger.kernel.org, Gabriel Krisman Bertazi , ZiyangZhang , Xiaoguang Wang , ming.lei@redhat.com Subject: Re: [PATCH V2 0/1] ubd: add io_uring based userspace block driver Message-ID: References: <20220517055358.3164431-1-ming.lei@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 2.85 on 10.11.54.9 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 17, 2022 at 03:06:34PM +0100, Stefan Hajnoczi wrote: > Here are some more thoughts on the ubd-control device: > > The current patch provides a ubd-control device for processes with > suitable permissions (i.e. root) to create, start, stop, and fetch > information about devices. > > There is no isolation between devices created by one process and those I understand linux hasn't device namespace yet, so can you share the rational behind the idea of device isolation, is it because ubd device is served by ubd daemon which belongs to one pid NS? Or the user creating /dev/ubdbN belongs to one user NS? IMO, ubd device is one file in VFS, and FS permission should be applied, then here the closest model should be user NS, and process privilege & file ownership. > created by another. Therefore two processes that do not trust each other > cannot both use UBD without potential interference. There is also no Can you share what the expectation is for this situation? It is the created UBD which can only be used in this NS, or can only be visible inside this NS? I guess the latter isn't possible since we don't have this kind of isolation framework yet. > isolation for containers. > > I think it would be a mistake to keep the ubd-control interface in its > current form since the current global/root model is limited. Instead I > suggest: > - Creating a device returns a new file descriptor instead of a global > dev_id. The device can be started/stopped/configured through this (and > only through this) per-device file descriptor. The device is not > visible to other processes through ubd-control so interference is not > possible. In order to give another process control over the device the > fd can be passed (e.g. SCM_RIGHTS). > /dev/ubdcN can only be opened by the process which is the descendant of the process which creates the device by sending ADD_DEV. But the device can be deleted/queried by other processes, however, I think it is reasonable if all these processes has permission to do that, such as all processes owns the device with same uid. So can we apply process privilege & file ownership for isolating ubd device? If per-process FD is used, it may confuse people, because process can not delete/query ubd device even though its uid shows it has the privilege. > Now multiple applications/containers/etc can use ubd-control without > interfering with each other. The security model still requires root > though since devices can be malicious. > > FUSE allows unprivileged mounts (see fuse_allow_current_process()). Only > processes with the same uid as the FUSE daemon can access such mounts > (in the default configuration). This prevents security issues while > still allowing unprivileged use cases. OK, looks FUSE applies process privilege & file ownership for dealing with unprivileged mounts. > > I suggest adapting the FUSE security model to block devices: > - Devices can be created without CAP_SYS_ADMIN but they have an > 'unprivileged' flag set to true. > - Unprivileged devices are not probed for partitions and LVM doesn't > touch them. This means the kernel doesn't access these devices via > code paths that might be exploitable. The above two makes sense. > - When another process with a different uid from ubdsrv opens an > unprivileged device, -EACCES is returned. This protects other > uids from the unprivileged device. OK, only the user who owns the device can access unprivileged device. > - When another process with a different uid from ubdsrv opens a > _privileged_ device there is no special access check because ubdsrv is > privileged. IMO, it depends if uid of this process has permission to access the ubd device, and we can set ubd device's owership by the process credentials. > > With these changes UBD can be used by unprivileged processes and > containers. I think it's worth discussing the details and having this > model from the start so UBD can be used in a wide range of use cases. I am pretty happy to discuss & figure out the details, but not sure it is one blocker for ubd: 1) kernel driver of loop/nbd or others haven't support the isolation 2) still don't know exact ubd use case for containers Thanks, Ming