Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp619145pxj; Thu, 10 Jun 2021 08:42:12 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzCR0AmEcYDXL8ziBEFHbHseAL9IgPTxojCycrBmuJwvVzz7+cUy7gI9pIdMzwx/XjkwM2h X-Received: by 2002:a50:9549:: with SMTP id v9mr75966eda.312.1623339732020; Thu, 10 Jun 2021 08:42:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623339732; cv=none; d=google.com; s=arc-20160816; b=cpjwoy58Z13Ulib/PS/+99QvksLRLghfnKCB10Fhsvc7tn1OBDk/Ve0E2iJLcMXPyk at55xdTQnRfexdoiWK0RVyiRA8zpq3gkH88HetDHZv85zAkALnejWOZx0CjnaBu1t8LO /trJFcb6KOCQP8fuoGvOrYTlgRNZcB/lFaXdGFkGPTkw0HYPmkUQaX+9OuditA/rkyFQ zH4wIt63MqLjMoGU0KVunbjS0UQ1XYma9zqI6NT6TfZ2H4USd1Ipsrq48ZG1JGixLJR4 nxJZypHxDtJWJYKdrVS1elaGQmlkxUf62+F5IV1qviC3BTz2cNsSlXR51acz2k9UuEOc P/Ng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=H5vCQsPxj9Rl4XFnBcjEoEFYRWMfJkRJOGwBf1rRHTk=; b=NDMciTOixz+06cKHas2mMR7ZZeps5Bx3Y2wMGET1ZsVjZPCqceCX4gdsWDlAJDkB9x QH9m4JNEwsiEZ4lQlpg7dDJUJyxd+4dev243mgvFv1AVWXFpnu/932OJ1fV1wtP1wdRR n8vU9RydUMaisou+iAxu2duTyPXPp1/rJcZr0iINwZXToGByw/Cnwqknx9f3bjpVX5am QmGr2CEVFsmEFNYgd/DhRBiExO0K/Dz1VCu+CE1psX3mYd8uvB4cHAefUKoMdLgURRi1 gpQcwRwJUPZfVoPHHYs3JeI4BueRr5uYqkT5E1jJduIzpOHg37mKg4BLxqLTpCgNNVsH L/OQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="is/9f3h9"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id m15si2476050ejb.707.2021.06.10.08.41.48; Thu, 10 Jun 2021 08:42:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="is/9f3h9"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231672AbhFJPks (ORCPT + 99 others); Thu, 10 Jun 2021 11:40:48 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:25120 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231551AbhFJPkq (ORCPT ); Thu, 10 Jun 2021 11:40:46 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1623339528; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=H5vCQsPxj9Rl4XFnBcjEoEFYRWMfJkRJOGwBf1rRHTk=; b=is/9f3h9zNaS0sUO9crmnoYyPT/qRd4qksHF2qseLCRQpuSrI6rwf/IY+A2jl85wo1JZV+ AmyUMboUi6g3I3vVw0CcTk3yvgqJ5ZOaW0Er8jD8vRU/PmY4N1iRB4jNZQSS5mZ5MnlR3Z eLy9v6zU0bkikCG6+FtGIuak4HEprN0= Received: from mail-oo1-f71.google.com (mail-oo1-f71.google.com [209.85.161.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-290-6blAAXx_MvK2IiKe_-IrWw-1; Thu, 10 Jun 2021 11:38:47 -0400 X-MC-Unique: 6blAAXx_MvK2IiKe_-IrWw-1 Received: by mail-oo1-f71.google.com with SMTP id r4-20020a4ab5040000b02902446eb55473so17498749ooo.20 for ; Thu, 10 Jun 2021 08:38:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=H5vCQsPxj9Rl4XFnBcjEoEFYRWMfJkRJOGwBf1rRHTk=; b=FpGbwxUPmP2GR4vxejjWxD7kNeEKBWZAJsi0naTM00uDtnxOSPv+qDFXyw9LFOgHCL Yyz8f3R/+9OIqnNW4EWTaqyjHOKyZM+50dNW8wwt0ANlcBNu/T/xr5jwlobWdXU8CbBH qGUmmlW3/I9erCQsZ6iyARY/MHMNwyMOiZ6qpTl7TB1XApNldIMOK7o8laWGrp7mvSaX ImwBQMZkOPYeGTLf2IhWPkqZIeZX3EHCu8oZ/bYH5d42WTpQ4NSRqB3cCb5ujiUiAWl7 X6PVIbKPV62x+jgbtBCctGYxW5XGhJcHYFAXoMLG6ANgvtf9e8sUxhm7KTb0f/oYa3iR aLng== X-Gm-Message-State: AOAM531umqUAxMITBHoHE7wxiVq5za7GosTDqPYYvAfQZSDjYKADKNG6 DEPncWgrXN9o7BLCiV6FrcHdusmNKFRDj6PtVm6zW39QCEw+nraZ1ub6pkFSxjenNLuJQUTagOL JQ5UZgNb9w4YYk9iDKGum5fcI X-Received: by 2002:aca:b509:: with SMTP id e9mr10551683oif.66.1623339525495; Thu, 10 Jun 2021 08:38:45 -0700 (PDT) X-Received: by 2002:aca:b509:: with SMTP id e9mr10551672oif.66.1623339525286; Thu, 10 Jun 2021 08:38:45 -0700 (PDT) Received: from redhat.com ([198.99.80.109]) by smtp.gmail.com with ESMTPSA id q6sm586111oot.40.2021.06.10.08.38.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Jun 2021 08:38:44 -0700 (PDT) Date: Thu, 10 Jun 2021 09:38:42 -0600 From: Alex Williamson To: Jason Gunthorpe Cc: Joerg Roedel , "Tian, Kevin" , Jean-Philippe Brucker , David Gibson , Jason Wang , "parav@mellanox.com" , "Enrico Weigelt, metux IT consult" , Paolo Bonzini , Shenming Lu , Eric Auger , Jonathan Corbet , "Raj, Ashok" , "Liu, Yi L" , "Wu, Hao" , "Jiang, Dave" , Jacob Pan , Kirti Wankhede , Robin Murphy , "kvm@vger.kernel.org" , "iommu@lists.linux-foundation.org" , David Woodhouse , LKML , Lu Baolu Subject: Re: Plan for /dev/ioasid RFC v2 Message-ID: <20210610093842.6b9a4e5b.alex.williamson@redhat.com> In-Reply-To: <20210609184940.GH1002214@nvidia.com> References: <20210609123919.GA1002214@nvidia.com> <20210609150009.GE1002214@nvidia.com> <20210609101532.452851eb.alex.williamson@redhat.com> <20210609102722.5abf62e1.alex.williamson@redhat.com> <20210609184940.GH1002214@nvidia.com> X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.33; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 9 Jun 2021 15:49:40 -0300 Jason Gunthorpe wrote: > On Wed, Jun 09, 2021 at 10:27:22AM -0600, Alex Williamson wrote: > > > > > It is a kernel decision, because a fundamental task of the kernel is to > > > > ensure isolation between user-space tasks as good as it can. And if a > > > > device assigned to one task can interfer with a device of another task > > > > (e.g. by sending P2P messages), then the promise of isolation is broken. > > > > > > AIUI, the IOASID model will still enforce IOMMU groups, but it's not an > > > explicit part of the interface like it is for vfio. For example the > > > IOASID model allows attaching individual devices such that we have > > > granularity to create per device IOASIDs, but all devices within an > > > IOMMU group are required to be attached to an IOASID before they can be > > > used. > > Yes, thanks Alex > > > > It's not entirely clear to me yet how that last bit gets > > > implemented though, ie. what barrier is in place to prevent device > > > usage prior to reaching this viable state. > > The major security checkpoint for the group is on the VFIO side. We > must require the group before userspace can be allowed access to any > device registers. Obtaining the device_fd from the group_fd does this > today as the group_fd is the security proof. > > Actually, thinking about this some more.. If the only way to get a > working device_fd in the first place is to get it from the group_fd > and thus pass a group-based security check, why do we need to do > anything at the ioasid level? > > The security concept of isolation was satisfied as soon as userspace > opened the group_fd. What do more checks in the kernel accomplish? Opening the group is not the extent of the security check currently required, the group must be added to a container and an IOMMU model configured for the container *before* the user can get a devicefd. Each devicefd creates a reference to this security context, therefore access to a device does not exist without such a context. This proposal has of course put the device before the group, which then makes it more difficult for vfio to retroactively enforce security. > Yes, we have the issue where some groups require all devices to use > the same IOASID, but once someone has the group_fd that is no longer a > security issue. We can fail VFIO_DEVICE_ATTACH_IOASID callss that > don't make sense. The groupfd only proves the user has an ownership claim to the devices, it does not itself prove that the devices are in an isolated context. Device access is not granted until that isolated context is configured. vfio owns the device, so it would make sense for vfio to enforce the security of device access only in a secure context, but how do we know a device is in a secure context? Is it sufficient to track the vfio device ioctls for attach/detach for an IOASID or will the user be able to manipulate IOASID configuration for a device directly via the IOASIDfd? What happens on detach? As we've discussed elsewhere in this thread, revoking access is more difficult than holding a reference to the secure context, but I'm under the impression that moving a device between IOASIDs could be standard practice in this new model. A device that's detached from a secure context, even temporarily, is a problem. Access to other devices in the same group as a device detached from a secure context is a problem. > > > > > Groups should be primarily about isolation security, not about IOASID > > > > > matching. > > > > > > > > That doesn't make any sense, what do you mean by 'IOASID matching'? > > > > > > One of the problems with the vfio interface use of groups is that we > > > conflate the IOMMU group for both isolation and granularity. I think > > > what Jason is referring to here is that we still want groups to be the > > > basis of isolation, but we don't want a uAPI that presumes all devices > > > within the group must use the same IOASID. > > Yes, thanks again Alex > > > > For example, if a user owns an IOMMU group consisting of > > > non-isolated functions of a multi-function device, they should be > > > able to create a vIOMMU VM where each of those functions has its > > > own address space. That can't be done today, the entire group > > > would need to be attached to the VM under a PCIe-to-PCI bridge to > > > reflect the address space limitation imposed by the vfio group > > > uAPI model. Thanks, > > > > Hmm, likely discussed previously in these threads, but I can't come up > > with the argument that prevents us from making the BIND interface > > at the group level but the ATTACH interface at the device level? For > > example: > > > > - VFIO_GROUP_BIND_IOASID_FD > > - VFIO_DEVICE_ATTACH_IOASID > > > > AFAICT that makes the group ownership more explicit but still allows > > the device level IOASID granularity. Logically this is just an > > internal iommu_group_for_each_dev() in the BIND ioctl. Thanks, > > At a high level it sounds OK. > > However I think your above question needs to be answered - what do we > want to enforce on the iommu_fd and why? > > Also, this creates a problem with the device label idea, we still > need to associate each device_fd with a label, so your above sequence > is probably: > > VFIO_GROUP_BIND_IOASID_FD(group fd) > VFIO_BIND_IOASID_FD(device fd 1, device_label) > VFIO_BIND_IOASID_FD(device fd 2, device_label) > VFIO_DEVICE_ATTACH_IOASID(..) > > And then I think we are back to where I had started, we can trigger > whatever VFIO_GROUP_BIND_IOASID_FD does automatically as soon as all > of the devices in the group have been bound. How to label a device seems like a relatively mundane issue relative to ownership and isolated contexts of groups and devices. The label is essentially just creating an identifier to device mapping, where the identifier (label) will be used in the IOASID interface, right? As I note above, that makes it difficult for vfio to maintain that a user only accesses a device in a secure context. This is exactly why vfio has the model of getting a devicefd from a groupfd only when that group is in a secure context and maintaining references to that secure context for each device. Split ownership of the secure context in IOASID vs device access in vfio and exposing devicefds outside the group is still a big question mark for me. Thanks, Alex