Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp3817298pxj; Tue, 15 Jun 2021 09:15:01 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxU1CnwGXfOFU3ghFvhqulu9ln4SC0OS6qkwaecX0EQCwBB5WC2Ozv3Twm12kWVlmonxV8n X-Received: by 2002:a17:907:1c1c:: with SMTP id nc28mr280698ejc.519.1623773701330; Tue, 15 Jun 2021 09:15:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623773701; cv=none; d=google.com; s=arc-20160816; b=hcYA/Rjop1rJv3U0Wi+Kxscj9IWMFyNTOu0D4UkrS16ALMqfp7Crag9h+oGY118nmG RyLnVeVm91B+geyVfQhCCvsIFJQOT02XFDY9upX8Rj2/pOggOkYpDefU2XRG/ShEFUx8 DmyijXAj2mgOs+esP4/TbiVmqWTkbFpFmbl4rR3Q+ebhQrGeKs3IR+hO/G1sY1Nc/5lS WkMr8C6wEY/2B3rV/MkJzwkZ9Mx6JwAQ7syuARPYumOc78WFxWxaAQ5zPWo0rt7ioCW9 pFdvLvMFxFSWkFJD9KWAVWDggY0TXcEp6fEgLG0A1DASTfRpiZfn2D70ejKz6JOliHG+ qufA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date:dkim-signature; bh=t1h4oaOaNjcSHr4pWVXRGjORB7JTz/dPNT4qeEP5NqE=; b=lkSpyJTJu8/AcegnBg3Iam4u/IxRvmY5pTt00HBoqGUPuA9wXr5d4dNKbfZhmqDbYL Ll3lq2O2BfaiS1hULJ82pDdAY4X8B+tF4cJ61Q6oYLqgSpzUK8kmz1EDDq0j4pMORYYi l8AKz/2Ylv7qU/F8gBwhYAgx9eT6qbQPUgSu675aXDoxyN5P6qHzYA/acGsfdV1my7XZ RraaH/CmbeORHETesoA56OlONqMXkhRvETdrG3XevOAVNDOW0V18P+xnfRA9ittusvjc CktZCd4CFlP/Iqf9T7I77hYLzWQCvpXHJJq7tvcFdP5SP1po91XcZ9C4Abzf4MUiSRLN 64JA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="IJ248b/D"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 9si9029356ejg.681.2021.06.15.09.14.37; Tue, 15 Jun 2021 09:15:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="IJ248b/D"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230161AbhFOQO0 (ORCPT + 99 others); Tue, 15 Jun 2021 12:14:26 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:23456 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230076AbhFOQOZ (ORCPT ); Tue, 15 Jun 2021 12:14:25 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1623773540; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=t1h4oaOaNjcSHr4pWVXRGjORB7JTz/dPNT4qeEP5NqE=; b=IJ248b/DuuANqKw+4VlRegMZGgI61vI+uNQKwbx3A7F+VjeqjDKVK3TJa+CjEhOSEH39jd aJ6mVKmdkqQBX1XkAkFhSN5oEAnaMNc6gWwQ0UawkrV3p1+RFVOdoRdM0AceAbKmqnNvkC 6u9D+mdFHdu91gmEPM9qEUIRvo2qgDk= Received: from mail-oo1-f69.google.com (mail-oo1-f69.google.com [209.85.161.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-280-l5xYmi8lMo6aUQIimviIJQ-1; Tue, 15 Jun 2021 12:12:19 -0400 X-MC-Unique: l5xYmi8lMo6aUQIimviIJQ-1 Received: by mail-oo1-f69.google.com with SMTP id 185-20020a4a09c20000b029024ac8624e53so4653441ooa.16 for ; Tue, 15 Jun 2021 09:12:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:organization:mime-version:content-transfer-encoding; bh=t1h4oaOaNjcSHr4pWVXRGjORB7JTz/dPNT4qeEP5NqE=; b=uaWrF6O+GyrdK9xQi31ZSD4Z6Py6hCbugsJjegcRkAtvXq1rfPhL8BC/+i+Ovb5dj6 VcuvGyXoGBM0I9otTblnmLt96e+Isb78Ji431/G/yfVdL95wBCok/JQauxBcQ76OQFsH OVbD68SBSjXNGhL3mJulcuFTxdf95SEMifHMcb88yXQz+HzzOCz25kCRkUtlj2Zx7kdT lXgTYGRgtFHHwheM7gWCvubG6v2IZB+bKNlCb9lo6jsM3B68no3+emx/BrVf0VpNlsAW wj/xdI0ui+0DMLTf94VTtudxnfBjWktxza50TDZKU/6uBa/qu0L3czJ+DTAgg+vdUjs5 90+Q== X-Gm-Message-State: AOAM530SoNYo3PjJdivGQR+lzbgZd4km8uaFmPeF8OVUVH6qqmTcwRDv UFzMObE7ERwcI1OnsXXbeG5lhUL1Se6Tg2JvB6ATB/Vrt3yz+p6TXGYzKU8bbo6BGmLGri3B0vC uxzL90Mkw+rnUODCjxiRM/lmQ X-Received: by 2002:a05:6808:1285:: with SMTP id a5mr3881779oiw.135.1623773538189; Tue, 15 Jun 2021 09:12:18 -0700 (PDT) X-Received: by 2002:a05:6808:1285:: with SMTP id a5mr3881757oiw.135.1623773538003; Tue, 15 Jun 2021 09:12:18 -0700 (PDT) Received: from redhat.com ([198.99.80.109]) by smtp.gmail.com with ESMTPSA id p1sm3953745oou.14.2021.06.15.09.12.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Jun 2021 09:12:17 -0700 (PDT) Date: Tue, 15 Jun 2021 10:12:15 -0600 From: Alex Williamson To: "Tian, Kevin" Cc: Jason Gunthorpe , Joerg Roedel , Jean-Philippe Brucker , David Gibson , "Jason Wang" , "parav@mellanox.com" , "Enrico Weigelt, metux IT consult" , Paolo Bonzini , Shenming Lu , Eric Auger , Jonathan Corbet , "Raj, Ashok" , "Liu, Yi L" , "Wu, Hao" , "Jiang, Dave" , Jacob Pan , Kirti Wankhede , "Robin Murphy" , "kvm@vger.kernel.org" , "iommu@lists.linux-foundation.org" , "David Woodhouse" , LKML , "Lu Baolu" Subject: Re: Plan for /dev/ioasid RFC v2 Message-ID: <20210615101215.4ba67c86.alex.williamson@redhat.com> In-Reply-To: References: <20210609150009.GE1002214@nvidia.com> <20210609101532.452851eb.alex.williamson@redhat.com> <20210609102722.5abf62e1.alex.williamson@redhat.com> <20210609184940.GH1002214@nvidia.com> <20210610093842.6b9a4e5b.alex.williamson@redhat.com> <20210611164529.GR1002214@nvidia.com> <20210611133828.6c6e8b29.alex.williamson@redhat.com> <20210612012846.GC1002214@nvidia.com> <20210612105711.7ac68c83.alex.williamson@redhat.com> <20210614140711.GI1002214@nvidia.com> <20210614102814.43ada8df.alex.williamson@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.32; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 15 Jun 2021 02:31:39 +0000 "Tian, Kevin" wrote: > > From: Alex Williamson > > Sent: Tuesday, June 15, 2021 12:28 AM > > > [...] > > > IOASID. Today the group fd requires an IOASID before it hands out a > > > device_fd. With iommu_fd the device_fd will not allow IOCTLs until it > > > has a blocked DMA IOASID and is successefully joined to an iommu_fd. > > > > Which is the root of my concern. Who owns ioctls to the device fd? > > It's my understanding this is a vfio provided file descriptor and it's > > therefore vfio's responsibility. A device-level IOASID interface > > therefore requires that vfio manage the group aspect of device access. > > AFAICT, that means that device access can therefore only begin when all > > devices for a given group are attached to the IOASID and must halt for > > all devices in the group if any device is ever detached from an IOASID, > > even temporarily. That suggests a lot more oversight of the IOASIDs by > > vfio than I'd prefer. > > > > This is possibly the point that is worthy of more clarification and > alignment, as it sounds like the root of controversy here. > > I feel the goal of vfio group management is more about ownership, i.e. > all devices within a group must be assigned to a single user. Following > the three rules defined by Jason, what we really care is whether a group > of devices can be isolated from the rest of the world, i.e. no access to > memory/device outside of its security context and no access to its > security context from devices outside of this group. This can be achieved > as long as every device in the group is either in block-DMA state when > it's not attached to any security context or attached to an IOASID context > in IOMMU fd. > > As long as group-level isolation is satisfied, how devices within a group > are further managed is decided by the user (unattached, all attached to > same IOASID, attached to different IOASIDs) as long as the user > understands the implication of lacking of isolation within the group. This > is what a device-centric model comes to play. Misconfiguration just hurts > the user itself. > > If this rationale can be agreed, then I didn't see the point of having VFIO > to mandate all devices in the group must be attached/detached in > lockstep. In theory this sounds great, but there are still too many assumptions and too much hand waving about where isolation occurs for me to feel like I really have the complete picture. So let's walk through some examples. Please fill in and correct where I'm wrong. 1) A dual-function PCIe e1000e NIC where the functions are grouped together due to ACS isolation issues. a) Initial state: functions 0 & 1 are both bound to e1000e driver. b) Admin uses driverctl to bind function 1 to vfio-pci, creating vfio device file, which is chmod'd to grant to a user. c) User opens vfio function 1 device file and an iommu_fd, binds device_fd to iommu_fd. Does this succeed? - if no, specifically where does it fail? - if yes, vfio can now allow access to the device? d) Repeat b) for function 0. e) Repeat c), still using function 1, is it different? Where? Why? 2) The same NIC as 1) a) Initial state: functions 0 & 1 bound to vfio-pci, vfio device files granted to user, user has bound both device_fds to the same iommu_fd. AIUI, even though not bound to an IOASID, vfio can now enable access through the device_fds, right? What specific entity has placed these devices into a block DMA state, when, and how? b) Both devices are attached to the same IOASID. Are we assuming that each device was atomically moved to the new IOMMU context by the IOASID code? What if the IOMMU cannot change the domain atomically? c) The device_fd for function 1 is detached from the IOASID. Are we assuming the reverse of b) performed by the IOASID code? d) The device_fd for function 1 is unbound from the iommu_fd. Does this succeed? - if yes, what is the resulting IOMMU context of the device and who owns it? - if no, well, that results in numerous tear-down issues. e) Function 1 is unbound from vfio-pci. Does this work or is it blocked? If blocked, by what entity specifically? f) Function 1 is bound to e1000e driver. We clearly have a violation here, specifically where and by who in this path should have prevented us from getting here or who pushes the BUG_ON to abort this? 3) A dual-function conventional PCI e1000 NIC where the functions are grouped together due to shared RID. a) Repeat 2.a) and 2.b) such that we have a valid, user accessible devices in the same IOMMU context. b) Function 1 is detached from the IOASID. I think function 1 cannot be placed into a different IOMMU context here, does the detach work? What's the IOMMU context now? c) A new IOASID is alloc'd within the existing iommu_fd and function 1 is attached to the new IOASID. Where, how, by whom does this fail? If vfio gets to offload all of it's group management to IOASID code, that's great, but I'm afraid that IOASID is so focused on a device-level API that we're instead just ignoring the group dynamics and vfio will be forced to provide oversight to maintain secure userspace access. Thanks, Alex