Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp2684192pxj; Mon, 10 May 2021 08:29:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzJEhGSJLnll2xd1WU36rKcBeTz1OWHeWWtHK0o/ZC1oz9Ne4kAIG6MYKGhsnFwb/tqoUda X-Received: by 2002:a92:cd8a:: with SMTP id r10mr22292093ilb.282.1620660554558; Mon, 10 May 2021 08:29:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620660554; cv=none; d=google.com; s=arc-20160816; b=VTghpD58U7ZlcWU9BD3RXOd5m6Tij9Th5NaW6xSoGmdp6gVeIY7NKuSyRpeuvUCr4j bT0jy6PURSQkRRe4RmjTNL4uKscLkBWtIqg33dNh6FSEILqo41xZXrNJBeiZUhkgWRga ZHT3/5hnKd1uhzI2Viz9MqrHrz+XotXI97z/Pve2SSuKh0vNhDItz+9DnjBuBXp+nSEO xg95JNyCXJ+orsMTKKY4dbBZDjSBuVgxpd9KA46aVYwTiV+aXN2HZHhKcr1wRDH6ouU2 ZYEfvFZLcZC1ORFbwWEVrd6hfyad3pVDaodyjleTbZZl15WFxzOn7Rp2u4QN22+396I3 8N7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :ironport-sdr:ironport-sdr; bh=bd62dggoC6wVn38RUafgDKAQUmW15+odkx7pW9QXuhk=; b=LAqTZFQXfifoqcidnvWg4vqLWB7WjHvPKR2Zfhxe1nomPFA0TruHgRPEm4IzECb3KN DlvNzrJlRQC1TNlSOxB5epS6g05r1C+hu1og1iN3V8E9P+Y/84RFe2FxEEuN+YVNIgUJ aV2VDvj59mV61bTHEH6TPYjnSDxTBd1u5Y4ps0NiXk7oiXdVjPCmKbzSTgKLgcLTOg+u FCCr6AqzgXB7DdkkxcefKgaj2Hfbp4i/vECJQdBubBSGDVk/2nrrkjRBu5tlfZxAPsyD QpNNjSai70RlGZfdbzxbr5Gx4SNyaphaA75KJPAk6wTNoSnzCeECVkf4k+HmHeMxzps0 6Eig== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 2si15971186iou.104.2021.05.10.08.29.00; Mon, 10 May 2021 08:29:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234203AbhEJP2G (ORCPT + 99 others); Mon, 10 May 2021 11:28:06 -0400 Received: from mga04.intel.com ([192.55.52.120]:26962 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237874AbhEJP0J (ORCPT ); Mon, 10 May 2021 11:26:09 -0400 IronPort-SDR: 3eHyPsmBMqXsOcGxu67xAJm9gg6RqgwDfK8a2yi+BJX3RNJVIe9sUMnGXNQFaz5vi+M6COLUOr or4JepUYMaqQ== X-IronPort-AV: E=McAfee;i="6200,9189,9980"; a="197234725" X-IronPort-AV: E=Sophos;i="5.82,287,1613462400"; d="scan'208";a="197234725" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 May 2021 08:25:05 -0700 IronPort-SDR: GjXRpWxI0GTq9EfqsAr0hz2jjpbs5onS2DLUgUp5jeCZ4ykKcxn/jUUXM/nI3Zw1v9kib0m8MB voaWFDGgutcg== X-IronPort-AV: E=Sophos;i="5.82,287,1613462400"; d="scan'208";a="536454725" Received: from otc-nc-03.jf.intel.com (HELO otc-nc-03) ([10.54.39.36]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 May 2021 08:25:03 -0700 Date: Mon, 10 May 2021 08:25:02 -0700 From: "Raj, Ashok" To: Jason Gunthorpe Cc: "Tian, Kevin" , Jean-Philippe Brucker , Jacob Pan , Alex Williamson , "Liu, Yi L" , Auger Eric , LKML , Joerg Roedel , Lu Baolu , David Woodhouse , "iommu@lists.linux-foundation.org" , "cgroups@vger.kernel.org" , Tejun Heo , Li Zefan , Johannes Weiner , Jean-Philippe Brucker , Jonathan Corbet , "Wu, Hao" , "Jiang, Dave" , Ashok Raj Subject: Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs Message-ID: <20210510152502.GA90095@otc-nc-03> References: <20210504151154.02908c63@jacob-builder> <20210504231530.GE1370958@nvidia.com> <20210505102259.044cafdf@jacob-builder> <20210505180023.GJ1370958@nvidia.com> <20210505130446.3ee2fccd@jacob-builder> <20210506122730.GQ1370958@nvidia.com> <20210506163240.GA9058@otc-nc-03> <20210510123729.GA1002214@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210510123729.GA1002214@nvidia.com> User-Agent: Mutt/1.5.24 (2015-08-30) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 10, 2021 at 09:37:29AM -0300, Jason Gunthorpe wrote: > On Sat, May 08, 2021 at 09:56:59AM +0000, Tian, Kevin wrote: > > > From: Raj, Ashok > > > Sent: Friday, May 7, 2021 12:33 AM > > > > > > > Basically it means when the guest's top level IOASID is created for > > > > nesting that IOASID claims all PASID's on the RID and excludes any > > > > PASID IOASIDs from existing on the RID now or in future. > > > > > > The way to look at it this is as follows: > > > > > > For platforms that do not have a need to support shared work queue model > > > support for ENQCMD or similar, PASID space is naturally per RID. There is no > > > complication with this. Every RID has the full range of PASID's and no need > > > for host to track which PASIDs are allocated now or in future in the guest. > > > > > > For platforms that support ENQCMD, it is required to mandate PASIDs are > > > global across the entire system. Maybe its better to call them gPASID for > > > guest and hPASID for host. Short reason being gPASID->hPASID is a guest > > > wide mapping for ENQCMD and not a per-RID based mapping. (We covered > > > that > > > in earlier responses) > > > > > > In our current implementation we actually don't separate this space, and > > > gPASID == hPASID. The iommu driver enforces that by using the custom > > > allocator and the architected interface that allows all guest vIOMMU > > > allocations to be proxied to host. Nothing but a glorified hypercall like > > > interface. In fact some OS's do use hypercall to get a hPASID vs using > > > the vCMD style interface. > > > > > > > After more thinking about the new interface, I feel gPASID==hPASID > > actually causes some confusion in uAPI design. In concept an ioasid > > is not active until it's attached to a device, because it's just an ID > > if w/o a device. So supposedly an ioasid should reject all user commands > > before attach. However an guest likely asks for a new gPASID before > > attaching it to devices and vIOMMU. if gPASID==hPASID then Qemu > > must request /dev/ioasid to allocate a hw_id for an ioasid which hasn't > > been attached to any device, with the assumption on kernel knowledge > > that this hw_id is from an global allocator w/o dependency on any > > device. This doesn't sound a clean design, not to say it also conflicts > > with live migration. > > Everything must be explicit. The situation David pointed to of > qemu emulating a vIOMMU while running on a host with a different > platform/physical IOMMU must be considered. > > If the vIOMMU needs specific behavior it must use /dev/iommu to ask > for it specifically and not just make wild assumptions about how the > platform works. I think the right way is for pIOMMU to enforce the right behavior. vIOMMU can ask for a PASID and physical IOMMU driver would give what is optimal for the platform. if vIOMMU says give me per-device PASID, but that can lead to conflicts in PASID name space, its best to avoid it. Global PASID doesn't break anything, but giving that control to vIOMMU doesn't seem right. When we have mixed uses cases like hardware that supports shared wq and SRIOV devices that need PASIDs we need to comprehend how they will work without having a backend to migrate PASIDs to new destination. for ENQCMD we have the gPASID->hPASID translation in the VMCS control. For devices that support SIOV, programming a PASID to a device is also mediated, so its possible for something like the mediated interface to assist with that migration for the dedicated WQ. When we have both SRIOV and shared WQ exposed to the same guest, we do have an issue. The simplest way that I thought was to have a guest and host PASID separation. Where the guest has its own PASID space and host has its own carved out. Guest can do what ever it wants within that allocated space without fear of any collition with any other device. Cheers, Ashok