Received: by 2002:a05:6a10:17d3:0:0:0:0 with SMTP id hz19csp303244pxb; Fri, 16 Apr 2021 06:11:43 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxAPZxQIq7gXQ9uCt6xCwaT2X2ParnegJLftm8AHY3M1rkTZ3v5553pulFwtqMzSAjPFrb6 X-Received: by 2002:a17:902:7581:b029:e7:3780:3c5f with SMTP id j1-20020a1709027581b02900e737803c5fmr9520431pll.59.1618578702737; Fri, 16 Apr 2021 06:11:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1618578702; cv=none; d=google.com; s=arc-20160816; b=EKqeBUn2yTOPd/YDrm0Sz3rbsO4bszbXD+bJOUXw1287SzvmyS/4wBlvRWoh9zhzUl HKepq8SEnwobs8JkPhQqK5N574GMpXciJQ1GSCESoYyApOTVXWecwr4pvDXGOuYzclce /d9nbNSkGdOhZLCjsxOJxYLODBwliNTe/HqkE5eEzopJrYD46uv/8PY5ndbx3ZEFlftv noyWyB4RG3zjPdoD0qq2+rVXUwh++kJqE7jRwwUlOpMrF2ReAK8/XwS2Ys8YeeBm+vtu brEB2j6VR+Kwz86WAL9s+NRnZBTXhFom6aGRVlcW60EDvGf+e2vesN2sldxJfWIyLO2R rJWw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date:ironport-sdr:ironport-sdr; bh=+/fmF4u65E2zusWdFcHUKystv9S8RQyIhR6m8Taogq4=; b=ZRZf7+X/jykEgy2i25OLjkNITRkB4PT6mULe7zbmRalqM8K4blOEZWZUM//wpnIwap ELU1qMSqdcPLutNxq7qkAytgEqus+DdrF1sXATOzYUXJGaaq9ifRKjAGxuZlbCUmUryu 89NVkWuuRyQa70X/HMQcdRubgY3giVQA/iB+9Kxh/mbLU6ZuaiOkRXdePC5/7/pA3emg zjwmoGqG2jm3McEk8LhpxueE1h/feG1jl/R6mFi2yP+OSjxpSnbDBJl40Mc/FsSuHAqx k6HpIEpHknlZg51AtiVc7BNy7gQ2QNvbBnJxY8vsYvqfQN+1h8sLXUDV+r3cD8kuHomF QfBw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r3si1966081plo.219.2021.04.16.06.11.29; Fri, 16 Apr 2021 06:11:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235252AbhDPNKo (ORCPT + 99 others); Fri, 16 Apr 2021 09:10:44 -0400 Received: from mga07.intel.com ([134.134.136.100]:56177 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235528AbhDPNKn (ORCPT ); Fri, 16 Apr 2021 09:10:43 -0400 IronPort-SDR: 9z2aWpXI45KDrmBCNLOkinMQK3T8n2Nk6LH9oo+tMVXhLjOKIhUgd5PohIYB1RbdFfvLwJNj9H Nq1l2hO5QP5Q== X-IronPort-AV: E=McAfee;i="6200,9189,9955"; a="258994681" X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="258994681" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Apr 2021 06:10:18 -0700 IronPort-SDR: bNdvfTep5YEmj6Da3ykd0jME7lPZtO68SgGihYTvPURY4blc1mzqZfn57deXfbO2oHSXU3vBh9 faWSbcPiLvBQ== X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="422006223" Received: from jacob-builder.jf.intel.com (HELO jacob-builder) ([10.7.199.155]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Apr 2021 06:10:18 -0700 Date: Fri, 16 Apr 2021 06:12:58 -0700 From: Jacob Pan To: Jason Gunthorpe Cc: Auger Eric , "Liu, Yi L" , Jean-Philippe Brucker , "Tian, Kevin" , LKML , Joerg Roedel , Lu Baolu , David Woodhouse , "iommu@lists.linux-foundation.org" , "cgroups@vger.kernel.org" , Tejun Heo , Li Zefan , Johannes Weiner , Jean-Philippe Brucker , Alex Williamson , Jonathan Corbet , "Raj, Ashok" , "Wu, Hao" , "Jiang, Dave" , jacob.jun.pan@linux.intel.com Subject: Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs Message-ID: <20210416061258.325e762e@jacob-builder> In-Reply-To: <20210415230732.GG1370958@nvidia.com> References: <20210331124038.GE1463678@nvidia.com> <20210401134236.GF1463678@nvidia.com> <20210401160337.GJ1463678@nvidia.com> <4bea6eb9-08ad-4b6b-1e0f-c97ece58a078@redhat.com> <20210415230732.GG1370958@nvidia.com> Organization: OTC X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Jason, On Thu, 15 Apr 2021 20:07:32 -0300, Jason Gunthorpe wrote: > On Thu, Apr 15, 2021 at 03:11:19PM +0200, Auger Eric wrote: > > Hi Jason, > > > > On 4/1/21 6:03 PM, Jason Gunthorpe wrote: > > > On Thu, Apr 01, 2021 at 02:08:17PM +0000, Liu, Yi L wrote: > > > > > >> DMA page faults are delivered to root-complex via page request > > >> message and it is per-device according to PCIe spec. Page request > > >> handling flow is: > > >> > > >> 1) iommu driver receives a page request from device > > >> 2) iommu driver parses the page request message. Get the RID,PASID, > > >> faulted page and requested permissions etc. > > >> 3) iommu driver triggers fault handler registered by device driver > > >> with iommu_report_device_fault() > > > > > > This seems confused. > > > > > > The PASID should define how to handle the page fault, not the driver. > > > > > > > In my series I don't use PASID at all. I am just enabling nested stage > > and the guest uses a single context. I don't allocate any user PASID at > > any point. > > > > When there is a fault at physical level (a stage 1 fault that concerns > > the guest), this latter needs to be reported and injected into the > > guest. The vfio pci driver registers a fault handler to the iommu layer > > and in that fault handler it fills a circ bugger and triggers an eventfd > > that is listened to by the VFIO-PCI QEMU device. this latter retrives > > the faault from the mmapped circ buffer, it knowns which vIOMMU it is > > attached to, and passes the fault to the vIOMMU. > > Then the vIOMMU triggers and IRQ in the guest. > > > > We are reusing the existing concepts from VFIO, region, IRQ to do that. > > > > For that use case, would you also use /dev/ioasid? > > /dev/ioasid could do all the things you described vfio-pci as doing, > it can even do them the same way you just described. > > Stated another way, do you plan to duplicate all of this code someday > for vfio-cxl? What about for vfio-platform? ARM SMMU can be hooked to > platform devices, right? > > I feel what you guys are struggling with is some choice in the iommu > kernel APIs that cause the events to be delivered to the pci_device > owner, not the PASID owner. > > That feels solvable. > Perhaps more of a philosophical question for you and Alex. There is no doubt that the direction you guided for /dev/ioasid is a much cleaner one, especially after VDPA emerged as another IOMMU backed framework. The question is what do we do with the nested translation features that have been targeting the existing VFIO-IOMMU for the last three years? That predates VDPA. Shall we put a stop marker *after* nested support and say no more extensions for VFIO-IOMMU, new features must be built on this new interface? If we were to close a checkout line for some unforeseen reasons, should we honor the customers already in line for a long time? This is not a tactic or excuse for not working on the new /dev/ioasid interface. In fact, I believe we can benefit from the lessons learned while completing the existing. This will give confidence to the new interface. Thoughts? > Jason Thanks, Jacob