Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp4301026ybt; Mon, 6 Jul 2020 00:50:44 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwZiMRJtCGkahWXANU2wB1DXC5Ps41lbg6KG7F0i2J1x1V955xSNvWHLNTvGuJmXPJESqhm X-Received: by 2002:a17:906:178b:: with SMTP id t11mr22314882eje.489.1594021844605; Mon, 06 Jul 2020 00:50:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1594021844; cv=none; d=google.com; s=arc-20160816; b=tEU3WR7bWj0n5emskwKbr6/G0xcE/KZxPA55tZ/5jWfuIlf3gYASdRBae+QS3yACjf 4Ni1FMSGvMWhj0cAwNuI5TggRwW5fyqSLdeDrcnT01Iq328I7Fg3BqC/Y7geNkyd6u6W dKW+Kwx2dkYCiUcLQrwQlUBBiBpUwq0NEoYl+lR5OA3yze/OS8sXmS4DO4j3WuVzs0pZ 3PwVDpSvPDyrDewB1NKKD4J2hBXKoCjOA/wnxSl9pTri8uP76N1FEK+OkmT3sI0myPcU 4zNgX2WV/e7FuSWLIKgnFczQyMktpzCqY/nqQc+j2q8Rux8GN4ilA9tF+DbF0uLUGeRz Dhjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:to:subject:cc:ironport-sdr:ironport-sdr; bh=0Z6Hm/pj2JU3YnVwkHqZK12GqR0Xa42nMN96hVme4Rs=; b=QHAtEm4ELY/LWDw5VAHUMk14bCBJpFgcBi5wHmiAxH3y/5bhT5poeGFVvWUSORJBfU VFEEAHvGoM012ss3u4Q/ykZK7DryklFr6VgbIkCo+hN50e9qHNlN3yTHMsMYX+GMQDbd IRFvPe1hKY6sdA/hOeNVnMDjjFLlddTqI//Mh3Nt33td+ockOBtV+Kla3h5YA+yEfc4/ MuoYGiA6UaY7fmhdm+k356AF5KKNm8qV06x3B3m6HLqO7UKZ2P9am0vExZSuh7EbLwhd FcrnnCNMmAiKMLq4AVT1c0ZweJV103Cwyp9Gonj4SUhfJhYFo2V4H31vxo8jhWMcHSyU uksw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w4si12286301ejj.300.2020.07.06.00.50.22; Mon, 06 Jul 2020 00:50:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728908AbgGFHsC (ORCPT + 99 others); Mon, 6 Jul 2020 03:48:02 -0400 Received: from mga02.intel.com ([134.134.136.20]:34271 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728248AbgGFHsB (ORCPT ); Mon, 6 Jul 2020 03:48:01 -0400 IronPort-SDR: mrLvLl3DLBQQNZdMAHLCOgpLSrgoCVdKSfqqTF5T40ZgkmpQd8aBGqb5kBPmt9MyGZe8VooenZ 61RDKcnFNEzw== X-IronPort-AV: E=McAfee;i="6000,8403,9673"; a="135624817" X-IronPort-AV: E=Sophos;i="5.75,318,1589266800"; d="scan'208";a="135624817" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jul 2020 00:48:01 -0700 IronPort-SDR: jsX2dyY4rDe+tiM/VAsDzql1x+A7g7CzsYMVj1OU9xXE2Cg2J5x3w/o3Y1XkPf9zeC8oA8ehUD IhAkUI5JDs8g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,318,1589266800"; d="scan'208";a="456613921" Received: from xiaoranf-mobl2.ccr.corp.intel.com (HELO [10.255.28.11]) ([10.255.28.11]) by orsmga005.jf.intel.com with ESMTP; 06 Jul 2020 00:47:58 -0700 Cc: baolu.lu@linux.intel.com, Joerg Roedel , Jacob Pan , "Raj, Ashok" , "Liu, Yi L" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH v2 3/4] iommu/vt-d: Report page request faults for guest SVA To: "Tian, Kevin" , "iommu@lists.linux-foundation.org" References: <20200706002535.9381-1-baolu.lu@linux.intel.com> <20200706002535.9381-4-baolu.lu@linux.intel.com> From: Lu Baolu Message-ID: Date: Mon, 6 Jul 2020 15:47:58 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020/7/6 9:36, Tian, Kevin wrote: >> From: Tian, Kevin >> Sent: Monday, July 6, 2020 9:30 AM >> >>> From: Lu Baolu >>> Sent: Monday, July 6, 2020 8:26 AM >>> >>> A pasid might be bound to a page table from a VM guest via the iommu >>> ops.sva_bind_gpasid. In this case, when a DMA page fault is detected >>> on the physical IOMMU, we need to inject the page fault request into >>> the guest. After the guest completes handling the page fault, a page >>> response need to be sent back via the iommu ops.page_response(). >>> >>> This adds support to report a page request fault. Any external module >>> which is interested in handling this fault should regiester a notifier >>> callback. >> >> be specific on which notifier to be registered... >> >>> >>> Co-developed-by: Jacob Pan >>> Signed-off-by: Jacob Pan >>> Co-developed-by: Liu Yi L >>> Signed-off-by: Liu Yi L >>> Signed-off-by: Lu Baolu >>> --- >>> drivers/iommu/intel/svm.c | 99 ++++++++++++++++++++++++++++++++----- >> -- >>> 1 file changed, 81 insertions(+), 18 deletions(-) >>> >>> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c >>> index c23167877b2b..08c58c2b1a06 100644 >>> --- a/drivers/iommu/intel/svm.c >>> +++ b/drivers/iommu/intel/svm.c >>> @@ -815,6 +815,57 @@ static void intel_svm_drain_prq(struct device *dev, >>> int pasid) >>> } >>> } >>> >>> +static int prq_to_iommu_prot(struct page_req_dsc *req) >>> +{ >>> + int prot = 0; >>> + >>> + if (req->rd_req) >>> + prot |= IOMMU_FAULT_PERM_READ; >>> + if (req->wr_req) >>> + prot |= IOMMU_FAULT_PERM_WRITE; >>> + if (req->exe_req) >>> + prot |= IOMMU_FAULT_PERM_EXEC; >>> + if (req->pm_req) >>> + prot |= IOMMU_FAULT_PERM_PRIV; >>> + >>> + return prot; >>> +} >>> + >>> +static int >>> +intel_svm_prq_report(struct device *dev, struct page_req_dsc *desc) >>> +{ >>> + struct iommu_fault_event event; >>> + u8 bus, devfn; >>> + >>> + memset(&event, 0, sizeof(struct iommu_fault_event)); >>> + bus = PCI_BUS_NUM(desc->rid); >>> + devfn = desc->rid & 0xff; >> >> not required. >> >>> + >>> + /* Fill in event data for device specific processing */ >>> + event.fault.type = IOMMU_FAULT_PAGE_REQ; >>> + event.fault.prm.addr = desc->addr; >>> + event.fault.prm.pasid = desc->pasid; >>> + event.fault.prm.grpid = desc->prg_index; >>> + event.fault.prm.perm = prq_to_iommu_prot(desc); >>> + >>> + /* >>> + * Set last page in group bit if private data is present, >>> + * page response is required as it does for LPIG. >>> + */ >> >> move to priv_data_present check? >> >>> + if (desc->lpig) >>> + event.fault.prm.flags |= >>> IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE; >>> + if (desc->pasid_present) >>> + event.fault.prm.flags |= >>> IOMMU_FAULT_PAGE_REQUEST_PASID_VALID; >>> + if (desc->priv_data_present) { >>> + event.fault.prm.flags |= >>> IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE; > > btw earlier comment is more about the behavior of the fault > handler (e.g. the guest), but not about why we need convert > to last_page prm flag. Let's make it clear that doing so is > because iommu_report_device_fault doesn't understand this > vt-d specific requirement thus we set last_page as a workaround. Yes. I will add this in the comment. Best regards, baolu > > Thanks > Kevin > >>> + event.fault.prm.flags |= >>> IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA; >>> + memcpy(event.fault.prm.private_data, desc->priv_data, >>> + sizeof(desc->priv_data)); >>> + } >>> + >>> + return iommu_report_device_fault(dev, &event); >>> +} >>> + >>> static irqreturn_t prq_event_thread(int irq, void *d) >>> { >>> struct intel_iommu *iommu = d; >>> @@ -828,7 +879,7 @@ static irqreturn_t prq_event_thread(int irq, void *d) >>> tail = dmar_readq(iommu->reg + DMAR_PQT_REG) & >>> PRQ_RING_MASK; >>> head = dmar_readq(iommu->reg + DMAR_PQH_REG) & >>> PRQ_RING_MASK; >>> while (head != tail) { >>> - struct intel_svm_dev *sdev; >>> + struct intel_svm_dev *sdev = NULL; >> >> move to outside of the loop, otherwise later check always hit "if (!sdev)" >> >>> struct vm_area_struct *vma; >>> struct page_req_dsc *req; >>> struct qi_desc resp; >>> @@ -864,6 +915,20 @@ static irqreturn_t prq_event_thread(int irq, void >> *d) >>> } >>> } >>> >>> + if (!sdev || sdev->sid != req->rid) { >>> + struct intel_svm_dev *t; >>> + >>> + sdev = NULL; >>> + rcu_read_lock(); >>> + list_for_each_entry_rcu(t, &svm->devs, list) { >>> + if (t->sid == req->rid) { >>> + sdev = t; >>> + break; >>> + } >>> + } >>> + rcu_read_unlock(); >>> + } >>> + >>> result = QI_RESP_INVALID; >>> /* Since we're using init_mm.pgd directly, we should never >>> take >>> * any faults on kernel addresses. */ >>> @@ -874,6 +939,17 @@ static irqreturn_t prq_event_thread(int irq, void >> *d) >>> if (!is_canonical_address(address)) >>> goto bad_req; >>> >>> + /* >>> + * If prq is to be handled outside iommu driver via receiver of >>> + * the fault notifiers, we skip the page response here. >>> + */ >>> + if (svm->flags & SVM_FLAG_GUEST_MODE) { >>> + if (sdev && !intel_svm_prq_report(sdev->dev, req)) >>> + goto prq_advance; >>> + else >>> + goto bad_req; >>> + } >>> + >>> /* If the mm is already defunct, don't handle faults. */ >>> if (!mmget_not_zero(svm->mm)) >>> goto bad_req; >>> @@ -892,24 +968,10 @@ static irqreturn_t prq_event_thread(int irq, void >> *d) >>> goto invalid; >>> >>> result = QI_RESP_SUCCESS; >>> - invalid: >>> +invalid: >>> mmap_read_unlock(svm->mm); >>> mmput(svm->mm); >>> - bad_req: >>> - /* Accounting for major/minor faults? */ >>> - rcu_read_lock(); >>> - list_for_each_entry_rcu(sdev, &svm->devs, list) { >>> - if (sdev->sid == req->rid) >>> - break; >>> - } >>> - /* Other devices can go away, but the drivers are not >>> permitted >>> - * to unbind while any page faults might be in flight. So it's >>> - * OK to drop the 'lock' here now we have it. */ >> >> should we keep and move this comment to earlier sdev lookup? and >> regarding to guest unbind, ae we preventing the fault owner (outside >> of iommu driver) to unbind against in-flight fault request? >> >>> - rcu_read_unlock(); >>> - >>> - if (WARN_ON(&sdev->list == &svm->devs)) >>> - sdev = NULL; >> >> similarly should we keep the WARN_ON check here? >> >>> - >>> +bad_req: >>> if (sdev && sdev->ops && sdev->ops->fault_cb) { >>> int rwxp = (req->rd_req << 3) | (req->wr_req << 2) | >>> (req->exe_req << 1) | (req->pm_req); >>> @@ -920,7 +982,7 @@ static irqreturn_t prq_event_thread(int irq, void *d) >>> and these can be NULL. Do not use them below this point! >>> */ >>> sdev = NULL; >>> svm = NULL; >>> - no_pasid: >>> +no_pasid: >>> if (req->lpig || req->priv_data_present) { >>> /* >>> * Per VT-d spec. v3.0 ch7.7, system software must >>> @@ -945,6 +1007,7 @@ static irqreturn_t prq_event_thread(int irq, void >> *d) >>> resp.qw3 = 0; >>> qi_submit_sync(iommu, &resp, 1, 0); >>> } >>> +prq_advance: >>> head = (head + sizeof(*req)) & PRQ_RING_MASK; >>> } >>> >>> -- >>> 2.17.1 >> >> Thanks >> Kevin