Received: by 2002:a05:7412:b10a:b0:f3:1519:9f41 with SMTP id az10csp2438400rdb; Sun, 3 Dec 2023 17:38:25 -0800 (PST) X-Google-Smtp-Source: AGHT+IETAKnb9AlUSih5jFtO1qQ6T7WYJvOPYSvSOmpudQgLySVnqOuCO2EJCTuCOZzMLICrv2m1 X-Received: by 2002:a05:6a00:c2:b0:6cb:4c84:43ce with SMTP id e2-20020a056a0000c200b006cb4c8443cemr3374558pfj.34.1701653904842; Sun, 03 Dec 2023 17:38:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701653904; cv=none; d=google.com; s=arc-20160816; b=r6nUTBykM3ftmiMRKXQtbSTCOZ63RdFjTX04tg87kzCzrOR+Iq0Jo1rr5pTUmdFg4P a/NXsimb7MHj5+BX2ERV4tQwixBMhTbp2cPNp022CyruNMImHSAEOtBKIjspYLo8itr/ KvTkCLdSUMuIJ2mvG3kwhdfiGqAi5/9sXizmbAltWZBlE9io7wPTs+pIMfk5pFBRTcE/ XKVMuNDIBXxKovYs3yQJVTI2tyLUC63YdGQnEYfDs1jOk0ZlZiNXAW0xVNsMq05Mw0Em LpfVJ7dAdIcboAEYyWDCkS8aTQ3411Lux6zdK541lv+OX9TeukhAptg82yPQtvQ9rwDR zViA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:to:content-language:subject:cc:user-agent:mime-version :date:message-id:dkim-signature; bh=3SOCkWh0mVvHq3FiPh9Z0VjnvqbHCJKK8ZUW0gK3h4g=; fh=wFPI/lFcIaKackGNejBv0f0mXNztwaYQsAIxkU+KM0A=; b=q+oPJQN5iXQj1bYiqRDS45Qe2IhtLHDD+MY3irskmdH2q0MTK+e7PeeTIsAMSmhkiW 0OIJoMHt5Q1Ikgn6+qZ/F/fGuvGm1EY5jAFIXJbGDCY328UX40H7kupYRLlD+LvQErhi e3HVafsobtVoNYukUeeINy25CuWihp1b1te6y3kn9QVySgT6/sNHDoDM2No0PUi2NYk1 jhcd1LV27umD407HRVlS6MWBQ0t3MMDaJEjjqUdPb6ycuocz0apyZ8Ql1FqKqdv5KYg3 12tBkh7lsBvIrivhmUgUvm2sOQvYVccQEoHENoiRv2PP67AfETEn+VLAQclUF/x5YoS1 WZYw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Sm16VVSj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from pete.vger.email (pete.vger.email. [23.128.96.36]) by mx.google.com with ESMTPS id g22-20020a056a0023d600b00691019fd0efsi6992510pfc.75.2023.12.03.17.38.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 03 Dec 2023 17:38:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) client-ip=23.128.96.36; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Sm16VVSj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id 43C5D805118D; Sun, 3 Dec 2023 17:38:22 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234212AbjLDBhI (ORCPT + 99 others); Sun, 3 Dec 2023 20:37:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45174 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229510AbjLDBhH (ORCPT ); Sun, 3 Dec 2023 20:37:07 -0500 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DD8E5E4; Sun, 3 Dec 2023 17:37:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701653834; x=1733189834; h=message-id:date:mime-version:cc:subject:to:references: from:in-reply-to:content-transfer-encoding; bh=RdAwYv1tCW3Sr0N3YOoYWWOcHVLJnlNtS3tq6G2IyzQ=; b=Sm16VVSjvq3ODjQDSDhKyOLR7eSL2ejpKK4es6jMgZC4sjOJ4wOd7dIM um9D7Ez/MtXFTPyHiDA49k80d8/yjDplyxQOj3OIqDNu3OnuTzqUydNap iU49pBJSv64qhzDBjC4D11QwP0nNpDqfNhqqoy7jiNXAlS959IoxycuBe mn/Av/DsOZ9r6Cus8qphwRmnEMCJGGKj9LVQo6d+/bABhcs+srUd+rKse 0ku+USbC5svJ2zT3P4fIyZO1OPhwL61llyODgKeBwuoNCi0W89gXQUVhC 2x1qlmWeWc5VvfFGH2TcFTQ5XmyfsMwxwpE9qJenTbIYq3jeudtesbCLa A==; X-IronPort-AV: E=McAfee;i="6600,9927,10913"; a="730460" X-IronPort-AV: E=Sophos;i="6.04,248,1695711600"; d="scan'208";a="730460" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Dec 2023 17:37:13 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10913"; a="943727892" X-IronPort-AV: E=Sophos;i="6.04,248,1695711600"; d="scan'208";a="943727892" Received: from allen-box.sh.intel.com (HELO [10.239.159.127]) ([10.239.159.127]) by orsmga005.jf.intel.com with ESMTP; 03 Dec 2023 17:37:09 -0800 Message-ID: <2354dd69-0179-4689-bc35-f4bf4ea5a886@linux.intel.com> Date: Mon, 4 Dec 2023 09:32:37 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Cc: baolu.lu@linux.intel.com, Joerg Roedel , Will Deacon , Robin Murphy , Kevin Tian , Jean-Philippe Brucker , Nicolin Chen , Yi Liu , Jacob Pan , Yan Zhao , iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v7 12/12] iommu: Improve iopf_queue_flush_dev() Content-Language: en-US To: Jason Gunthorpe References: <20231115030226.16700-1-baolu.lu@linux.intel.com> <20231115030226.16700-13-baolu.lu@linux.intel.com> <20231201203536.GG1489931@ziepe.ca> <20231203141414.GJ1489931@ziepe.ca> From: Baolu Lu In-Reply-To: <20231203141414.GJ1489931@ziepe.ca> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Sun, 03 Dec 2023 17:38:22 -0800 (PST) On 12/3/23 10:14 PM, Jason Gunthorpe wrote: > On Sun, Dec 03, 2023 at 04:53:08PM +0800, Baolu Lu wrote: >> On 12/2/23 4:35 AM, Jason Gunthorpe wrote: >>> On Wed, Nov 15, 2023 at 11:02:26AM +0800, Lu Baolu wrote: >>>> The iopf_queue_flush_dev() is called by the iommu driver before releasing >>>> a PASID. It ensures that all pending faults for this PASID have been >>>> handled or cancelled, and won't hit the address space that reuses this >>>> PASID. The driver must make sure that no new fault is added to the queue. >>> This needs more explanation, why should anyone care? >>> >>> More importantly, why is*discarding* the right thing to do? >>> Especially why would we discard a partial page request group? >>> >>> After we change a translation we may have PRI requests in a >>> queue. They need to be acknowledged, not discarded. The DMA in the >>> device should be restarted and the device should observe the new >>> translation - if it is blocking then it should take a DMA error. >>> >>> More broadly, we should just let things run their normal course. The >>> domain to deliver the fault to should be determined very early. If we >>> get a fault and there is no fault domain currently assigned then just >>> restart it. >>> >>> The main reason to fence would be to allow the domain to become freed >>> as the faults should be holding pointers to it. But I feel there are >>> simpler options for that then this.. >> >> In the iommu_detach_device_pasid() path, the domain is about to be >> removed from the pasid of device. The IOMMU driver performs the >> following steps sequentially: > > I know that is why it does, but it doesn't explain at all why. > >> 1. Clears the pasid translation entry. Thus, all subsequent DMA >> transactions (translation requests, translated requests or page >> requests) targeting the iommu domain will be blocked. >> >> 2. Waits until all pending page requests for the device's PASID have >> been reported to upper layers via the iommu_report_device_fault(). >> However, this does not guarantee that all page requests have been >> responded. >> >> 3. Free all partial page requests for this pasid since the page request >> response is only needed for a complete request group. There's no >> action required for the page requests which are not last of a request >> group. > > But we expect the last to come eventually since everything should be > grouped properly, so why bother doing this? > > Indeed if 2 worked, how is this even possible to have partials? Step 1 clears the pasid table entry, hence all subsequent page requests are blocked (hardware auto-respond the request but not put it in the queue). It is possible that a portion of a page fault group may have been queued for processing, but the last request is being blocked by hardware due to the pasid entry being in the blocking state. In reality, this may be a no-op as I haven't seen any real-world implementations of multiple-requests fault groups on Intel platforms. > >> 5. Follow the IOMMU hardware requirements (for example, VT-d sepc, >> section 7.10, Software Steps to Drain Page Requests & Responses) to >> drain in-flight page requests and page group responses between the >> remapping hardware queues and the endpoint device. >> >> With above steps done in iommu_detach_device_pasid(), the pasid could be >> re-used for any other address space. > > As I said, that isn't even required. There is no issue with leaking > PRI's across attachments. > > >>> I suppose the driver is expected to stop calling >>> iommu_report_device_fault() before calling this function, but that >>> doesn't seem like it is going to be possible. Drivers should be >>> implementing atomic replace for the PASID updates and in that case >>> there is no momement when it can say the HW will stop generating PRI. >> >> Atomic domain replacement for a PASID is not currently implemented in >> the core or driver. > > It is, the driver should implement set_dev_pasid in such a way that > repeated calls do replacements, ideally atomically. This is what ARM > SMMUv3 does after my changes. > >> Even if atomic replacement were to be implemented, >> it would be necessary to ensure that all translation requests, >> translated requests, page requests and responses for the old domain are >> drained before switching to the new domain. > > Again, no it isn't required. > > Requests simply have to continue to be acked, it doesn't matter if > they are acked against the wrong domain because the device will simply > re-issue them.. Ah! I start to get your point now. Even a page fault response is postponed to a new address space, which possibly be another address space or hardware blocking state, the hardware just retries. As long as we flushes all caches (IOTLB and device TLB) during switching, the mappings of the old domain won't leak. So it's safe to keep page requests there. Do I get you correctly? Best regards, baolu