Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18;
Subject: Re: [PATCH v3 5/6] iommu/amd: Tailored gather logic for AMD
To:     Nadav Amit <nadav.amit@gmail.com>, Joerg Roedel <joro@8bytes.org>
Cc:     linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org,
        Nadav Amit <namit@vmware.com>,
        Jiajun Cao <caojiajun@vmware.com>,
        Will Deacon <will@kernel.org>
References: <20210607182541.119756-1-namit@vmware.com>
 <20210607182541.119756-6-namit@vmware.com>
From:   Robin Murphy <robin.murphy@arm.com>
Message-ID: <1913c012-e6c0-1d5e-01b3-5f6da367c6bd@arm.com>
Date:   Tue, 15 Jun 2021 13:55:18 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101
 Thunderbird/78.10.1
MIME-Version: 1.0
In-Reply-To: <20210607182541.119756-6-namit@vmware.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-GB
Content-Transfer-Encoding: 7bit
Precedence: bulk

On 2021-06-07 19:25, Nadav Amit wrote:
> From: Nadav Amit <namit@vmware.com>
> 
> AMD's IOMMU can flush efficiently (i.e., in a single flush) any range.
> This is in contrast, for instnace, to Intel IOMMUs that have a limit on
> the number of pages that can be flushed in a single flush.  In addition,
> AMD's IOMMU do not care about the page-size, so changes of the page size
> do not need to trigger a TLB flush.
> 
> So in most cases, a TLB flush due to disjoint range or page-size changes
> are not needed for AMD. Yet, vIOMMUs require the hypervisor to
> synchronize the virtualized IOMMU's PTEs with the physical ones. This
> process induce overheads, so it is better not to cause unnecessary
> flushes, i.e., flushes of PTEs that were not modified.
> 
> Implement and use amd_iommu_iotlb_gather_add_page() and use it instead
> of the generic iommu_iotlb_gather_add_page(). Ignore page-size changes
> and disjoint regions unless "non-present cache" feature is reported by
> the IOMMU capabilities, as this is an indication we are running on a
> physical IOMMU. A similar indication is used by VT-d (see "caching
> mode"). The new logic retains the same flushing behavior that we had
> before the introduction of page-selective IOTLB flushes for AMD.
> 
> On virtualized environments, check if the newly flushed region and the
> gathered one are disjoint and flush if it is. Also check whether the new
> region would cause IOTLB invalidation of large region that would include
> unmodified PTE. The latter check is done according to the "order" of the
> IOTLB flush.

If it helps,

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

I wonder if it might be more effective to defer the alignment-based 
splitting part to amd_iommu_iotlb_sync() itself, but that could be 
investigated as another follow-up.

Robin.

> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Will Deacon <will@kernel.org>
> Cc: Jiajun Cao <caojiajun@vmware.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Cc: iommu@lists.linux-foundation.org
> Cc: linux-kernel@vger.kernel.org>
> Signed-off-by: Nadav Amit <namit@vmware.com>
> ---
>   drivers/iommu/amd/iommu.c | 44 ++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 43 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 3e40f6610b6a..128f2e889ced 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -2053,6 +2053,48 @@ static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova,
>   	return ret;
>   }
>   
> +static void amd_iommu_iotlb_gather_add_page(struct iommu_domain *domain,
> +					    struct iommu_iotlb_gather *gather,
> +					    unsigned long iova, size_t size)
> +{
> +	/*
> +	 * AMD's IOMMU can flush as many pages as necessary in a single flush.
> +	 * Unless we run in a virtual machine, which can be inferred according
> +	 * to whether "non-present cache" is on, it is probably best to prefer
> +	 * (potentially) too extensive TLB flushing (i.e., more misses) over
> +	 * mutliple TLB flushes (i.e., more flushes). For virtual machines the
> +	 * hypervisor needs to synchronize the host IOMMU PTEs with those of
> +	 * the guest, and the trade-off is different: unnecessary TLB flushes
> +	 * should be avoided.
> +	 */
> +	if (amd_iommu_np_cache && gather->end != 0) {
> +		unsigned long start = iova, end = start + size - 1;
> +
> +		if (iommu_iotlb_gather_is_disjoint(gather, iova, size)) {
> +			/*
> +			 * If the new page is disjoint from the current range,
> +			 * flush.
> +			 */
> +			iommu_iotlb_sync(domain, gather);
> +		} else {
> +			/*
> +			 * If the order of TLB flushes increases by more than
> +			 * 1, it means that we would have to flush PTEs that
> +			 * were not modified. In this case, flush.
> +			 */
> +			unsigned long new_start = min(gather->start, start);
> +			unsigned long new_end = min(gather->end, end);
> +			int msb_diff = fls64(gather->end ^ gather->start);
> +			int new_msb_diff = fls64(new_end ^ new_start);
> +
> +			if (new_msb_diff > msb_diff + 1)
> +				iommu_iotlb_sync(domain, gather);
> +		}
> +	}
> +
> +	iommu_iotlb_gather_add_range(gather, iova, size);
> +}
> +
>   static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
>   			      size_t page_size,
>   			      struct iommu_iotlb_gather *gather)
> @@ -2067,7 +2109,7 @@ static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
>   
>   	r = (ops->unmap) ? ops->unmap(ops, iova, page_size, gather) : 0;
>   
> -	iommu_iotlb_gather_add_page(dom, gather, iova, page_size);
> +	amd_iommu_iotlb_gather_add_page(dom, gather, iova, page_size);
>   
>   	return r;
>   }
>