Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp233377pxj; Wed, 9 Jun 2021 22:27:12 -0700 (PDT) X-Google-Smtp-Source: ABdhPJweOmnMzDhs9H1+HO+EQUhgegn6fE8KHJlS556WurvaEqB64x14m1CyM+T+qocXCPvV0x2d X-Received: by 2002:a05:6402:3444:: with SMTP id l4mr3010784edc.259.1623302832373; Wed, 09 Jun 2021 22:27:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623302832; cv=none; d=google.com; s=arc-20160816; b=rMEIal8Sflib4Q3zq3ngK+9WdUfhEdVTv4lDY5jXg9sOkqxWLlKR5XuoYf3aSqZWqK KNdK6v3fk5UED72wcgiPTZifo5zVyaabnMXg8eNJuSR2XeL+qBY4070N7CKexyeefnGj cr1dYQ7tDU+k0ChCsSNZ+LEMKvp+6QdrjbtweQlI6T+ZkvDbGxqJqpQyrZLTm16jmCPQ j1xLl0uprd7QCRDGvx0315cd/BjIYnm15Z0ERunmRaVBAHyzq1G4ddC0W9GtdbP9aXuh 1tqLK8JHEENBD423rtaKPApBGy+WZdlRYGbPY1VXo009lzQzxAGI4egGZu2cv1hP6yY4 RXXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:message-id:references:in-reply-to :subject:cc:to:from:date:content-transfer-encoding:mime-version :sender:dkim-signature; bh=Bgf1x/nFWFQGqJpMO9ldu/lRRQ3GhGowilaUgKGPmUA=; b=Z3pB4XzEwZe2ApN+kZSw2TcHd+tZ9J8zA2t+C3ilTZhi+PjOTI24UwqWqIfJMnXHkH sMdEvCkLnilxiTZGG8/lwOteaq4jAEfXLvBMPliiI7lf5iKxlVTGuQy+jOqduduIe7CC nIcHPN9vO69EEqM6jqf2V7UDV2i1LYDSRp9463ICCj1gLQ7kama6OmCSBBXUDnYRQoMf h/cFmUAUQrTaB00iyROzHsaxVRuY1eKMCjbzC0EZGT4AXC6gRvsvL+BV0T9o6IXtYQpn u/HmS1L6qpVfyr8icS/hEB7eDEeJrhV+uEoWk7q8eq7ofD+li+Lk1nx/Ewzrl//irYGD uwzw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@mg.codeaurora.org header.s=smtp header.b=EeWzjBDh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id m9si1797343ejj.281.2021.06.09.22.26.47; Wed, 09 Jun 2021 22:27:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@mg.codeaurora.org header.s=smtp header.b=EeWzjBDh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230027AbhFJF1L (ORCPT + 99 others); Thu, 10 Jun 2021 01:27:11 -0400 Received: from m43-7.mailgun.net ([69.72.43.7]:61239 "EHLO m43-7.mailgun.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229966AbhFJF1K (ORCPT ); Thu, 10 Jun 2021 01:27:10 -0400 DKIM-Signature: a=rsa-sha256; v=1; c=relaxed/relaxed; d=mg.codeaurora.org; q=dns/txt; s=smtp; t=1623302715; h=Message-ID: References: In-Reply-To: Subject: Cc: To: From: Date: Content-Transfer-Encoding: Content-Type: MIME-Version: Sender; bh=Bgf1x/nFWFQGqJpMO9ldu/lRRQ3GhGowilaUgKGPmUA=; b=EeWzjBDhR/7rX3NJoN94o3cnTFpWcfcJJpE0HFDfW6L1OT2tC4tpQo8g7x7lMR9x2XTU1d8y ireyGMvUU1xDrP5siM3HfdbXlt2g0VPjbaZGWwM9LA1druavlt6smFvPVqWzbJ4+aeCcbapk XaRePHGjhE0R+HSrWBVnTFWOJ3U= X-Mailgun-Sending-Ip: 69.72.43.7 X-Mailgun-Sid: WyI0MWYwYSIsICJsaW51eC1rZXJuZWxAdmdlci5rZXJuZWwub3JnIiwgImJlOWU0YSJd Received: from smtp.codeaurora.org (ec2-35-166-182-171.us-west-2.compute.amazonaws.com [35.166.182.171]) by smtp-out-n07.prod.us-east-1.postgun.com with SMTP id 60c1a22ce27c0cc77f461e9b (version=TLS1.2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256); Thu, 10 Jun 2021 05:25:00 GMT Sender: saiprakash.ranjan=codeaurora.org@mg.codeaurora.org Received: by smtp.codeaurora.org (Postfix, from userid 1001) id DDBA7C43144; Thu, 10 Jun 2021 05:24:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-caf-mail-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=ALL_TRUSTED,BAYES_00, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.codeaurora.org (localhost.localdomain [127.0.0.1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: saiprakash.ranjan) by smtp.codeaurora.org (Postfix) with ESMTPSA id 23D8BC4338A; Thu, 10 Jun 2021 05:24:58 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Thu, 10 Jun 2021 10:54:58 +0530 From: Sai Prakash Ranjan To: Robin Murphy Cc: Will Deacon , Joerg Roedel , iommu@lists.linux-foundation.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-arm-msm@vger.kernel.org Subject: Re: [PATCH] iommu/io-pgtable-arm: Optimize partial walk flush for large scatter-gather list In-Reply-To: References: <20210609145315.25750-1-saiprakash.ranjan@codeaurora.org> Message-ID: X-Sender: saiprakash.ranjan@codeaurora.org User-Agent: Roundcube Webmail/1.3.9 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Robin, On 2021-06-10 00:14, Robin Murphy wrote: > On 2021-06-09 15:53, Sai Prakash Ranjan wrote: >> Currently for iommu_unmap() of large scatter-gather list with page >> size >> elements, the majority of time is spent in flushing of partial walks >> in >> __arm_lpae_unmap() which is a VA based TLB invalidation (TLBIVA for >> arm-smmu). >> >> For example: to unmap a 32MB scatter-gather list with page size >> elements >> (8192 entries), there are 16->2MB buffer unmaps based on the pgsize >> (2MB >> for 4K granule) and each of 2MB will further result in 512 TLBIVAs >> (2MB/4K) >> resulting in a total of 8192 TLBIVAs (512*16) for 16->2MB causing a >> huge >> overhead. >> >> So instead use io_pgtable_tlb_flush_all() to invalidate the entire >> context >> if size (pgsize) is greater than the granule size (4K, 16K, 64K). For >> this >> example of 32MB scatter-gather list unmap, this results in just 16 >> ASID >> based TLB invalidations or tlb_flush_all() callback (TLBIASID in case >> of >> arm-smmu) as opposed to 8192 TLBIVAs thereby increasing the >> performance of >> unmaps drastically. >> >> Condition (size > granule size) is chosen for >> io_pgtable_tlb_flush_all() >> because for any granule with supported pgsizes, we will have at least >> 512 >> TLB invalidations for which tlb_flush_all() is already recommended. >> For >> example, take 4K granule with 2MB pgsize, this will result in 512 >> TLBIVA >> in partial walk flush. >> >> Test on QTI SM8150 SoC for 10 iterations of iommu_{map_sg}/unmap: >> (average over 10 iterations) >> >> Before this optimization: >> >> size iommu_map_sg iommu_unmap >> 4K 2.067 us 1.854 us >> 64K 9.598 us 8.802 us >> 1M 148.890 us 130.718 us >> 2M 305.864 us 67.291 us >> 12M 1793.604 us 390.838 us >> 16M 2386.848 us 518.187 us >> 24M 3563.296 us 775.989 us >> 32M 4747.171 us 1033.364 us >> >> After this optimization: >> >> size iommu_map_sg iommu_unmap >> 4K 1.723 us 1.765 us >> 64K 9.880 us 8.869 us >> 1M 155.364 us 135.223 us >> 2M 303.906 us 5.385 us >> 12M 1786.557 us 21.250 us >> 16M 2391.890 us 27.437 us >> 24M 3570.895 us 39.937 us >> 32M 4755.234 us 51.797 us >> >> This is further reduced once the map/unmap_pages() support gets in >> which >> will result in just 1 tlb_flush_all() as opposed to 16 >> tlb_flush_all(). >> >> Signed-off-by: Sai Prakash Ranjan >> --- >> drivers/iommu/io-pgtable-arm.c | 7 +++++-- >> 1 file changed, 5 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/iommu/io-pgtable-arm.c >> b/drivers/iommu/io-pgtable-arm.c >> index 87def58e79b5..c3cb9add3179 100644 >> --- a/drivers/iommu/io-pgtable-arm.c >> +++ b/drivers/iommu/io-pgtable-arm.c >> @@ -589,8 +589,11 @@ static size_t __arm_lpae_unmap(struct >> arm_lpae_io_pgtable *data, >> if (!iopte_leaf(pte, lvl, iop->fmt)) { >> /* Also flush any partial walks */ >> - io_pgtable_tlb_flush_walk(iop, iova, size, >> - ARM_LPAE_GRANULE(data)); >> + if (size > ARM_LPAE_GRANULE(data)) >> + io_pgtable_tlb_flush_all(iop); >> + else > > Erm, when will the above condition ever not be true? ;) > Ah right, silly me :) > Taking a step back, though, what about the impact to drivers other > than SMMUv2? Other drivers would be msm_iommu.c, qcom_iommu.c which does the same thing as arm-smmu-v2 (page based invalidations), then there is ipmmu-vmsa.c which does tlb_flush_all() for flush walk. > In particular I'm thinking of SMMUv3.2 where the whole > range can be invalidated by VA in a single command anyway, so the > additional penalties of TLBIALL are undesirable. > Right, so I am thinking we can have a new generic quirk IO_PGTABLE_QUIRK_RANGE_INV to choose between range based invalidations(tlb_flush_walk) and tlb_flush_all(). In this case of arm-smmu-v3.2, we can tie up ARM_SMMU_FEAT_RANGE_INV with this quirk and have something like below, thoughts? if (iop->cfg.quirks & IO_PGTABLE_QUIRK_RANGE_INV) io_pgtable_tlb_flush_walk(iop, iova, size, ARM_LPAE_GRANULE(data)); else io_pgtable_tlb_flush_all(iop); Thanks, Sai -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation