Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp395077imm; Thu, 26 Jul 2018 22:07:14 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdFnU39J5P0b3QjvlCrfEAXj01zn6XbRfutpqp5uUkDqXdnBXJu/y21bOx+YSS6yeIeqGy4 X-Received: by 2002:a17:902:9687:: with SMTP id n7-v6mr4510614plp.33.1532668034876; Thu, 26 Jul 2018 22:07:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532668034; cv=none; d=google.com; s=arc-20160816; b=ihGXBQPh8Npo9adKLsx0cHEzqeJeN4ESjvAn8G3/AmzYbqG/yEmMcj2JIjr2lWtkyP lHxDOnb+Kp8M+K8KYKtXhXrwiWC5K3hcuGsBEkAgNBEGKrietk/2WFN5B5E22kbYP7Bf 0ltx32hesrFT4ZpvHyXMgmI1FIGsn3NVPs8ZCRzLq1lD3DoEFpgprYr7BCGXxEvlObOc 0/Q42A0bzT+LTGCpAEDQTAzDZ0p/3rt8ve7IdA+BVdwxO7F432QQMI23MiNA47cXChpK kwG/ndn2ERPKHcv1MAJYNwUf1X5pCqE0DzsY3zrnCVjJVSR2090l+sAC6GzjDO1khL37 lvgQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language:in-reply-to:mime-version :user-agent:date:message-id:from:references:cc:to:subject :dmarc-filter:dkim-signature:dkim-signature :arc-authentication-results; bh=VX+yu3+ASO4NvtAMc1zftQlRiFZKSAvWGNjjwTx+qvw=; b=PrDgO3jS3o2hj7ru9VIbnZWZh4jaIeDQwh0KXDK1l/VeeqZOqBNsV3yOS5jRtFzBsU WWgtJDQBD8v00AojN4Z+f1WN8ztX/kSTESeKMdjC4xRdhymdVKn5toB4bF1zHOImIX9M +K/LRy/juKz7JIjGSi+VWb4Sa49acIpbCAUPRBh2FvYI8pWvMqvBn5ZpRngxtzuTEp6Z uR5oEmnjFbQ4pfw2Q+V3xi0quApKLMPdYjiotFpUB1rmj45Plmq7eFTOiEsInMpNaQcK WEDbftEtwMxLZbCkfiDWcrz9aOvQUM1m7nS61vbUHtGG8ntvx/dbfZ9DCM7WHBvKU1uG nhvQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=jHoQoQJF; dkim=pass header.i=@codeaurora.org header.s=default header.b=gOnKiZiy; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i6-v6si3012620pgk.517.2018.07.26.22.06.55; Thu, 26 Jul 2018 22:07:14 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=jHoQoQJF; dkim=pass header.i=@codeaurora.org header.s=default header.b=gOnKiZiy; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729116AbeG0G0C (ORCPT + 99 others); Fri, 27 Jul 2018 02:26:02 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:43854 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725964AbeG0G0B (ORCPT ); Fri, 27 Jul 2018 02:26:01 -0400 Received: by smtp.codeaurora.org (Postfix, from userid 1000) id 014E760B11; Fri, 27 Jul 2018 05:05:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1532667956; bh=YSeXJArDilKnCUKCo/D8EH/nX6X+H9sKbNzYa41Ssfg=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=jHoQoQJFvNDxrSYSDzK7EGeXJQ/dh6GacIY61ZX1p7hyITYY650MVRmMwDKJNO/f5 KjMNfQs1fKYgsQxJ5xwgIV9Rpmu9K8YacMOWl9pqF7t2jNMR5n0jHxwZxDjZS/I9bF GLRG9fb4vmxtw4WuUAI++kZ+EaJyozqleihyufYQ= X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on pdx-caf-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=ALL_TRUSTED,BAYES_00, DKIM_SIGNED,HTML_MESSAGE,T_DKIM_INVALID autolearn=no autolearn_force=no version=3.4.0 Received: from [10.79.40.85] (blr-bdr-fw-01_globalnat_allzones-outside.qualcomm.com [103.229.18.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: vivek.gautam@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id 9BDC360594; Fri, 27 Jul 2018 05:05:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1532667953; bh=YSeXJArDilKnCUKCo/D8EH/nX6X+H9sKbNzYa41Ssfg=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=gOnKiZiy0JlYV7KAxqe3oBrjwHJ4uze/+ZFjibBgNUkh2OejZkmZa3Obp5tV8br94 VzTTZYw8SRHSj+573H6nJR/vo0nf6K8tX+Jto8+Bm6L4P1FBfe2+VbQ+RUjkkAbjv0 NIloIIDX7qE70Ds5WSKBvSJMtDbX4ItZq/NdLZEQ= DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 9BDC360594 Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=vivek.gautam@codeaurora.org Subject: Re: [PATCH v13 1/4] iommu/arm-smmu: Add pm_runtime/sleep ops To: Robin Murphy Cc: Joerg Roedel , robh+dt , "Rafael J. Wysocki" , Will Deacon , "list@263.net:IOMMU DRIVERS" , "open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS" , open list , Mark Rutland , Linux PM , sboyd@kernel.org, Lukas Wunner , linux-arm-msm , freedreno , Tomasz Figa References: <20180719101539.6104-1-vivek.gautam@codeaurora.org> <20180719101539.6104-2-vivek.gautam@codeaurora.org> <401d8509-b9ad-913b-334e-f4ac853472e3@arm.com> From: Vivek Gautam Message-ID: Date: Fri, 27 Jul 2018 10:35:46 +0530 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <401d8509-b9ad-913b-334e-f4ac853472e3@arm.com> Content-Type: multipart/alternative; boundary="------------37BEB597666699A5F82FD777" Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is a multi-part message in MIME format. --------------37BEB597666699A5F82FD777 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit On 7/26/2018 9:00 PM, Robin Murphy wrote: > On 26/07/18 08:12, Vivek Gautam wrote: >> On Wed, Jul 25, 2018 at 11:46 PM, Vivek Gautam >> wrote: >>> On Tue, Jul 24, 2018 at 8:51 PM, Robin Murphy >>> wrote: >>>> On 19/07/18 11:15, Vivek Gautam wrote: >>>>> >>>>> From: Sricharan R >>>>> >>>>> The smmu needs to be functional only when the respective >>>>> master's using it are active. The device_link feature >>>>> helps to track such functional dependencies, so that the >>>>> iommu gets powered when the master device enables itself >>>>> using pm_runtime. So by adapting the smmu driver for >>>>> runtime pm, above said dependency can be addressed. >>>>> >>>>> This patch adds the pm runtime/sleep callbacks to the >>>>> driver and also the functions to parse the smmu clocks >>>>> from DT and enable them in resume/suspend. >>>>> >>>>> Also, while we enable the runtime pm add a pm sleep suspend >>>>> callback that pushes devices to low power state by turning >>>>> the clocks off in a system sleep. >>>>> Also add corresponding clock enable path in resume callback. >>>>> >>>>> Signed-off-by: Sricharan R >>>>> Signed-off-by: Archit Taneja >>>>> [vivek: rework for clock and pm ops] >>>>> Signed-off-by: Vivek Gautam >>>>> Reviewed-by: Tomasz Figa >>>>> --- >>>>> >>>>> Changes since v12: >>>>>    - Added pm sleep .suspend callback. This disables the clocks. >>>>>    - Added corresponding change to enable clocks in .resume >>>>>     pm sleep callback. >>>>> >>>>>    drivers/iommu/arm-smmu.c | 75 >>>>> ++++++++++++++++++++++++++++++++++++++++++++++-- >>>>>    1 file changed, 73 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c >>>>> index c73cfce1ccc0..9138a6fffe04 100644 >>>>> --- a/drivers/iommu/arm-smmu.c >>>>> +++ b/drivers/iommu/arm-smmu.c >> >> [snip] >> >>>>> platform_device *pdev) >>>>>          arm_smmu_device_remove(pdev); >>>>>    } >>>>>    +static int __maybe_unused arm_smmu_runtime_resume(struct >>>>> device *dev) >>>>> +{ >>>>> +       struct arm_smmu_device *smmu = dev_get_drvdata(dev); >>>>> + >>>>> +       return clk_bulk_enable(smmu->num_clks, smmu->clks); >>>> >>>> >>>> If there's a power domain being automatically switched by genpd >>>> then we need >>>> a reset here because we may have lost state entirely. Since I >>>> remembered the >>>> otherwise-useless GPU SMMU on Juno is in a separate power domain, I >>>> gave it >>>> a poking via sysfs with some debug stuff to dump sCR0 in these >>>> callbacks, >>>> and the problem is clear: >>>> >>>> ... >>>> [    4.625551] arm-smmu 2b400000.iommu: genpd_runtime_suspend() >>>> [    4.631163] arm-smmu 2b400000.iommu: arm_smmu_runtime_suspend: >>>> 0x00201936 >>>> [    4.637897] arm-smmu 2b400000.iommu: suspend latency exceeded, >>>> 6733980 ns >>>> [   21.566983] arm-smmu 2b400000.iommu: genpd_runtime_resume() >>>> [   21.584796] arm-smmu 2b400000.iommu: arm_smmu_runtime_resume: >>>> 0x00220101 >>>> [   21.591452] arm-smmu 2b400000.iommu: resume latency exceeded, >>>> 6658020 ns >>>> ... >>> >>> Qualcomm SoCs have retention enabled for SMMU registers so they don't >>> lose state. >>> ... >>> [  256.013367] arm-smmu b40000.arm,smmu: arm_smmu_runtime_suspend >>> SCR0 = 0x201e36 >>> [  256.013367] >>> [  256.019160] arm-smmu b40000.arm,smmu: arm_smmu_runtime_resume >>> SCR0 = 0x201e36 >>> [  256.019160] >>> [  256.027368] arm-smmu b40000.arm,smmu: arm_smmu_runtime_suspend >>> SCR0 = 0x201e36 >>> [  256.027368] >>> [  256.036786] arm-smmu b40000.arm,smmu: arm_smmu_runtime_resume >>> SCR0 = 0x201e36 >>> ... >>> >>> However after adding arm_smmu_device_reset() in runtime_resume() I >>> observe >>> some performance degradation when kill an instance of 'kmscube' and >>> start it again. >>> The launch time with arm_smmu_device_reset() in runtime_resume() >>> change is >>> more. >>> Could this be because of frequent TLB invalidation and sync? > > Probably. Plus the reset procedure is a big chunk of MMIO accesses, > which for a non-trivial SMMU configuration probably isn't negligible > in itself. Unfortunately, unless you know for absolute certain that > you don't need to do that, you do. > >> Some more information that i gathered. >> On Qcom SoCs besides the registers retention, TCU invalidates TLB >> cache on >> a CX power collapse exit, which is the system wide suspend case. >> The arm-smmu software is not aware of this CX power collapse / >> auto-invalidation. >> >> So wouldn't doing an explicit TLB invalidations during runtime resume be >> detrimental to performance? > > Indeed it would be, but resuming with TLBs full of random > valid-looking junk is even more so. > >> I have one more doubt here - >> We do runtime power cycle around arm_smmu_map/unmap() too. >> Now during map/unmap we selectively do TLB maintenance (either >> tlb_sync or tlb_add_flush). >> But with runtime pm we want to do TLBIALL*. Is that a problem? > > It's technically redundant to do both, true, but as we've covered in > previous rounds of discussion it's very difficult to know *which* one > is sufficient at any given time, so in order to make progress for now > I think we have to settle with doing both. Thanks Robin. I will respin the patches as Tomasz also suggested; arm_smmu_runtime_resume() will look like:     if (pm_runtime_suspended(dev))         return 0;     return arm_smmu_runtime_resume(dev); and, arm_smmu_runtime_resume() will have arm_smmu_device_reset(). Best regards Vivek > > Robin. > -- > To unsubscribe from this list: send the line "unsubscribe > linux-arm-msm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at  http://vger.kernel.org/majordomo-info.html --------------37BEB597666699A5F82FD777 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit



On 7/26/2018 9:00 PM, Robin Murphy wrote:
On 26/07/18 08:12, Vivek Gautam wrote:
On Wed, Jul 25, 2018 at 11:46 PM, Vivek Gautam
<vivek.gautam@codeaurora.org> wrote:
On Tue, Jul 24, 2018 at 8:51 PM, Robin Murphy <robin.murphy@arm.com> wrote:
On 19/07/18 11:15, Vivek Gautam wrote:

From: Sricharan R <sricharan@codeaurora.org>

The smmu needs to be functional only when the respective
master's using it are active. The device_link feature
helps to track such functional dependencies, so that the
iommu gets powered when the master device enables itself
using pm_runtime. So by adapting the smmu driver for
runtime pm, above said dependency can be addressed.

This patch adds the pm runtime/sleep callbacks to the
driver and also the functions to parse the smmu clocks
from DT and enable them in resume/suspend.

Also, while we enable the runtime pm add a pm sleep suspend
callback that pushes devices to low power state by turning
the clocks off in a system sleep.
Also add corresponding clock enable path in resume callback.

Signed-off-by: Sricharan R <sricharan@codeaurora.org>
Signed-off-by: Archit Taneja <architt@codeaurora.org>
[vivek: rework for clock and pm ops]
Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
---

Changes since v12:
   - Added pm sleep .suspend callback. This disables the clocks.
   - Added corresponding change to enable clocks in .resume
    pm sleep callback.

   drivers/iommu/arm-smmu.c | 75
++++++++++++++++++++++++++++++++++++++++++++++--
   1 file changed, 73 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c73cfce1ccc0..9138a6fffe04 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c

[snip]

platform_device *pdev)
         arm_smmu_device_remove(pdev);
   }
   +static int __maybe_unused arm_smmu_runtime_resume(struct device *dev)
+{
+       struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+
+       return clk_bulk_enable(smmu->num_clks, smmu->clks);


If there's a power domain being automatically switched by genpd then we need
a reset here because we may have lost state entirely. Since I remembered the
otherwise-useless GPU SMMU on Juno is in a separate power domain, I gave it
a poking via sysfs with some debug stuff to dump sCR0 in these callbacks,
and the problem is clear:

...
[    4.625551] arm-smmu 2b400000.iommu: genpd_runtime_suspend()
[    4.631163] arm-smmu 2b400000.iommu: arm_smmu_runtime_suspend: 0x00201936
[    4.637897] arm-smmu 2b400000.iommu: suspend latency exceeded, 6733980 ns
[   21.566983] arm-smmu 2b400000.iommu: genpd_runtime_resume()
[   21.584796] arm-smmu 2b400000.iommu: arm_smmu_runtime_resume: 0x00220101
[   21.591452] arm-smmu 2b400000.iommu: resume latency exceeded, 6658020 ns
...

Qualcomm SoCs have retention enabled for SMMU registers so they don't
lose state.
...
[  256.013367] arm-smmu b40000.arm,smmu: arm_smmu_runtime_suspend
SCR0 = 0x201e36
[  256.013367]
[  256.019160] arm-smmu b40000.arm,smmu: arm_smmu_runtime_resume
SCR0 = 0x201e36
[  256.019160]
[  256.027368] arm-smmu b40000.arm,smmu: arm_smmu_runtime_suspend
SCR0 = 0x201e36
[  256.027368]
[  256.036786] arm-smmu b40000.arm,smmu: arm_smmu_runtime_resume
SCR0 = 0x201e36
...

However after adding arm_smmu_device_reset() in runtime_resume() I observe
some performance degradation when kill an instance of 'kmscube' and
start it again.
The launch time with arm_smmu_device_reset() in runtime_resume() change is
more.
Could this be because of frequent TLB invalidation and sync?

Probably. Plus the reset procedure is a big chunk of MMIO accesses, which for a non-trivial SMMU configuration probably isn't negligible in itself. Unfortunately, unless you know for absolute certain that you don't need to do that, you do.

Some more information that i gathered.
On Qcom SoCs besides the registers retention, TCU invalidates TLB cache on
a CX power collapse exit, which is the system wide suspend case.
The arm-smmu software is not aware of this CX power collapse /
auto-invalidation.

So wouldn't doing an explicit TLB invalidations during runtime resume be
detrimental to performance?

Indeed it would be, but resuming with TLBs full of random valid-looking junk is even more so.

I have one more doubt here -
We do runtime power cycle around arm_smmu_map/unmap() too.
Now during map/unmap we selectively do TLB maintenance (either
tlb_sync or tlb_add_flush).
But with runtime pm we want to do TLBIALL*. Is that a problem?

It's technically redundant to do both, true, but as we've covered in previous rounds of discussion it's very difficult to know *which* one is sufficient at any given time, so in order to make progress for now I think we have to settle with doing both.

Thanks Robin. I will respin the patches as Tomasz also suggested;

arm_smmu_runtime_resume() will look like:


    if (pm_runtime_suspended(dev))
        return 0;
    return arm_smmu_runtime_resume(dev);

and,
arm_smmu_runtime_resume() will have arm_smmu_device_reset().

Best regards
Vivek

Robin.
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--------------37BEB597666699A5F82FD777--