Often devices allocate dma buffers before they do
runtime pm resume. This is the case for example with v4l2
devices where buffers are allocated during 'VIDIOC_REQBUFS`
and runtime resume happens later usually during 'VIDIOC_STREAMON'.
In such cases the partial tlb flush when allocating will fail
since the the iommu is runtime suspended. This will print a warning
and try to do full flush. But there is actually no need to flush
the tlb before the consumer device is turned on.
Fix the warning by skipping parital flush when allocating and instead
do full flash in runtime resume
This patchset is a combination of a patch already sent in a different
patchset: [1] and a warning fix from Sebastian Reichel
[1] https://lore.kernel.org/linux-devicetree/[email protected]/
Sebastian Reichel (1):
iommu/mediatek: always check runtime PM status in tlb flush range
callback
Yong Wu (1):
iommu/mediatek: Always tlb_flush_all when each PM resume
drivers/iommu/mtk_iommu.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
--
2.17.1
From: Yong Wu <[email protected]>
Prepare for 2 HWs that sharing pgtable in different power-domains.
When there are 2 M4U HWs, it may has problem in the flush_range in which
we get the pm_status via the m4u dev, BUT that function don't reflect the
real power-domain status of the HW since there may be other HW also use
that power-domain.
The function dma_alloc_attrs help allocate the iommu buffer which
need the corresponding power domain since tlb flush is needed when
preparing iova. BUT this function only is for allocating buffer,
we have no good reason to request the user always call pm_runtime_get
before calling dma_alloc_xxx. Therefore, we add a tlb_flush_all
in the pm_runtime_resume to make sure the tlb always is clean.
Another solution is always call pm_runtime_get in the tlb_flush_range.
This will trigger pm runtime resume/backup so often when the iommu
power is not active at some time(means user don't call pm_runtime_get
before calling dma_alloc_xxx), This may cause the performance drop.
thus we don't use this.
In other case, the iommu's power should always be active via device
link with smi.
The previous SoC don't have PM except mt8192. the mt8192 IOMMU is display's
power-domain which nearly always is enabled. thus no need fix tags here.
Prepare for mt8195.
Signed-off-by: Yong Wu <[email protected]>
[imporvie inline doc]
Signed-off-by: Dafna Hirschfeld <[email protected]>
---
drivers/iommu/mtk_iommu.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 25b834104790..28dc4b95b6d9 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -964,6 +964,13 @@ static int __maybe_unused mtk_iommu_runtime_resume(struct device *dev)
return ret;
}
+ /*
+ * Users may allocate dma buffer before they call pm_runtime_get,
+ * in which case it will lack the necessary tlb flush.
+ * Thus, make sure to update the tlb after each PM resume.
+ */
+ mtk_iommu_tlb_flush_all(data);
+
/*
* Uppon first resume, only enable the clk and return, since the values of the
* registers are not yet set.
--
2.17.1
From: Sebastian Reichel <[email protected]>
In case of v4l2_reqbufs() it is possible, that a TLB flush is done
without runtime PM being enabled. In that case the "Partial TLB flush
timed out, falling back to full flush" warning is printed.
Commit c0b57581b73b ("iommu/mediatek: Add power-domain operation")
introduced has_pm as optimization to avoid checking runtime PM
when there is no power domain attached. But without the PM domain
there is still the device driver's runtime PM suspend handler, which
disables the clock. Thus flushing should also be avoided when there
is no PM domain involved.
Signed-off-by: Sebastian Reichel <[email protected]>
Reviewed-by: Dafna Hirschfeld <[email protected]>
---
drivers/iommu/mtk_iommu.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 28dc4b95b6d9..b0535fcfd1d7 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -227,16 +227,13 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size,
size_t granule,
struct mtk_iommu_data *data)
{
- bool has_pm = !!data->dev->pm_domain;
unsigned long flags;
int ret;
u32 tmp;
for_each_m4u(data) {
- if (has_pm) {
- if (pm_runtime_get_if_in_use(data->dev) <= 0)
- continue;
- }
+ if (pm_runtime_get_if_in_use(data->dev) <= 0)
+ continue;
spin_lock_irqsave(&data->tlb_lock, flags);
writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0,
@@ -261,8 +258,7 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size,
writel_relaxed(0, data->base + REG_MMU_CPE_DONE);
spin_unlock_irqrestore(&data->tlb_lock, flags);
- if (has_pm)
- pm_runtime_put(data->dev);
+ pm_runtime_put(data->dev);
}
}
--
2.17.1
Hi Dafna,
Sorry for reply late.
On Mon, 2021-11-22 at 12:43 +0200, Dafna Hirschfeld wrote:
> From: Yong Wu <[email protected]>
>
> Prepare for 2 HWs that sharing pgtable in different power-domains.
>
> When there are 2 M4U HWs, it may has problem in the flush_range in
> which
> we get the pm_status via the m4u dev, BUT that function don't reflect
> the
> real power-domain status of the HW since there may be other HW also
> use
> that power-domain.
>
> The function dma_alloc_attrs help allocate the iommu buffer which
> need the corresponding power domain since tlb flush is needed when
> preparing iova. BUT this function only is for allocating buffer,
> we have no good reason to request the user always call pm_runtime_get
> before calling dma_alloc_xxx. Therefore, we add a tlb_flush_all
> in the pm_runtime_resume to make sure the tlb always is clean.
>
> Another solution is always call pm_runtime_get in the
> tlb_flush_range.
> This will trigger pm runtime resume/backup so often when the iommu
> power is not active at some time(means user don't call pm_runtime_get
> before calling dma_alloc_xxx), This may cause the performance drop.
> thus we don't use this.
>
> In other case, the iommu's power should always be active via device
> link with smi.
>
> The previous SoC don't have PM except mt8192. the mt8192 IOMMU is
> display's
> power-domain which nearly always is enabled. thus no need fix tags
> here.
> Prepare for mt8195.
In this patchset, this message should be not proper. I think you could
add the comment why this patch is needed in mt8173.
>
> Signed-off-by: Yong Wu <[email protected]>
> [imporvie inline doc]
> Signed-off-by: Dafna Hirschfeld <[email protected]>
> ---
> drivers/iommu/mtk_iommu.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 25b834104790..28dc4b95b6d9 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -964,6 +964,13 @@ static int __maybe_unused
> mtk_iommu_runtime_resume(struct device *dev)
> return ret;
> }
>
> + /*
> + * Users may allocate dma buffer before they call
> pm_runtime_get,
> + * in which case it will lack the necessary tlb flush.
> + * Thus, make sure to update the tlb after each PM resume.
> + */
> + mtk_iommu_tlb_flush_all(data);
This should not work. since current the *_tlb_flush_all call
pm_runtime_get_if_in_use which will always return 0 when it called from
this runtime_cb in my test. thus, It won't do the tlb_flush_all
actually.
I guess this also depend on these two patches of mt8195 v3.
[PATCH v3 09/33] iommu/mediatek: Remove for_each_m4u in tlb_sync_all
[PATCH v3 10/33] iommu/mediatek: Add tlb_lock in tlb_flush_all
like in [10/33], I added a mtk_iommu_tlb_do_flush_all which don't have
the pm operation.
This looks has a dependence. Let me know if I can help this.
> +
> /*
> * Uppon first resume, only enable the clk and return, since
> the values of the
> * registers are not yet set.
On Mon, 2021-11-22 at 12:44 +0200, Dafna Hirschfeld wrote:
> From: Sebastian Reichel <[email protected]>
>
> In case of v4l2_reqbufs() it is possible, that a TLB flush is done
> without runtime PM being enabled. In that case the "Partial TLB flush
> timed out, falling back to full flush" warning is printed.
>
> Commit c0b57581b73b ("iommu/mediatek: Add power-domain operation")
> introduced has_pm as optimization to avoid checking runtime PM
> when there is no power domain attached. But without the PM domain
> there is still the device driver's runtime PM suspend handler, which
> disables the clock. Thus flushing should also be avoided when there
> is no PM domain involved.
>
> Signed-off-by: Sebastian Reichel <[email protected]>
> Reviewed-by: Dafna Hirschfeld <[email protected]>
Reviewed-by: Yong Wu <[email protected]>
> ---
> drivers/iommu/mtk_iommu.c | 10 +++-------
> 1 file changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 28dc4b95b6d9..b0535fcfd1d7 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -227,16 +227,13 @@ static void
> mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size,
> size_t granule,
> struct mtk_iommu_data *data)
> {
> - bool has_pm = !!data->dev->pm_domain;
> unsigned long flags;
> int ret;
> u32 tmp;
>
> for_each_m4u(data) {
> - if (has_pm) {
> - if (pm_runtime_get_if_in_use(data->dev) <= 0)
> - continue;
> - }
> + if (pm_runtime_get_if_in_use(data->dev) <= 0)
> + continue;
>
> spin_lock_irqsave(&data->tlb_lock, flags);
> writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0,
> @@ -261,8 +258,7 @@ static void
> mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size,
> writel_relaxed(0, data->base + REG_MMU_CPE_DONE);
> spin_unlock_irqrestore(&data->tlb_lock, flags);
>
> - if (has_pm)
> - pm_runtime_put(data->dev);
> + pm_runtime_put(data->dev);
> }
> }
>
On 27.11.21 04:46, Yong Wu wrote:
> Hi Dafna,
>
> Sorry for reply late.
>
> On Mon, 2021-11-22 at 12:43 +0200, Dafna Hirschfeld wrote:
>> From: Yong Wu <[email protected]>
>>
>> Prepare for 2 HWs that sharing pgtable in different power-domains.
>>
>> When there are 2 M4U HWs, it may has problem in the flush_range in
>> which
>> we get the pm_status via the m4u dev, BUT that function don't reflect
>> the
>> real power-domain status of the HW since there may be other HW also
>> use
>> that power-domain.
>>
>> The function dma_alloc_attrs help allocate the iommu buffer which
>> need the corresponding power domain since tlb flush is needed when
>> preparing iova. BUT this function only is for allocating buffer,
>> we have no good reason to request the user always call pm_runtime_get
>> before calling dma_alloc_xxx. Therefore, we add a tlb_flush_all
>> in the pm_runtime_resume to make sure the tlb always is clean.
>>
>> Another solution is always call pm_runtime_get in the
>> tlb_flush_range.
>> This will trigger pm runtime resume/backup so often when the iommu
>> power is not active at some time(means user don't call pm_runtime_get
>> before calling dma_alloc_xxx), This may cause the performance drop.
>> thus we don't use this.
>>
>> In other case, the iommu's power should always be active via device
>> link with smi.
>>
>> The previous SoC don't have PM except mt8192. the mt8192 IOMMU is
>> display's
>> power-domain which nearly always is enabled. thus no need fix tags
>> here.
>> Prepare for mt8195.
>
> In this patchset, this message should be not proper. I think you could
> add the comment why this patch is needed in mt8173.
>
>>
>> Signed-off-by: Yong Wu <[email protected]>
>> [imporvie inline doc]
>> Signed-off-by: Dafna Hirschfeld <[email protected]>
>> ---
>> drivers/iommu/mtk_iommu.c | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
>> index 25b834104790..28dc4b95b6d9 100644
>> --- a/drivers/iommu/mtk_iommu.c
>> +++ b/drivers/iommu/mtk_iommu.c
>> @@ -964,6 +964,13 @@ static int __maybe_unused
>> mtk_iommu_runtime_resume(struct device *dev)
>> return ret;
>> }
>>
>> + /*
>> + * Users may allocate dma buffer before they call
>> pm_runtime_get,
>> + * in which case it will lack the necessary tlb flush.
>> + * Thus, make sure to update the tlb after each PM resume.
>> + */
>> + mtk_iommu_tlb_flush_all(data);
>
> This should not work. since current the *_tlb_flush_all call
> pm_runtime_get_if_in_use which will always return 0 when it called from
> this runtime_cb in my test. thus, It won't do the tlb_flush_all
> actually.
>
> I guess this also depend on these two patches of mt8195 v3.
> [PATCH v3 09/33] iommu/mediatek: Remove for_each_m4u in tlb_sync_all
> [PATCH v3 10/33] iommu/mediatek: Add tlb_lock in tlb_flush_all
>
> like in [10/33], I added a mtk_iommu_tlb_do_flush_all which don't have
> the pm operation.
>
> This looks has a dependence. Let me know if I can help this.
It did work for me, testing on elm device. I'll check that again.
>
>> +
>> /*
>> * Uppon first resume, only enable the clk and return, since
>> the values of the
>> * registers are not yet set.
On 07.12.21 10:31, Dafna Hirschfeld wrote:
>
>
> On 27.11.21 04:46, Yong Wu wrote:
>> Hi Dafna,
>>
>> Sorry for reply late.
>>
>> On Mon, 2021-11-22 at 12:43 +0200, Dafna Hirschfeld wrote:
>>> From: Yong Wu <[email protected]>
>>>
>>> Prepare for 2 HWs that sharing pgtable in different power-domains.
>>>
>>> When there are 2 M4U HWs, it may has problem in the flush_range in
>>> which
>>> we get the pm_status via the m4u dev, BUT that function don't reflect
>>> the
>>> real power-domain status of the HW since there may be other HW also
>>> use
>>> that power-domain.
>>>
>>> The function dma_alloc_attrs help allocate the iommu buffer which
>>> need the corresponding power domain since tlb flush is needed when
>>> preparing iova. BUT this function only is for allocating buffer,
>>> we have no good reason to request the user always call pm_runtime_get
>>> before calling dma_alloc_xxx. Therefore, we add a tlb_flush_all
>>> in the pm_runtime_resume to make sure the tlb always is clean.
>>>
>>> Another solution is always call pm_runtime_get in the
>>> tlb_flush_range.
>>> This will trigger pm runtime resume/backup so often when the iommu
>>> power is not active at some time(means user don't call pm_runtime_get
>>> before calling dma_alloc_xxx), This may cause the performance drop.
>>> thus we don't use this.
>>>
>>> In other case, the iommu's power should always be active via device
>>> link with smi.
>>>
>>> The previous SoC don't have PM except mt8192. the mt8192 IOMMU is
>>> display's
>>> power-domain which nearly always is enabled. thus no need fix tags
>>> here.
>>> Prepare for mt8195.
>>
>> In this patchset, this message should be not proper. I think you could
>> add the comment why this patch is needed in mt8173.
>>
>>>
>>> Signed-off-by: Yong Wu <[email protected]>
>>> [imporvie inline doc]
>>> Signed-off-by: Dafna Hirschfeld <[email protected]>
>>> ---
>>> drivers/iommu/mtk_iommu.c | 7 +++++++
>>> 1 file changed, 7 insertions(+)
>>>
>>> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
>>> index 25b834104790..28dc4b95b6d9 100644
>>> --- a/drivers/iommu/mtk_iommu.c
>>> +++ b/drivers/iommu/mtk_iommu.c
>>> @@ -964,6 +964,13 @@ static int __maybe_unused
>>> mtk_iommu_runtime_resume(struct device *dev)
>>> return ret;
>>> }
>>> + /*
>>> + * Users may allocate dma buffer before they call
>>> pm_runtime_get,
>>> + * in which case it will lack the necessary tlb flush.
>>> + * Thus, make sure to update the tlb after each PM resume.
>>> + */
>>> + mtk_iommu_tlb_flush_all(data);
>>
>> This should not work. since current the *_tlb_flush_all call
>> pm_runtime_get_if_in_use which will always return 0 when it called from
>> this runtime_cb in my test. thus, It won't do the tlb_flush_all
>> actually.
He, indeed, my mistake, although the encoder works more or less fine even
without the full flush so I didn't catch that.
>>
>> I guess this also depend on these two patches of mt8195 v3.
>> [PATCH v3 09/33] iommu/mediatek: Remove for_each_m4u in tlb_sync_all
>> [PATCH v3 10/33] iommu/mediatek: Add tlb_lock in tlb_flush_all
I'll add those two
>>
>> like in [10/33], I added a mtk_iommu_tlb_do_flush_all which don't have
>> the pm operation.
yes, I need to remove the pm_runtime_get_if_in_use call in the 'flush_all' func
I see there is also a patch for that in the mt8195 v3 series "[PATCH v3 13/33] iommu/mediatek: Remove the power status checking in tlb flush all"
So I'll send v2, adding all those 3 patches, but I think adding mtk_iommu_tlb_do_flush_all
on patch 9 and removing it again on patch 13 is confusing so I'll avoid that.
Thanks,
Dafna
>>
>> This looks has a dependence. Let me know if I can help this.
>
> It did work for me, testing on elm device. I'll check that again.
>
>
>>
>>> +
>>> /*
>>> * Uppon first resume, only enable the clk and return, since
>>> the values of the
>>> * registers are not yet set.
>
On 08.12.21 11:50, Dafna Hirschfeld wrote:
>
>
> On 07.12.21 10:31, Dafna Hirschfeld wrote:
>>
>>
>> On 27.11.21 04:46, Yong Wu wrote:
>>> Hi Dafna,
>>>
>>> Sorry for reply late.
>>>
>>> On Mon, 2021-11-22 at 12:43 +0200, Dafna Hirschfeld wrote:
>>>> From: Yong Wu <[email protected]>
>>>>
>>>> Prepare for 2 HWs that sharing pgtable in different power-domains.
>>>>
>>>> When there are 2 M4U HWs, it may has problem in the flush_range in
>>>> which
>>>> we get the pm_status via the m4u dev, BUT that function don't reflect
>>>> the
>>>> real power-domain status of the HW since there may be other HW also
>>>> use
>>>> that power-domain.
>>>>
>>>> The function dma_alloc_attrs help allocate the iommu buffer which
>>>> need the corresponding power domain since tlb flush is needed when
>>>> preparing iova. BUT this function only is for allocating buffer,
>>>> we have no good reason to request the user always call pm_runtime_get
>>>> before calling dma_alloc_xxx. Therefore, we add a tlb_flush_all
>>>> in the pm_runtime_resume to make sure the tlb always is clean.
>>>>
>>>> Another solution is always call pm_runtime_get in the
>>>> tlb_flush_range.
>>>> This will trigger pm runtime resume/backup so often when the iommu
>>>> power is not active at some time(means user don't call pm_runtime_get
>>>> before calling dma_alloc_xxx), This may cause the performance drop.
>>>> thus we don't use this.
>>>>
>>>> In other case, the iommu's power should always be active via device
>>>> link with smi.
>>>>
>>>> The previous SoC don't have PM except mt8192. the mt8192 IOMMU is
>>>> display's
>>>> power-domain which nearly always is enabled. thus no need fix tags
>>>> here.
>>>> Prepare for mt8195.
>>>
>>> In this patchset, this message should be not proper. I think you could
>>> add the comment why this patch is needed in mt8173.
>>>
>>>>
>>>> Signed-off-by: Yong Wu <[email protected]>
>>>> [imporvie inline doc]
>>>> Signed-off-by: Dafna Hirschfeld <[email protected]>
>>>> ---
>>>> drivers/iommu/mtk_iommu.c | 7 +++++++
>>>> 1 file changed, 7 insertions(+)
>>>>
>>>> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
>>>> index 25b834104790..28dc4b95b6d9 100644
>>>> --- a/drivers/iommu/mtk_iommu.c
>>>> +++ b/drivers/iommu/mtk_iommu.c
>>>> @@ -964,6 +964,13 @@ static int __maybe_unused
>>>> mtk_iommu_runtime_resume(struct device *dev)
>>>> return ret;
>>>> }
>>>> + /*
>>>> + * Users may allocate dma buffer before they call
>>>> pm_runtime_get,
>>>> + * in which case it will lack the necessary tlb flush.
>>>> + * Thus, make sure to update the tlb after each PM resume.
>>>> + */
>>>> + mtk_iommu_tlb_flush_all(data);
>>>
>>> This should not work. since current the *_tlb_flush_all call
>>> pm_runtime_get_if_in_use which will always return 0 when it called from
>>> this runtime_cb in my test. thus, It won't do the tlb_flush_all
>>> actually.
>
> He, indeed, my mistake, although the encoder works more or less fine even
> without the full flush so I didn't catch that.
>
>>>
>>> I guess this also depend on these two patches of mt8195 v3.
>>> [PATCH v3 09/33] iommu/mediatek: Remove for_each_m4u in tlb_sync_all
>>> [PATCH v3 10/33] iommu/mediatek: Add tlb_lock in tlb_flush_all
>
> I'll add those two
>
>>>
>>> like in [10/33], I added a mtk_iommu_tlb_do_flush_all which don't have
>>> the pm operation.
>
> yes, I need to remove the pm_runtime_get_if_in_use call in the 'flush_all' func
> I see there is also a patch for that in the mt8195 v3 series "[PATCH v3 13/33] iommu/mediatek: Remove the power status checking in tlb flush all"
>
> So I'll send v2, adding all those 3 patches, but I think adding mtk_iommu_tlb_do_flush_all
> on patch 9 and removing it again on patch 13 is confusing so I'll avoid that.
>
In addition, the call to mtk_iommu_tlb_flush_all from mtk_iommu_runtime_resume should move to the bottom of the function
after all values are updated
> Thanks,
> Dafna
>
>
>
>>>
>>> This looks has a dependence. Let me know if I can help this.
>>
>> It did work for me, testing on elm device. I'll check that again.
>>
>>
>>>
>>>> +
>>>> /*
>>>> * Uppon first resume, only enable the clk and return, since
>>>> the values of the
>>>> * registers are not yet set.
>>