2020-08-08 07:26:11

by Tiezhu Yang

[permalink] [raw]
Subject: [PATCH] gpu/drm: Remove TTM_PL_FLAG_WC of VRAM to fix writecombine issue for Loongson64

Loongson processors have a writecombine issue that maybe failed to
write back framebuffer used with ATI Radeon or AMD GPU at times,
after commit 8a08e50cee66 ("drm: Permit video-buffers writecombine
mapping for MIPS"), there exists some errors such as blurred screen
and lockup, and so on.

Remove the flag TTM_PL_FLAG_WC of VRAM to fix writecombine issue for
Loongson64 to work well with ATI Radeon or AMD GPU, and it has no any
influence on the other platforms.

[ 60.958721] radeon 0000:03:00.0: ring 0 stalled for more than 10079msec
[ 60.965315] radeon 0000:03:00.0: GPU lockup (current fence id 0x0000000000000112 last fence id 0x000000000000011d on ring 0)
[ 60.976525] radeon 0000:03:00.0: ring 3 stalled for more than 10086msec
[ 60.983156] radeon 0000:03:00.0: GPU lockup (current fence id 0x0000000000000374 last fence id 0x00000000000003a8 on ring 3)

Signed-off-by: Tiezhu Yang <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 7 +++++--
drivers/gpu/drm/radeon/radeon_object.c | 20 ++++++++++++++------
2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 5ac7b55..9f785f6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -136,8 +136,11 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain)

places[c].fpfn = 0;
places[c].lpfn = 0;
- places[c].flags = TTM_PL_FLAG_WC | TTM_PL_FLAG_UNCACHED |
- TTM_PL_FLAG_VRAM;
+ if (IS_ENABLED(CONFIG_MACH_LOONGSON64))
+ places[c].flags = TTM_PL_FLAG_UNCACHED | TTM_PL_FLAG_VRAM;
+ else
+ places[c].flags = TTM_PL_FLAG_WC | TTM_PL_FLAG_UNCACHED |
+ TTM_PL_FLAG_VRAM;

if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)
places[c].lpfn = visible_pfn;
diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
index f3dee01..c6cede6 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -112,15 +112,23 @@ void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain)
rbo->rdev->mc.visible_vram_size < rbo->rdev->mc.real_vram_size) {
rbo->placements[c].fpfn =
rbo->rdev->mc.visible_vram_size >> PAGE_SHIFT;
- rbo->placements[c++].flags = TTM_PL_FLAG_WC |
- TTM_PL_FLAG_UNCACHED |
- TTM_PL_FLAG_VRAM;
+ if (IS_ENABLED(CONFIG_MACH_LOONGSON64))
+ rbo->placements[c++].flags = TTM_PL_FLAG_UNCACHED |
+ TTM_PL_FLAG_VRAM;
+ else
+ rbo->placements[c++].flags = TTM_PL_FLAG_WC |
+ TTM_PL_FLAG_UNCACHED |
+ TTM_PL_FLAG_VRAM;
}

rbo->placements[c].fpfn = 0;
- rbo->placements[c++].flags = TTM_PL_FLAG_WC |
- TTM_PL_FLAG_UNCACHED |
- TTM_PL_FLAG_VRAM;
+ if (IS_ENABLED(CONFIG_MACH_LOONGSON64))
+ rbo->placements[c++].flags = TTM_PL_FLAG_UNCACHED |
+ TTM_PL_FLAG_VRAM;
+ else
+ rbo->placements[c++].flags = TTM_PL_FLAG_WC |
+ TTM_PL_FLAG_UNCACHED |
+ TTM_PL_FLAG_VRAM;
}

if (domain & RADEON_GEM_DOMAIN_GTT) {
--
2.1.0


2020-08-08 13:43:58

by Thomas Bogendoerfer

[permalink] [raw]
Subject: Re: [PATCH] gpu/drm: Remove TTM_PL_FLAG_WC of VRAM to fix writecombine issue for Loongson64

On Sat, Aug 08, 2020 at 03:25:02PM +0800, Tiezhu Yang wrote:
> Loongson processors have a writecombine issue that maybe failed to
> write back framebuffer used with ATI Radeon or AMD GPU at times,
> after commit 8a08e50cee66 ("drm: Permit video-buffers writecombine
> mapping for MIPS"), there exists some errors such as blurred screen
> and lockup, and so on.
>
> Remove the flag TTM_PL_FLAG_WC of VRAM to fix writecombine issue for
> Loongson64 to work well with ATI Radeon or AMD GPU, and it has no any
> influence on the other platforms.

well it's not my call to take or reject this patch, but I already
indicated it might be better to disable writecombine on the CPU
detection side (or do you have other devices where writecombining
works ?). Something like below will disbale it for all loongson64 CPUs.
If you now find out where it works and where it doesn't, you can even
reduce it to the required minium of affected CPUs.

Thomas.


diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
index def1659fe262..cdd87009e931 100644
--- a/arch/mips/kernel/cpu-probe.c
+++ b/arch/mips/kernel/cpu-probe.c
@@ -2043,7 +2043,6 @@ static inline void cpu_probe_loongson(struct cpuinfo_mips *c, unsigned int cpu)
set_isa(c, MIPS_CPU_ISA_M64R2);
break;
}
- c->writecombine = _CACHE_UNCACHED_ACCELERATED;
c->ases |= (MIPS_ASE_LOONGSON_MMI | MIPS_ASE_LOONGSON_EXT |
MIPS_ASE_LOONGSON_EXT2);
break;
@@ -2073,7 +2072,6 @@ static inline void cpu_probe_loongson(struct cpuinfo_mips *c, unsigned int cpu)
* register, we correct it here.
*/
c->options |= MIPS_CPU_FTLB | MIPS_CPU_TLBINV | MIPS_CPU_LDPTE;
- c->writecombine = _CACHE_UNCACHED_ACCELERATED;
c->ases |= (MIPS_ASE_LOONGSON_MMI | MIPS_ASE_LOONGSON_CAM |
MIPS_ASE_LOONGSON_EXT | MIPS_ASE_LOONGSON_EXT2);
c->ases &= ~MIPS_ASE_VZ; /* VZ of Loongson-3A2000/3000 is incomplete */
@@ -2084,7 +2082,6 @@ static inline void cpu_probe_loongson(struct cpuinfo_mips *c, unsigned int cpu)
set_elf_platform(cpu, "loongson3a");
set_isa(c, MIPS_CPU_ISA_M64R2);
decode_cpucfg(c);
- c->writecombine = _CACHE_UNCACHED_ACCELERATED;
break;
default:
panic("Unknown Loongson Processor ID!");

--
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea. [ RFC1925, 2.3 ]

2020-08-08 13:54:19

by Jiaxun Yang

[permalink] [raw]
Subject: Re: [PATCH] gpu/drm: Remove TTM_PL_FLAG_WC of VRAM to fix writecombine issue for Loongson64



?? 2020/8/8 ????9:41, Thomas Bogendoerfer д??:
> On Sat, Aug 08, 2020 at 03:25:02PM +0800, Tiezhu Yang wrote:
>> Loongson processors have a writecombine issue that maybe failed to
>> write back framebuffer used with ATI Radeon or AMD GPU at times,
>> after commit 8a08e50cee66 ("drm: Permit video-buffers writecombine
>> mapping for MIPS"), there exists some errors such as blurred screen
>> and lockup, and so on.
>>
>> Remove the flag TTM_PL_FLAG_WC of VRAM to fix writecombine issue for
>> Loongson64 to work well with ATI Radeon or AMD GPU, and it has no any
>> influence on the other platforms.
> well it's not my call to take or reject this patch, but I already
> indicated it might be better to disable writecombine on the CPU
> detection side (or do you have other devices where writecombining
> works ?). Something like below will disbale it for all loongson64 CPUs.
> If you now find out where it works and where it doesn't, you can even
> reduce it to the required minium of affected CPUs.
Hi Tiezhu, Thomas,

Yes, writecombine works well on LS7A's internal GPU....
And even works well with some AMD GPUs (in my case, RX550).

Tiezhu, is it possible to investigate the issue deeper in Loongson?
Probably we just need to add some barrier to maintain the data coherency,
or disable writecombine for AMD GPU's command buffer and leave texture/frame
buffer wc accelerated.

Thanks.

- Jiaxun

2020-08-09 12:16:43

by Christian König

[permalink] [raw]
Subject: Re: [PATCH] gpu/drm: Remove TTM_PL_FLAG_WC of VRAM to fix writecombine issue for Loongson64

Am 08.08.20 um 15:50 schrieb Jiaxun Yang:
>
>
> ?? 2020/8/8 ????9:41, Thomas Bogendoerfer д??:
>> On Sat, Aug 08, 2020 at 03:25:02PM +0800, Tiezhu Yang wrote:
>>> Loongson processors have a writecombine issue that maybe failed to
>>> write back framebuffer used with ATI Radeon or AMD GPU at times,
>>> after commit 8a08e50cee66 ("drm: Permit video-buffers writecombine
>>> mapping for MIPS"), there exists some errors such as blurred screen
>>> and lockup, and so on.
>>>
>>> Remove the flag TTM_PL_FLAG_WC of VRAM to fix writecombine issue for
>>> Loongson64 to work well with ATI Radeon or AMD GPU, and it has no any
>>> influence on the other platforms.
>> well it's not my call to take or reject this patch, but I already
>> indicated it might be better to disable writecombine on the CPU
>> detection side (or do you have other devices where writecombining
>> works ?). Something like below will disbale it for all loongson64 CPUs.
>> If you now find out where it works and where it doesn't, you can even
>> reduce it to the required minium of affected CPUs.
> Hi Tiezhu, Thomas,
>
> Yes, writecombine works well on LS7A's internal GPU....
> And even works well with some AMD GPUs (in my case, RX550).

In this case the patch is a clear NAK since you haven't root caused the
issue and are just working around it in a very questionable manner.

>
> Tiezhu, is it possible to investigate the issue deeper in Loongson?
> Probably we just need to add some barrier to maintain the data coherency,
> or disable writecombine for AMD GPU's command buffer and leave
> texture/frame
> buffer wc accelerated.

Have you moved any buffer to VRAM and forgot to add an HDP flush/invalidate?

The acceleration is not much of a problem, but if WC doesn't work in
general you need to disable it for the whole CPU and not for individual
drivers.

Regards,
Christian.

>
> Thanks.
>
> - Jiaxun

2020-08-10 01:00:22

by Tiezhu Yang

[permalink] [raw]
Subject: Re: [PATCH] gpu/drm: Remove TTM_PL_FLAG_WC of VRAM to fix writecombine issue for Loongson64

On 08/09/2020 08:13 PM, Christian König wrote:
> Am 08.08.20 um 15:50 schrieb Jiaxun Yang:
>>
>>
>> 在 2020/8/8 下午9:41, Thomas Bogendoerfer 写道:
>>> On Sat, Aug 08, 2020 at 03:25:02PM +0800, Tiezhu Yang wrote:
>>>> Loongson processors have a writecombine issue that maybe failed to
>>>> write back framebuffer used with ATI Radeon or AMD GPU at times,
>>>> after commit 8a08e50cee66 ("drm: Permit video-buffers writecombine
>>>> mapping for MIPS"), there exists some errors such as blurred screen
>>>> and lockup, and so on.
>>>>
>>>> Remove the flag TTM_PL_FLAG_WC of VRAM to fix writecombine issue for
>>>> Loongson64 to work well with ATI Radeon or AMD GPU, and it has no any
>>>> influence on the other platforms.
>>> well it's not my call to take or reject this patch, but I already
>>> indicated it might be better to disable writecombine on the CPU
>>> detection side (or do you have other devices where writecombining
>>> works ?). Something like below will disbale it for all loongson64 CPUs.
>>> If you now find out where it works and where it doesn't, you can even
>>> reduce it to the required minium of affected CPUs.
>> Hi Tiezhu, Thomas,
>>
>> Yes, writecombine works well on LS7A's internal GPU....
>> And even works well with some AMD GPUs (in my case, RX550).
>
> In this case the patch is a clear NAK since you haven't root caused
> the issue and are just working around it in a very questionable manner.
>
>>
>> Tiezhu, is it possible to investigate the issue deeper in Loongson?
>> Probably we just need to add some barrier to maintain the data
>> coherency,
>> or disable writecombine for AMD GPU's command buffer and leave
>> texture/frame
>> buffer wc accelerated.
>
> Have you moved any buffer to VRAM and forgot to add an HDP
> flush/invalidate?
>
> The acceleration is not much of a problem, but if WC doesn't work in
> general you need to disable it for the whole CPU and not for
> individual drivers.

Hi Thomas, Jiaxun and Christian,

Thank you very much for your suggestions.

Actually, this patch is a temporary solution to just make it work well,
it is not a proper and final solution.

I understand your opinions, it will take some time to find the root cause.

Thanks,
Tiezhu

>
> Regards,
> Christian.
>
>>
>> Thanks.
>>
>> - Jiaxun

2020-08-10 11:00:04

by Michel Dänzer

[permalink] [raw]
Subject: Re: [PATCH] gpu/drm: Remove TTM_PL_FLAG_WC of VRAM to fix writecombine issue for Loongson64

On 2020-08-09 2:13 p.m., Christian König wrote:
> Am 08.08.20 um 15:50 schrieb Jiaxun Yang:
>> 在 2020/8/8 下午9:41, Thomas Bogendoerfer 写道:
>>> On Sat, Aug 08, 2020 at 03:25:02PM +0800, Tiezhu Yang wrote:
>>>> Loongson processors have a writecombine issue that maybe failed to
>>>> write back framebuffer used with ATI Radeon or AMD GPU at times,
>>>> after commit 8a08e50cee66 ("drm: Permit video-buffers writecombine
>>>> mapping for MIPS"), there exists some errors such as blurred screen
>>>> and lockup, and so on.
>>>>
>>>> Remove the flag TTM_PL_FLAG_WC of VRAM to fix writecombine issue for
>>>> Loongson64 to work well with ATI Radeon or AMD GPU, and it has no any
>>>> influence on the other platforms.
>>> well it's not my call to take or reject this patch, but I already
>>> indicated it might be better to disable writecombine on the CPU
>>> detection side (or do you have other devices where writecombining
>>> works ?). Something like below will disbale it for all loongson64 CPUs.
>>> If you now find out where it works and where it doesn't, you can even
>>> reduce it to the required minium of affected CPUs.
>> Hi Tiezhu, Thomas,
>>
>> Yes, writecombine works well on LS7A's internal GPU....
>> And even works well with some AMD GPUs (in my case, RX550).
>
> In this case the patch is a clear NAK since you haven't root caused the
> issue and are just working around it in a very questionable manner.

To be fair though, amdgpu & radeon are already disabling write-combining
for system memory pages in 32-bit x86 kernels for similar reasons.


--
Earthling Michel Dänzer | https://redhat.com
Libre software enthusiast | Mesa and X developer

2020-08-10 11:24:06

by Christian König

[permalink] [raw]
Subject: Re: [PATCH] gpu/drm: Remove TTM_PL_FLAG_WC of VRAM to fix writecombine issue for Loongson64

Am 10.08.20 um 12:50 schrieb Michel Dänzer:
> On 2020-08-09 2:13 p.m., Christian König wrote:
>> Am 08.08.20 um 15:50 schrieb Jiaxun Yang:
>>> 在 2020/8/8 下午9:41, Thomas Bogendoerfer 写道:
>>>> On Sat, Aug 08, 2020 at 03:25:02PM +0800, Tiezhu Yang wrote:
>>>>> Loongson processors have a writecombine issue that maybe failed to
>>>>> write back framebuffer used with ATI Radeon or AMD GPU at times,
>>>>> after commit 8a08e50cee66 ("drm: Permit video-buffers writecombine
>>>>> mapping for MIPS"), there exists some errors such as blurred screen
>>>>> and lockup, and so on.
>>>>>
>>>>> Remove the flag TTM_PL_FLAG_WC of VRAM to fix writecombine issue for
>>>>> Loongson64 to work well with ATI Radeon or AMD GPU, and it has no any
>>>>> influence on the other platforms.
>>>> well it's not my call to take or reject this patch, but I already
>>>> indicated it might be better to disable writecombine on the CPU
>>>> detection side (or do you have other devices where writecombining
>>>> works ?). Something like below will disbale it for all loongson64 CPUs.
>>>> If you now find out where it works and where it doesn't, you can even
>>>> reduce it to the required minium of affected CPUs.
>>> Hi Tiezhu, Thomas,
>>>
>>> Yes, writecombine works well on LS7A's internal GPU....
>>> And even works well with some AMD GPUs (in my case, RX550).
>> In this case the patch is a clear NAK since you haven't root caused the
>> issue and are just working around it in a very questionable manner.
> To be fair though, amdgpu & radeon are already disabling write-combining
> for system memory pages in 32-bit x86 kernels for similar reasons.

Yeah, well that is USWC for system memory. But this is about WC for the
VRAM BAR.

When we don't understand or don't correctly implement something on the
platform for USWC then this is annoying, but not a serious issue.

But when the hardware doesn't correctly implement WC for PCIe BARs, then
this is a violation of the PCIe spec and a bit more serious issue for
the whole platform.

We can work around that by disabling WC for PCIe BARs on the whole
platform, or behind specific bridges or or or, but patching each
individual driver so that they work is not really the right approach.

Cheers,
Christian.