2018-06-06 14:19 GMT+02:00 Christian König <[email protected]>:
> Am 06.06.2018 um 14:08 schrieb Gabriel C:
>>
>> 2018-06-06 13:33 GMT+02:00 Christian König <[email protected]>:
>>>
>>> Am 06.06.2018 um 13:28 schrieb Gabriel C:
>>>>
>>>> 2018-04-11 7:02 GMT+02:00 Gabriel C <[email protected]>:
>>>>>>
>>>>>> 2018-04-11 6:00 GMT+02:00 Gabriel C <[email protected]>:
>>>>>> 2018-04-09 11:42 GMT+02:00 Christian König
>>>>>> <[email protected]>:
>>>>>>>
>>>>>>> Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:
>>>>>
>>>>> ...
>>>>>>
>>>>>> I can help testing code for 4.17/++ if you wish but that is
>>>>>> *different*
>>>>>> storry.
>>>>>>
>>>>> Quick tested an 4.16.0-11490-gb284d4d5a678 , amdgpu and radeon driver
>>>>> are broken now in this one.
>>>>>
>>>>> radeon tells:
>>>>>
>>>>> ...
>>>>>
>>>>> [ 6.337838] [drm] PCIE GART of 2048M enabled (table at
>>>>> 0x00000000001D6000).
>>>>> [ 6.338210] radeon 0000:21:00.0: (-12) create WB bo failed
>>>>> [ 6.338214] radeon 0000:21:00.0: disabling GPU acceleration
>>>>>
>>>>> ...
>>>>>
>>>> I have the same Issue now on final 4.17.
>>>
>>>
>>> Actually Michel came up with a fix for the performance regression which
>>> is
>>> now backported to older kernels as well.
>>>
>>> So the original issue of this mail thread should be fixed by now.
>>
>> Ok , will test as soon I get the GPU to work :))
>>
>>>> Also I played with BIOS options also which does not fix anything but
>>>> changes the error message.
>>>>
>>>> IOMMU && SR-IOV disabled the error changes to this :
>>>>
>>>> [ 7.092044] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0
>>>> test failed (scratch(0x850C)=0xCAFEDEAD)
>>>> [ 7.092059] radeon 0000:21:00.0: disabling GPU acceleration
>>>>
>>>>
>>>> While I could workaround SWIOTLB bugs in 4.15 and 4.16 , 4.17 seems to
>>>> kill the GPU with no way
>>>> for me to make it work ( at least I could not find any workaround by now
>>>> )
>>>
>>>
>>> That actually sounds like something completely different. Can you provide
>>> a
>>> full dmesg of radeon and/or amdgpu?
>>
>> Sure here from boot with IOMMU/SR-IOV ON/OFF in BIOS :
>>
>>
>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt
>>
>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt
>>
>> Also nothing else changed in that setup just testing kernel 4.17.
>
>
> That has nothing TODO with the driver nor the original bug you reported. The
> problem is that SME is active and that is currently not supported at all
> with a that hardware.
Ok .. so are we playing now kernel an AMD Hardware roulette on each release ?
SME was like this in kernel 4.16.x here and all worked.
Also if you don't support SME at all now on that Hardware while worked before
please add proper error handling and proper dmesg messages
letting the user know.
radeon: xxxx : SME not supported on that Hardware anymore , please
disable SME...
radeon: xxxx: Update your GPU < or whatever >
How hard would be that ?
No one but developers , can guess from these error messges why his
hardware suddenly isn't working anymore by just updating the kernel.
>
> Try to disable SME either in the BIOS or on the kernel command line.
Yes that works but is not the point.
Really you just can't break users setups like this.
On 2018-06-06 03:33 PM, Gabriel C wrote:
> 2018-06-06 14:19 GMT+02:00 Christian König <[email protected]>:
>> Am 06.06.2018 um 14:08 schrieb Gabriel C:
>>> 2018-06-06 13:33 GMT+02:00 Christian König <[email protected]>:
>>>> Am 06.06.2018 um 13:28 schrieb Gabriel C:
>>>>> 2018-04-11 7:02 GMT+02:00 Gabriel C <[email protected]>:
>>>>>>
>>>>>> [ 6.337838] [drm] PCIE GART of 2048M enabled (table at
>>>>>> 0x00000000001D6000).
>>>>>> [ 6.338210] radeon 0000:21:00.0: (-12) create WB bo failed
>>>>>> [ 6.338214] radeon 0000:21:00.0: disabling GPU acceleration
>>>>>>
>>>>>> ...
>>>>>>
>>>>> I have the same Issue now on final 4.17.
Please file a bug report, and ideally bisect which commit(s) introduced
the issue(s).
>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt
>>>
>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt
>>>
>>> Also nothing else changed in that setup just testing kernel 4.17.
>>
>>
>> That has nothing TODO with the driver nor the original bug you reported. The
>> problem is that SME is active and that is currently not supported at all
>> with a that hardware.
>
> Ok .. so are we playing now kernel an AMD Hardware roulette on each release ?
>
> SME was like this in kernel 4.16.x here and all worked.
If that is true, again please bisect which commit broke it.
All the reports I've seen before this indicated that at least amdgpu has
never worked with SME (which BTW doesn't mean it's never going to work
or that we don't want to support it, just that as far as we know it's
currently not working).
--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer
Am 06.06.2018 um 15:33 schrieb Gabriel C:
> 2018-06-06 14:19 GMT+02:00 Christian König <[email protected]>:
>> Am 06.06.2018 um 14:08 schrieb Gabriel C:
>>> [SNIP]
>>>
>>> That has nothing TODO with the driver nor the original bug you reported. The
>>> problem is that SME is active and that is currently not supported at all
>>> with a that hardware.
> Ok .. so are we playing now kernel an AMD Hardware roulette on each release ?
>
> SME was like this in kernel 4.16.x here and all worked.
>
> Also if you don't support SME at all now on that Hardware while worked before
> please add proper error handling and proper dmesg messages
> letting the user know.
>
> radeon: xxxx : SME not supported on that Hardware anymore , please
> disable SME...
> radeon: xxxx: Update your GPU < or whatever >
>
> How hard would be that ?
Yes, to be precise that isn't the job of the GFX driver to care about
such things.
It is a well known and documented limitation of SME that it is in
general mostly incompatible with GFX (or compute) hardware, and it
actually doesn't matter which hardware or driver you use.
In other words what happens is that as soon as you use GFX (or compute)
SME gets disabled transparently.
The problem is that this happens only on the DMA slow path we just
disabled because of the performance problems.
Going to propose to revert that or at least only use it when SME is
disabled.
Regards,
Christian.
Am 06.06.2018 um 16:12 schrieb Michel Dänzer:
> On 2018-06-06 03:33 PM, Gabriel C wrote:
>> 2018-06-06 14:19 GMT+02:00 Christian König <[email protected]>:
>>> Am 06.06.2018 um 14:08 schrieb Gabriel C:
>>>> 2018-06-06 13:33 GMT+02:00 Christian König <[email protected]>:
>>>>> Am 06.06.2018 um 13:28 schrieb Gabriel C:
>>>>>> 2018-04-11 7:02 GMT+02:00 Gabriel C <[email protected]>:
>>>>>>>
>>>>>>> [ 6.337838] [drm] PCIE GART of 2048M enabled (table at
>>>>>>> 0x00000000001D6000).
>>>>>>> [ 6.338210] radeon 0000:21:00.0: (-12) create WB bo failed
>>>>>>> [ 6.338214] radeon 0000:21:00.0: disabling GPU acceleration
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>> I have the same Issue now on final 4.17.
>
> Please file a bug report, and ideally bisect which commit(s)
> introduced the issue(s).
>
>
>>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt
>>>>
>>>>
>>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt
>>>>
>>>>
>>>> Also nothing else changed in that setup just testing kernel 4.17.
>>>
>>>
>>> That has nothing TODO with the driver nor the original bug you
>>> reported. The
>>> problem is that SME is active and that is currently not supported at
>>> all
>>> with a that hardware.
>>
>> Ok .. so are we playing now kernel an AMD Hardware roulette on each
>> release ?
>>
>> SME was like this in kernel 4.16.x here and all worked.
>
> If that is true, again please bisect which commit broke it.
>
> All the reports I've seen before this indicated that at least amdgpu
> has never worked with SME (which BTW doesn't mean it's never going to
> work or that we don't want to support it, just that as far as we know
> it's currently not working).
At least in theory it should work when we use the coherent DMA allocator.
When that really worked before, so the most likely commit which broke
this is:
commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f
Author: Chunming Zhou <[email protected]>
Date: Fri Feb 9 10:44:09 2018 +0800
drm/amdgpu: only enable swiotlb alloc when need v2
get the max io mapping address of system memory to see if it is over
our card accessing range.
v2: move checking later
Signed-off-by: Chunming Zhou <[email protected]>
Reviewed-by: Monk Liu <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Currently looking into how we could somehow improve this detection.
Regards,
Christian.
On 2018-06-06 04:44 PM, Christian König wrote:
> Am 06.06.2018 um 16:12 schrieb Michel Dänzer:
>> On 2018-06-06 03:33 PM, Gabriel C wrote:
>>> 2018-06-06 14:19 GMT+02:00 Christian König <[email protected]>:
>>>> Am 06.06.2018 um 14:08 schrieb Gabriel C:
>>>>> 2018-06-06 13:33 GMT+02:00 Christian König <[email protected]>:
>>>>>> Am 06.06.2018 um 13:28 schrieb Gabriel C:
>>
>>>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt
>>>>>
>>>>>
>>>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt
>>>>>
>>>>>
>>>>> Also nothing else changed in that setup just testing kernel 4.17.
>>>>
>>>>
>>>> That has nothing TODO with the driver nor the original bug you
>>>> reported. The
>>>> problem is that SME is active and that is currently not supported at
>>>> all
>>>> with a that hardware.
>>>
>>> Ok .. so are we playing now kernel an AMD Hardware roulette on each
>>> release ?
>>>
>>> SME was like this in kernel 4.16.x here and all worked.
>>
>> If that is true, again please bisect which commit broke it.
>>
>> All the reports I've seen before this indicated that at least amdgpu
>> has never worked with SME (which BTW doesn't mean it's never going to
>> work or that we don't want to support it, just that as far as we know
>> it's currently not working).
>
> At least in theory it should work when we use the coherent DMA allocator.
>
> When that really worked before, so the most likely commit which broke
> this is:
>
> commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f
> Author: Chunming Zhou <[email protected]>
> Date: Fri Feb 9 10:44:09 2018 +0800
>
> drm/amdgpu: only enable swiotlb alloc when need v2
>
> get the max io mapping address of system memory to see if it is over
> our card accessing range.
> v2: move checking later
>
> Signed-off-by: Chunming Zhou <[email protected]>
> Reviewed-by: Monk Liu <[email protected]>
> Reviewed-by: Christian König <[email protected]>
> Signed-off-by: Alex Deucher <[email protected]>
>
> Currently looking into how we could somehow improve this detection.
I guess this could fit for Gabriel, but e.g.
https://bugs.freedesktop.org/104437 says amdgpu was already broken with
SME in 4.15, if not 4.14 (I suspect there was simply no SME support
earlier).
--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer
2018-06-06 17:03 GMT+02:00 Michel Dänzer <[email protected]>:
> On 2018-06-06 04:44 PM, Christian König wrote:
>> Am 06.06.2018 um 16:12 schrieb Michel Dänzer:
>>> On 2018-06-06 03:33 PM, Gabriel C wrote:
>>>> 2018-06-06 14:19 GMT+02:00 Christian König <[email protected]>:
>>>>> Am 06.06.2018 um 14:08 schrieb Gabriel C:
>>>>>> 2018-06-06 13:33 GMT+02:00 Christian König <[email protected]>:
>>>>>>> Am 06.06.2018 um 13:28 schrieb Gabriel C:
>>>
>>>>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt
>>>>>>
>>>>>>
>>>>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt
>>>>>>
>>>>>>
>>>>>> Also nothing else changed in that setup just testing kernel 4.17.
>>>>>
>>>>>
>>>>> That has nothing TODO with the driver nor the original bug you
>>>>> reported. The
>>>>> problem is that SME is active and that is currently not supported at
>>>>> all
>>>>> with a that hardware.
>>>>
>>>> Ok .. so are we playing now kernel an AMD Hardware roulette on each
>>>> release ?
>>>>
>>>> SME was like this in kernel 4.16.x here and all worked.
>>>
>>> If that is true, again please bisect which commit broke it.
>>>
>>> All the reports I've seen before this indicated that at least amdgpu
>>> has never worked with SME (which BTW doesn't mean it's never going to
>>> work or that we don't want to support it, just that as far as we know
>>> it's currently not working).
>>
>> At least in theory it should work when we use the coherent DMA allocator.
>>
>> When that really worked before, so the most likely commit which broke
>> this is:
>>
>> commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f
>> Author: Chunming Zhou <[email protected]>
>> Date: Fri Feb 9 10:44:09 2018 +0800
>>
>> drm/amdgpu: only enable swiotlb alloc when need v2
>>
>> get the max io mapping address of system memory to see if it is over
>> our card accessing range.
>> v2: move checking later
>>
>> Signed-off-by: Chunming Zhou <[email protected]>
>> Reviewed-by: Monk Liu <[email protected]>
>> Reviewed-by: Christian König <[email protected]>
>> Signed-off-by: Alex Deucher <[email protected]>
>>
>> Currently looking into how we could somehow improve this detection.
>
> I guess this could fit for Gabriel, but e.g.
> https://bugs.freedesktop.org/104437 says amdgpu was already broken with
> SME in 4.15, if not 4.14 (I suspect there was simply no SME support
> earlier).
I got strange performance issue with 4.15 and 4.16 .. but SME was ON
on that setup ( even before it hit mainline ) and never broke the GPU like this.
There is a 4.16.13 boot dmesg which has no such issue:
http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-radeon-SME-ON-kernel-4.16.txt
With the setup as is booting 4.16.x works , while 4.17 trows the errors.
>
>
> --
> Earthling Michel Dänzer | http://www.amd.com
> Libre software enthusiast | Mesa and X developer
2018-06-06 16:44 GMT+02:00 Christian König <[email protected]>:
> Am 06.06.2018 um 16:12 schrieb Michel Dänzer:
>>
>> On 2018-06-06 03:33 PM, Gabriel C wrote:
>>>
>>> 2018-06-06 14:19 GMT+02:00 Christian König <[email protected]>:
>>>>
>>>> Am 06.06.2018 um 14:08 schrieb Gabriel C:
>>>>>
>>>>> 2018-06-06 13:33 GMT+02:00 Christian König <[email protected]>:
>>>>>>
>>>>>> Am 06.06.2018 um 13:28 schrieb Gabriel C:
>>>>>>>
>>>>>>> 2018-04-11 7:02 GMT+02:00 Gabriel C <[email protected]>:
>>>>>>>>
>>>>>>>>
>>>>>>>> [ 6.337838] [drm] PCIE GART of 2048M enabled (table at
>>>>>>>> 0x00000000001D6000).
>>>>>>>> [ 6.338210] radeon 0000:21:00.0: (-12) create WB bo failed
>>>>>>>> [ 6.338214] radeon 0000:21:00.0: disabling GPU acceleration
>>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>> I have the same Issue now on final 4.17.
>>
>>
>> Please file a bug report, and ideally bisect which commit(s) introduced
>> the issue(s).
>>
>>
>>>>>
>>>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt
>>>>>
>>>>>
>>>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt
>>>>>
>>>>> Also nothing else changed in that setup just testing kernel 4.17.
>>>>
>>>>
>>>>
>>>> That has nothing TODO with the driver nor the original bug you reported.
>>>> The
>>>> problem is that SME is active and that is currently not supported at all
>>>> with a that hardware.
>>>
>>>
>>> Ok .. so are we playing now kernel an AMD Hardware roulette on each
>>> release ?
>>>
>>> SME was like this in kernel 4.16.x here and all worked.
>>
>>
>> If that is true, again please bisect which commit broke it.
>>
>> All the reports I've seen before this indicated that at least amdgpu has
>> never worked with SME (which BTW doesn't mean it's never going to work or
>> that we don't want to support it, just that as far as we know it's currently
>> not working).
>
>
> At least in theory it should work when we use the coherent DMA allocator.
>
> When that really worked before, so the most likely commit which broke this
> is:
>
> commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f
> Author: Chunming Zhou <[email protected]>
> Date: Fri Feb 9 10:44:09 2018 +0800
>
> drm/amdgpu: only enable swiotlb alloc when need v2
>
> get the max io mapping address of system memory to see if it is over
> our card accessing range.
> v2: move checking later
>
> Signed-off-by: Chunming Zhou <[email protected]>
> Reviewed-by: Monk Liu <[email protected]>
> Reviewed-by: Christian König <[email protected]>
> Signed-off-by: Alex Deucher <[email protected]>
>
> Currently looking into how we could somehow improve this detection.
Is not this one , I've build an kernel with this reverted.
I'll do an bisect tonight or tomorrow.
>
> Regards,
> Christian.
Am 06.06.2018 um 17:44 schrieb Gabriel C:
> 2018-06-06 17:03 GMT+02:00 Michel Dänzer <[email protected]>:
>> On 2018-06-06 04:44 PM, Christian König wrote:
>>> Am 06.06.2018 um 16:12 schrieb Michel Dänzer:
>>> [SNIP]
>>> At least in theory it should work when we use the coherent DMA allocator.
>>>
>>> When that really worked before, so the most likely commit which broke
>>> this is:
>>>
>>> commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f
>>> Author: Chunming Zhou <[email protected]>
>>> Date: Fri Feb 9 10:44:09 2018 +0800
>>>
>>> drm/amdgpu: only enable swiotlb alloc when need v2
>>>
>>> get the max io mapping address of system memory to see if it is over
>>> our card accessing range.
>>> v2: move checking later
>>>
>>> Signed-off-by: Chunming Zhou <[email protected]>
>>> Reviewed-by: Monk Liu <[email protected]>
>>> Reviewed-by: Christian König <[email protected]>
>>> Signed-off-by: Alex Deucher <[email protected]>
>>>
>>> Currently looking into how we could somehow improve this detection.
>> I guess this could fit for Gabriel, but e.g.
>> https://bugs.freedesktop.org/104437 says amdgpu was already broken with
>> SME in 4.15, if not 4.14 (I suspect there was simply no SME support
>> earlier).
And what I totally missed is that Gabriel is using radeon and not amdgpu.
So Gabriel you need to revert this one for testing:
commit 1bc3d3cce8c3b44c2b5ac6cee98c830bb40e6b0f
Author: Chunming Zhou <[email protected]>
Date: Fri Feb 9 10:44:10 2018 +0800
drm/radeon: only enable swiotlb path when need v2
swiotlb expands our card accessing range, but its path always is slower
than ttm pool allocation.
So add condition to use it.
v2: move a bit later
Signed-off-by: Chunming Zhou <[email protected]>
Reviewed-by: Monk Liu <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Link:
https://patchwork.freedesktop.org/patch/msgid/[email protected]
> I got strange performance issue with 4.15 and 4.16 .. but SME was ON
> on that setup ( even before it hit mainline ) and never broke the GPU like this.
Well that is very interesting, you are the first one who reports that
SME + GFX works in some way. So far we only got negative reports for that.
> There is a 4.16.13 boot dmesg which has no such issue:
>
> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-radeon-SME-ON-kernel-4.16.txt
>
> With the setup as is booting 4.16.x works , while 4.17 trows the errors.
Please do the bisect if the patch I've mentioned above doesn't help.
Thanks,
Christian.
>
>>
>> --
>> Earthling Michel Dänzer | http://www.amd.com
>> Libre software enthusiast | Mesa and X developer
2018-06-07 9:07 GMT+02:00 Christian König <[email protected]>:
> Am 06.06.2018 um 17:44 schrieb Gabriel C:
>>
>> 2018-06-06 17:03 GMT+02:00 Michel Dänzer <[email protected]>:
>>>
>>> On 2018-06-06 04:44 PM, Christian König wrote:
>>>>
>>>> Am 06.06.2018 um 16:12 schrieb Michel Dänzer:
>>>> [SNIP]
>>>> At least in theory it should work when we use the coherent DMA
>>>> allocator.
>>>>
>>>> When that really worked before, so the most likely commit which broke
>>>> this is:
>>>>
>>>> commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f
>>>> Author: Chunming Zhou <[email protected]>
>>>> Date: Fri Feb 9 10:44:09 2018 +0800
>>>>
>>>> drm/amdgpu: only enable swiotlb alloc when need v2
>>>>
>>>> get the max io mapping address of system memory to see if it is
>>>> over
>>>> our card accessing range.
>>>> v2: move checking later
>>>>
>>>> Signed-off-by: Chunming Zhou <[email protected]>
>>>> Reviewed-by: Monk Liu <[email protected]>
>>>> Reviewed-by: Christian König <[email protected]>
>>>> Signed-off-by: Alex Deucher <[email protected]>
>>>>
>>>> Currently looking into how we could somehow improve this detection.
>>>
>>> I guess this could fit for Gabriel, but e.g.
>>> https://bugs.freedesktop.org/104437 says amdgpu was already broken with
>>> SME in 4.15, if not 4.14 (I suspect there was simply no SME support
>>> earlier).
>
>
> And what I totally missed is that Gabriel is using radeon and not amdgpu.
>
> So Gabriel you need to revert this one for testing:
> commit 1bc3d3cce8c3b44c2b5ac6cee98c830bb40e6b0f
> Author: Chunming Zhou <[email protected]>
> Date: Fri Feb 9 10:44:10 2018 +0800
>
> drm/radeon: only enable swiotlb path when need v2
>
> swiotlb expands our card accessing range, but its path always is slower
> than ttm pool allocation.
> So add condition to use it.
> v2: move a bit later
>
> Signed-off-by: Chunming Zhou <[email protected]>
> Reviewed-by: Monk Liu <[email protected]>
> Reviewed-by: Christian König <[email protected]>
> Signed-off-by: Alex Deucher <[email protected]>
> Link:
> https://patchwork.freedesktop.org/patch/msgid/[email protected]
>
>> I got strange performance issue with 4.15 and 4.16 .. but SME was ON
>> on that setup ( even before it hit mainline ) and never broke the GPU like
>> this.
>
>
> Well that is very interesting, you are the first one who reports that SME +
> GFX works in some way. So far we only got negative reports for that.
>
>> There is a 4.16.13 boot dmesg which has no such issue:
>>
>>
>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-radeon-SME-ON-kernel-4.16.txt
>>
>> With the setup as is booting 4.16.x works , while 4.17 trows the errors.
>
>
> Please do the bisect if the patch I've mentioned above doesn't help.
Ok done.. bisect points to:
b468620f2a1dfdcfddfd6fa54367b8bcc1b51248 is the first bad commit
commit b468620f2a1dfdcfddfd6fa54367b8bcc1b51248
Author: Christoph Hellwig <[email protected]>
Date: Mon Mar 19 11:38:19 2018 +0100
iommu/amd_iommu: Use CONFIG_DMA_DIRECT_OPS=y and dma_direct_{alloc,free}()
This cleans up the code a lot by removing duplicate logic.
Tested-by: Tom Lendacky <[email protected]>
Tested-by: Joerg Roedel <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Acked-by: Joerg Roedel <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Jon Mason <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Muli Ben-Yehuda <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
I'll try to revert this once I'm home.
BR
>> Well that is very interesting, you are the first one who reports that SME +
>> GFX works in some way. So far we only got negative reports for that.
>>
>>> There is a 4.16.13 boot dmesg which has no such issue:
>>>
>>>
>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-radeon-SME-ON-kernel-4.16.txt
>>>
>>> With the setup as is booting 4.16.x works , while 4.17 trows the errors.
>>
>>
>> Please do the bisect if the patch I've mentioned above doesn't help.
>
> Ok done.. bisect points to:
>
> b468620f2a1dfdcfddfd6fa54367b8bcc1b51248 is the first bad commit
> commit b468620f2a1dfdcfddfd6fa54367b8bcc1b51248
> Author: Christoph Hellwig <[email protected]>
> Date: Mon Mar 19 11:38:19 2018 +0100
>
> iommu/amd_iommu: Use CONFIG_DMA_DIRECT_OPS=y and dma_direct_{alloc,free}()
>
> This cleans up the code a lot by removing duplicate logic.
>
> Tested-by: Tom Lendacky <[email protected]>
> Tested-by: Joerg Roedel <[email protected]>
> Signed-off-by: Christoph Hellwig <[email protected]>
> Reviewed-by: Thomas Gleixner <[email protected]>
> Acked-by: Joerg Roedel <[email protected]>
> Cc: David Woodhouse <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Jon Mason <[email protected]>
> Cc: Konrad Rzeszutek Wilk <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Muli Ben-Yehuda <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: [email protected]
> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: Ingo Molnar <[email protected]>
>
>
> I'll try to revert this once I'm home.
I can confirm reverting b468620f2a1dfdcfddfd6fa54367b8bcc1b51248
fixes that issue for me.
The GPU is working fine with SME enabled.
Now with working GPU :) I can also confirm performance is back to normal
without doing any other workarounds.
The only app still acting up a bit is Firefox , just minor frame drops,
but nothing to bad. ( probably an Firefox bug too )
crhomium/chrome is fine .. even with 10 tabs open , each one playing
an video on youtube no glitches at all.
Desktop is also fine now, could not find anything wrong.
BR
Hi Christopher,
Am 07.06.2018 um 18:24 schrieb Gabriel C:
>> [SNIP]
>> Ok done.. bisect points to:
>>
>> b468620f2a1dfdcfddfd6fa54367b8bcc1b51248 is the first bad commit
>> commit b468620f2a1dfdcfddfd6fa54367b8bcc1b51248
>> Author: Christoph Hellwig <[email protected]>
>> Date: Mon Mar 19 11:38:19 2018 +0100
>>
>> iommu/amd_iommu: Use CONFIG_DMA_DIRECT_OPS=y and dma_direct_{alloc,free}()
>>
>> This cleans up the code a lot by removing duplicate logic.
>>
>> Tested-by: Tom Lendacky <[email protected]>
>> Tested-by: Joerg Roedel <[email protected]>
>> Signed-off-by: Christoph Hellwig <[email protected]>
>> Reviewed-by: Thomas Gleixner <[email protected]>
>> Acked-by: Joerg Roedel <[email protected]>
>> Cc: David Woodhouse <[email protected]>
>> Cc: Joerg Roedel <[email protected]>
>> Cc: Jon Mason <[email protected]>
>> Cc: Konrad Rzeszutek Wilk <[email protected]>
>> Cc: Linus Torvalds <[email protected]>
>> Cc: Muli Ben-Yehuda <[email protected]>
>> Cc: Peter Zijlstra <[email protected]>
>> Cc: [email protected]
>> Link: http://lkml.kernel.org/r/[email protected]
>> Signed-off-by: Ingo Molnar <[email protected]>
>>
>>
>> I'll try to revert this once I'm home.
> I can confirm reverting b468620f2a1dfdcfddfd6fa54367b8bcc1b51248
> fixes that issue for me.
any idea what could cause that? Basically this patch breaks radeon when
SME is enabled.
> The GPU is working fine with SME enabled.
>
> Now with working GPU :) I can also confirm performance is back to normal
> without doing any other workarounds.
>
> The only app still acting up a bit is Firefox , just minor frame drops,
> but nothing to bad. ( probably an Firefox bug too )
>
> crhomium/chrome is fine .. even with 10 tabs open , each one playing
> an video on youtube no glitches at all.
>
> Desktop is also fine now, could not find anything wrong.
Thanks for testing,
Christian.
>
>
> BR
On Thu, Jun 07, 2018 at 07:20:37PM +0200, Christian K?nig wrote:
> Hi Christopher,
I don't see a Christopher on the Cc list..
On Thu, Jun 07, 2018 at 02:32:46PM +0200, Gabriel C wrote:
> Ok done.. bisect points to:
What is the failure mode you are seeing? Can't find anything in the
mail unfortunately.
Hi Christoph,
Am 08.06.2018 um 08:01 schrieb Christoph Hellwig:
> On Thu, Jun 07, 2018 at 07:20:37PM +0200, Christian König wrote:
>> Hi Christopher,
> I don't see a Christopher on the Cc list..
Sorry, auto-uncorrection. I indeed meant you :)
Christian.
Am 08.06.2018 um 08:02 schrieb Christoph Hellwig:
> On Thu, Jun 07, 2018 at 02:32:46PM +0200, Gabriel C wrote:
>> Ok done.. bisect points to:
> What is the failure mode you are seeing? Can't find anything in the
> mail unfortunately.
As far as I analyzed it we now get an -ENOMEM from dma_alloc_attrs() in
drivers/gpu/drm/ttm/ttm_page_alloc_dma.c when IOMMU is enabled.
Still need to figure out which parameters we want to use for the
allocation, but I think it is only 4k or 8k.
Regards,
Christian.
2018-06-08 8:52 GMT+02:00 Christian König <[email protected]>:
> Am 08.06.2018 um 08:02 schrieb Christoph Hellwig:
>>
>> On Thu, Jun 07, 2018 at 02:32:46PM +0200, Gabriel C wrote:
>>>
>>> Ok done.. bisect points to:
>>
>> What is the failure mode you are seeing? Can't find anything in the
>> mail unfortunately.
>
>
> As far as I analyzed it we now get an -ENOMEM from dma_alloc_attrs() in
> drivers/gpu/drm/ttm/ttm_page_alloc_dma.c when IOMMU is enabled.
>
> Still need to figure out which parameters we want to use for the allocation,
> but I think it is only 4k or 8k.
When you guys need me to test something , or run debug patches
or patches of any sort just let me know..
>
> Regards,
> Christian.
BR
I think the prime issue is that dma_direct_alloc respects the dma
mask. Which we don't need if actually using the iommu. This would
be mostly harmless exept for the the SEV bit high in the address that
makes the checks fail.
For now I'd say revert this commit for 4.17/4.18-rc and I'll look into
addressing these issues properly.
On Mon, Jun 11, 2018 at 12:07 AM Christoph Hellwig <[email protected]> wrote:
>
> For now I'd say revert this commit for 4.17/4.18-rc and I'll look into
> addressing these issues properly.
Ok, reverted in my tree, and marked for stable (for 4.17). Thanks,
Linus