2019-12-15 03:21:19

by Woody Suwalski

[permalink] [raw]
Subject: Regression in 5.4 kernel on 32-bit Radeon IBM T40

Regression in 5.4 kernel on 32-bit Radeon IBM T40
triggered by
commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713
Author: Christoph Hellwig <[email protected]>
Date:   Thu Aug 15 09:27:00 2019 +0200

Howdy,
The above patch has triggered a display problem on IBM Thinkpad T40,
where the screen is covered with a lots of random short black horizontal
lines, or distorted letters in X terms.

The culprit seems to be that the dma_get_required_mask() is returning a
value 0x3fffffff
which is smaller than dma_get_mask()0xffffffff.That results in
dma_addressing_limited()==0 in ttm_bo_device(), and using 40-bits dma
instead of 32-bits.

If I hardcode "1" as the last parameter to ttm_bo_device_init() in place
of a call to dma_addressing_limited(),the problem goes away.

I have added the debug lines starting with "wms:" to the start of
radeon_ttm_init() and of radeon_device_init()printing the interesting
variables.
/....
[    2.091692] Linux agpgart interface v0.103
[    2.092380] agpgart-intel 0000:00:00.0: Intel 855PM Chipset
[    2.107706] agpgart-intel 0000:00:00.0: AGP aperture is 256M @ 0xd0000000
[    2.108111] [drm] radeon kernel modesetting enabled.
[    2.108200] radeon 0000:01:00.0: vgaarb: deactivate vga console
[    2.109365] Console: switching to colour dummy device 80x25
******* radeon_device_init()
[    2.110712] wms: radeon_init flags = 0x90003
[    2.110718] [drm] initializing kernel modesetting (RV200
0x1002:0x4C57 0x1014:0x0530 0x00).
[    2.111220] agpgart-intel 0000:00:00.0: AGP 2.0 bridge
[    2.111233] agpgart-intel 0000:00:00.0: putting AGP V2 device into 1x
mode
[    2.111265] radeon 0000:01:00.0: putting AGP V2 device into 1x mode
[    2.111286] radeon 0000:01:00.0: GTT: 256M 0xD0000000 - 0xDFFFFFFF
[    2.111295] radeon 0000:01:00.0: VRAM: 128M 0x00000000E0000000 -
0x00000000E7FFFFFF (32M used)
[    2.111701] [drm] Detected VRAM RAM=128M, BAR=128M
[    2.111704] [drm] RAM width 64bits DDR
******* radeon_ttm_init()
[    2.111706] wms: dma_addressing_limited=0x0
[    2.111709] wms: dma_get_mask=0xffffffff, bus_dma_limit=0x0,
dma_get_required_mask=0x3fffffff
[    2.115971] [TTM] Zone  kernel: Available graphics memory: 437028 KiB
[    2.115973] [TTM] Zone highmem: Available graphics memory: 510440 KiB

What should be the proper value of these dma variables on the 32-bit system?
How to fix that issue correctly (patches welcomed :-) )Or is the
platform fubar?

Thanks, Woody



2019-12-15 16:05:40

by Meelis Roos

[permalink] [raw]
Subject: Re: Regression in 5.4 kernel on 32-bit Radeon IBM T40

15.12.19 05:17 Woody Suwalski wrote:
> Regression in 5.4 kernel on 32-bit Radeon IBM T40
> triggered by
> commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713
> Author: Christoph Hellwig <[email protected]>
> Date:   Thu Aug 15 09:27:00 2019 +0200
>
> Howdy,
> The above patch has triggered a display problem on IBM Thinkpad T40, where the screen is covered with a lots of random short black horizontal lines, or distorted letters in X terms.
>
> The culprit seems to be that the dma_get_required_mask() is returning a value 0x3fffffff
> which is smaller than dma_get_mask()0xffffffff.That results in dma_addressing_limited()==0 in ttm_bo_device(), and using 40-bits dma instead of 32-bits.

I have the same problem on 32-bit Dell Latitude D600.

> If I hardcode "1" as the last parameter to ttm_bo_device_init() in place of a call to dma_addressing_limited(),the problem goes away.

Tried this on top on 5.4.0 and it helped here too.

--
Meelis Roos <[email protected]>





2020-01-09 17:14:36

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Regression in 5.4 kernel on 32-bit Radeon IBM T40

Hi Woody,

sorry for the late reply, I've been off to a vacation over the holidays.

On Sat, Dec 14, 2019 at 10:17:15PM -0500, Woody Suwalski wrote:
> Regression in 5.4 kernel on 32-bit Radeon IBM T40
> triggered by
> commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713
> Author: Christoph Hellwig <[email protected]>
> Date:?? Thu Aug 15 09:27:00 2019 +0200
>
> Howdy,
> The above patch has triggered a display problem on IBM Thinkpad T40, where
> the screen is covered with a lots of random short black horizontal lines,
> or distorted letters in X terms.
>
> The culprit seems to be that the dma_get_required_mask() is returning a
> value 0x3fffffff
> which is smaller than dma_get_mask()0xffffffff.That results in
> dma_addressing_limited()==0 in ttm_bo_device(), and using 40-bits dma
> instead of 32-bits.

Which is the intended behavior assuming your system has 1GB of memory.
Does it?

> If I hardcode "1" as the last parameter to ttm_bo_device_init() in place of
> a call to dma_addressing_limited(),the problem goes away.

I'll need some help from the drm / radeon / TTM maintainers if there are
any other side effects from not passing the need_dma32 paramters.
Obviously if the device doesn't have more than 32-bits worth of dram and
no DMA offset we can't feed unaddressable memory to the device.
Unfortunately I have a very hard time following the implementation of
the TTM pool if it does anything else in this case.

2020-01-09 19:15:59

by Christian König

[permalink] [raw]
Subject: Re: Regression in 5.4 kernel on 32-bit Radeon IBM T40

Hi Christoph,

Am 09.01.20 um 15:14 schrieb Christoph Hellwig:
> Hi Woody,
>
> sorry for the late reply, I've been off to a vacation over the holidays.
>
> On Sat, Dec 14, 2019 at 10:17:15PM -0500, Woody Suwalski wrote:
>> Regression in 5.4 kernel on 32-bit Radeon IBM T40
>> triggered by
>> commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713
>> Author: Christoph Hellwig <[email protected]>
>> Date:   Thu Aug 15 09:27:00 2019 +0200
>>
>> Howdy,
>> The above patch has triggered a display problem on IBM Thinkpad T40, where
>> the screen is covered with a lots of random short black horizontal lines,
>> or distorted letters in X terms.
>>
>> The culprit seems to be that the dma_get_required_mask() is returning a
>> value 0x3fffffff
>> which is smaller than dma_get_mask()0xffffffff.That results in
>> dma_addressing_limited()==0 in ttm_bo_device(), and using 40-bits dma
>> instead of 32-bits.
> Which is the intended behavior assuming your system has 1GB of memory.
> Does it?

Assuming the system doesn't have the 1GB split up somehow crazy over the
address space that should indeed work as intended.

>
>> If I hardcode "1" as the last parameter to ttm_bo_device_init() in place of
>> a call to dma_addressing_limited(),the problem goes away.
> I'll need some help from the drm / radeon / TTM maintainers if there are
> any other side effects from not passing the need_dma32 paramters.
> Obviously if the device doesn't have more than 32-bits worth of dram and
> no DMA offset we can't feed unaddressable memory to the device.
> Unfortunately I have a very hard time following the implementation of
> the TTM pool if it does anything else in this case.

The only other thing which comes to mind is using huge pages. Can you
try a kernel with CONFIG_TRANSPARENT_HUGEPAGE disabled?

Thanks,
Christian.

2020-01-09 22:43:45

by Woody Suwalski

[permalink] [raw]
Subject: Re: Regression in 5.4 kernel on 32-bit Radeon IBM T40

Christian König wrote:
> Hi Christoph,
>
> Am 09.01.20 um 15:14 schrieb Christoph Hellwig:
>> Hi Woody,
>>
>> sorry for the late reply, I've been off to a vacation over the holidays.
>>
>> On Sat, Dec 14, 2019 at 10:17:15PM -0500, Woody Suwalski wrote:
>>> Regression in 5.4 kernel on 32-bit Radeon IBM T40
>>> triggered by
>>> commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713
>>> Author: Christoph Hellwig <[email protected]>
>>> Date:   Thu Aug 15 09:27:00 2019 +0200
>>>
>>> Howdy,
>>> The above patch has triggered a display problem on IBM Thinkpad T40,
>>> where
>>> the screen is covered with a lots of random short black horizontal
>>> lines,
>>> or distorted letters in X terms.
>>>
>>> The culprit seems to be that the dma_get_required_mask() is returning a
>>> value 0x3fffffff
>>> which is smaller than dma_get_mask()0xffffffff.That results in
>>> dma_addressing_limited()==0 in ttm_bo_device(), and using 40-bits dma
>>> instead of 32-bits.
>> Which is the intended behavior assuming your system has 1GB of memory.
>> Does it?
>
> Assuming the system doesn't have the 1GB split up somehow crazy over
> the address space that should indeed work as intended.
>
>>
>>> If I hardcode "1" as the last parameter to ttm_bo_device_init() in
>>> place of
>>> a call to dma_addressing_limited(),the problem goes away.
>> I'll need some help from the drm / radeon / TTM maintainers if there are
>> any other side effects from not passing the need_dma32 paramters.
>> Obviously if the device doesn't have more than 32-bits worth of dram and
>> no DMA offset we can't feed unaddressable memory to the device.
>> Unfortunately I have a very hard time following the implementation of
>> the TTM pool if it does anything else in this case.
>
> The only other thing which comes to mind is using huge pages. Can you
> try a kernel with CONFIG_TRANSPARENT_HUGEPAGE disabled?
>
> Thanks,
> Christian.

Happy New Year :-)

Yes, the box has 1G of RAM, and unfortunately nope, TRANSPARENT_HUGEPAGE
is not on. I am attaching the .config, maybe you can find some insanity
there... Also - for reference - a minimalistic patch fixing symptoms
(but not addressing the root cause  :-( )

I can try to rebuild the kernel with HIGHMEM off, although I am not
optimistic it will change anything. But at least it should simplify the
1G split...

So if you have any other ideas - pls let me know..

Thanks, Woody


Attachments:
radeon_ttm.patch (556.00 B)
config_i386 (130.40 kB)
Download all attachments

2020-01-10 02:43:11

by Woody Suwalski

[permalink] [raw]
Subject: Re: Regression in 5.4 kernel on 32-bit Radeon IBM T40

Woody Suwalski wrote:
> Christian König wrote:
>> Hi Christoph,
>>
>> Am 09.01.20 um 15:14 schrieb Christoph Hellwig:
>>> Hi Woody,
>>>
>>> sorry for the late reply, I've been off to a vacation over the
>>> holidays.
>>>
>>> On Sat, Dec 14, 2019 at 10:17:15PM -0500, Woody Suwalski wrote:
>>>> Regression in 5.4 kernel on 32-bit Radeon IBM T40
>>>> triggered by
>>>> commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713
>>>> Author: Christoph Hellwig <[email protected]>
>>>> Date:   Thu Aug 15 09:27:00 2019 +0200
>>>>
>>>> Howdy,
>>>> The above patch has triggered a display problem on IBM Thinkpad
>>>> T40, where
>>>> the screen is covered with a lots of random short black horizontal
>>>> lines,
>>>> or distorted letters in X terms.
>>>>
>>>> The culprit seems to be that the dma_get_required_mask() is
>>>> returning a
>>>> value 0x3fffffff
>>>> which is smaller than dma_get_mask()0xffffffff.That results in
>>>> dma_addressing_limited()==0 in ttm_bo_device(), and using 40-bits dma
>>>> instead of 32-bits.
>>> Which is the intended behavior assuming your system has 1GB of memory.
>>> Does it?
>>
>> Assuming the system doesn't have the 1GB split up somehow crazy over
>> the address space that should indeed work as intended.
>>
>>>
>>>> If I hardcode "1" as the last parameter to ttm_bo_device_init() in
>>>> place of
>>>> a call to dma_addressing_limited(),the problem goes away.
>>> I'll need some help from the drm / radeon / TTM maintainers if there
>>> are
>>> any other side effects from not passing the need_dma32 paramters.
>>> Obviously if the device doesn't have more than 32-bits worth of dram
>>> and
>>> no DMA offset we can't feed unaddressable memory to the device.
>>> Unfortunately I have a very hard time following the implementation of
>>> the TTM pool if it does anything else in this case.
>>
>> The only other thing which comes to mind is using huge pages. Can you
>> try a kernel with CONFIG_TRANSPARENT_HUGEPAGE disabled?
>>
>> Thanks,
>> Christian.
>
> Happy New Year :-)
>
> Yes, the box has 1G of RAM, and unfortunately nope,
> TRANSPARENT_HUGEPAGE is not on. I am attaching the .config, maybe you
> can find some insanity there... Also - for reference - a minimalistic
> patch fixing symptoms (but not addressing the root cause  :-( )
>
> I can try to rebuild the kernel with HIGHMEM off, although I am not
> optimistic it will change anything. But at least it should simplify
> the 1G split...
>
> So if you have any other ideas - pls let me know..
>
> Thanks, Woody
>
Interesting. Rebuilding the kernel with HIMEM disabled actually solves
the display problem. The debug lines show exactly same values for
dma_get_required_mask() and dma_get_mask(), yet now it works OK... So
what has solved it???

Woody

2020-02-22 16:31:57

by Thomas Backlund

[permalink] [raw]
Subject: Re: Regression in 5.4 kernel on 32-bit Radeon IBM T40

Den 09-01-2020 kl. 17:12, skrev Christian König:
> Hi Christoph,
>
> Am 09.01.20 um 15:14 schrieb Christoph Hellwig:
>> Hi Woody,
>>
>> sorry for the late reply, I've been off to a vacation over the holidays.
>>
>> On Sat, Dec 14, 2019 at 10:17:15PM -0500, Woody Suwalski wrote:
>>> Regression in 5.4 kernel on 32-bit Radeon IBM T40
>>> triggered by
>>> commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713
>>> Author: Christoph Hellwig <[email protected]>
>>> Date:   Thu Aug 15 09:27:00 2019 +0200
>>>
>>> Howdy,
>>> The above patch has triggered a display problem on IBM Thinkpad T40,
>>> where
>>> the screen is covered with a lots of random short black horizontal
>>> lines,
>>> or distorted letters in X terms.
>>>
>>> The culprit seems to be that the dma_get_required_mask() is returning a
>>> value 0x3fffffff
>>> which is smaller than dma_get_mask()0xffffffff.That results in
>>> dma_addressing_limited()==0 in ttm_bo_device(), and using 40-bits dma
>>> instead of 32-bits.
>> Which is the intended behavior assuming your system has 1GB of memory.
>> Does it?
>
> Assuming the system doesn't have the 1GB split up somehow crazy over the
> address space that should indeed work as intended.
>
>>
>>> If I hardcode "1" as the last parameter to ttm_bo_device_init() in
>>> place of
>>> a call to dma_addressing_limited(),the problem goes away.
>> I'll need some help from the drm / radeon / TTM maintainers if there are
>> any other side effects from not passing the need_dma32 paramters.
>> Obviously if the device doesn't have more than 32-bits worth of dram and
>> no DMA offset we can't feed unaddressable memory to the device.
>> Unfortunately I have a very hard time following the implementation of
>> the TTM pool if it does anything else in this case.
>
> The only other thing which comes to mind is using huge pages. Can you
> try a kernel with CONFIG_TRANSPARENT_HUGEPAGE disabled?
>


Any progress on this ?

We have a bugreport in Mageia with the hw:
Dell Inspiron 5100, 32-bit P4 processor, 2GB of RAM, Radeon Mobility
7500 (RV200) graphics

that gets display issues too and reverting the offending commit restores
normal behaviour.

and the same issue is still there with 5.5 series kernels.

--
Thomas

2020-03-15 01:37:42

by Thomas Schwinge

[permalink] [raw]
Subject: Re: Regression in 5.4 kernel on 32-bit Radeon IBM T40

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstra?e 201, 80634 M?nchen / Germany
Registergericht M?nchen HRB 106955, Gesch?ftsf?hrer: Thomas Heurung, Alexander Walter


Attachments:
(No filename) (7.15 kB)

2020-09-16 22:19:27

by Alex Deucher

[permalink] [raw]
Subject: Re: Regression in 5.4 kernel on 32-bit Radeon IBM T40

On Mon, Feb 24, 2020 at 4:20 AM Thomas Backlund <[email protected]> wrote:
>
> Den 09-01-2020 kl. 17:12, skrev Christian König:
> > Hi Christoph,
> >
> > Am 09.01.20 um 15:14 schrieb Christoph Hellwig:
> >> Hi Woody,
> >>
> >> sorry for the late reply, I've been off to a vacation over the holidays.
> >>
> >> On Sat, Dec 14, 2019 at 10:17:15PM -0500, Woody Suwalski wrote:
> >>> Regression in 5.4 kernel on 32-bit Radeon IBM T40
> >>> triggered by
> >>> commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713
> >>> Author: Christoph Hellwig <[email protected]>
> >>> Date: Thu Aug 15 09:27:00 2019 +0200
> >>>
> >>> Howdy,
> >>> The above patch has triggered a display problem on IBM Thinkpad T40,
> >>> where
> >>> the screen is covered with a lots of random short black horizontal
> >>> lines,
> >>> or distorted letters in X terms.
> >>>
> >>> The culprit seems to be that the dma_get_required_mask() is returning a
> >>> value 0x3fffffff
> >>> which is smaller than dma_get_mask()0xffffffff.That results in
> >>> dma_addressing_limited()==0 in ttm_bo_device(), and using 40-bits dma
> >>> instead of 32-bits.
> >> Which is the intended behavior assuming your system has 1GB of memory.
> >> Does it?
> >
> > Assuming the system doesn't have the 1GB split up somehow crazy over the
> > address space that should indeed work as intended.
> >
> >>
> >>> If I hardcode "1" as the last parameter to ttm_bo_device_init() in
> >>> place of
> >>> a call to dma_addressing_limited(),the problem goes away.
> >> I'll need some help from the drm / radeon / TTM maintainers if there are
> >> any other side effects from not passing the need_dma32 paramters.
> >> Obviously if the device doesn't have more than 32-bits worth of dram and
> >> no DMA offset we can't feed unaddressable memory to the device.
> >> Unfortunately I have a very hard time following the implementation of
> >> the TTM pool if it does anything else in this case.
> >
> > The only other thing which comes to mind is using huge pages. Can you
> > try a kernel with CONFIG_TRANSPARENT_HUGEPAGE disabled?
> >
>
>
> Any progress on this ?
>
> We have a bugreport in Mageia with the hw:
> Dell Inspiron 5100, 32-bit P4 processor, 2GB of RAM, Radeon Mobility
> 7500 (RV200) graphics
>
> that gets display issues too and reverting the offending commit restores
> normal behaviour.
>
> and the same issue is still there with 5.5 series kernels.

Does disabling HIMEM or setting radeon.agpmode=-1 on the kernel
command line in grub fix the issue?

Alex