2023-07-03 13:26:01

by Arnd Bergmann

[permalink] [raw]
Subject: [PATCH] drm/amdgpu: avoid integer overflow warning in amdgpu_device_resize_fb_bar()

From: Arnd Bergmann <[email protected]>

On 32-bit architectures comparing a resource against a value larger than
U32_MAX can cause a warning:

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1344:18: error: result of comparison of constant 4294967296 with expression of type 'resource_size_t' (aka 'unsigned int') is always false [-Werror,-Wtautological-constant-out-of-range-compare]
res->start > 0x100000000ull)
~~~~~~~~~~ ^ ~~~~~~~~~~~~~~

The compiler is right that this cannot happen in this configuration, which
is ok, so just add a cast to shut up the warning.

Fixes: 31b8adab3247e ("drm/amdgpu: require a root bus window above 4GB for BAR resize")
Signed-off-by: Arnd Bergmann <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 7f069e1731fee..abd13942aac5d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1341,7 +1341,7 @@ int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev)

pci_bus_for_each_resource(root, res, i) {
if (res && res->flags & (IORESOURCE_MEM | IORESOURCE_MEM_64) &&
- res->start > 0x100000000ull)
+ (u64)res->start > 0x100000000ull)
break;
}

--
2.39.2



2023-07-04 07:28:20

by Christian König

[permalink] [raw]
Subject: Re: [PATCH] drm/amdgpu: avoid integer overflow warning in amdgpu_device_resize_fb_bar()

Am 03.07.23 um 14:35 schrieb Arnd Bergmann:
> From: Arnd Bergmann <[email protected]>
>
> On 32-bit architectures comparing a resource against a value larger than
> U32_MAX can cause a warning:
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1344:18: error: result of comparison of constant 4294967296 with expression of type 'resource_size_t' (aka 'unsigned int') is always false [-Werror,-Wtautological-constant-out-of-range-compare]
> res->start > 0x100000000ull)
> ~~~~~~~~~~ ^ ~~~~~~~~~~~~~~
>
> The compiler is right that this cannot happen in this configuration, which
> is ok, so just add a cast to shut up the warning.

Well it doesn't make sense to compile that driver on systems with only
32bit phys_addr_t in the first place.

It might be cleaner to just not build the whole driver on such systems
or at least leave out this function.

Regards,
Christian

>
> Fixes: 31b8adab3247e ("drm/amdgpu: require a root bus window above 4GB for BAR resize")
> Signed-off-by: Arnd Bergmann <[email protected]>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 7f069e1731fee..abd13942aac5d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -1341,7 +1341,7 @@ int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev)
>
> pci_bus_for_each_resource(root, res, i) {
> if (res && res->flags & (IORESOURCE_MEM | IORESOURCE_MEM_64) &&
> - res->start > 0x100000000ull)
> + (u64)res->start > 0x100000000ull)
> break;
> }
>


2023-07-04 12:36:41

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] drm/amdgpu: avoid integer overflow warning in amdgpu_device_resize_fb_bar()

On Tue, Jul 4, 2023, at 08:54, Christian König wrote:
> Am 03.07.23 um 14:35 schrieb Arnd Bergmann:
>> From: Arnd Bergmann <[email protected]>
>>
>> On 32-bit architectures comparing a resource against a value larger than
>> U32_MAX can cause a warning:
>>
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1344:18: error: result of comparison of constant 4294967296 with expression of type 'resource_size_t' (aka 'unsigned int') is always false [-Werror,-Wtautological-constant-out-of-range-compare]
>> res->start > 0x100000000ull)
>> ~~~~~~~~~~ ^ ~~~~~~~~~~~~~~
>>
>> The compiler is right that this cannot happen in this configuration, which
>> is ok, so just add a cast to shut up the warning.
>
> Well it doesn't make sense to compile that driver on systems with only
> 32bit phys_addr_t in the first place.

Not sure I understand the specific requirement. Do you mean the entire
amdgpu driver requires 64-bit BAR addressing, or just the bits that
resize the BARs?

> It might be cleaner to just not build the whole driver on such systems
> or at least leave out this function.

How about this version? This also addresses the build failure, but
I don't know if this makes any sense:

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1325,6 +1325,9 @@ int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev)
u16 cmd;
int r;

+ if (!IS_ENABLED(CONFIG_PHYS_ADDR_T_64BIT))
+ return 0;
+
/* Bypass for VF */
if (amdgpu_sriov_vf(adev))
return 0;

Arnd

2023-07-04 12:57:23

by Christian König

[permalink] [raw]
Subject: Re: [PATCH] drm/amdgpu: avoid integer overflow warning in amdgpu_device_resize_fb_bar()

Am 04.07.23 um 14:24 schrieb Arnd Bergmann:
> On Tue, Jul 4, 2023, at 08:54, Christian König wrote:
>> Am 03.07.23 um 14:35 schrieb Arnd Bergmann:
>>> From: Arnd Bergmann <[email protected]>
>>>
>>> On 32-bit architectures comparing a resource against a value larger than
>>> U32_MAX can cause a warning:
>>>
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1344:18: error: result of comparison of constant 4294967296 with expression of type 'resource_size_t' (aka 'unsigned int') is always false [-Werror,-Wtautological-constant-out-of-range-compare]
>>> res->start > 0x100000000ull)
>>> ~~~~~~~~~~ ^ ~~~~~~~~~~~~~~
>>>
>>> The compiler is right that this cannot happen in this configuration, which
>>> is ok, so just add a cast to shut up the warning.
>> Well it doesn't make sense to compile that driver on systems with only
>> 32bit phys_addr_t in the first place.
> Not sure I understand the specific requirement. Do you mean the entire
> amdgpu driver requires 64-bit BAR addressing, or just the bits that
> resize the BARs?

Well a bit of both.

Modern AMD GPUs have 16GiB of local memory (VRAM), making those
accessible to a CPU which can only handle 32bit addresses by resizing
the BAR is impossible to begin with.

But going a step further even without resizing it is pretty hard to get
that hardware working on such an architecture.

>> It might be cleaner to just not build the whole driver on such systems
>> or at least leave out this function.
> How about this version? This also addresses the build failure, but
> I don't know if this makes any sense:
>
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -1325,6 +1325,9 @@ int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev)
> u16 cmd;
> int r;
>
> + if (!IS_ENABLED(CONFIG_PHYS_ADDR_T_64BIT))
> + return 0;
> +

Yes, if that suppresses the warning as well then that makes perfect
sense to me.

Regards,
Christian.

> /* Bypass for VF */
> if (amdgpu_sriov_vf(adev))
> return 0;
>
> Arnd


2023-07-04 14:47:03

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] drm/amdgpu: avoid integer overflow warning in amdgpu_device_resize_fb_bar()

On Tue, Jul 4, 2023, at 14:33, Christian König wrote:
> Am 04.07.23 um 14:24 schrieb Arnd Bergmann:
>> On Tue, Jul 4, 2023, at 08:54, Christian König wrote:
>>> Am 03.07.23 um 14:35 schrieb Arnd Bergmann:
>> Not sure I understand the specific requirement. Do you mean the entire
>> amdgpu driver requires 64-bit BAR addressing, or just the bits that
>> resize the BARs?
>
> Well a bit of both.
>
> Modern AMD GPUs have 16GiB of local memory (VRAM), making those
> accessible to a CPU which can only handle 32bit addresses by resizing
> the BAR is impossible to begin with.
>
> But going a step further even without resizing it is pretty hard to get
> that hardware working on such an architecture.

I'd still like to understand this part better, as we have a lot of
arm64 chips with somewhat flawed PCIe implementations, often with
a tiny 64-bit memory space, but otherwise probably capable of
using a GPU.

What exactly do you expect to happen here?

a) Use only part of the VRAM but otherwise work as expected
b) Access all of the VRAM, but at a performance cost for
bank switching?
c) Require kernel changes to make a) or b) work, otherwise
fail to load
d) have no chance of working even with driver changes

>>> It might be cleaner to just not build the whole driver on such systems
>>> or at least leave out this function.
>> How about this version? This also addresses the build failure, but
>> I don't know if this makes any sense:
>>
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -1325,6 +1325,9 @@ int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev)
>> u16 cmd;
>> int r;
>>
>> + if (!IS_ENABLED(CONFIG_PHYS_ADDR_T_64BIT))
>> + return 0;
>> +
>
> Yes, if that suppresses the warning as well then that makes perfect
> sense to me.

Ok, I'll send that as a v2 then.

Arnd

2023-07-04 15:25:38

by Christian König

[permalink] [raw]
Subject: Re: [PATCH] drm/amdgpu: avoid integer overflow warning in amdgpu_device_resize_fb_bar()

Am 04.07.23 um 16:31 schrieb Arnd Bergmann:
> On Tue, Jul 4, 2023, at 14:33, Christian König wrote:
>> Am 04.07.23 um 14:24 schrieb Arnd Bergmann:
>>> On Tue, Jul 4, 2023, at 08:54, Christian König wrote:
>>>> Am 03.07.23 um 14:35 schrieb Arnd Bergmann:
>>> Not sure I understand the specific requirement. Do you mean the entire
>>> amdgpu driver requires 64-bit BAR addressing, or just the bits that
>>> resize the BARs?
>> Well a bit of both.
>>
>> Modern AMD GPUs have 16GiB of local memory (VRAM), making those
>> accessible to a CPU which can only handle 32bit addresses by resizing
>> the BAR is impossible to begin with.
>>
>> But going a step further even without resizing it is pretty hard to get
>> that hardware working on such an architecture.
> I'd still like to understand this part better, as we have a lot of
> arm64 chips with somewhat flawed PCIe implementations, often with
> a tiny 64-bit memory space, but otherwise probably capable of
> using a GPU.

Yeah, those are unfortunately very well known to us :(

> What exactly do you expect to happen here?
>
> a) Use only part of the VRAM but otherwise work as expected
> b) Access all of the VRAM, but at a performance cost for
> bank switching?

We have tons of x86 systems where we can't resize the BAR (because of
lack of BIOS setup of the root PCIe windows). So bank switching is still
perfectly supported.

> c) Require kernel changes to make a) or b) work, otherwise
> fail to load
> d) have no chance of working even with driver changes

Yeah, that is usually what happens on those arm64 system with flawed
PCIe implementations.

The problem is not even BAR resize, basically we already had tons of
customers which came to us and complained that amdgpu doesn't load or
crashes the system after a few seconds.

After investigating (which sometimes even includes involving engineers
from ARM) we usually find that those boards doesn't even remotely comply
to the PCIe specification, both regarding power as well as functional
things like DMA coherency.

Regards,
Christian.

>
>>>> It might be cleaner to just not build the whole driver on such systems
>>>> or at least leave out this function.
>>> How about this version? This also addresses the build failure, but
>>> I don't know if this makes any sense:
>>>
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -1325,6 +1325,9 @@ int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev)
>>> u16 cmd;
>>> int r;
>>>
>>> + if (!IS_ENABLED(CONFIG_PHYS_ADDR_T_64BIT))
>>> + return 0;
>>> +
>> Yes, if that suppresses the warning as well then that makes perfect
>> sense to me.
> Ok, I'll send that as a v2 then.
>
> Arnd


2023-07-04 15:43:57

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] drm/amdgpu: avoid integer overflow warning in amdgpu_device_resize_fb_bar()

On Tue, Jul 4, 2023, at 16:51, Christian König wrote:
> Am 04.07.23 um 16:31 schrieb Arnd Bergmann:
>> On Tue, Jul 4, 2023, at 14:33, Christian König wrote:
>>>
>>> Modern AMD GPUs have 16GiB of local memory (VRAM), making those
>>> accessible to a CPU which can only handle 32bit addresses by resizing
>>> the BAR is impossible to begin with.
>>>
>>> But going a step further even without resizing it is pretty hard to get
>>> that hardware working on such an architecture.
>> I'd still like to understand this part better, as we have a lot of
>> arm64 chips with somewhat flawed PCIe implementations, often with
>> a tiny 64-bit memory space, but otherwise probably capable of
>> using a GPU.
>
> Yeah, those are unfortunately very well known to us :(
>
>> What exactly do you expect to happen here?
>>
>> a) Use only part of the VRAM but otherwise work as expected
>> b) Access all of the VRAM, but at a performance cost for
>> bank switching?
>
> We have tons of x86 systems where we can't resize the BAR (because of
> lack of BIOS setup of the root PCIe windows). So bank switching is still
> perfectly supported.

Ok, good.

> After investigating (which sometimes even includes involving engineers
> from ARM) we usually find that those boards doesn't even remotely comply
> to the PCIe specification, both regarding power as well as functional
> things like DMA coherency.

Makes sense, the power usage is clearly going to make this
impossible on a lot of boards. I would have expected noncoherent
DMA to be a solvable problem, since that generally works with
all drivers that use the dma-mapping interfaces correctly,
but I understand that drivers/gpu/* often does its own thing
here, which may make that harder.

Arnd

2023-07-05 12:34:55

by Christian König

[permalink] [raw]
Subject: Re: [PATCH] drm/amdgpu: avoid integer overflow warning in amdgpu_device_resize_fb_bar()

Am 04.07.23 um 17:24 schrieb Arnd Bergmann:
> On Tue, Jul 4, 2023, at 16:51, Christian König wrote:
>> Am 04.07.23 um 16:31 schrieb Arnd Bergmann:
>>> On Tue, Jul 4, 2023, at 14:33, Christian König wrote:
>>>> Modern AMD GPUs have 16GiB of local memory (VRAM), making those
>>>> accessible to a CPU which can only handle 32bit addresses by resizing
>>>> the BAR is impossible to begin with.
>>>>
>>>> But going a step further even without resizing it is pretty hard to get
>>>> that hardware working on such an architecture.
>>> I'd still like to understand this part better, as we have a lot of
>>> arm64 chips with somewhat flawed PCIe implementations, often with
>>> a tiny 64-bit memory space, but otherwise probably capable of
>>> using a GPU.
>> Yeah, those are unfortunately very well known to us :(
>>
>>> What exactly do you expect to happen here?
>>>
>>> a) Use only part of the VRAM but otherwise work as expected
>>> b) Access all of the VRAM, but at a performance cost for
>>> bank switching?
>> We have tons of x86 systems where we can't resize the BAR (because of
>> lack of BIOS setup of the root PCIe windows). So bank switching is still
>> perfectly supported.
> Ok, good.
>
>> After investigating (which sometimes even includes involving engineers
>> from ARM) we usually find that those boards doesn't even remotely comply
>> to the PCIe specification, both regarding power as well as functional
>> things like DMA coherency.
> Makes sense, the power usage is clearly going to make this
> impossible on a lot of boards. I would have expected noncoherent
> DMA to be a solvable problem, since that generally works with
> all drivers that use the dma-mapping interfaces correctly,
> but I understand that drivers/gpu/* often does its own thing
> here, which may make that harder.

Yeah, I've heard that before. The problem is simply that the dma-mapping
interface can't handle those cases.

User space APIs like Vulkan and some OpenGL extensions make a coherent
memory model between GPU and CPU mandatory.

In other words you have things like ring buffers between code running on
the GPU and code running on the CPU and the kernel is not even involved
in that communication.

This is all based on the PCIe specification which makes it quite clear
that things like snooping caches is mandatory for a compliant root complex.

There has been success to some degree by making everything uncached, but
then the performance just sucks so badly that you can practically forget
it as well.

Regards,
Christian.

>
> Arnd