2022-12-25 20:30:15

by Luís Mendes

[permalink] [raw]
Subject: [PATCH] [RFC] drm/drm_buddy fails to initialize on 32-bit architectures

Re-sending with the correct linux-kernel mailing list email address.
Sorry for the inconvenience.

The proposed patch fixes the issue and allows amdgpu to work again on
armhf with a AMD RX 550 card, however it may not be the best solution
for the issue, as detailed below.

include/log2.h defined macros rounddown_pow_of_two(...) and
roundup_pow_of_two(...) do not handle 64-bit values on 32-bit
architectures (tested on armv9 armhf machine) causing
drm_buddy_init(...) to fail on BUG_ON with an underflow on the order
value, thus impeding amdgpu to load properly (no GUI).

One option is to modify rounddown_pow_of_two(...) to detect if the
variable takes 32 bits or less and call __rounddown_pow_of_two_u32(u32
n) or if the variable takes more space than 32 bits, then call
__rounddown_pow_of_two_u64(u64 n). This would imply renaming
__rounddown_pow_of_two(unsigne
d long n) to
__rounddown_pow_of_two_u32(u32 n) and add a new function
__rounddown_pow_of_two_u64(u64 n). This would be the most transparent
solution, however there a few complications, and they are:
- that the mm subsystem will fail to link on armhf with an undefined
reference on __aeabi_uldivmod
- there a few drivers that directly call __rounddown_pow_of_two(...)
- that other drivers and subsystems generate warnings

So this alternate solution was devised which avoids touching existing
code paths, and just updates drm_buddy which seems to be the only
driver that is failing, however I am not sure if this is the proper
way to go. So I would like to get a second opinion on this, by those
who know.

/include/linux/log2.h
/drivers/gpu/drm/drm_buddy.c

Signed-off-by: Luís Mendes <[email protected]>
>8------------------------------------------------------8<
diff -uprN linux-next/drivers/gpu/drm/drm_buddy.c
linux-nextLM/drivers/gpu/drm/drm_buddy.c
--- linux-next/drivers/gpu/drm/drm_buddy.c 2022-12-25
16:29:26.000000000 +0000
+++ linux-nextLM/drivers/gpu/drm/drm_buddy.c 2022-12-25
17:04:32.136007116 +0000
@@ -128,7 +128,7 @@ int drm_buddy_init(struct drm_buddy *mm,
unsigned int order;
u64 root_size;

- root_size = rounddown_pow_of_two(size);
+ root_size = rounddown_pow_of_two_u64(size);
order = ilog2(root_size) - ilog2(chunk_size);

root = drm_block_alloc(mm, NULL, order, offset);
diff -uprN linux-next/include/linux/log2.h linux-nextLM/include/linux/log2.h
--- linux-next/include/linux/log2.h 2022-12-25 16:29:29.000000000 +0000
+++ linux-nextLM/include/linux/log2.h 2022-12-25 17:00:34.319901492 +0000
@@ -58,6 +58,18 @@ unsigned long __roundup_pow_of_two(unsig
}

/**
+ * __roundup_pow_of_two_u64() - round up to nearest power of two
+ * (unsgined 64-bits precision version)
+ * @n: value to round up
+ */
+static inline __attribute__((const))
+u64 __roundup_pow_of_two_u64(u64 n)
+{
+ return 1ULL << fls64(n - 1);
+}
+
+
+/**
* __rounddown_pow_of_two() - round down to nearest power of two
* @n: value to round down
*/
@@ -68,6 +80,17 @@ unsigned long __rounddown_pow_of_two(uns
}

/**
+ * __rounddown_pow_of_two_u64() - round down to nearest power of two
+ * (unsgined 64-bits precision version)
+ * @n: value to round down
+ */
+static inline __attribute__((const))
+u64 __rounddown_pow_of_two_u64(u64 n)
+{
+ return 1ULL << (fls64(n) - 1);
+}
+
+/**
* const_ilog2 - log base 2 of 32-bit or a 64-bit constant unsigned value
* @n: parameter
*
@@ -163,6 +186,7 @@ unsigned long __rounddown_pow_of_two(uns
__ilog2_u64(n) \
)

+
/**
* roundup_pow_of_two - round the given value up to nearest power of two
* @n: parameter
@@ -181,6 +205,25 @@ unsigned long __rounddown_pow_of_two(uns
)

/**
+ * roundup_pow_of_two_u64 - round the given value up to nearest power of two
+ * (unsgined 64-bits precision version)
+ * @n: parameter
+ *
+ * round the given value up to the nearest power of two
+ * - the result is undefined when n == 0
+ * - this can be used to initialise global variables from constant data
+ */
+#define roundup_pow_of_two_u64(n) \
+( \
+ __builtin_constant_p(n) ? ( \
+ ((n) == 1) ? 1 : \
+ (1ULL << (ilog2((n) - 1) + 1)) \
+ ) : \
+ __roundup_pow_of_two_u64(n) \
+ )
+
+
+/**
* rounddown_pow_of_two - round the given value down to nearest power of two
* @n: parameter
*
@@ -195,6 +238,22 @@ unsigned long __rounddown_pow_of_two(uns
__rounddown_pow_of_two(n) \
)

+/**
+ * rounddown_pow_of_two_u64 - round the given value down to nearest
power of two
+ * (unsgined 64-bits precision version)
+ * @n: parameter
+ *
+ * round the given value down to the nearest power of two
+ * - the result is undefined when n == 0
+ * - this can be used to initialise global variables from constant data
+ */
+#define rounddown_pow_of_two_u64(n) \
+( \
+ __builtin_constant_p(n) ? ( \
+ (1ULL << ilog2(n))) : \
+ __rounddown_pow_of_two_u64(n) \
+ )
+
static inline __attribute_const__
int __order_base_2(unsigned long n)
{


2023-01-03 09:08:47

by Christian König

[permalink] [raw]
Subject: Re: [PATCH] [RFC] drm/drm_buddy fails to initialize on 32-bit architectures

Am 25.12.22 um 20:39 schrieb Luís Mendes:
> Re-sending with the correct linux-kernel mailing list email address.
> Sorry for the inconvenience.
>
> The proposed patch fixes the issue and allows amdgpu to work again on
> armhf with a AMD RX 550 card, however it may not be the best solution
> for the issue, as detailed below.
>
> include/log2.h defined macros rounddown_pow_of_two(...) and
> roundup_pow_of_two(...) do not handle 64-bit values on 32-bit
> architectures (tested on armv9 armhf machine) causing
> drm_buddy_init(...) to fail on BUG_ON with an underflow on the order
> value, thus impeding amdgpu to load properly (no GUI).
>
> One option is to modify rounddown_pow_of_two(...) to detect if the
> variable takes 32 bits or less and call __rounddown_pow_of_two_u32(u32
> n) or if the variable takes more space than 32 bits, then call
> __rounddown_pow_of_two_u64(u64 n). This would imply renaming
> __rounddown_pow_of_two(unsigne
> d long n) to
> __rounddown_pow_of_two_u32(u32 n) and add a new function
> __rounddown_pow_of_two_u64(u64 n). This would be the most transparent
> solution, however there a few complications, and they are:
> - that the mm subsystem will fail to link on armhf with an undefined
> reference on __aeabi_uldivmod
> - there a few drivers that directly call __rounddown_pow_of_two(...)
> - that other drivers and subsystems generate warnings
>
> So this alternate solution was devised which avoids touching existing
> code paths, and just updates drm_buddy which seems to be the only
> driver that is failing, however I am not sure if this is the proper
> way to go. So I would like to get a second opinion on this, by those
> who know.
>
> /include/linux/log2.h
> /drivers/gpu/drm/drm_buddy.c
>
> Signed-off-by: Luís Mendes <[email protected]>
>> 8------------------------------------------------------8<
> diff -uprN linux-next/drivers/gpu/drm/drm_buddy.c
> linux-nextLM/drivers/gpu/drm/drm_buddy.c
> --- linux-next/drivers/gpu/drm/drm_buddy.c 2022-12-25
> 16:29:26.000000000 +0000
> +++ linux-nextLM/drivers/gpu/drm/drm_buddy.c 2022-12-25
> 17:04:32.136007116 +0000
> @@ -128,7 +128,7 @@ int drm_buddy_init(struct drm_buddy *mm,
> unsigned int order;
> u64 root_size;
>
> - root_size = rounddown_pow_of_two(size);
> + root_size = rounddown_pow_of_two_u64(size);
> order = ilog2(root_size) - ilog2(chunk_size);

I think this can be handled much easier if keep around the root_order
instead of the root_size in the first place.

Cause ilog2() does the right thing even for non power of two values and
so we just need the order for the offset subtraction below.

Arun can you take a closer look at this?

Regards,
Christian.

>
> root = drm_block_alloc(mm, NULL, order, offset);
> diff -uprN linux-next/include/linux/log2.h linux-nextLM/include/linux/log2.h
> --- linux-next/include/linux/log2.h 2022-12-25 16:29:29.000000000 +0000
> +++ linux-nextLM/include/linux/log2.h 2022-12-25 17:00:34.319901492 +0000
> @@ -58,6 +58,18 @@ unsigned long __roundup_pow_of_two(unsig
> }
>
> /**
> + * __roundup_pow_of_two_u64() - round up to nearest power of two
> + * (unsgined 64-bits precision version)
> + * @n: value to round up
> + */
> +static inline __attribute__((const))
> +u64 __roundup_pow_of_two_u64(u64 n)
> +{
> + return 1ULL << fls64(n - 1);
> +}
> +
> +
> +/**
> * __rounddown_pow_of_two() - round down to nearest power of two
> * @n: value to round down
> */
> @@ -68,6 +80,17 @@ unsigned long __rounddown_pow_of_two(uns
> }
>
> /**
> + * __rounddown_pow_of_two_u64() - round down to nearest power of two
> + * (unsgined 64-bits precision version)
> + * @n: value to round down
> + */
> +static inline __attribute__((const))
> +u64 __rounddown_pow_of_two_u64(u64 n)
> +{
> + return 1ULL << (fls64(n) - 1);
> +}
> +
> +/**
> * const_ilog2 - log base 2 of 32-bit or a 64-bit constant unsigned value
> * @n: parameter
> *
> @@ -163,6 +186,7 @@ unsigned long __rounddown_pow_of_two(uns
> __ilog2_u64(n) \
> )
>
> +
> /**
> * roundup_pow_of_two - round the given value up to nearest power of two
> * @n: parameter
> @@ -181,6 +205,25 @@ unsigned long __rounddown_pow_of_two(uns
> )
>
> /**
> + * roundup_pow_of_two_u64 - round the given value up to nearest power of two
> + * (unsgined 64-bits precision version)
> + * @n: parameter
> + *
> + * round the given value up to the nearest power of two
> + * - the result is undefined when n == 0
> + * - this can be used to initialise global variables from constant data
> + */
> +#define roundup_pow_of_two_u64(n) \
> +( \
> + __builtin_constant_p(n) ? ( \
> + ((n) == 1) ? 1 : \
> + (1ULL << (ilog2((n) - 1) + 1)) \
> + ) : \
> + __roundup_pow_of_two_u64(n) \
> + )
> +
> +
> +/**
> * rounddown_pow_of_two - round the given value down to nearest power of two
> * @n: parameter
> *
> @@ -195,6 +238,22 @@ unsigned long __rounddown_pow_of_two(uns
> __rounddown_pow_of_two(n) \
> )
>
> +/**
> + * rounddown_pow_of_two_u64 - round the given value down to nearest
> power of two
> + * (unsgined 64-bits precision version)
> + * @n: parameter
> + *
> + * round the given value down to the nearest power of two
> + * - the result is undefined when n == 0
> + * - this can be used to initialise global variables from constant data
> + */
> +#define rounddown_pow_of_two_u64(n) \
> +( \
> + __builtin_constant_p(n) ? ( \
> + (1ULL << ilog2(n))) : \
> + __rounddown_pow_of_two_u64(n) \
> + )
> +
> static inline __attribute_const__
> int __order_base_2(unsigned long n)
> {

2023-03-09 10:13:20

by Luís Mendes

[permalink] [raw]
Subject: Re: [PATCH] [RFC] drm/drm_buddy fails to initialize on 32-bit architectures

Hi,

Ping? This is actually a regression.
If there is no one available to work this, maybe I can have a look in
my spare time, in accordance with your suggestion.

Regards,
Luís

On Tue, Jan 3, 2023 at 8:44 AM Christian König <[email protected]> wrote:
>
> Am 25.12.22 um 20:39 schrieb Luís Mendes:
> > Re-sending with the correct linux-kernel mailing list email address.
> > Sorry for the inconvenience.
> >
> > The proposed patch fixes the issue and allows amdgpu to work again on
> > armhf with a AMD RX 550 card, however it may not be the best solution
> > for the issue, as detailed below.
> >
> > include/log2.h defined macros rounddown_pow_of_two(...) and
> > roundup_pow_of_two(...) do not handle 64-bit values on 32-bit
> > architectures (tested on armv9 armhf machine) causing
> > drm_buddy_init(...) to fail on BUG_ON with an underflow on the order
> > value, thus impeding amdgpu to load properly (no GUI).
> >
> > One option is to modify rounddown_pow_of_two(...) to detect if the
> > variable takes 32 bits or less and call __rounddown_pow_of_two_u32(u32
> > n) or if the variable takes more space than 32 bits, then call
> > __rounddown_pow_of_two_u64(u64 n). This would imply renaming
> > __rounddown_pow_of_two(unsigne
> > d long n) to
> > __rounddown_pow_of_two_u32(u32 n) and add a new function
> > __rounddown_pow_of_two_u64(u64 n). This would be the most transparent
> > solution, however there a few complications, and they are:
> > - that the mm subsystem will fail to link on armhf with an undefined
> > reference on __aeabi_uldivmod
> > - there a few drivers that directly call __rounddown_pow_of_two(...)
> > - that other drivers and subsystems generate warnings
> >
> > So this alternate solution was devised which avoids touching existing
> > code paths, and just updates drm_buddy which seems to be the only
> > driver that is failing, however I am not sure if this is the proper
> > way to go. So I would like to get a second opinion on this, by those
> > who know.
> >
> > /include/linux/log2.h
> > /drivers/gpu/drm/drm_buddy.c
> >
> > Signed-off-by: Luís Mendes <[email protected]>
> >> 8------------------------------------------------------8<
> > diff -uprN linux-next/drivers/gpu/drm/drm_buddy.c
> > linux-nextLM/drivers/gpu/drm/drm_buddy.c
> > --- linux-next/drivers/gpu/drm/drm_buddy.c 2022-12-25
> > 16:29:26.000000000 +0000
> > +++ linux-nextLM/drivers/gpu/drm/drm_buddy.c 2022-12-25
> > 17:04:32.136007116 +0000
> > @@ -128,7 +128,7 @@ int drm_buddy_init(struct drm_buddy *mm,
> > unsigned int order;
> > u64 root_size;
> >
> > - root_size = rounddown_pow_of_two(size);
> > + root_size = rounddown_pow_of_two_u64(size);
> > order = ilog2(root_size) - ilog2(chunk_size);
>
> I think this can be handled much easier if keep around the root_order
> instead of the root_size in the first place.
>
> Cause ilog2() does the right thing even for non power of two values and
> so we just need the order for the offset subtraction below.
>
> Arun can you take a closer look at this?
>
> Regards,
> Christian.
>
> >
> > root = drm_block_alloc(mm, NULL, order, offset);
> > diff -uprN linux-next/include/linux/log2.h linux-nextLM/include/linux/log2.h
> > --- linux-next/include/linux/log2.h 2022-12-25 16:29:29.000000000 +0000
> > +++ linux-nextLM/include/linux/log2.h 2022-12-25 17:00:34.319901492 +0000
> > @@ -58,6 +58,18 @@ unsigned long __roundup_pow_of_two(unsig
> > }
> >
> > /**
> > + * __roundup_pow_of_two_u64() - round up to nearest power of two
> > + * (unsgined 64-bits precision version)
> > + * @n: value to round up
> > + */
> > +static inline __attribute__((const))
> > +u64 __roundup_pow_of_two_u64(u64 n)
> > +{
> > + return 1ULL << fls64(n - 1);
> > +}
> > +
> > +
> > +/**
> > * __rounddown_pow_of_two() - round down to nearest power of two
> > * @n: value to round down
> > */
> > @@ -68,6 +80,17 @@ unsigned long __rounddown_pow_of_two(uns
> > }
> >
> > /**
> > + * __rounddown_pow_of_two_u64() - round down to nearest power of two
> > + * (unsgined 64-bits precision version)
> > + * @n: value to round down
> > + */
> > +static inline __attribute__((const))
> > +u64 __rounddown_pow_of_two_u64(u64 n)
> > +{
> > + return 1ULL << (fls64(n) - 1);
> > +}
> > +
> > +/**
> > * const_ilog2 - log base 2 of 32-bit or a 64-bit constant unsigned value
> > * @n: parameter
> > *
> > @@ -163,6 +186,7 @@ unsigned long __rounddown_pow_of_two(uns
> > __ilog2_u64(n) \
> > )
> >
> > +
> > /**
> > * roundup_pow_of_two - round the given value up to nearest power of two
> > * @n: parameter
> > @@ -181,6 +205,25 @@ unsigned long __rounddown_pow_of_two(uns
> > )
> >
> > /**
> > + * roundup_pow_of_two_u64 - round the given value up to nearest power of two
> > + * (unsgined 64-bits precision version)
> > + * @n: parameter
> > + *
> > + * round the given value up to the nearest power of two
> > + * - the result is undefined when n == 0
> > + * - this can be used to initialise global variables from constant data
> > + */
> > +#define roundup_pow_of_two_u64(n) \
> > +( \
> > + __builtin_constant_p(n) ? ( \
> > + ((n) == 1) ? 1 : \
> > + (1ULL << (ilog2((n) - 1) + 1)) \
> > + ) : \
> > + __roundup_pow_of_two_u64(n) \
> > + )
> > +
> > +
> > +/**
> > * rounddown_pow_of_two - round the given value down to nearest power of two
> > * @n: parameter
> > *
> > @@ -195,6 +238,22 @@ unsigned long __rounddown_pow_of_two(uns
> > __rounddown_pow_of_two(n) \
> > )
> >
> > +/**
> > + * rounddown_pow_of_two_u64 - round the given value down to nearest
> > power of two
> > + * (unsgined 64-bits precision version)
> > + * @n: parameter
> > + *
> > + * round the given value down to the nearest power of two
> > + * - the result is undefined when n == 0
> > + * - this can be used to initialise global variables from constant data
> > + */
> > +#define rounddown_pow_of_two_u64(n) \
> > +( \
> > + __builtin_constant_p(n) ? ( \
> > + (1ULL << ilog2(n))) : \
> > + __rounddown_pow_of_two_u64(n) \
> > + )
> > +
> > static inline __attribute_const__
> > int __order_base_2(unsigned long n)
> > {
>

Subject: RE: [PATCH] [RFC] drm/drm_buddy fails to initialize on 32-bit architectures

[AMD Official Use Only - General]

Hi Luis,

Sorry, I missed this one. Give me some time. I will check on it.

Regards,
Arun
-----Original Message-----
From: Luís Mendes <[email protected]>
Sent: Thursday, March 9, 2023 3:43 PM
To: Koenig, Christian <[email protected]>
Cc: [email protected]; amd-gfx list <[email protected]>; Linux Kernel Mailing List <[email protected]>; Paneer Selvam, Arunpravin <[email protected]>
Subject: Re: [PATCH] [RFC] drm/drm_buddy fails to initialize on 32-bit architectures

Hi,

Ping? This is actually a regression.
If there is no one available to work this, maybe I can have a look in my spare time, in accordance with your suggestion.

Regards,
Luís

On Tue, Jan 3, 2023 at 8:44 AM Christian König <[email protected]> wrote:
>
> Am 25.12.22 um 20:39 schrieb Luís Mendes:
> > Re-sending with the correct linux-kernel mailing list email address.
> > Sorry for the inconvenience.
> >
> > The proposed patch fixes the issue and allows amdgpu to work again
> > on armhf with a AMD RX 550 card, however it may not be the best
> > solution for the issue, as detailed below.
> >
> > include/log2.h defined macros rounddown_pow_of_two(...) and
> > roundup_pow_of_two(...) do not handle 64-bit values on 32-bit
> > architectures (tested on armv9 armhf machine) causing
> > drm_buddy_init(...) to fail on BUG_ON with an underflow on the order
> > value, thus impeding amdgpu to load properly (no GUI).
> >
> > One option is to modify rounddown_pow_of_two(...) to detect if the
> > variable takes 32 bits or less and call
> > __rounddown_pow_of_two_u32(u32
> > n) or if the variable takes more space than 32 bits, then call
> > __rounddown_pow_of_two_u64(u64 n). This would imply renaming
> > __rounddown_pow_of_two(unsigne d long n) to
> > __rounddown_pow_of_two_u32(u32 n) and add a new function
> > __rounddown_pow_of_two_u64(u64 n). This would be the most
> > transparent solution, however there a few complications, and they are:
> > - that the mm subsystem will fail to link on armhf with an undefined
> > reference on __aeabi_uldivmod
> > - there a few drivers that directly call __rounddown_pow_of_two(...)
> > - that other drivers and subsystems generate warnings
> >
> > So this alternate solution was devised which avoids touching
> > existing code paths, and just updates drm_buddy which seems to be
> > the only driver that is failing, however I am not sure if this is
> > the proper way to go. So I would like to get a second opinion on
> > this, by those who know.
> >
> > /include/linux/log2.h
> > /drivers/gpu/drm/drm_buddy.c
> >
> > Signed-off-by: Luís Mendes <[email protected]>
> >> 8------------------------------------------------------8<
> > diff -uprN linux-next/drivers/gpu/drm/drm_buddy.c
> > linux-nextLM/drivers/gpu/drm/drm_buddy.c
> > --- linux-next/drivers/gpu/drm/drm_buddy.c 2022-12-25
> > 16:29:26.000000000 +0000
> > +++ linux-nextLM/drivers/gpu/drm/drm_buddy.c 2022-12-25
> > 17:04:32.136007116 +0000
> > @@ -128,7 +128,7 @@ int drm_buddy_init(struct drm_buddy *mm,
> > unsigned int order;
> > u64 root_size;
> >
> > - root_size = rounddown_pow_of_two(size);
> > + root_size = rounddown_pow_of_two_u64(size);
> > order = ilog2(root_size) - ilog2(chunk_size);
>
> I think this can be handled much easier if keep around the root_order
> instead of the root_size in the first place.
>
> Cause ilog2() does the right thing even for non power of two values
> and so we just need the order for the offset subtraction below.
>
> Arun can you take a closer look at this?
>
> Regards,
> Christian.
>
> >
> > root = drm_block_alloc(mm, NULL, order, offset); diff
> > -uprN linux-next/include/linux/log2.h linux-nextLM/include/linux/log2.h
> > --- linux-next/include/linux/log2.h 2022-12-25 16:29:29.000000000 +0000
> > +++ linux-nextLM/include/linux/log2.h 2022-12-25 17:00:34.319901492 +0000
> > @@ -58,6 +58,18 @@ unsigned long __roundup_pow_of_two(unsig
> > }
> >
> > /**
> > + * __roundup_pow_of_two_u64() - round up to nearest power of two
> > + * (unsgined 64-bits precision version)
> > + * @n: value to round up
> > + */
> > +static inline __attribute__((const))
> > +u64 __roundup_pow_of_two_u64(u64 n) {
> > + return 1ULL << fls64(n - 1);
> > +}
> > +
> > +
> > +/**
> > * __rounddown_pow_of_two() - round down to nearest power of two
> > * @n: value to round down
> > */
> > @@ -68,6 +80,17 @@ unsigned long __rounddown_pow_of_two(uns
> > }
> >
> > /**
> > + * __rounddown_pow_of_two_u64() - round down to nearest power of
> > +two
> > + * (unsgined 64-bits precision version)
> > + * @n: value to round down
> > + */
> > +static inline __attribute__((const))
> > +u64 __rounddown_pow_of_two_u64(u64 n) {
> > + return 1ULL << (fls64(n) - 1);
> > +}
> > +
> > +/**
> > * const_ilog2 - log base 2 of 32-bit or a 64-bit constant unsigned value
> > * @n: parameter
> > *
> > @@ -163,6 +186,7 @@ unsigned long __rounddown_pow_of_two(uns
> > __ilog2_u64(n) \
> > )
> >
> > +
> > /**
> > * roundup_pow_of_two - round the given value up to nearest power of two
> > * @n: parameter
> > @@ -181,6 +205,25 @@ unsigned long __rounddown_pow_of_two(uns
> > )
> >
> > /**
> > + * roundup_pow_of_two_u64 - round the given value up to nearest
> > +power of two
> > + * (unsgined 64-bits precision version)
> > + * @n: parameter
> > + *
> > + * round the given value up to the nearest power of two
> > + * - the result is undefined when n == 0
> > + * - this can be used to initialise global variables from constant
> > +data */
> > +#define roundup_pow_of_two_u64(n) \
> > +( \
> > + __builtin_constant_p(n) ? ( \
> > + ((n) == 1) ? 1 : \
> > + (1ULL << (ilog2((n) - 1) + 1)) \
> > + ) : \
> > + __roundup_pow_of_two_u64(n) \
> > + )
> > +
> > +
> > +/**
> > * rounddown_pow_of_two - round the given value down to nearest power of two
> > * @n: parameter
> > *
> > @@ -195,6 +238,22 @@ unsigned long __rounddown_pow_of_two(uns
> > __rounddown_pow_of_two(n) \
> > )
> >
> > +/**
> > + * rounddown_pow_of_two_u64 - round the given value down to nearest
> > power of two
> > + * (unsgined 64-bits precision version)
> > + * @n: parameter
> > + *
> > + * round the given value down to the nearest power of two
> > + * - the result is undefined when n == 0
> > + * - this can be used to initialise global variables from constant
> > +data */
> > +#define rounddown_pow_of_two_u64(n) \
> > +( \
> > + __builtin_constant_p(n) ? ( \
> > + (1ULL << ilog2(n))) : \
> > + __rounddown_pow_of_two_u64(n) \
> > + )
> > +
> > static inline __attribute_const__
> > int __order_base_2(unsigned long n)
> > {
>

Subject: Re: [PATCH] [RFC] drm/drm_buddy fails to initialize on 32-bit architectures



On 3/9/2023 3:42 PM, Luís Mendes wrote:
> Hi,
>
> Ping? This is actually a regression.
> If there is no one available to work this, maybe I can have a look in
> my spare time, in accordance with your suggestion.
>
> Regards,
> Luís
>
> On Tue, Jan 3, 2023 at 8:44 AM Christian König <[email protected]> wrote:
>> Am 25.12.22 um 20:39 schrieb Luís Mendes:
>>> Re-sending with the correct linux-kernel mailing list email address.
>>> Sorry for the inconvenience.
>>>
>>> The proposed patch fixes the issue and allows amdgpu to work again on
>>> armhf with a AMD RX 550 card, however it may not be the best solution
>>> for the issue, as detailed below.
>>>
>>> include/log2.h defined macros rounddown_pow_of_two(...) and
>>> roundup_pow_of_two(...) do not handle 64-bit values on 32-bit
>>> architectures (tested on armv9 armhf machine) causing
>>> drm_buddy_init(...) to fail on BUG_ON with an underflow on the order
>>> value, thus impeding amdgpu to load properly (no GUI).
>>>
>>> One option is to modify rounddown_pow_of_two(...) to detect if the
>>> variable takes 32 bits or less and call __rounddown_pow_of_two_u32(u32
>>> n) or if the variable takes more space than 32 bits, then call
>>> __rounddown_pow_of_two_u64(u64 n). This would imply renaming
>>> __rounddown_pow_of_two(unsigne
>>> d long n) to
>>> __rounddown_pow_of_two_u32(u32 n) and add a new function
>>> __rounddown_pow_of_two_u64(u64 n). This would be the most transparent
>>> solution, however there a few complications, and they are:
>>> - that the mm subsystem will fail to link on armhf with an undefined
>>> reference on __aeabi_uldivmod
>>> - there a few drivers that directly call __rounddown_pow_of_two(...)
>>> - that other drivers and subsystems generate warnings
>>>
>>> So this alternate solution was devised which avoids touching existing
>>> code paths, and just updates drm_buddy which seems to be the only
>>> driver that is failing, however I am not sure if this is the proper
>>> way to go. So I would like to get a second opinion on this, by those
>>> who know.
>>>
>>> /include/linux/log2.h
>>> /drivers/gpu/drm/drm_buddy.c
>>>
>>> Signed-off-by: Luís Mendes <[email protected]>
>>>> 8------------------------------------------------------8<
>>> diff -uprN linux-next/drivers/gpu/drm/drm_buddy.c
>>> linux-nextLM/drivers/gpu/drm/drm_buddy.c
>>> --- linux-next/drivers/gpu/drm/drm_buddy.c 2022-12-25
>>> 16:29:26.000000000 +0000
>>> +++ linux-nextLM/drivers/gpu/drm/drm_buddy.c 2022-12-25
>>> 17:04:32.136007116 +0000
>>> @@ -128,7 +128,7 @@ int drm_buddy_init(struct drm_buddy *mm,
>>> unsigned int order;
>>> u64 root_size;
>>>
>>> - root_size = rounddown_pow_of_two(size);
>>> + root_size = rounddown_pow_of_two_u64(size);
>>> order = ilog2(root_size) - ilog2(chunk_size);
>> I think this can be handled much easier if keep around the root_order
>> instead of the root_size in the first place.
>>
>> Cause ilog2() does the right thing even for non power of two values and
>> so we just need the order for the offset subtraction below.
Could you try with ilog2() and see if you are getting the right value
for size as suggested
by Christian.

Thanks,
Arun
>>
>> Arun can you take a closer look at this?
>>
>> Regards,
>> Christian.
>>
>>> root = drm_block_alloc(mm, NULL, order, offset);
>>> diff -uprN linux-next/include/linux/log2.h linux-nextLM/include/linux/log2.h
>>> --- linux-next/include/linux/log2.h 2022-12-25 16:29:29.000000000 +0000
>>> +++ linux-nextLM/include/linux/log2.h 2022-12-25 17:00:34.319901492 +0000
>>> @@ -58,6 +58,18 @@ unsigned long __roundup_pow_of_two(unsig
>>> }
>>>
>>> /**
>>> + * __roundup_pow_of_two_u64() - round up to nearest power of two
>>> + * (unsgined 64-bits precision version)
>>> + * @n: value to round up
>>> + */
>>> +static inline __attribute__((const))
>>> +u64 __roundup_pow_of_two_u64(u64 n)
>>> +{
>>> + return 1ULL << fls64(n - 1);
>>> +}
>>> +
>>> +
>>> +/**
>>> * __rounddown_pow_of_two() - round down to nearest power of two
>>> * @n: value to round down
>>> */
>>> @@ -68,6 +80,17 @@ unsigned long __rounddown_pow_of_two(uns
>>> }
>>>
>>> /**
>>> + * __rounddown_pow_of_two_u64() - round down to nearest power of two
>>> + * (unsgined 64-bits precision version)
>>> + * @n: value to round down
>>> + */
>>> +static inline __attribute__((const))
>>> +u64 __rounddown_pow_of_two_u64(u64 n)
>>> +{
>>> + return 1ULL << (fls64(n) - 1);
>>> +}
>>> +
>>> +/**
>>> * const_ilog2 - log base 2 of 32-bit or a 64-bit constant unsigned value
>>> * @n: parameter
>>> *
>>> @@ -163,6 +186,7 @@ unsigned long __rounddown_pow_of_two(uns
>>> __ilog2_u64(n) \
>>> )
>>>
>>> +
>>> /**
>>> * roundup_pow_of_two - round the given value up to nearest power of two
>>> * @n: parameter
>>> @@ -181,6 +205,25 @@ unsigned long __rounddown_pow_of_two(uns
>>> )
>>>
>>> /**
>>> + * roundup_pow_of_two_u64 - round the given value up to nearest power of two
>>> + * (unsgined 64-bits precision version)
>>> + * @n: parameter
>>> + *
>>> + * round the given value up to the nearest power of two
>>> + * - the result is undefined when n == 0
>>> + * - this can be used to initialise global variables from constant data
>>> + */
>>> +#define roundup_pow_of_two_u64(n) \
>>> +( \
>>> + __builtin_constant_p(n) ? ( \
>>> + ((n) == 1) ? 1 : \
>>> + (1ULL << (ilog2((n) - 1) + 1)) \
>>> + ) : \
>>> + __roundup_pow_of_two_u64(n) \
>>> + )
>>> +
>>> +
>>> +/**
>>> * rounddown_pow_of_two - round the given value down to nearest power of two
>>> * @n: parameter
>>> *
>>> @@ -195,6 +238,22 @@ unsigned long __rounddown_pow_of_two(uns
>>> __rounddown_pow_of_two(n) \
>>> )
>>>
>>> +/**
>>> + * rounddown_pow_of_two_u64 - round the given value down to nearest
>>> power of two
>>> + * (unsgined 64-bits precision version)
>>> + * @n: parameter
>>> + *
>>> + * round the given value down to the nearest power of two
>>> + * - the result is undefined when n == 0
>>> + * - this can be used to initialise global variables from constant data
>>> + */
>>> +#define rounddown_pow_of_two_u64(n) \
>>> +( \
>>> + __builtin_constant_p(n) ? ( \
>>> + (1ULL << ilog2(n))) : \
>>> + __rounddown_pow_of_two_u64(n) \
>>> + )
>>> +
>>> static inline __attribute_const__
>>> int __order_base_2(unsigned long n)
>>> {


2023-03-15 12:44:18

by Luís Mendes

[permalink] [raw]
Subject: Re: [PATCH] [RFC] drm/drm_buddy fails to initialize on 32-bit architectures

I'll give it a try this weekend.

Luís

On Fri, Mar 10, 2023 at 1:15 PM Arunpravin Paneer Selvam
<[email protected]> wrote:
>
>
>
> On 3/9/2023 3:42 PM, Luís Mendes wrote:
> > Hi,
> >
> > Ping? This is actually a regression.
> > If there is no one available to work this, maybe I can have a look in
> > my spare time, in accordance with your suggestion.
> >
> > Regards,
> > Luís
> >
> > On Tue, Jan 3, 2023 at 8:44 AM Christian König <[email protected]> wrote:
> >> Am 25.12.22 um 20:39 schrieb Luís Mendes:
> >>> Re-sending with the correct linux-kernel mailing list email address.
> >>> Sorry for the inconvenience.
> >>>
> >>> The proposed patch fixes the issue and allows amdgpu to work again on
> >>> armhf with a AMD RX 550 card, however it may not be the best solution
> >>> for the issue, as detailed below.
> >>>
> >>> include/log2.h defined macros rounddown_pow_of_two(...) and
> >>> roundup_pow_of_two(...) do not handle 64-bit values on 32-bit
> >>> architectures (tested on armv9 armhf machine) causing
> >>> drm_buddy_init(...) to fail on BUG_ON with an underflow on the order
> >>> value, thus impeding amdgpu to load properly (no GUI).
> >>>
> >>> One option is to modify rounddown_pow_of_two(...) to detect if the
> >>> variable takes 32 bits or less and call __rounddown_pow_of_two_u32(u32
> >>> n) or if the variable takes more space than 32 bits, then call
> >>> __rounddown_pow_of_two_u64(u64 n). This would imply renaming
> >>> __rounddown_pow_of_two(unsigne
> >>> d long n) to
> >>> __rounddown_pow_of_two_u32(u32 n) and add a new function
> >>> __rounddown_pow_of_two_u64(u64 n). This would be the most transparent
> >>> solution, however there a few complications, and they are:
> >>> - that the mm subsystem will fail to link on armhf with an undefined
> >>> reference on __aeabi_uldivmod
> >>> - there a few drivers that directly call __rounddown_pow_of_two(...)
> >>> - that other drivers and subsystems generate warnings
> >>>
> >>> So this alternate solution was devised which avoids touching existing
> >>> code paths, and just updates drm_buddy which seems to be the only
> >>> driver that is failing, however I am not sure if this is the proper
> >>> way to go. So I would like to get a second opinion on this, by those
> >>> who know.
> >>>
> >>> /include/linux/log2.h
> >>> /drivers/gpu/drm/drm_buddy.c
> >>>
> >>> Signed-off-by: Luís Mendes <[email protected]>
> >>>> 8------------------------------------------------------8<
> >>> diff -uprN linux-next/drivers/gpu/drm/drm_buddy.c
> >>> linux-nextLM/drivers/gpu/drm/drm_buddy.c
> >>> --- linux-next/drivers/gpu/drm/drm_buddy.c 2022-12-25
> >>> 16:29:26.000000000 +0000
> >>> +++ linux-nextLM/drivers/gpu/drm/drm_buddy.c 2022-12-25
> >>> 17:04:32.136007116 +0000
> >>> @@ -128,7 +128,7 @@ int drm_buddy_init(struct drm_buddy *mm,
> >>> unsigned int order;
> >>> u64 root_size;
> >>>
> >>> - root_size = rounddown_pow_of_two(size);
> >>> + root_size = rounddown_pow_of_two_u64(size);
> >>> order = ilog2(root_size) - ilog2(chunk_size);
> >> I think this can be handled much easier if keep around the root_order
> >> instead of the root_size in the first place.
> >>
> >> Cause ilog2() does the right thing even for non power of two values and
> >> so we just need the order for the offset subtraction below.
> Could you try with ilog2() and see if you are getting the right value
> for size as suggested
> by Christian.
>
> Thanks,
> Arun
> >>
> >> Arun can you take a closer look at this?
> >>
> >> Regards,
> >> Christian.
> >>
> >>> root = drm_block_alloc(mm, NULL, order, offset);
> >>> diff -uprN linux-next/include/linux/log2.h linux-nextLM/include/linux/log2.h
> >>> --- linux-next/include/linux/log2.h 2022-12-25 16:29:29.000000000 +0000
> >>> +++ linux-nextLM/include/linux/log2.h 2022-12-25 17:00:34.319901492 +0000
> >>> @@ -58,6 +58,18 @@ unsigned long __roundup_pow_of_two(unsig
> >>> }
> >>>
> >>> /**
> >>> + * __roundup_pow_of_two_u64() - round up to nearest power of two
> >>> + * (unsgined 64-bits precision version)
> >>> + * @n: value to round up
> >>> + */
> >>> +static inline __attribute__((const))
> >>> +u64 __roundup_pow_of_two_u64(u64 n)
> >>> +{
> >>> + return 1ULL << fls64(n - 1);
> >>> +}
> >>> +
> >>> +
> >>> +/**
> >>> * __rounddown_pow_of_two() - round down to nearest power of two
> >>> * @n: value to round down
> >>> */
> >>> @@ -68,6 +80,17 @@ unsigned long __rounddown_pow_of_two(uns
> >>> }
> >>>
> >>> /**
> >>> + * __rounddown_pow_of_two_u64() - round down to nearest power of two
> >>> + * (unsgined 64-bits precision version)
> >>> + * @n: value to round down
> >>> + */
> >>> +static inline __attribute__((const))
> >>> +u64 __rounddown_pow_of_two_u64(u64 n)
> >>> +{
> >>> + return 1ULL << (fls64(n) - 1);
> >>> +}
> >>> +
> >>> +/**
> >>> * const_ilog2 - log base 2 of 32-bit or a 64-bit constant unsigned value
> >>> * @n: parameter
> >>> *
> >>> @@ -163,6 +186,7 @@ unsigned long __rounddown_pow_of_two(uns
> >>> __ilog2_u64(n) \
> >>> )
> >>>
> >>> +
> >>> /**
> >>> * roundup_pow_of_two - round the given value up to nearest power of two
> >>> * @n: parameter
> >>> @@ -181,6 +205,25 @@ unsigned long __rounddown_pow_of_two(uns
> >>> )
> >>>
> >>> /**
> >>> + * roundup_pow_of_two_u64 - round the given value up to nearest power of two
> >>> + * (unsgined 64-bits precision version)
> >>> + * @n: parameter
> >>> + *
> >>> + * round the given value up to the nearest power of two
> >>> + * - the result is undefined when n == 0
> >>> + * - this can be used to initialise global variables from constant data
> >>> + */
> >>> +#define roundup_pow_of_two_u64(n) \
> >>> +( \
> >>> + __builtin_constant_p(n) ? ( \
> >>> + ((n) == 1) ? 1 : \
> >>> + (1ULL << (ilog2((n) - 1) + 1)) \
> >>> + ) : \
> >>> + __roundup_pow_of_two_u64(n) \
> >>> + )
> >>> +
> >>> +
> >>> +/**
> >>> * rounddown_pow_of_two - round the given value down to nearest power of two
> >>> * @n: parameter
> >>> *
> >>> @@ -195,6 +238,22 @@ unsigned long __rounddown_pow_of_two(uns
> >>> __rounddown_pow_of_two(n) \
> >>> )
> >>>
> >>> +/**
> >>> + * rounddown_pow_of_two_u64 - round the given value down to nearest
> >>> power of two
> >>> + * (unsgined 64-bits precision version)
> >>> + * @n: parameter
> >>> + *
> >>> + * round the given value down to the nearest power of two
> >>> + * - the result is undefined when n == 0
> >>> + * - this can be used to initialise global variables from constant data
> >>> + */
> >>> +#define rounddown_pow_of_two_u64(n) \
> >>> +( \
> >>> + __builtin_constant_p(n) ? ( \
> >>> + (1ULL << ilog2(n))) : \
> >>> + __rounddown_pow_of_two_u64(n) \
> >>> + )
> >>> +
> >>> static inline __attribute_const__
> >>> int __order_base_2(unsigned long n)
> >>> {
>