MIME-Version: 1.0
References: <CAEzXK1oghXAB_KpKpm=-CviDQbNaH0qfgYTSSjZgvvyj4U78AA@mail.gmail.com>
 <5e192700-c54a-04cf-a223-281af7af0457@amd.com> <CAEzXK1qPjYxDiRVTy8McXbagC8JKgtu+C0E+-VP3NHZvtYdZ+g@mail.gmail.com>
 <ce410155-1052-cd0a-60bf-2807e6376ddd@amd.com>
In-Reply-To: <ce410155-1052-cd0a-60bf-2807e6376ddd@amd.com>
From:   =?UTF-8?B?THXDrXMgTWVuZGVz?= <luis.p.mendes@gmail.com>
Date:   Wed, 15 Mar 2023 12:42:26 +0000
Message-ID: <CAEzXK1qOPCDevS2KgpySZnrwSLeX20fFZBv-i+O-xTWwV8-i9w@mail.gmail.com>
Subject: Re: [PATCH] [RFC] drm/drm_buddy fails to initialize on 32-bit architectures
To:     Arunpravin Paneer Selvam <arunpravin.paneerselvam@amd.com>
Cc:     =?UTF-8?Q?Christian_K=C3=B6nig?= <christian.koenig@amd.com>,
        akpm@linux-foundation.org,
        amd-gfx list <amd-gfx@lists.freedesktop.org>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Precedence: bulk

I'll give it a try this weekend.

Lu=C3=ADs

On Fri, Mar 10, 2023 at 1:15=E2=80=AFPM Arunpravin Paneer Selvam
<arunpravin.paneerselvam@amd.com> wrote:
>
>
>
> On 3/9/2023 3:42 PM, Lu=C3=ADs Mendes wrote:
> > Hi,
> >
> > Ping? This is actually a regression.
> > If there is no one available to work this, maybe I can have a look in
> > my spare time, in accordance with your suggestion.
> >
> > Regards,
> > Lu=C3=ADs
> >
> > On Tue, Jan 3, 2023 at 8:44=E2=80=AFAM Christian K=C3=B6nig <christian.=
koenig@amd.com> wrote:
> >> Am 25.12.22 um 20:39 schrieb Lu=C3=ADs Mendes:
> >>> Re-sending with the correct  linux-kernel mailing list email address.
> >>> Sorry for the inconvenience.
> >>>
> >>> The proposed patch fixes the issue and allows amdgpu to work again on
> >>> armhf with a AMD RX 550 card, however it may not be the best solution
> >>> for the issue, as detailed below.
> >>>
> >>> include/log2.h defined macros rounddown_pow_of_two(...) and
> >>> roundup_pow_of_two(...) do not handle 64-bit values on 32-bit
> >>> architectures (tested on armv9 armhf machine) causing
> >>> drm_buddy_init(...) to fail on BUG_ON with an underflow on the order
> >>> value, thus impeding amdgpu to load properly (no GUI).
> >>>
> >>> One option is to modify rounddown_pow_of_two(...) to detect if the
> >>> variable takes 32 bits or less and call __rounddown_pow_of_two_u32(u3=
2
> >>> n) or if the variable takes more space than 32 bits, then call
> >>> __rounddown_pow_of_two_u64(u64 n). This would imply renaming
> >>> __rounddown_pow_of_two(unsigne
> >>> d long n) to
> >>> __rounddown_pow_of_two_u32(u32 n) and add a new function
> >>> __rounddown_pow_of_two_u64(u64 n). This would be the most transparent
> >>> solution, however there a few complications, and they are:
> >>> - that the mm subsystem will fail to link on armhf with an undefined
> >>> reference on __aeabi_uldivmod
> >>> - there a few drivers that directly call __rounddown_pow_of_two(...)
> >>> - that other drivers and subsystems generate warnings
> >>>
> >>> So this alternate solution was devised which avoids touching existing
> >>> code paths, and just updates drm_buddy which seems to be the only
> >>> driver that is failing, however I am not sure if this is the proper
> >>> way to go. So I would like to get a second opinion on this, by those
> >>> who know.
> >>>
> >>> /include/linux/log2.h
> >>> /drivers/gpu/drm/drm_buddy.c
> >>>
> >>> Signed-off-by: Lu=C3=ADs Mendes <luis.p.mendes@gmail.com>
> >>>> 8------------------------------------------------------8<
> >>> diff -uprN linux-next/drivers/gpu/drm/drm_buddy.c
> >>> linux-nextLM/drivers/gpu/drm/drm_buddy.c
> >>> --- linux-next/drivers/gpu/drm/drm_buddy.c    2022-12-25
> >>> 16:29:26.000000000 +0000
> >>> +++ linux-nextLM/drivers/gpu/drm/drm_buddy.c    2022-12-25
> >>> 17:04:32.136007116 +0000
> >>> @@ -128,7 +128,7 @@ int drm_buddy_init(struct drm_buddy *mm,
> >>>            unsigned int order;
> >>>            u64 root_size;
> >>>
> >>> -        root_size =3D rounddown_pow_of_two(size);
> >>> +        root_size =3D rounddown_pow_of_two_u64(size);
> >>>            order =3D ilog2(root_size) - ilog2(chunk_size);
> >> I think this can be handled much easier if keep around the root_order
> >> instead of the root_size in the first place.
> >>
> >> Cause ilog2() does the right thing even for non power of two values an=
d
> >> so we just need the order for the offset subtraction below.
> Could you try with ilog2() and see if you are getting the right value
> for size as suggested
> by Christian.
>
> Thanks,
> Arun
> >>
> >> Arun can you take a closer look at this?
> >>
> >> Regards,
> >> Christian.
> >>
> >>>            root =3D drm_block_alloc(mm, NULL, order, offset);
> >>> diff -uprN linux-next/include/linux/log2.h linux-nextLM/include/linux=
/log2.h
> >>> --- linux-next/include/linux/log2.h    2022-12-25 16:29:29.000000000 =
+0000
> >>> +++ linux-nextLM/include/linux/log2.h    2022-12-25 17:00:34.31990149=
2 +0000
> >>> @@ -58,6 +58,18 @@ unsigned long __roundup_pow_of_two(unsig
> >>>    }
> >>>
> >>>    /**
> >>> + * __roundup_pow_of_two_u64() - round up to nearest power of two
> >>> + * (unsgined 64-bits precision version)
> >>> + * @n: value to round up
> >>> + */
> >>> +static inline __attribute__((const))
> >>> +u64 __roundup_pow_of_two_u64(u64 n)
> >>> +{
> >>> +    return 1ULL << fls64(n - 1);
> >>> +}
> >>> +
> >>> +
> >>> +/**
> >>>     * __rounddown_pow_of_two() - round down to nearest power of two
> >>>     * @n: value to round down
> >>>     */
> >>> @@ -68,6 +80,17 @@ unsigned long __rounddown_pow_of_two(uns
> >>>    }
> >>>
> >>>    /**
> >>> + * __rounddown_pow_of_two_u64() - round down to nearest power of two
> >>> + * (unsgined 64-bits precision version)
> >>> + * @n: value to round down
> >>> + */
> >>> +static inline __attribute__((const))
> >>> +u64 __rounddown_pow_of_two_u64(u64 n)
> >>> +{
> >>> +    return 1ULL << (fls64(n) - 1);
> >>> +}
> >>> +
> >>> +/**
> >>>     * const_ilog2 - log base 2 of 32-bit or a 64-bit constant unsigne=
d value
> >>>     * @n: parameter
> >>>     *
> >>> @@ -163,6 +186,7 @@ unsigned long __rounddown_pow_of_two(uns
> >>>        __ilog2_u64(n)            \
> >>>     )
> >>>
> >>> +
> >>>    /**
> >>>     * roundup_pow_of_two - round the given value up to nearest power =
of two
> >>>     * @n: parameter
> >>> @@ -181,6 +205,25 @@ unsigned long __rounddown_pow_of_two(uns
> >>>     )
> >>>
> >>>    /**
> >>> + * roundup_pow_of_two_u64 - round the given value up to nearest powe=
r of two
> >>> + * (unsgined 64-bits precision version)
> >>> + * @n: parameter
> >>> + *
> >>> + * round the given value up to the nearest power of two
> >>> + * - the result is undefined when n =3D=3D 0
> >>> + * - this can be used to initialise global variables from constant d=
ata
> >>> + */
> >>> +#define roundup_pow_of_two_u64(n)            \
> >>> +(                        \
> >>> +    __builtin_constant_p(n) ? (        \
> >>> +        ((n) =3D=3D 1) ? 1 :        \
> >>> +        (1ULL << (ilog2((n) - 1) + 1))    \
> >>> +                   ) :        \
> >>> +    __roundup_pow_of_two_u64(n)            \
> >>> + )
> >>> +
> >>> +
> >>> +/**
> >>>     * rounddown_pow_of_two - round the given value down to nearest po=
wer of two
> >>>     * @n: parameter
> >>>     *
> >>> @@ -195,6 +238,22 @@ unsigned long __rounddown_pow_of_two(uns
> >>>        __rounddown_pow_of_two(n)        \
> >>>     )
> >>>
> >>> +/**
> >>> + * rounddown_pow_of_two_u64 - round the given value down to nearest
> >>> power of two
> >>> + * (unsgined 64-bits precision version)
> >>> + * @n: parameter
> >>> + *
> >>> + * round the given value down to the nearest power of two
> >>> + * - the result is undefined when n =3D=3D 0
> >>> + * - this can be used to initialise global variables from constant d=
ata
> >>> + */
> >>> +#define rounddown_pow_of_two_u64(n)            \
> >>> +(                        \
> >>> +    __builtin_constant_p(n) ? (        \
> >>> +        (1ULL << ilog2(n))) :        \
> >>> +    __rounddown_pow_of_two_u64(n)        \
> >>> + )
> >>> +
> >>>    static inline __attribute_const__
> >>>    int __order_base_2(unsigned long n)
> >>>    {
>