2014-12-22 11:11:39

by Oded Gabbay

[permalink] [raw]
Subject: [PATCH] drm/radeon: Try to init amdkfd only if 64 bit kernel

amdkfd driver can be compiled only in 64-bit kernel. Therefore, there is no
point in trying to initialize amdkfd in 32-bit kernel.

In addition, in case of specific configuration of 32-bit kernel, no modules and
random kernel base, the symbol_request function doesn't work as expected - It
doesn't return NULL if the symbol doesn't exists. That makes the kernel panic.
Therefore, the as amdkfd doesn't compile in 32-bit kernel, the best way is just
to return false immediately.

Signed-off-by: Oded Gabbay <[email protected]>
---
drivers/gpu/drm/radeon/radeon_kfd.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c b/drivers/gpu/drm/radeon/radeon_kfd.c
index 242fd8b..cb77e5c 100644
--- a/drivers/gpu/drm/radeon/radeon_kfd.c
+++ b/drivers/gpu/drm/radeon/radeon_kfd.c
@@ -101,6 +101,7 @@ static const struct kgd2kfd_calls *kgd2kfd;

bool radeon_kfd_init(void)
{
+#ifdef CONFIG_X86_64
bool (*kgd2kfd_init_p)(unsigned, const struct kfd2kgd_calls*,
const struct kgd2kfd_calls**);

@@ -117,6 +118,9 @@ bool radeon_kfd_init(void)
}

return true;
+#else
+ return false;
+#endif
}

void radeon_kfd_fini(void)
--
1.9.1


2014-12-22 16:58:56

by Alex Deucher

[permalink] [raw]
Subject: Re: [PATCH] drm/radeon: Try to init amdkfd only if 64 bit kernel

On Mon, Dec 22, 2014 at 6:11 AM, Oded Gabbay <[email protected]> wrote:
> amdkfd driver can be compiled only in 64-bit kernel. Therefore, there is no
> point in trying to initialize amdkfd in 32-bit kernel.
>
> In addition, in case of specific configuration of 32-bit kernel, no modules and
> random kernel base, the symbol_request function doesn't work as expected - It
> doesn't return NULL if the symbol doesn't exists. That makes the kernel panic.
> Therefore, the as amdkfd doesn't compile in 32-bit kernel, the best way is just
> to return false immediately.
>
> Signed-off-by: Oded Gabbay <[email protected]>

Reviewed-by: Alex Deucher <[email protected]>

> ---
> drivers/gpu/drm/radeon/radeon_kfd.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c b/drivers/gpu/drm/radeon/radeon_kfd.c
> index 242fd8b..cb77e5c 100644
> --- a/drivers/gpu/drm/radeon/radeon_kfd.c
> +++ b/drivers/gpu/drm/radeon/radeon_kfd.c
> @@ -101,6 +101,7 @@ static const struct kgd2kfd_calls *kgd2kfd;
>
> bool radeon_kfd_init(void)
> {
> +#ifdef CONFIG_X86_64
> bool (*kgd2kfd_init_p)(unsigned, const struct kfd2kgd_calls*,
> const struct kgd2kfd_calls**);
>
> @@ -117,6 +118,9 @@ bool radeon_kfd_init(void)
> }
>
> return true;
> +#else
> + return false;
> +#endif
> }
>
> void radeon_kfd_fini(void)
> --
> 1.9.1
>
> _______________________________________________
> dri-devel mailing list
> [email protected]
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

2014-12-22 18:50:00

by Andi Kleen

[permalink] [raw]
Subject: Re: [LKP] [PATCH] drm/radeon: Try to init amdkfd only if 64 bit kernel

On Mon, Dec 22, 2014 at 11:58:43AM -0500, Alex Deucher wrote:
> On Mon, Dec 22, 2014 at 6:11 AM, Oded Gabbay <[email protected]> wrote:
> > amdkfd driver can be compiled only in 64-bit kernel. Therefore, there is no
> > point in trying to initialize amdkfd in 32-bit kernel.
> >
> > In addition, in case of specific configuration of 32-bit kernel, no modules and
> > random kernel base, the symbol_request function doesn't work as expected - It
> > doesn't return NULL if the symbol doesn't exists. That makes the kernel panic.
> > Therefore, the as amdkfd doesn't compile in 32-bit kernel, the best way is just
> > to return false immediately.
> >
> > Signed-off-by: Oded Gabbay <[email protected]>
>
> Reviewed-by: Alex Deucher <[email protected]>

Sorry but the patch is just bogus. X-bit only code is usually
a very bad sign for the code. This is not windows programing after all.

Even if you wanted to do a 64bit only driver -- which
you probably don't -- the standard way would be to exclude
it in Kconfig.

Please root-cause why symbol_request doesn't work on 32bit
and fix it properly.

+rusty.

-Andi

2014-12-22 19:00:43

by Andi Kleen

[permalink] [raw]
Subject: Re: [LKP] [PATCH] drm/radeon: Try to init amdkfd only if 64 bit kernel

On Mon, Dec 22, 2014 at 10:49:40AM -0800, Andi Kleen wrote:
> On Mon, Dec 22, 2014 at 11:58:43AM -0500, Alex Deucher wrote:
> > On Mon, Dec 22, 2014 at 6:11 AM, Oded Gabbay <[email protected]> wrote:
> > > amdkfd driver can be compiled only in 64-bit kernel. Therefore, there is no
> > > point in trying to initialize amdkfd in 32-bit kernel.
> > >
> > > In addition, in case of specific configuration of 32-bit kernel, no modules and
> > > random kernel base, the symbol_request function doesn't work as expected - It
> > > doesn't return NULL if the symbol doesn't exists. That makes the kernel panic.
> > > Therefore, the as amdkfd doesn't compile in 32-bit kernel, the best way is just
> > > to return false immediately.
> > >
> > > Signed-off-by: Oded Gabbay <[email protected]>
> >
> > Reviewed-by: Alex Deucher <[email protected]>
>
> Sorry but the patch is just bogus. X-bit only code is usually
> a very bad sign for the code. This is not windows programing after all.
>
> Even if you wanted to do a 64bit only driver -- which
> you probably don't -- the standard way would be to exclude
> it in Kconfig.
>
> Please root-cause why symbol_request doesn't work on 32bit
> and fix it properly.
>
> +rusty.

And also with correct email.

-Andi

2014-12-22 19:19:15

by Oded Gabbay

[permalink] [raw]
Subject: Re: [LKP] [PATCH] drm/radeon: Try to init amdkfd only if 64 bit kernel



On 12/22/2014 09:00 PM, Andi Kleen wrote:
> On Mon, Dec 22, 2014 at 10:49:40AM -0800, Andi Kleen wrote:
>> On Mon, Dec 22, 2014 at 11:58:43AM -0500, Alex Deucher wrote:
>>> On Mon, Dec 22, 2014 at 6:11 AM, Oded Gabbay <[email protected]> wrote:
>>>> amdkfd driver can be compiled only in 64-bit kernel. Therefore, there is no
>>>> point in trying to initialize amdkfd in 32-bit kernel.
>>>>
>>>> In addition, in case of specific configuration of 32-bit kernel, no modules and
>>>> random kernel base, the symbol_request function doesn't work as expected - It
>>>> doesn't return NULL if the symbol doesn't exists. That makes the kernel panic.
>>>> Therefore, the as amdkfd doesn't compile in 32-bit kernel, the best way is just
>>>> to return false immediately.
>>>>
>>>> Signed-off-by: Oded Gabbay <[email protected]>
>>>
>>> Reviewed-by: Alex Deucher <[email protected]>
>>
>> Sorry but the patch is just bogus. X-bit only code is usually
>> a very bad sign for the code. This is not windows programing after all.
Hi Andi,

Strange, I have never programmed for Windows in my life (except maybe in a
few courses during my degree) :)
>>
>> Even if you wanted to do a 64bit only driver -- which
>> you probably don't -- the standard way would be to exclude
>> it in Kconfig.
So amdkfd actually *only* supports 64bit user processes, because AMD's HSA
stack on Linux supports *only* 64bit user processes. So, yes, I definitely
want to do a 64bit only driver.
If you look at kfd_open(), it fails the open of /dev/kfd if the process is
32bit.
In addition, in Kconfig of amdkfd, it is written:
"depends on DRM_RADEON && AMD_IOMMU_V2 && X86_64"

The problem here is that there is code in radeon, which is a driver that can
compile in 32bit, which tries to load amdkfd. I didn't see a point in trying
to load a driver which can't be compiled in 32bit.

>>
>> Please root-cause why symbol_request doesn't work on 32bit
>> and fix it properly.
I didn't say it doesn't always work.
The actual thing that doesn't work is the define symbol_get and only in a
specific case of 32bit kernel AND CONFIG_MODULES is unset AND
CONFIG_RANDOMIZE_BASE is set.
The define in that case is:
#define symbol_get(x) ({ extern typeof(x) x __attribute__((weak)); &(x); })

Why it doesn't work (doesn't return NULL when symbol doesn't exists) ?
I don't know, probably because of some elf/makefile/c language magic. I'm
not that big of an expert on those issues, and I wanted to provide a fix for
this problem during the -rc stages. If someone can help me solving the root
cause, I would be more than happy.

Oded
>>
>> +rusty.
>
> And also with correct email.
>
> -Andi
>

2014-12-23 23:10:07

by Rusty Russell

[permalink] [raw]
Subject: Re: [LKP] [PATCH] drm/radeon: Try to init amdkfd only if 64 bit kernel

Oded Gabbay <[email protected]> writes:
> I didn't say it doesn't always work.
> The actual thing that doesn't work is the define symbol_get and only in a
> specific case of 32bit kernel AND CONFIG_MODULES is unset AND
> CONFIG_RANDOMIZE_BASE is set.
> The define in that case is:
> #define symbol_get(x) ({ extern typeof(x) x __attribute__((weak)); &(x); })
>
> Why it doesn't work (doesn't return NULL when symbol doesn't exists) ?

Hmm, I'd guess CONFIG_RANDOMIZE_BASE is relocating NULL symbols...

No, I can't reproduce this. Please send your .config privately.

Here's my test case:

diff --git a/init/main.c b/init/main.c
index 61b993767db5..a3ee1ec97ec3 100644
--- a/init/main.c
+++ b/init/main.c
@@ -683,6 +683,12 @@ asmlinkage __visible void __init start_kernel(void)

ftrace_init();

+ {
+ extern void nonexistent_fn(void);
+ printk("symbol_get(nonexistent_fn) = %p\n",
+ symbol_get(nonexistent_fn));
+ }
+
/* Do the rest non-__init'ed, we're now alive */
rest_init();
}

Thanks,
Rusty.

2014-12-24 09:23:20

by Oded Gabbay

[permalink] [raw]
Subject: Re: [LKP] [PATCH] drm/radeon: Try to init amdkfd only if 64 bit kernel



On 12/24/2014 01:01 AM, Rusty Russell wrote:
> Oded Gabbay <[email protected]> writes:
>> I didn't say it doesn't always work.
>> The actual thing that doesn't work is the define symbol_get and only in a
>> specific case of 32bit kernel AND CONFIG_MODULES is unset AND
>> CONFIG_RANDOMIZE_BASE is set.
>> The define in that case is:
>> #define symbol_get(x) ({ extern typeof(x) x __attribute__((weak)); &(x); })
>>
>> Why it doesn't work (doesn't return NULL when symbol doesn't exists) ?
>
> Hmm, I'd guess CONFIG_RANDOMIZE_BASE is relocating NULL symbols...
>
> No, I can't reproduce this. Please send your .config privately.
>
> Here's my test case:
>
> diff --git a/init/main.c b/init/main.c
> index 61b993767db5..a3ee1ec97ec3 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -683,6 +683,12 @@ asmlinkage __visible void __init start_kernel(void)
>
> ftrace_init();
>
> + {
> + extern void nonexistent_fn(void);
> + printk("symbol_get(nonexistent_fn) = %p\n",
> + symbol_get(nonexistent_fn));
> + }
> +
> /* Do the rest non-__init'ed, we're now alive */
> rest_init();
> }
>
> Thanks,
> Rusty.
>
Hi Rusty,

Attached is the bad config file. (config-bad)
I have narrowed the changes you need to do to the config file in order to
reproduce this bug.
The base assumption is a 32-bit kernel and without modules support. Rest of the
config file is pretty standard, IMO.
Then, its not enough to enable CONFIG_RANDOMIZE_BASE like I wrote in my original
post. You need also to unset CONFIG_HIBERNATION.

If you do only one of the changes above, it isn't sufficient to trigger the problem.

I also attached a config-good file, which is almost the same as config-bad,
except the changes I mentioned above that reproduces the problem.

To quickly reproduce it, I run the following command:
qemu-system-i386 -enable-kvm -kernel arch/x86/boot/bzImage -initrd
~/tmp/test32.cpio.gz

The kernel is 3.19-rc1 from Linus' tree. No additional patches on top of that.

I appreciate your help.

Oded


Attachments:
config-bad (82.38 kB)
config-good (82.45 kB)
Download all attachments

2014-12-25 12:31:49

by Christian König

[permalink] [raw]
Subject: Re: [LKP] [PATCH] drm/radeon: Try to init amdkfd only if 64 bit kernel

Am 22.12.2014 um 20:18 schrieb Oded Gabbay:
>
> On 12/22/2014 09:00 PM, Andi Kleen wrote:
>> On Mon, Dec 22, 2014 at 10:49:40AM -0800, Andi Kleen wrote:
>>> On Mon, Dec 22, 2014 at 11:58:43AM -0500, Alex Deucher wrote:
>>>> On Mon, Dec 22, 2014 at 6:11 AM, Oded Gabbay <[email protected]> wrote:
>>>>> amdkfd driver can be compiled only in 64-bit kernel. Therefore, there is no
>>>>> point in trying to initialize amdkfd in 32-bit kernel.
>>>>>
>>>>> In addition, in case of specific configuration of 32-bit kernel, no modules and
>>>>> random kernel base, the symbol_request function doesn't work as expected - It
>>>>> doesn't return NULL if the symbol doesn't exists. That makes the kernel panic.
>>>>> Therefore, the as amdkfd doesn't compile in 32-bit kernel, the best way is just
>>>>> to return false immediately.
>>>>>
>>>>> Signed-off-by: Oded Gabbay <[email protected]>
>>>> Reviewed-by: Alex Deucher <[email protected]>
>>> Sorry but the patch is just bogus. X-bit only code is usually
>>> a very bad sign for the code. This is not windows programing after all.
> Hi Andi,
>
> Strange, I have never programmed for Windows in my life (except maybe in a
> few courses during my degree) :)
>>> Even if you wanted to do a 64bit only driver -- which
>>> you probably don't -- the standard way would be to exclude
>>> it in Kconfig.
> So amdkfd actually *only* supports 64bit user processes, because AMD's HSA
> stack on Linux supports *only* 64bit user processes. So, yes, I definitely
> want to do a 64bit only driver.
> If you look at kfd_open(), it fails the open of /dev/kfd if the process is
> 32bit.
> In addition, in Kconfig of amdkfd, it is written:
> "depends on DRM_RADEON && AMD_IOMMU_V2 && X86_64"
>
> The problem here is that there is code in radeon, which is a driver that can
> compile in 32bit, which tries to load amdkfd. I didn't see a point in trying
> to load a driver which can't be compiled in 32bit.

Well in this case couldn't we make the code in radeon depend on whether
or not the KFD driver is compiled in or not instead of checking the
system architecture?

Regards,
Christian.

>
>>> Please root-cause why symbol_request doesn't work on 32bit
>>> and fix it properly.
> I didn't say it doesn't always work.
> The actual thing that doesn't work is the define symbol_get and only in a
> specific case of 32bit kernel AND CONFIG_MODULES is unset AND
> CONFIG_RANDOMIZE_BASE is set.
> The define in that case is:
> #define symbol_get(x) ({ extern typeof(x) x __attribute__((weak)); &(x); })
>
> Why it doesn't work (doesn't return NULL when symbol doesn't exists) ?
> I don't know, probably because of some elf/makefile/c language magic. I'm
> not that big of an expert on those issues, and I wanted to provide a fix for
> this problem during the -rc stages. If someone can help me solving the root
> cause, I would be more than happy.
>
> Oded
>>> +rusty.
>> And also with correct email.
>>
>> -Andi
>>
> _______________________________________________
> dri-devel mailing list
> [email protected]
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

2014-12-28 09:05:49

by Oded Gabbay

[permalink] [raw]
Subject: Re: [LKP] [PATCH] drm/radeon: Try to init amdkfd only if 64 bit kernel



On 12/25/2014 02:31 PM, Christian König wrote:
> Am 22.12.2014 um 20:18 schrieb Oded Gabbay:
>>
>> On 12/22/2014 09:00 PM, Andi Kleen wrote:
>>> On Mon, Dec 22, 2014 at 10:49:40AM -0800, Andi Kleen wrote:
>>>> On Mon, Dec 22, 2014 at 11:58:43AM -0500, Alex Deucher wrote:
>>>>> On Mon, Dec 22, 2014 at 6:11 AM, Oded Gabbay <[email protected]> wrote:
>>>>>> amdkfd driver can be compiled only in 64-bit kernel. Therefore, there is no
>>>>>> point in trying to initialize amdkfd in 32-bit kernel.
>>>>>>
>>>>>> In addition, in case of specific configuration of 32-bit kernel, no
>>>>>> modules and
>>>>>> random kernel base, the symbol_request function doesn't work as expected - It
>>>>>> doesn't return NULL if the symbol doesn't exists. That makes the kernel
>>>>>> panic.
>>>>>> Therefore, the as amdkfd doesn't compile in 32-bit kernel, the best way is
>>>>>> just
>>>>>> to return false immediately.
>>>>>>
>>>>>> Signed-off-by: Oded Gabbay <[email protected]>
>>>>> Reviewed-by: Alex Deucher <[email protected]>
>>>> Sorry but the patch is just bogus. X-bit only code is usually
>>>> a very bad sign for the code. This is not windows programing after all.
>> Hi Andi,
>>
>> Strange, I have never programmed for Windows in my life (except maybe in a
>> few courses during my degree) :)
>>>> Even if you wanted to do a 64bit only driver -- which
>>>> you probably don't -- the standard way would be to exclude
>>>> it in Kconfig.
>> So amdkfd actually *only* supports 64bit user processes, because AMD's HSA
>> stack on Linux supports *only* 64bit user processes. So, yes, I definitely
>> want to do a 64bit only driver.
>> If you look at kfd_open(), it fails the open of /dev/kfd if the process is
>> 32bit.
>> In addition, in Kconfig of amdkfd, it is written:
>> "depends on DRM_RADEON && AMD_IOMMU_V2 && X86_64"
>>
>> The problem here is that there is code in radeon, which is a driver that can
>> compile in 32bit, which tries to load amdkfd. I didn't see a point in trying
>> to load a driver which can't be compiled in 32bit.
>
> Well in this case couldn't we make the code in radeon depend on whether or not
> the KFD driver is compiled in or not instead of checking the system architecture?
>
> Regards,
> Christian.
>
If we are going down that path, we need something like:

bool radeon_kfd_init(void)
{
#if defined(CONFIG_HSA_AMD_MODULE)
current code (symbol request and call to symbol)
#elif defined(CONFIG_HSA_AMD)
direct call to kgd2kfd_init
#else
return false;
#endif
}

Now, the original concept of the symbol_request call was to prevent writing
something like the above pseudo-code, but because symbol_request is not
currently working in all cases, I think that this is a good band-aid as any.

Oded



>>
>>>> Please root-cause why symbol_request doesn't work on 32bit
>>>> and fix it properly.
>> I didn't say it doesn't always work.
>> The actual thing that doesn't work is the define symbol_get and only in a
>> specific case of 32bit kernel AND CONFIG_MODULES is unset AND
>> CONFIG_RANDOMIZE_BASE is set.
>> The define in that case is:
>> #define symbol_get(x) ({ extern typeof(x) x __attribute__((weak)); &(x); })
>>
>> Why it doesn't work (doesn't return NULL when symbol doesn't exists) ?
>> I don't know, probably because of some elf/makefile/c language magic. I'm
>> not that big of an expert on those issues, and I wanted to provide a fix for
>> this problem during the -rc stages. If someone can help me solving the root
>> cause, I would be more than happy.
>>
>> Oded
>>>> +rusty.
>>> And also with correct email.
>>>
>>> -Andi
>>>
>> _______________________________________________
>> dri-devel mailing list
>> [email protected]
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>