2020-08-29 11:29:25

by Qu Wenruo

[permalink] [raw]
Subject: [PATCH] module: Add more error message for failed kernel module loading

When kernel module loading failed, user space only get one of the
following error messages:
- -ENOEXEC
This is the most confusing one. From corrupted ELF header to bad
WRITE|EXEC flags check introduced by in module_enforce_rwx_sections()
all returns this error number.

- -EPERM
This is for blacklisted modules. But mod doesn't do extra explain
on this error either.

- -ENOMEM
The only error which needs no explain.

This means, if a user got "Exec format error" from modprobe, it provides
no meaningful way for the user to debug, and will take extra time
communicating to get extra info.

So this patch will add extra error messages for -ENOEXEC and -EPERM
errors, allowing user to do better debugging and reporting.

Signed-off-by: Qu Wenruo <[email protected]>
---
kernel/module.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 1c5cff34d9f2..9f748c6eeb48 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2096,8 +2096,12 @@ static int module_enforce_rwx_sections(Elf_Ehdr *hdr, Elf_Shdr *sechdrs,
int i;

for (i = 0; i < hdr->e_shnum; i++) {
- if ((sechdrs[i].sh_flags & shf_wx) == shf_wx)
+ if ((sechdrs[i].sh_flags & shf_wx) == shf_wx) {
+ pr_err(
+ "Module %s section %d has invalid WRITE|EXEC flags\n",
+ mod->name, i);
return -ENOEXEC;
+ }
}

return 0;
@@ -3825,8 +3829,10 @@ static int load_module(struct load_info *info, const char __user *uargs,
char *after_dashes;

err = elf_header_check(info);
- if (err)
+ if (err) {
+ pr_err("Module has invalid ELF header\n");
goto free_copy;
+ }

err = setup_load_info(info, flags);
if (err)
@@ -3834,6 +3840,7 @@ static int load_module(struct load_info *info, const char __user *uargs,

if (blacklisted(info->name)) {
err = -EPERM;
+ pr_err("Module %s is blacklisted\n", info->name);
goto free_copy;
}

--
2.27.0


2020-09-01 18:53:56

by Lucas De Marchi

[permalink] [raw]
Subject: Re: [PATCH] module: Add more error message for failed kernel module loading

On Sat, Aug 29, 2020 at 4:15 AM Qu Wenruo <[email protected]> wrote:
>
> When kernel module loading failed, user space only get one of the
> following error messages:
> - -ENOEXEC
> This is the most confusing one. From corrupted ELF header to bad
> WRITE|EXEC flags check introduced by in module_enforce_rwx_sections()
> all returns this error number.
>
> - -EPERM
> This is for blacklisted modules. But mod doesn't do extra explain
> on this error either.
>
> - -ENOMEM
> The only error which needs no explain.
>
> This means, if a user got "Exec format error" from modprobe, it provides
> no meaningful way for the user to debug, and will take extra time
> communicating to get extra info.
>
> So this patch will add extra error messages for -ENOEXEC and -EPERM
> errors, allowing user to do better debugging and reporting.
>
> Signed-off-by: Qu Wenruo <[email protected]>
> ---
> kernel/module.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/module.c b/kernel/module.c
> index 1c5cff34d9f2..9f748c6eeb48 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -2096,8 +2096,12 @@ static int module_enforce_rwx_sections(Elf_Ehdr *hdr, Elf_Shdr *sechdrs,
> int i;
>
> for (i = 0; i < hdr->e_shnum; i++) {
> - if ((sechdrs[i].sh_flags & shf_wx) == shf_wx)
> + if ((sechdrs[i].sh_flags & shf_wx) == shf_wx) {
> + pr_err(
> + "Module %s section %d has invalid WRITE|EXEC flags\n",
> + mod->name, i);
> return -ENOEXEC;
> + }
> }
>
> return 0;
> @@ -3825,8 +3829,10 @@ static int load_module(struct load_info *info, const char __user *uargs,
> char *after_dashes;
>
> err = elf_header_check(info);
> - if (err)
> + if (err) {
> + pr_err("Module has invalid ELF header\n");
> goto free_copy;
> + }
>
> err = setup_load_info(info, flags);
> if (err)
> @@ -3834,6 +3840,7 @@ static int load_module(struct load_info *info, const char __user *uargs,
>
> if (blacklisted(info->name)) {
> err = -EPERM;
> + pr_err("Module %s is blacklisted\n", info->name);

I wonder why would anyone actually add the blacklist to the command
line like this and have no
way to revert that back. This was introduced in
be7de5f91fdc modules: Add kernel parameter to blacklist modules
as a way to overcome broken initrd generation afaics. Either kernel
command line (using modprobe.blacklist)
or /etc/modprobe.d options are honoured by libkmod and allow a
sufficiently privileged user to bypass it.

+Rusty, +Prarit: is there anything this module parameter is covering
that I'm missing?

For the changes here,

Reviewed-by: Lucas De Marchi <[email protected]>

thanks
Lucas De Marchi

> goto free_copy;
> }
>
> --
> 2.27.0
>

2020-09-01 19:57:49

by Prarit Bhargava

[permalink] [raw]
Subject: Re: [PATCH] module: Add more error message for failed kernel module loading



On 9/1/20 2:50 PM, Lucas De Marchi wrote:
> On Sat, Aug 29, 2020 at 4:15 AM Qu Wenruo <[email protected]> wrote:
>>
>> When kernel module loading failed, user space only get one of the
>> following error messages:
>> - -ENOEXEC
>> This is the most confusing one. From corrupted ELF header to bad
>> WRITE|EXEC flags check introduced by in module_enforce_rwx_sections()
>> all returns this error number.
>>
>> - -EPERM
>> This is for blacklisted modules. But mod doesn't do extra explain
>> on this error either.
>>
>> - -ENOMEM
>> The only error which needs no explain.
>>
>> This means, if a user got "Exec format error" from modprobe, it provides
>> no meaningful way for the user to debug, and will take extra time
>> communicating to get extra info.
>>
>> So this patch will add extra error messages for -ENOEXEC and -EPERM
>> errors, allowing user to do better debugging and reporting.
>>
>> Signed-off-by: Qu Wenruo <[email protected]>
>> ---
>> kernel/module.c | 11 +++++++++--
>> 1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/module.c b/kernel/module.c
>> index 1c5cff34d9f2..9f748c6eeb48 100644
>> --- a/kernel/module.c
>> +++ b/kernel/module.c
>> @@ -2096,8 +2096,12 @@ static int module_enforce_rwx_sections(Elf_Ehdr *hdr, Elf_Shdr *sechdrs,
>> int i;
>>
>> for (i = 0; i < hdr->e_shnum; i++) {
>> - if ((sechdrs[i].sh_flags & shf_wx) == shf_wx)
>> + if ((sechdrs[i].sh_flags & shf_wx) == shf_wx) {
>> + pr_err(
>> + "Module %s section %d has invalid WRITE|EXEC flags\n",
>> + mod->name, i);
>> return -ENOEXEC;
>> + }
>> }
>>
>> return 0;
>> @@ -3825,8 +3829,10 @@ static int load_module(struct load_info *info, const char __user *uargs,
>> char *after_dashes;
>>
>> err = elf_header_check(info);
>> - if (err)
>> + if (err) {
>> + pr_err("Module has invalid ELF header\n");
>> goto free_copy;
>> + }
>>
>> err = setup_load_info(info, flags);
>> if (err)
>> @@ -3834,6 +3840,7 @@ static int load_module(struct load_info *info, const char __user *uargs,
>>
>> if (blacklisted(info->name)) {
>> err = -EPERM;
>> + pr_err("Module %s is blacklisted\n", info->name);
>
> I wonder why would anyone actually add the blacklist to the command
> line like this and have no
> way to revert that back. This was introduced in

Debug. Debug. Debug. ;) The parameter was added to debug broken installations
and kernel boots.

> be7de5f91fdc modules: Add kernel parameter to blacklist modules
> as a way to overcome broken initrd generation afaics.

Not the generation of the initramfs, but blocking a module loaded during the
boot. The installation may have failed and there's no easy way to then debug
what module was responsible for the failure.

Either kernel
> command line (using modprobe.blacklist)
> or /etc/modprobe.d options are honoured by libkmod and allow a
> sufficiently privileged user to bypass it.
>

Both of those options only work if the filesystem is mounted IIRC. It could be
the case that modprobe.blacklist now works earlier in the boot, however, I've
never used it because module_blacklist is the biggest and best hammer that I can
use to get through a broken installation or boot.

In any case you're incorrectly assuming that the system has a filesystem on it.
That's not necessarily the case as I'll explain below.

> +Rusty, +Prarit: is there anything this module parameter is covering
> that I'm missing?

This is the situation I have repeatedly come across : A system at a remote site
will not boot any flavor of Linux. Since the system would not install the only
way to debug was to provide install images to workaround a module load failure.
As you can imagine that is a time consuming process as well as a bad end user
experience.

I got so sick of it that I wrote the code above (well similar to it anyway --
thanks for the review Randy ;)) to add the module_blacklist parameter to make
debugging an uninstallable "bricked" system easier.

It's come in handy in some unexpected ways. We've had situations where modules
have loaded and corrupted memory and blacklisting them revealed that the modules
were responsible for the memory corruption.

P.

>
> For the changes here,
>
> Reviewed-by: Lucas De Marchi <[email protected]>
>
> thanks
> Lucas De Marchi
>
>> goto free_copy;
>> }
>>
>> --
>> 2.27.0
>>
>

2020-09-01 20:20:31

by Lucas De Marchi

[permalink] [raw]
Subject: Re: [PATCH] module: Add more error message for failed kernel module loading

On Tue, Sep 1, 2020 at 12:56 PM Prarit Bhargava <[email protected]> wrote:
>
>
>
> On 9/1/20 2:50 PM, Lucas De Marchi wrote:
> > On Sat, Aug 29, 2020 at 4:15 AM Qu Wenruo <[email protected]> wrote:
> >>
> >> When kernel module loading failed, user space only get one of the
> >> following error messages:
> >> - -ENOEXEC
> >> This is the most confusing one. From corrupted ELF header to bad
> >> WRITE|EXEC flags check introduced by in module_enforce_rwx_sections()
> >> all returns this error number.
> >>
> >> - -EPERM
> >> This is for blacklisted modules. But mod doesn't do extra explain
> >> on this error either.
> >>
> >> - -ENOMEM
> >> The only error which needs no explain.
> >>
> >> This means, if a user got "Exec format error" from modprobe, it provides
> >> no meaningful way for the user to debug, and will take extra time
> >> communicating to get extra info.
> >>
> >> So this patch will add extra error messages for -ENOEXEC and -EPERM
> >> errors, allowing user to do better debugging and reporting.
> >>
> >> Signed-off-by: Qu Wenruo <[email protected]>
> >> ---
> >> kernel/module.c | 11 +++++++++--
> >> 1 file changed, 9 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/kernel/module.c b/kernel/module.c
> >> index 1c5cff34d9f2..9f748c6eeb48 100644
> >> --- a/kernel/module.c
> >> +++ b/kernel/module.c
> >> @@ -2096,8 +2096,12 @@ static int module_enforce_rwx_sections(Elf_Ehdr *hdr, Elf_Shdr *sechdrs,
> >> int i;
> >>
> >> for (i = 0; i < hdr->e_shnum; i++) {
> >> - if ((sechdrs[i].sh_flags & shf_wx) == shf_wx)
> >> + if ((sechdrs[i].sh_flags & shf_wx) == shf_wx) {
> >> + pr_err(
> >> + "Module %s section %d has invalid WRITE|EXEC flags\n",
> >> + mod->name, i);
> >> return -ENOEXEC;
> >> + }
> >> }
> >>
> >> return 0;
> >> @@ -3825,8 +3829,10 @@ static int load_module(struct load_info *info, const char __user *uargs,
> >> char *after_dashes;
> >>
> >> err = elf_header_check(info);
> >> - if (err)
> >> + if (err) {
> >> + pr_err("Module has invalid ELF header\n");
> >> goto free_copy;
> >> + }
> >>
> >> err = setup_load_info(info, flags);
> >> if (err)
> >> @@ -3834,6 +3840,7 @@ static int load_module(struct load_info *info, const char __user *uargs,
> >>
> >> if (blacklisted(info->name)) {
> >> err = -EPERM;
> >> + pr_err("Module %s is blacklisted\n", info->name);
> >
> > I wonder why would anyone actually add the blacklist to the command
> > line like this and have no
> > way to revert that back. This was introduced in
>
> Debug. Debug. Debug. ;) The parameter was added to debug broken installations
> and kernel boots.
>
> > be7de5f91fdc modules: Add kernel parameter to blacklist modules
> > as a way to overcome broken initrd generation afaics.
>
> Not the generation of the initramfs, but blocking a module loaded during the
> boot. The installation may have failed and there's no easy way to then debug
> what module was responsible for the failure.

if you are using initrd, then *inside* the initrd you should have the
/etc/modprobe.d/* file
you want. I said "broken initrd generation" because the tool should
put the file there, and
apparently for you it isn't.

Even if you don't have the file, you could use modprobe.blacklist= and
let the blacklist happen
in the userspace library rather than in the kernel. Module loading is
not like firmware loading
that happens without help from userspace.

>
> Either kernel
> > command line (using modprobe.blacklist)
> > or /etc/modprobe.d options are honoured by libkmod and allow a
> > sufficiently privileged user to bypass it.
> >
>
> Both of those options only work if the filesystem is mounted IIRC. It could be
> the case that modprobe.blacklist now works earlier in the boot, however, I've
> never used it because module_blacklist is the biggest and best hammer that I can
> use to get through a broken installation or boot.
>
> In any case you're incorrectly assuming that the system has a filesystem on it.
> That's not necessarily the case as I'll explain below.
>
> > +Rusty, +Prarit: is there anything this module parameter is covering
> > that I'm missing?
>
> This is the situation I have repeatedly come across : A system at a remote site
> will not boot any flavor of Linux. Since the system would not install the only
> way to debug was to provide install images to workaround a module load failure.
> As you can imagine that is a time consuming process as well as a bad end user
> experience.
>
> I got so sick of it that I wrote the code above (well similar to it anyway --
> thanks for the review Randy ;)) to add the module_blacklist parameter to make
> debugging an uninstallable "bricked" system easier.
>
> It's come in handy in some unexpected ways. We've had situations where modules
> have loaded and corrupted memory and blacklisting them revealed that the modules
> were responsible for the memory corruption.

ok... but this seems a reimplementation of modprobe.blacklist= option
in the kernel command line,
but in kernel space, with no way to remove it after the kernel is booted.

Lucas De Marchi

>
> P.
>
> >
> > For the changes here,
> >
> > Reviewed-by: Lucas De Marchi <[email protected]>
> >
> > thanks
> > Lucas De Marchi
> >
> >> goto free_copy;
> >> }
> >>
> >> --
> >> 2.27.0
> >>
> >
>

2020-09-02 00:10:08

by Prarit Bhargava

[permalink] [raw]
Subject: Re: [PATCH] module: Add more error message for failed kernel module loading



On 9/1/20 4:17 PM, Lucas De Marchi wrote:
> On Tue, Sep 1, 2020 at 12:56 PM Prarit Bhargava <[email protected]> wrote:
>>
>>
>>
>> On 9/1/20 2:50 PM, Lucas De Marchi wrote:
>>> On Sat, Aug 29, 2020 at 4:15 AM Qu Wenruo <[email protected]> wrote:
>>>>
>>>> When kernel module loading failed, user space only get one of the
>>>> following error messages:
>>>> - -ENOEXEC
>>>> This is the most confusing one. From corrupted ELF header to bad
>>>> WRITE|EXEC flags check introduced by in module_enforce_rwx_sections()
>>>> all returns this error number.
>>>>
>>>> - -EPERM
>>>> This is for blacklisted modules. But mod doesn't do extra explain
>>>> on this error either.
>>>>
>>>> - -ENOMEM
>>>> The only error which needs no explain.
>>>>
>>>> This means, if a user got "Exec format error" from modprobe, it provides
>>>> no meaningful way for the user to debug, and will take extra time
>>>> communicating to get extra info.
>>>>
>>>> So this patch will add extra error messages for -ENOEXEC and -EPERM
>>>> errors, allowing user to do better debugging and reporting.
>>>>
>>>> Signed-off-by: Qu Wenruo <[email protected]>
>>>> ---
>>>> kernel/module.c | 11 +++++++++--
>>>> 1 file changed, 9 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/kernel/module.c b/kernel/module.c
>>>> index 1c5cff34d9f2..9f748c6eeb48 100644
>>>> --- a/kernel/module.c
>>>> +++ b/kernel/module.c
>>>> @@ -2096,8 +2096,12 @@ static int module_enforce_rwx_sections(Elf_Ehdr *hdr, Elf_Shdr *sechdrs,
>>>> int i;
>>>>
>>>> for (i = 0; i < hdr->e_shnum; i++) {
>>>> - if ((sechdrs[i].sh_flags & shf_wx) == shf_wx)
>>>> + if ((sechdrs[i].sh_flags & shf_wx) == shf_wx) {
>>>> + pr_err(
>>>> + "Module %s section %d has invalid WRITE|EXEC flags\n",
>>>> + mod->name, i);
>>>> return -ENOEXEC;
>>>> + }
>>>> }
>>>>
>>>> return 0;
>>>> @@ -3825,8 +3829,10 @@ static int load_module(struct load_info *info, const char __user *uargs,
>>>> char *after_dashes;
>>>>
>>>> err = elf_header_check(info);
>>>> - if (err)
>>>> + if (err) {
>>>> + pr_err("Module has invalid ELF header\n");
>>>> goto free_copy;
>>>> + }
>>>>
>>>> err = setup_load_info(info, flags);
>>>> if (err)
>>>> @@ -3834,6 +3840,7 @@ static int load_module(struct load_info *info, const char __user *uargs,
>>>>
>>>> if (blacklisted(info->name)) {
>>>> err = -EPERM;
>>>> + pr_err("Module %s is blacklisted\n", info->name);
>>>
>>> I wonder why would anyone actually add the blacklist to the command
>>> line like this and have no
>>> way to revert that back. This was introduced in
>>
>> Debug. Debug. Debug. ;) The parameter was added to debug broken installations
>> and kernel boots.
>>
>>> be7de5f91fdc modules: Add kernel parameter to blacklist modules
>>> as a way to overcome broken initrd generation afaics.
>>
>> Not the generation of the initramfs, but blocking a module loaded during the
>> boot. The installation may have failed and there's no easy way to then debug
>> what module was responsible for the failure.
>
> if you are using initrd, then *inside* the initrd you should have the
> /etc/modprobe.d/* file
> you want. I said "broken initrd generation" because the tool should
> put the file there, and
> apparently for you it isn't.
>
> Even if you don't have the file, you could use modprobe.blacklist= and
> let the blacklist happen
> in the userspace library rather than in the kernel. Module loading is
> not like firmware loading
> that happens without help from userspace.
>
>>
>> Either kernel
>>> command line (using modprobe.blacklist)
>>> or /etc/modprobe.d options are honoured by libkmod and allow a
>>> sufficiently privileged user to bypass it.
>>>
>>
>> Both of those options only work if the filesystem is mounted IIRC. It could be
>> the case that modprobe.blacklist now works earlier in the boot, however, I've
>> never used it because module_blacklist is the biggest and best hammer that I can
>> use to get through a broken installation or boot.
>>
>> In any case you're incorrectly assuming that the system has a filesystem on it.
>> That's not necessarily the case as I'll explain below.
>>
>>> +Rusty, +Prarit: is there anything this module parameter is covering
>>> that I'm missing?
>>
>> This is the situation I have repeatedly come across : A system at a remote site
>> will not boot any flavor of Linux. Since the system would not install the only
>> way to debug was to provide install images to workaround a module load failure.
>> As you can imagine that is a time consuming process as well as a bad end user
>> experience.
>>
>> I got so sick of it that I wrote the code above (well similar to it anyway --
>> thanks for the review Randy ;)) to add the module_blacklist parameter to make
>> debugging an uninstallable "bricked" system easier.
>>
>> It's come in handy in some unexpected ways. We've had situations where modules
>> have loaded and corrupted memory and blacklisting them revealed that the modules
>> were responsible for the memory corruption.
>
> ok... but this seems a reimplementation of modprobe.blacklist= option
> in the kernel command line,
> but in kernel space, with no way to remove it after the kernel is booted.
>

That's *EXACTLY* what I want. I do not want modprobe (because of some
misconfiguration) to load the module I've explicitly blacklisted.

P.

> Lucas De Marchi
>
>>
>> P.
>>
>>>
>>> For the changes here,
>>>
>>> Reviewed-by: Lucas De Marchi <[email protected]>
>>>
>>> thanks
>>> Lucas De Marchi
>>>
>>>> goto free_copy;
>>>> }
>>>>
>>>> --
>>>> 2.27.0
>>>>
>>>
>>
>