2022-11-09 02:51:17

by Connor O'Brien

[permalink] [raw]
Subject: [PATCH bpf-next] bpf: btf: don't log ignored BTF mismatches

Enabling CONFIG_MODULE_ALLOW_BTF_MISMATCH is an indication that BTF
mismatches are expected and module loading should proceed
anyway. Logging with pr_warn() on every one of these "benign"
mismatches creates unnecessary noise when many such modules are
loaded. Instead, limit logging to the case where a BTF mismatch
actually prevents a module form loading.

Signed-off-by: Connor O'Brien <[email protected]>
---
kernel/bpf/btf.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 5579ff3a5b54..406370487413 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -7190,11 +7190,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
}
btf = btf_parse_module(mod->name, mod->btf_data, mod->btf_data_size);
if (IS_ERR(btf)) {
- pr_warn("failed to validate module [%s] BTF: %ld\n",
- mod->name, PTR_ERR(btf));
kfree(btf_mod);
- if (!IS_ENABLED(CONFIG_MODULE_ALLOW_BTF_MISMATCH))
+ if (!IS_ENABLED(CONFIG_MODULE_ALLOW_BTF_MISMATCH)) {
+ pr_warn("failed to validate module [%s] BTF: %ld\n",
+ mod->name, PTR_ERR(btf));
err = PTR_ERR(btf);
+ }
goto out;
}
err = btf_alloc_id(btf);
--
2.38.1.431.g37b22c650d-goog



2022-11-09 16:55:43

by Yonghong Song

[permalink] [raw]
Subject: Re: [PATCH bpf-next] bpf: btf: don't log ignored BTF mismatches



On 11/8/22 6:41 PM, Connor O'Brien wrote:
> Enabling CONFIG_MODULE_ALLOW_BTF_MISMATCH is an indication that BTF
> mismatches are expected and module loading should proceed
> anyway. Logging with pr_warn() on every one of these "benign"
> mismatches creates unnecessary noise when many such modules are
> loaded. Instead, limit logging to the case where a BTF mismatch
> actually prevents a module form loading.
>
> Signed-off-by: Connor O'Brien <[email protected]>
> ---
> kernel/bpf/btf.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 5579ff3a5b54..406370487413 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -7190,11 +7190,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
> }
> btf = btf_parse_module(mod->name, mod->btf_data, mod->btf_data_size);
> if (IS_ERR(btf)) {
> - pr_warn("failed to validate module [%s] BTF: %ld\n",
> - mod->name, PTR_ERR(btf));

I think such warning still useful even with
CONFIG_MODULE_ALLOW_BTF_MISMATCH.
Can we use pr_warn_ratelimited instead of pr_warn in the above to
avoid excessive warnings?

> kfree(btf_mod);
> - if (!IS_ENABLED(CONFIG_MODULE_ALLOW_BTF_MISMATCH))
> + if (!IS_ENABLED(CONFIG_MODULE_ALLOW_BTF_MISMATCH)) {
> + pr_warn("failed to validate module [%s] BTF: %ld\n",
> + mod->name, PTR_ERR(btf));
> err = PTR_ERR(btf);
> + }
> goto out;
> }
> err = btf_alloc_id(btf);

2022-12-08 02:40:05

by Connor O'Brien

[permalink] [raw]
Subject: Re: [PATCH bpf-next] bpf: btf: don't log ignored BTF mismatches

On Wed, Nov 9, 2022 at 8:45 AM Yonghong Song <[email protected]> wrote:
>
>
>
> On 11/8/22 6:41 PM, Connor O'Brien wrote:
> > Enabling CONFIG_MODULE_ALLOW_BTF_MISMATCH is an indication that BTF
> > mismatches are expected and module loading should proceed
> > anyway. Logging with pr_warn() on every one of these "benign"
> > mismatches creates unnecessary noise when many such modules are
> > loaded. Instead, limit logging to the case where a BTF mismatch
> > actually prevents a module form loading.
> >
> > Signed-off-by: Connor O'Brien <[email protected]>
> > ---
> > kernel/bpf/btf.c | 7 ++++---
> > 1 file changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index 5579ff3a5b54..406370487413 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -7190,11 +7190,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
> > }
> > btf = btf_parse_module(mod->name, mod->btf_data, mod->btf_data_size);
> > if (IS_ERR(btf)) {
> > - pr_warn("failed to validate module [%s] BTF: %ld\n",
> > - mod->name, PTR_ERR(btf));
>
> I think such warning still useful even with
> CONFIG_MODULE_ALLOW_BTF_MISMATCH.
> Can we use pr_warn_ratelimited instead of pr_warn in the above to
> avoid excessive warnings?

I gave this a try on a Pixel 6 but I'm not sure it quite addresses the
issue. The amount of logging doesn't seem to decrease much, I think
because the interval between loading one mismatched module and the
next can be greater than the default rate limit. To my mind, the issue
is the total volume of these messages more so than their rate.

For context, Android devices using the GKI may load hundreds of
separately-built modules, and BTF mismatches are possible for any/all
of these. It was pointed out to me that btf_verifier_log_type can also
print several more lines per mismatched module. ~5 lines of logging
for each mismatched module can start to add up, in terms of both
overhead and the noise added to the kernel logs.

This is more subjective but I think the warnings also read as though
this is a more serious failure that might prevent affected modules
from working correctly; anecdotally, I've gotten multiple questions
about them asking if something is broken. This can be a red herring
for anyone unfamiliar with BTF who is reading the logs to debug
unrelated issues. In the CONFIG_MODULE_ALLOW_BTF_MISMATCH=y case the
flood of warnings seems out of proportion to the actual result
(modules still load successfully, just without debug info) especially
since the user has explicitly enabled a config saying they expect
mismatches.

If there needs to be some logging in the "mismatch allowed" case,
could an acceptable middle ground be to use pr_warn_once to send a
single message reporting that mismatches were detected & module BTF
debug info might be unavailable? Alternatively, if we could opt out of
module BTF loading then that would also avoid this issue, but that's
already been proposed before ([1], [2]) so I thought working with the
existing config option might be preferred.

[1] https://lore.kernel.org/bpf/[email protected]/
[2] https://lore.kernel.org/bpf/[email protected]/

2022-12-08 18:25:42

by Yonghong Song

[permalink] [raw]
Subject: Re: [PATCH bpf-next] bpf: btf: don't log ignored BTF mismatches



On 12/7/22 6:19 PM, Connor O'Brien wrote:
> On Wed, Nov 9, 2022 at 8:45 AM Yonghong Song <[email protected]> wrote:
>>
>>
>>
>> On 11/8/22 6:41 PM, Connor O'Brien wrote:
>>> Enabling CONFIG_MODULE_ALLOW_BTF_MISMATCH is an indication that BTF
>>> mismatches are expected and module loading should proceed
>>> anyway. Logging with pr_warn() on every one of these "benign"
>>> mismatches creates unnecessary noise when many such modules are
>>> loaded. Instead, limit logging to the case where a BTF mismatch
>>> actually prevents a module form loading.
>>>
>>> Signed-off-by: Connor O'Brien <[email protected]>
>>> ---
>>> kernel/bpf/btf.c | 7 ++++---
>>> 1 file changed, 4 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>>> index 5579ff3a5b54..406370487413 100644
>>> --- a/kernel/bpf/btf.c
>>> +++ b/kernel/bpf/btf.c
>>> @@ -7190,11 +7190,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
>>> }
>>> btf = btf_parse_module(mod->name, mod->btf_data, mod->btf_data_size);
>>> if (IS_ERR(btf)) {
>>> - pr_warn("failed to validate module [%s] BTF: %ld\n",
>>> - mod->name, PTR_ERR(btf));
>>
>> I think such warning still useful even with
>> CONFIG_MODULE_ALLOW_BTF_MISMATCH.
>> Can we use pr_warn_ratelimited instead of pr_warn in the above to
>> avoid excessive warnings?
>
> I gave this a try on a Pixel 6 but I'm not sure it quite addresses the
> issue. The amount of logging doesn't seem to decrease much, I think
> because the interval between loading one mismatched module and the
> next can be greater than the default rate limit. To my mind, the issue
> is the total volume of these messages more so than their rate.
>
> For context, Android devices using the GKI may load hundreds of
> separately-built modules, and BTF mismatches are possible for any/all
> of these. It was pointed out to me that btf_verifier_log_type can also
> print several more lines per mismatched module. ~5 lines of logging
> for each mismatched module can start to add up, in terms of both
> overhead and the noise added to the kernel logs.
>
> This is more subjective but I think the warnings also read as though
> this is a more serious failure that might prevent affected modules
> from working correctly; anecdotally, I've gotten multiple questions
> about them asking if something is broken. This can be a red herring
> for anyone unfamiliar with BTF who is reading the logs to debug
> unrelated issues. In the CONFIG_MODULE_ALLOW_BTF_MISMATCH=y case the
> flood of warnings seems out of proportion to the actual result
> (modules still load successfully, just without debug info) especially
> since the user has explicitly enabled a config saying they expect
> mismatches.
>
> If there needs to be some logging in the "mismatch allowed" case,
> could an acceptable middle ground be to use pr_warn_once to send a

So it looks like pr_warn_ratelimited still produces a lot of warning.
In this case, I guess pr_warn_once should be okay.

> single message reporting that mismatches were detected & module BTF
> debug info might be unavailable? Alternatively, if we could opt out of
> module BTF loading then that would also avoid this issue, but that's
> already been proposed before ([1], [2]) so I thought working with the
> existing config option might be preferred.
>
> [1] https://lore.kernel.org/bpf/[email protected]/
> [2] https://lore.kernel.org/bpf/[email protected]/

2022-12-09 00:12:32

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [PATCH bpf-next] bpf: btf: don't log ignored BTF mismatches

On Thu, Dec 8, 2022 at 10:01 AM Yonghong Song <[email protected]> wrote:
>
>
>
> On 12/7/22 6:19 PM, Connor O'Brien wrote:
> > On Wed, Nov 9, 2022 at 8:45 AM Yonghong Song <[email protected]> wrote:
> >>
> >>
> >>
> >> On 11/8/22 6:41 PM, Connor O'Brien wrote:
> >>> Enabling CONFIG_MODULE_ALLOW_BTF_MISMATCH is an indication that BTF
> >>> mismatches are expected and module loading should proceed
> >>> anyway. Logging with pr_warn() on every one of these "benign"
> >>> mismatches creates unnecessary noise when many such modules are
> >>> loaded. Instead, limit logging to the case where a BTF mismatch
> >>> actually prevents a module form loading.
> >>>
> >>> Signed-off-by: Connor O'Brien <[email protected]>
> >>> ---
> >>> kernel/bpf/btf.c | 7 ++++---
> >>> 1 file changed, 4 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> >>> index 5579ff3a5b54..406370487413 100644
> >>> --- a/kernel/bpf/btf.c
> >>> +++ b/kernel/bpf/btf.c
> >>> @@ -7190,11 +7190,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
> >>> }
> >>> btf = btf_parse_module(mod->name, mod->btf_data, mod->btf_data_size);
> >>> if (IS_ERR(btf)) {
> >>> - pr_warn("failed to validate module [%s] BTF: %ld\n",
> >>> - mod->name, PTR_ERR(btf));
> >>
> >> I think such warning still useful even with
> >> CONFIG_MODULE_ALLOW_BTF_MISMATCH.
> >> Can we use pr_warn_ratelimited instead of pr_warn in the above to
> >> avoid excessive warnings?
> >
> > I gave this a try on a Pixel 6 but I'm not sure it quite addresses the
> > issue. The amount of logging doesn't seem to decrease much, I think
> > because the interval between loading one mismatched module and the
> > next can be greater than the default rate limit. To my mind, the issue
> > is the total volume of these messages more so than their rate.
> >
> > For context, Android devices using the GKI may load hundreds of
> > separately-built modules, and BTF mismatches are possible for any/all
> > of these. It was pointed out to me that btf_verifier_log_type can also
> > print several more lines per mismatched module. ~5 lines of logging
> > for each mismatched module can start to add up, in terms of both
> > overhead and the noise added to the kernel logs.
> >
> > This is more subjective but I think the warnings also read as though
> > this is a more serious failure that might prevent affected modules
> > from working correctly; anecdotally, I've gotten multiple questions
> > about them asking if something is broken. This can be a red herring
> > for anyone unfamiliar with BTF who is reading the logs to debug
> > unrelated issues. In the CONFIG_MODULE_ALLOW_BTF_MISMATCH=y case the
> > flood of warnings seems out of proportion to the actual result
> > (modules still load successfully, just without debug info) especially
> > since the user has explicitly enabled a config saying they expect
> > mismatches.
> >
> > If there needs to be some logging in the "mismatch allowed" case,
> > could an acceptable middle ground be to use pr_warn_once to send a
>
> So it looks like pr_warn_ratelimited still produces a lot of warning.
> In this case, I guess pr_warn_once should be okay.

Maybe pr_warn_once generic "some kernel module BTF mismatched". And
also warn per-module message with details if
CONFIG_MODULE_ALLOW_BTF_MISMATCH is not set?

>
> > single message reporting that mismatches were detected & module BTF
> > debug info might be unavailable? Alternatively, if we could opt out of
> > module BTF loading then that would also avoid this issue, but that's
> > already been proposed before ([1], [2]) so I thought working with the
> > existing config option might be preferred.
> >
> > [1] https://lore.kernel.org/bpf/[email protected]/
> > [2] https://lore.kernel.org/bpf/[email protected]/

2022-12-09 01:11:42

by Connor O'Brien

[permalink] [raw]
Subject: Re: [PATCH bpf-next] bpf: btf: don't log ignored BTF mismatches

On Thu, Dec 8, 2022 at 4:03 PM Andrii Nakryiko
<[email protected]> wrote:
>
> On Thu, Dec 8, 2022 at 10:01 AM Yonghong Song <[email protected]> wrote:
> >
> >
> >
> > On 12/7/22 6:19 PM, Connor O'Brien wrote:
> > > On Wed, Nov 9, 2022 at 8:45 AM Yonghong Song <[email protected]> wrote:
> > >>
> > >>
> > >>
> > >> On 11/8/22 6:41 PM, Connor O'Brien wrote:
> > >>> Enabling CONFIG_MODULE_ALLOW_BTF_MISMATCH is an indication that BTF
> > >>> mismatches are expected and module loading should proceed
> > >>> anyway. Logging with pr_warn() on every one of these "benign"
> > >>> mismatches creates unnecessary noise when many such modules are
> > >>> loaded. Instead, limit logging to the case where a BTF mismatch
> > >>> actually prevents a module form loading.
> > >>>
> > >>> Signed-off-by: Connor O'Brien <[email protected]>
> > >>> ---
> > >>> kernel/bpf/btf.c | 7 ++++---
> > >>> 1 file changed, 4 insertions(+), 3 deletions(-)
> > >>>
> > >>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > >>> index 5579ff3a5b54..406370487413 100644
> > >>> --- a/kernel/bpf/btf.c
> > >>> +++ b/kernel/bpf/btf.c
> > >>> @@ -7190,11 +7190,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
> > >>> }
> > >>> btf = btf_parse_module(mod->name, mod->btf_data, mod->btf_data_size);
> > >>> if (IS_ERR(btf)) {
> > >>> - pr_warn("failed to validate module [%s] BTF: %ld\n",
> > >>> - mod->name, PTR_ERR(btf));
> > >>
> > >> I think such warning still useful even with
> > >> CONFIG_MODULE_ALLOW_BTF_MISMATCH.
> > >> Can we use pr_warn_ratelimited instead of pr_warn in the above to
> > >> avoid excessive warnings?
> > >
> > > I gave this a try on a Pixel 6 but I'm not sure it quite addresses the
> > > issue. The amount of logging doesn't seem to decrease much, I think
> > > because the interval between loading one mismatched module and the
> > > next can be greater than the default rate limit. To my mind, the issue
> > > is the total volume of these messages more so than their rate.
> > >
> > > For context, Android devices using the GKI may load hundreds of
> > > separately-built modules, and BTF mismatches are possible for any/all
> > > of these. It was pointed out to me that btf_verifier_log_type can also
> > > print several more lines per mismatched module. ~5 lines of logging
> > > for each mismatched module can start to add up, in terms of both
> > > overhead and the noise added to the kernel logs.
> > >
> > > This is more subjective but I think the warnings also read as though
> > > this is a more serious failure that might prevent affected modules
> > > from working correctly; anecdotally, I've gotten multiple questions
> > > about them asking if something is broken. This can be a red herring
> > > for anyone unfamiliar with BTF who is reading the logs to debug
> > > unrelated issues. In the CONFIG_MODULE_ALLOW_BTF_MISMATCH=y case the
> > > flood of warnings seems out of proportion to the actual result
> > > (modules still load successfully, just without debug info) especially
> > > since the user has explicitly enabled a config saying they expect
> > > mismatches.
> > >
> > > If there needs to be some logging in the "mismatch allowed" case,
> > > could an acceptable middle ground be to use pr_warn_once to send a
> >
> > So it looks like pr_warn_ratelimited still produces a lot of warning.
> > In this case, I guess pr_warn_once should be okay.
>
> Maybe pr_warn_once generic "some kernel module BTF mismatched". And
> also warn per-module message with details if
> CONFIG_MODULE_ALLOW_BTF_MISMATCH is not set?
>

Yeah, I figured the logging behavior should remain unchanged when
CONFIG_MODULE_ALLOW_BTF_MISMATCH is unset, the existing verbose
logging makes sense when every mismatch stops a module from loading.

Then for the pr_warn_once case something like "Kernel module BTF
mismatch detected, BTF debug info may be unavailable for some modules"



> >
> > > single message reporting that mismatches were detected & module BTF
> > > debug info might be unavailable? Alternatively, if we could opt out of
> > > module BTF loading then that would also avoid this issue, but that's
> > > already been proposed before ([1], [2]) so I thought working with the
> > > existing config option might be preferred.
> > >
> > > [1] https://lore.kernel.org/bpf/[email protected]/
> > > [2] https://lore.kernel.org/bpf/[email protected]/

2022-12-09 01:51:08

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [PATCH bpf-next] bpf: btf: don't log ignored BTF mismatches

On Thu, Dec 8, 2022 at 4:59 PM Connor O'Brien <[email protected]> wrote:
>
> On Thu, Dec 8, 2022 at 4:03 PM Andrii Nakryiko
> <[email protected]> wrote:
> >
> > On Thu, Dec 8, 2022 at 10:01 AM Yonghong Song <[email protected]> wrote:
> > >
> > >
> > >
> > > On 12/7/22 6:19 PM, Connor O'Brien wrote:
> > > > On Wed, Nov 9, 2022 at 8:45 AM Yonghong Song <[email protected]> wrote:
> > > >>
> > > >>
> > > >>
> > > >> On 11/8/22 6:41 PM, Connor O'Brien wrote:
> > > >>> Enabling CONFIG_MODULE_ALLOW_BTF_MISMATCH is an indication that BTF
> > > >>> mismatches are expected and module loading should proceed
> > > >>> anyway. Logging with pr_warn() on every one of these "benign"
> > > >>> mismatches creates unnecessary noise when many such modules are
> > > >>> loaded. Instead, limit logging to the case where a BTF mismatch
> > > >>> actually prevents a module form loading.
> > > >>>
> > > >>> Signed-off-by: Connor O'Brien <[email protected]>
> > > >>> ---
> > > >>> kernel/bpf/btf.c | 7 ++++---
> > > >>> 1 file changed, 4 insertions(+), 3 deletions(-)
> > > >>>
> > > >>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > > >>> index 5579ff3a5b54..406370487413 100644
> > > >>> --- a/kernel/bpf/btf.c
> > > >>> +++ b/kernel/bpf/btf.c
> > > >>> @@ -7190,11 +7190,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
> > > >>> }
> > > >>> btf = btf_parse_module(mod->name, mod->btf_data, mod->btf_data_size);
> > > >>> if (IS_ERR(btf)) {
> > > >>> - pr_warn("failed to validate module [%s] BTF: %ld\n",
> > > >>> - mod->name, PTR_ERR(btf));
> > > >>
> > > >> I think such warning still useful even with
> > > >> CONFIG_MODULE_ALLOW_BTF_MISMATCH.
> > > >> Can we use pr_warn_ratelimited instead of pr_warn in the above to
> > > >> avoid excessive warnings?
> > > >
> > > > I gave this a try on a Pixel 6 but I'm not sure it quite addresses the
> > > > issue. The amount of logging doesn't seem to decrease much, I think
> > > > because the interval between loading one mismatched module and the
> > > > next can be greater than the default rate limit. To my mind, the issue
> > > > is the total volume of these messages more so than their rate.
> > > >
> > > > For context, Android devices using the GKI may load hundreds of
> > > > separately-built modules, and BTF mismatches are possible for any/all
> > > > of these. It was pointed out to me that btf_verifier_log_type can also
> > > > print several more lines per mismatched module. ~5 lines of logging
> > > > for each mismatched module can start to add up, in terms of both
> > > > overhead and the noise added to the kernel logs.
> > > >
> > > > This is more subjective but I think the warnings also read as though
> > > > this is a more serious failure that might prevent affected modules
> > > > from working correctly; anecdotally, I've gotten multiple questions
> > > > about them asking if something is broken. This can be a red herring
> > > > for anyone unfamiliar with BTF who is reading the logs to debug
> > > > unrelated issues. In the CONFIG_MODULE_ALLOW_BTF_MISMATCH=y case the
> > > > flood of warnings seems out of proportion to the actual result
> > > > (modules still load successfully, just without debug info) especially
> > > > since the user has explicitly enabled a config saying they expect
> > > > mismatches.
> > > >
> > > > If there needs to be some logging in the "mismatch allowed" case,
> > > > could an acceptable middle ground be to use pr_warn_once to send a
> > >
> > > So it looks like pr_warn_ratelimited still produces a lot of warning.
> > > In this case, I guess pr_warn_once should be okay.
> >
> > Maybe pr_warn_once generic "some kernel module BTF mismatched". And
> > also warn per-module message with details if
> > CONFIG_MODULE_ALLOW_BTF_MISMATCH is not set?
> >
>
> Yeah, I figured the logging behavior should remain unchanged when
> CONFIG_MODULE_ALLOW_BTF_MISMATCH is unset, the existing verbose
> logging makes sense when every mismatch stops a module from loading.
>
> Then for the pr_warn_once case something like "Kernel module BTF
> mismatch detected, BTF debug info may be unavailable for some modules"

yep, makes sense to me

>
>
>
> > >
> > > > single message reporting that mismatches were detected & module BTF
> > > > debug info might be unavailable? Alternatively, if we could opt out of
> > > > module BTF loading then that would also avoid this issue, but that's
> > > > already been proposed before ([1], [2]) so I thought working with the
> > > > existing config option might be preferred.
> > > >
> > > > [1] https://lore.kernel.org/bpf/[email protected]/
> > > > [2] https://lore.kernel.org/bpf/[email protected]/