LinuxLists.cc - Re: WARN_ON

2020-12-03 09:25:49

Subject: Re: WARN_ON_ONCE

On Thu, Dec 3, 2020 at 10:19 AM Dmitry Vyukov <[email protected]> wrote:
>
> On Thu, Dec 3, 2020 at 10:10 AM Alexey Kardashevskiy <[email protected]> wrote:
> >
> > Hi!
> >
> > Syzkaller triggered WARN_ON_ONCE at
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/tracepoint.c?h=v5.10-rc6#n266
> >
> >
> > ===
> > static int tracepoint_add_func(struct tracepoint *tp,
> > struct tracepoint_func *func, int prio)
> > {
> > struct tracepoint_func *old, *tp_funcs;
> > int ret;
> >
> > if (tp->regfunc && !static_key_enabled(&tp->key)) {
> > ret = tp->regfunc();
> > if (ret < 0)
> > return ret;
> > }
> >
> > tp_funcs = rcu_dereference_protected(tp->funcs,
> > lockdep_is_held(&tracepoints_mutex));
> > old = func_add(&tp_funcs, func, prio);
> > if (IS_ERR(old)) {
> > WARN_ON_ONCE(PTR_ERR(old) != -ENOMEM);
> > return PTR_ERR(old);
> > }
> >
> > ===
> >
> > What is the common approach here? Syzkaller reacts on this as if it was
> > a bug but WARN_ON_ONCE here seems intentional. Do we still push for
> > removing such warnings?
>
> +LKML

+LKML for real

> Hi Alexey,
>
> Yes, see the guidelines here:
> https://elixir.bootlin.com/linux/v5.10-rc6/source/include/asm-generic/bug.h#L67
>
> Without a criteria for kernel but/not a kernel bug no kernel testing
> is possible.
>
> But this may be a real bug as well. The code seems to assume that
> ENOMEM is the only possible error here, which is not the case in
> reality.
>
>
> > Another example is:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/tracepoint.h?h=v5.10-rc6#n313
> >
> > My VMs crash on dereferencing it_func_ptr which is easily fixable by:
> >
> > @@ -307,9 +307,11 @@ static inline struct tracepoint
> > *tracepoint_ptr_deref(tracepoint_ptr_t *p)
> > \
> > it_func_ptr = \
> >
> > rcu_dereference_raw((&__tracepoint_##_name)->funcs); \
> > + if (it_func_ptr) \
> > do { \
> > it_func = (it_func_ptr)->func; \
> > __data = (it_func_ptr)->data; \
> >
> >
> > But - this only happens when OOM killer starts killing syzkaller
> > processes (I do not give it much memory so it is quite artificial
> > environment). Do we push these?
> >
> > Are there guidelines of some sort? Thanks,
> >
> >
> > --
> > Alexey
> >
> > --
> > You received this message because you are subscribed to the Google Groups "syzkaller" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller/87f443cf-26c0-6302-edee-556045bca18a%40ozlabs.ru.

2020-12-04 01:30:12

by Michael Ellerman

[permalink] [raw]

Subject: Re: WARN_ON_ONCE

Dmitry Vyukov <[email protected]> writes:
> On Thu, Dec 3, 2020 at 10:19 AM Dmitry Vyukov <[email protected]> wrote:
>> On Thu, Dec 3, 2020 at 10:10 AM Alexey Kardashevskiy <[email protected]> wrote:
>> >
>> > Hi!
>> >
>> > Syzkaller triggered WARN_ON_ONCE at
>> >
>> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/tracepoint.c?h=v5.10-rc6#n266
>> >
>> >
>> > ===
>> > static int tracepoint_add_func(struct tracepoint *tp,
>> > struct tracepoint_func *func, int prio)
>> > {
>> > struct tracepoint_func *old, *tp_funcs;
>> > int ret;
>> >
>> > if (tp->regfunc && !static_key_enabled(&tp->key)) {
>> > ret = tp->regfunc();
>> > if (ret < 0)
>> > return ret;
>> > }
>> >
>> > tp_funcs = rcu_dereference_protected(tp->funcs,
>> > lockdep_is_held(&tracepoints_mutex));
>> > old = func_add(&tp_funcs, func, prio);
>> > if (IS_ERR(old)) {
>> > WARN_ON_ONCE(PTR_ERR(old) != -ENOMEM);
>> > return PTR_ERR(old);
>> > }
>> >
>> > ===
>> >
>> > What is the common approach here? Syzkaller reacts on this as if it was
>> > a bug but WARN_ON_ONCE here seems intentional. Do we still push for
>> > removing such warnings?

AFAICS it is a bug if that fires.

See the commit that added it:
d66a270be331 ("tracepoint: Do not warn on ENOMEM")

Which says:
Tracepoint should only warn when a kernel API user does not respect the
required preconditions (e.g. same tracepoint enabled twice, or called
to remove a tracepoint that does not exist).

Silence warning in out-of-memory conditions, given that the error is
returned to the caller.

So if you're seeing it then you've someone caused it to return something
other than ENOMEM, and that is a bug.

cheers

2020-12-04 02:35:14

by Alexey Kardashevskiy

[permalink] [raw]

Subject: Re: WARN_ON_ONCE

On 04/12/2020 12:25, Michael Ellerman wrote:
> Dmitry Vyukov <[email protected]> writes:
>> On Thu, Dec 3, 2020 at 10:19 AM Dmitry Vyukov <[email protected]> wrote:
>>> On Thu, Dec 3, 2020 at 10:10 AM Alexey Kardashevskiy <[email protected]> wrote:
>>>>
>>>> Hi!
>>>>
>>>> Syzkaller triggered WARN_ON_ONCE at
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/tracepoint.c?h=v5.10-rc6#n266
>>>>
>>>>
>>>> ===
>>>> static int tracepoint_add_func(struct tracepoint *tp,
>>>> struct tracepoint_func *func, int prio)
>>>> {
>>>> struct tracepoint_func *old, *tp_funcs;
>>>> int ret;
>>>>
>>>> if (tp->regfunc && !static_key_enabled(&tp->key)) {
>>>> ret = tp->regfunc();
>>>> if (ret < 0)
>>>> return ret;
>>>> }
>>>>
>>>> tp_funcs = rcu_dereference_protected(tp->funcs,
>>>> lockdep_is_held(&tracepoints_mutex));
>>>> old = func_add(&tp_funcs, func, prio);
>>>> if (IS_ERR(old)) {
>>>> WARN_ON_ONCE(PTR_ERR(old) != -ENOMEM);
>>>> return PTR_ERR(old);
>>>> }
>>>>
>>>> ===
>>>>
>>>> What is the common approach here? Syzkaller reacts on this as if it was
>>>> a bug but WARN_ON_ONCE here seems intentional. Do we still push for
>>>> removing such warnings?
>
> AFAICS it is a bug if that fires.
>
> See the commit that added it:
> d66a270be331 ("tracepoint: Do not warn on ENOMEM")
>
> Which says:
> Tracepoint should only warn when a kernel API user does not respect the
> required preconditions (e.g. same tracepoint enabled twice,

This says that the userspace can trigger the warning if it does not use
the API right.

> or called
> to remove a tracepoint that does not exist).
>
> Silence warning in out-of-memory conditions, given that the error is
> returned to the caller.
>
>
> So if you're seeing it then you've someone caused it to return something
> other than ENOMEM, and that is a bug.

This is an userspace bug which registers the same thing twice, the
kernel returns a correct error. The question is should it warn by
WARN_ON or pr_err(). The comment in bug.h suggests pr_err() is the right
way, is not it?

--
Alexey

2020-12-05 14:02:53

by Michael Ellerman

[permalink] [raw]

Subject: Re: WARN_ON_ONCE

Alexey Kardashevskiy <[email protected]> writes:
> On 04/12/2020 12:25, Michael Ellerman wrote:
>> Dmitry Vyukov <[email protected]> writes:
>>> On Thu, Dec 3, 2020 at 10:19 AM Dmitry Vyukov <[email protected]> wrote:
>>>> On Thu, Dec 3, 2020 at 10:10 AM Alexey Kardashevskiy <[email protected]> wrote:
>>>>>
>>>>> Hi!
>>>>>
>>>>> Syzkaller triggered WARN_ON_ONCE at
>>>>>
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/tracepoint.c?h=v5.10-rc6#n266
>>>>>
>>>>>
>>>>> ===
>>>>> static int tracepoint_add_func(struct tracepoint *tp,
>>>>> struct tracepoint_func *func, int prio)
>>>>> {
>>>>> struct tracepoint_func *old, *tp_funcs;
>>>>> int ret;
>>>>>
>>>>> if (tp->regfunc && !static_key_enabled(&tp->key)) {
>>>>> ret = tp->regfunc();
>>>>> if (ret < 0)
>>>>> return ret;
>>>>> }
>>>>>
>>>>> tp_funcs = rcu_dereference_protected(tp->funcs,
>>>>> lockdep_is_held(&tracepoints_mutex));
>>>>> old = func_add(&tp_funcs, func, prio);
>>>>> if (IS_ERR(old)) {
>>>>> WARN_ON_ONCE(PTR_ERR(old) != -ENOMEM);
>>>>> return PTR_ERR(old);
>>>>> }
>>>>>
>>>>> ===
>>>>>
>>>>> What is the common approach here? Syzkaller reacts on this as if it was
>>>>> a bug but WARN_ON_ONCE here seems intentional. Do we still push for
>>>>> removing such warnings?
>>
>> AFAICS it is a bug if that fires.
>>
>> See the commit that added it:
>> d66a270be331 ("tracepoint: Do not warn on ENOMEM")
>>
>> Which says:
>> Tracepoint should only warn when a kernel API user does not respect the
>> required preconditions (e.g. same tracepoint enabled twice,
>
> This says that the userspace can trigger the warning if it does not use
> the API right.

No I don't think it says that.

It's saying that it should be a WARN if a *kernel* user of the
tracepoint API violates the API. The implication is that this condition
should never happen if the kernel is using the tracepoint API correctly,
and so if we hit this condition it indicates a bug in the kernel that
should be fixed.

>> or called
>> to remove a tracepoint that does not exist).
>>
>> Silence warning in out-of-memory conditions, given that the error is
>> returned to the caller.
>>
>>
>> So if you're seeing it then you've someone caused it to return something
>> other than ENOMEM, and that is a bug.
>
> This is an userspace bug which registers the same thing twice, the
> kernel returns a correct error. The question is should it warn by
> WARN_ON or pr_err(). The comment in bug.h suggests pr_err() is the right
> way, is not it?

Userspace must not be able to trigger a WARN.

What is the path into that code from userspace?

Either something on that path should be checking that it's not violating
the API and triggering the WARN, or if that's not possible/easy then the
WARN should be removed.

cheers

2020-12-06 12:18:27

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: WARN_ON_ONCE

On Sat, Dec 5, 2020 at 1:05 PM Michael Ellerman <[email protected]> wrote:
>
> Alexey Kardashevskiy <[email protected]> writes:
> > On 04/12/2020 12:25, Michael Ellerman wrote:
> >> Dmitry Vyukov <[email protected]> writes:
> >>> On Thu, Dec 3, 2020 at 10:19 AM Dmitry Vyukov <[email protected]> wrote:
> >>>> On Thu, Dec 3, 2020 at 10:10 AM Alexey Kardashevskiy <[email protected]> wrote:
> >>>>>
> >>>>> Hi!
> >>>>>
> >>>>> Syzkaller triggered WARN_ON_ONCE at
> >>>>>
> >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/tracepoint.c?h=v5.10-rc6#n266
> >>>>>
> >>>>>
> >>>>> ===
> >>>>> static int tracepoint_add_func(struct tracepoint *tp,
> >>>>> struct tracepoint_func *func, int prio)
> >>>>> {
> >>>>> struct tracepoint_func *old, *tp_funcs;
> >>>>> int ret;
> >>>>>
> >>>>> if (tp->regfunc && !static_key_enabled(&tp->key)) {
> >>>>> ret = tp->regfunc();
> >>>>> if (ret < 0)
> >>>>> return ret;
> >>>>> }
> >>>>>
> >>>>> tp_funcs = rcu_dereference_protected(tp->funcs,
> >>>>> lockdep_is_held(&tracepoints_mutex));
> >>>>> old = func_add(&tp_funcs, func, prio);
> >>>>> if (IS_ERR(old)) {
> >>>>> WARN_ON_ONCE(PTR_ERR(old) != -ENOMEM);
> >>>>> return PTR_ERR(old);
> >>>>> }
> >>>>>
> >>>>> ===
> >>>>>
> >>>>> What is the common approach here? Syzkaller reacts on this as if it was
> >>>>> a bug but WARN_ON_ONCE here seems intentional. Do we still push for
> >>>>> removing such warnings?
> >>
> >> AFAICS it is a bug if that fires.
> >>
> >> See the commit that added it:
> >> d66a270be331 ("tracepoint: Do not warn on ENOMEM")
> >>
> >> Which says:
> >> Tracepoint should only warn when a kernel API user does not respect the
> >> required preconditions (e.g. same tracepoint enabled twice,
> >
> > This says that the userspace can trigger the warning if it does not use
> > the API right.
>
> No I don't think it says that.
>
> It's saying that it should be a WARN if a *kernel* user of the
> tracepoint API violates the API. The implication is that this condition
> should never happen if the kernel is using the tracepoint API correctly,
> and so if we hit this condition it indicates a bug in the kernel that
> should be fixed.
>
> >> or called
> >> to remove a tracepoint that does not exist).
> >>
> >> Silence warning in out-of-memory conditions, given that the error is
> >> returned to the caller.
> >>
> >>
> >> So if you're seeing it then you've someone caused it to return something
> >> other than ENOMEM, and that is a bug.
> >
> > This is an userspace bug which registers the same thing twice, the
> > kernel returns a correct error. The question is should it warn by
> > WARN_ON or pr_err(). The comment in bug.h suggests pr_err() is the right
> > way, is not it?
>
> Userspace must not be able to trigger a WARN.
>
> What is the path into that code from userspace?

There are lots of info on this WARNING in the syzbot report:
https://syzkaller.appspot.com/bug?id=41f4318cf01762389f4d1c1c459da4f542fe5153
https://lore.kernel.org/lkml/[email protected]/

There are lots of sample stacks and reproducers, also happens on 4.14 and 4.19.

> Either something on that path should be checking that it's not violating
> the API and triggering the WARN, or if that's not possible/easy then the
> WARN should be removed.
>
> cheers