2021-06-26 14:00:23

by Tetsuo Handa

[permalink] [raw]
Subject: [PATCH] tracepoint: Do not warn on EEXIST or ENOENT

syzbot is hitting WARN_ON_ONCE() at tracepoint_add_func() [1], but
func_add() returning -EEXIST and func_remove() returning -ENOENT are
not kernel bugs that can justify crashing the system.

Commit d66a270be3310d7a ("tracepoint: Do not warn on ENOMEM") says that
tracepoint should only warn when a kernel API user does not respect the
required preconditions (e.g. same tracepoint enabled twice, or called
to remove a tracepoint that does not exist). But WARN*() must be used to
denote kernel bugs and not to print simple warnings. If someone wants to
print warnings, pr_warn() etc. should be used instead.

Link: https://syzkaller.appspot.com/bug?id=41f4318cf01762389f4d1c1c459da4f542fe5153 [1]
Reported-by: syzbot <[email protected]>
Signed-off-by: Tetsuo Handa <[email protected]>
Tested-by: syzbot <[email protected]>
---
kernel/tracepoint.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 9f478d29b926..3cfa37a3d05c 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -287,10 +287,8 @@ static int tracepoint_add_func(struct tracepoint *tp,
tp_funcs = rcu_dereference_protected(tp->funcs,
lockdep_is_held(&tracepoints_mutex));
old = func_add(&tp_funcs, func, prio);
- if (IS_ERR(old)) {
- WARN_ON_ONCE(PTR_ERR(old) != -ENOMEM);
+ if (IS_ERR(old))
return PTR_ERR(old);
- }

/*
* rcu_assign_pointer has as smp_store_release() which makes sure
@@ -320,7 +318,7 @@ static int tracepoint_remove_func(struct tracepoint *tp,
tp_funcs = rcu_dereference_protected(tp->funcs,
lockdep_is_held(&tracepoints_mutex));
old = func_remove(&tp_funcs, func);
- if (WARN_ON_ONCE(IS_ERR(old)))
+ if (IS_ERR(old))
return PTR_ERR(old);

if (tp_funcs == old)
--
2.18.4


2021-06-26 14:22:01

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] tracepoint: Do not warn on EEXIST or ENOENT

On Sat, 26 Jun 2021 22:58:45 +0900
Tetsuo Handa <[email protected]> wrote:

> syzbot is hitting WARN_ON_ONCE() at tracepoint_add_func() [1], but
> func_add() returning -EEXIST and func_remove() returning -ENOENT are
> not kernel bugs that can justify crashing the system.

There should be no path that registers a tracepoint twice. That's a bug
in the kernel. Looking at the link below, I see the backtrace:

Call Trace:
tracepoint_probe_register_prio kernel/tracepoint.c:369 [inline]
tracepoint_probe_register+0x9c/0xe0 kernel/tracepoint.c:389
__bpf_probe_register kernel/trace/bpf_trace.c:2154 [inline]
bpf_probe_register+0x15a/0x1c0 kernel/trace/bpf_trace.c:2159
bpf_raw_tracepoint_open+0x34a/0x720 kernel/bpf/syscall.c:2878
__do_sys_bpf+0x2586/0x4f40 kernel/bpf/syscall.c:4435
do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47

So BPF is allowing the user to register the same tracepoint more than
once? That looks to be a bug in the BPF code where it shouldn't be
allowing user space to register the same tracepoint multiple times.

If we take the patch and just error out, that is probably not what the
BPF user wants.

-- Steve



>
> Commit d66a270be3310d7a ("tracepoint: Do not warn on ENOMEM") says that
> tracepoint should only warn when a kernel API user does not respect the
> required preconditions (e.g. same tracepoint enabled twice, or called
> to remove a tracepoint that does not exist). But WARN*() must be used to
> denote kernel bugs and not to print simple warnings. If someone wants to
> print warnings, pr_warn() etc. should be used instead.
>
> Link: https://syzkaller.appspot.com/bug?id=41f4318cf01762389f4d1c1c459da4f542fe5153 [1]
> Reported-by: syzbot <[email protected]>
> Signed-off-by: Tetsuo Handa <[email protected]>
> Tested-by: syzbot <[email protected]>
> ---
> kernel/tracepoint.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index 9f478d29b926..3cfa37a3d05c 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -287,10 +287,8 @@ static int tracepoint_add_func(struct tracepoint *tp,
> tp_funcs = rcu_dereference_protected(tp->funcs,
> lockdep_is_held(&tracepoints_mutex));
> old = func_add(&tp_funcs, func, prio);
> - if (IS_ERR(old)) {
> - WARN_ON_ONCE(PTR_ERR(old) != -ENOMEM);
> + if (IS_ERR(old))
> return PTR_ERR(old);
> - }
>
> /*
> * rcu_assign_pointer has as smp_store_release() which makes sure
> @@ -320,7 +318,7 @@ static int tracepoint_remove_func(struct tracepoint *tp,
> tp_funcs = rcu_dereference_protected(tp->funcs,
> lockdep_is_held(&tracepoints_mutex));
> old = func_remove(&tp_funcs, func);
> - if (WARN_ON_ONCE(IS_ERR(old)))
> + if (IS_ERR(old))
> return PTR_ERR(old);
>
> if (tp_funcs == old)

2021-06-26 15:17:45

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH] tracepoint: Do not warn on EEXIST or ENOENT

On 2021/06/26 23:18, Steven Rostedt wrote:
> On Sat, 26 Jun 2021 22:58:45 +0900
> Tetsuo Handa <[email protected]> wrote:
>
>> syzbot is hitting WARN_ON_ONCE() at tracepoint_add_func() [1], but
>> func_add() returning -EEXIST and func_remove() returning -ENOENT are
>> not kernel bugs that can justify crashing the system.
>
> There should be no path that registers a tracepoint twice. That's a bug
> in the kernel. Looking at the link below, I see the backtrace:
>
> Call Trace:
> tracepoint_probe_register_prio kernel/tracepoint.c:369 [inline]
> tracepoint_probe_register+0x9c/0xe0 kernel/tracepoint.c:389
> __bpf_probe_register kernel/trace/bpf_trace.c:2154 [inline]
> bpf_probe_register+0x15a/0x1c0 kernel/trace/bpf_trace.c:2159
> bpf_raw_tracepoint_open+0x34a/0x720 kernel/bpf/syscall.c:2878
> __do_sys_bpf+0x2586/0x4f40 kernel/bpf/syscall.c:4435
> do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
>
> So BPF is allowing the user to register the same tracepoint more than
> once? That looks to be a bug in the BPF code where it shouldn't be
> allowing user space to register the same tracepoint multiple times.

I didn't catch your question.

(1) func_add() can reject an attempt to add same tracepoint multiple times
by returning -EINVAL to the caller.
(2) But tracepoint_add_func() (the caller of func_add()) is calling WARN_ON_ONCE()
if func_add() returned -EINVAL.
(3) And tracepoint_add_func() is triggerable via request from userspace.
(4) tracepoint_probe_register_prio() serializes tracepoint_add_func() call
triggered by concurrent request from userspace using tracepoints_mutex mutex.
(5) But tracepoint_add_func() does not check whether same tracepoint multiple
is already registered before calling func_add().
(6) As a result, tracepoint_add_func() receives -EINVAL from func_add(), and
calls WARN_ON_ONCE() and the system crashes due to panic_on_warn == 1.

Why this is a bug in the BPF code? The BPF code is not allowing userspace to
register the same tracepoint multiple times. I think that tracepoint_add_func()
is stupid enough to crash the kernel instead of rejecting when an attempt to
register the same tracepoint multiple times is made.

2021-06-26 15:18:59

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH] tracepoint: Do not warn on EEXIST or ENOENT

On 2021/06/27 0:13, Tetsuo Handa wrote:
> On 2021/06/26 23:18, Steven Rostedt wrote:
>> On Sat, 26 Jun 2021 22:58:45 +0900
>> Tetsuo Handa <[email protected]> wrote:
>>
>>> syzbot is hitting WARN_ON_ONCE() at tracepoint_add_func() [1], but
>>> func_add() returning -EEXIST and func_remove() returning -ENOENT are
>>> not kernel bugs that can justify crashing the system.
>>
>> There should be no path that registers a tracepoint twice. That's a bug
>> in the kernel. Looking at the link below, I see the backtrace:
>>
>> Call Trace:
>> tracepoint_probe_register_prio kernel/tracepoint.c:369 [inline]
>> tracepoint_probe_register+0x9c/0xe0 kernel/tracepoint.c:389
>> __bpf_probe_register kernel/trace/bpf_trace.c:2154 [inline]
>> bpf_probe_register+0x15a/0x1c0 kernel/trace/bpf_trace.c:2159
>> bpf_raw_tracepoint_open+0x34a/0x720 kernel/bpf/syscall.c:2878
>> __do_sys_bpf+0x2586/0x4f40 kernel/bpf/syscall.c:4435
>> do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
>>
>> So BPF is allowing the user to register the same tracepoint more than
>> once? That looks to be a bug in the BPF code where it shouldn't be
>> allowing user space to register the same tracepoint multiple times.
>
> I didn't catch your question.
>
> (1) func_add() can reject an attempt to add same tracepoint multiple times
> by returning -EINVAL to the caller.

Sorry, s/EINVAL/EEXIST/g on (1) (2) (6).

> (2) But tracepoint_add_func() (the caller of func_add()) is calling WARN_ON_ONCE()
> if func_add() returned -EINVAL.
> (3) And tracepoint_add_func() is triggerable via request from userspace.
> (4) tracepoint_probe_register_prio() serializes tracepoint_add_func() call
> triggered by concurrent request from userspace using tracepoints_mutex mutex.
> (5) But tracepoint_add_func() does not check whether same tracepoint multiple
> is already registered before calling func_add().
> (6) As a result, tracepoint_add_func() receives -EINVAL from func_add(), and
> calls WARN_ON_ONCE() and the system crashes due to panic_on_warn == 1.
>
> Why this is a bug in the BPF code? The BPF code is not allowing userspace to
> register the same tracepoint multiple times. I think that tracepoint_add_func()
> is stupid enough to crash the kernel instead of rejecting when an attempt to
> register the same tracepoint multiple times is made.
>

2021-06-26 15:45:09

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] tracepoint: Do not warn on EEXIST or ENOENT

On Sun, 27 Jun 2021 00:13:17 +0900
Tetsuo Handa <[email protected]> wrote:

> On 2021/06/26 23:18, Steven Rostedt wrote:
> > On Sat, 26 Jun 2021 22:58:45 +0900
> > Tetsuo Handa <[email protected]> wrote:
> >
> >> syzbot is hitting WARN_ON_ONCE() at tracepoint_add_func() [1], but
> >> func_add() returning -EEXIST and func_remove() returning -ENOENT are
> >> not kernel bugs that can justify crashing the system.
> >
> > There should be no path that registers a tracepoint twice. That's a bug
> > in the kernel. Looking at the link below, I see the backtrace:
> >
> > Call Trace:
> > tracepoint_probe_register_prio kernel/tracepoint.c:369 [inline]
> > tracepoint_probe_register+0x9c/0xe0 kernel/tracepoint.c:389
> > __bpf_probe_register kernel/trace/bpf_trace.c:2154 [inline]
> > bpf_probe_register+0x15a/0x1c0 kernel/trace/bpf_trace.c:2159
> > bpf_raw_tracepoint_open+0x34a/0x720 kernel/bpf/syscall.c:2878
> > __do_sys_bpf+0x2586/0x4f40 kernel/bpf/syscall.c:4435
> > do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
> >
> > So BPF is allowing the user to register the same tracepoint more than
> > once? That looks to be a bug in the BPF code where it shouldn't be
> > allowing user space to register the same tracepoint multiple times.
>
> I didn't catch your question.
>
> (1) func_add() can reject an attempt to add same tracepoint multiple times
> by returning -EINVAL to the caller.
> (2) But tracepoint_add_func() (the caller of func_add()) is calling WARN_ON_ONCE()
> if func_add() returned -EINVAL.

That's because (before BPF) there's no place in the kernel that tries
to register the same tracepoint multiple times, and was considered a
bug if it happened, because there's no ref counters to deal with adding
them multiple times.

If the tracepoint is already registered (with the given function and
data), then something likely went wrong.

> (3) And tracepoint_add_func() is triggerable via request from userspace.

Only via BPF correct?

I'm not sure how it works, but can't BPF catch that it is registering
the same tracepoint again?

We could add this patch, but then we need to add the WARN_ON_ONCE() to
all the other callers, because for the other callers it's a bug if the
tracepoint was registered twice with the same callback and data.

Or we can add another interface that wont warn, and BPF can use that.

> (4) tracepoint_probe_register_prio() serializes tracepoint_add_func() call
> triggered by concurrent request from userspace using tracepoints_mutex mutex.

You keep saying user space. Is it a BPF program?

> (5) But tracepoint_add_func() does not check whether same tracepoint multiple
> is already registered before calling func_add().

Because it's considered a bug in the kernel if that is the case.

> (6) As a result, tracepoint_add_func() receives -EINVAL from func_add(), and
> calls WARN_ON_ONCE() and the system crashes due to panic_on_warn == 1.
>
> Why this is a bug in the BPF code? The BPF code is not allowing userspace to
> register the same tracepoint multiple times. I think that tracepoint_add_func()

Then how is the same tracepoint being registered multiple times?

You keep saying "user space" but the only way user space is doing this
is through BPF. Or am I missing something?

> is stupid enough to crash the kernel instead of rejecting when an attempt to
> register the same tracepoint multiple times is made.

Because its a bug in the kernel, and WARN_ON_ONCE() is what is used
when you detect something that is considered a bug in the kernel. If
you don't want warnings to crash the kernel, you don't add
"panic_on_warning".

If BPF is expected to register the same tracepoint with the same
callback and data more than once, then let's add a call to do that
without warning. Like I said, other callers expect the call to succeed
unless it's out of memory, which tends to cause other problems.

FYI, this warning has caught bugs in my own code, that triggered my
tests to fail, and had me go fix that bug before pushing it further.
And my tests fail only on a full warning.

-- Steve

2021-06-26 18:25:13

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] tracepoint: Do not warn on EEXIST or ENOENT

On Sat, 26 Jun 2021 11:41:57 -0400
Steven Rostedt <[email protected]> wrote:

> If BPF is expected to register the same tracepoint with the same
> callback and data more than once, then let's add a call to do that
> without warning. Like I said, other callers expect the call to succeed
> unless it's out of memory, which tends to cause other problems.

If BPF is OK with registering the same probe more than once if user
space expects it, we can add this patch, which allows the caller (in
this case BPF) to not warn if the probe being registered is already
registered, and keeps the idea that a probe registered twice is a bug
for all other use cases.

-- Steve

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 13f65420f188..656d22cf42fc 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -36,10 +36,11 @@ struct trace_eval_map {
extern struct srcu_struct tracepoint_srcu;

extern int
-tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
+tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data,
+ bool no_warn);
extern int
tracepoint_probe_register_prio(struct tracepoint *tp, void *probe, void *data,
- int prio);
+ int prio, bool no_warn);
extern int
tracepoint_probe_unregister(struct tracepoint *tp, void *probe, void *data);
extern void
@@ -250,14 +251,16 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
register_trace_##name(void (*probe)(data_proto), void *data) \
{ \
return tracepoint_probe_register(&__tracepoint_##name, \
- (void *)probe, data); \
+ (void *)probe, data, \
+ false); \
} \
static inline int \
register_trace_prio_##name(void (*probe)(data_proto), void *data,\
int prio) \
{ \
return tracepoint_probe_register_prio(&__tracepoint_##name, \
- (void *)probe, data, prio); \
+ (void *)probe, data, \
+ prio, false); \
} \
static inline int \
unregister_trace_##name(void (*probe)(data_proto), void *data) \
diff --git a/kernel/kcsan/kcsan_test.c b/kernel/kcsan/kcsan_test.c
index 8bcffbdef3d3..b76738e61eee 100644
--- a/kernel/kcsan/kcsan_test.c
+++ b/kernel/kcsan/kcsan_test.c
@@ -1160,7 +1160,7 @@ static void register_tracepoints(struct tracepoint *tp, void *ignore)
{
check_trace_callback_type_console(probe_console);
if (!strcmp(tp->name, "console"))
- WARN_ON(tracepoint_probe_register(tp, probe_console, NULL));
+ WARN_ON(tracepoint_probe_register(tp, probe_console, NULL, false));
}

__no_kcsan
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 7a52bc172841..3d3a80db40b5 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1840,7 +1840,7 @@ static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *
if (prog->aux->max_tp_access > btp->writable_size)
return -EINVAL;

- return tracepoint_probe_register(tp, (void *)btp->bpf_func, prog);
+ return tracepoint_probe_register(tp, (void *)btp->bpf_func, prog, true);
}

int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 80e96989770e..07986569b1b9 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -500,7 +500,7 @@ int trace_event_reg(struct trace_event_call *call,
case TRACE_REG_REGISTER:
return tracepoint_probe_register(call->tp,
call->class->probe,
- file);
+ file, false);
case TRACE_REG_UNREGISTER:
tracepoint_probe_unregister(call->tp,
call->class->probe,
@@ -511,7 +511,7 @@ int trace_event_reg(struct trace_event_call *call,
case TRACE_REG_PERF_REGISTER:
return tracepoint_probe_register(call->tp,
call->class->perf_probe,
- call);
+ call, false);
case TRACE_REG_PERF_UNREGISTER:
tracepoint_probe_unregister(call->tp,
call->class->perf_probe,
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 9f478d29b926..3c3a517b229e 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -273,7 +273,8 @@ static void tracepoint_update_call(struct tracepoint *tp, struct tracepoint_func
* Add the probe function to a tracepoint.
*/
static int tracepoint_add_func(struct tracepoint *tp,
- struct tracepoint_func *func, int prio)
+ struct tracepoint_func *func, int prio,
+ bool no_warn)
{
struct tracepoint_func *old, *tp_funcs;
int ret;
@@ -288,7 +289,7 @@ static int tracepoint_add_func(struct tracepoint *tp,
lockdep_is_held(&tracepoints_mutex));
old = func_add(&tp_funcs, func, prio);
if (IS_ERR(old)) {
- WARN_ON_ONCE(PTR_ERR(old) != -ENOMEM);
+ WARN_ON_ONCE(!no_warn && PTR_ERR(old) != -ENOMEM);
return PTR_ERR(old);
}

@@ -349,6 +350,7 @@ static int tracepoint_remove_func(struct tracepoint *tp,
* @probe: probe handler
* @data: tracepoint data
* @prio: priority of this function over other registered functions
+ * @no_warn: Do not warn if the tracepoint is already registered
*
* Returns 0 if ok, error value on error.
* Note: if @tp is within a module, the caller is responsible for
@@ -357,7 +359,7 @@ static int tracepoint_remove_func(struct tracepoint *tp,
* within module exit functions.
*/
int tracepoint_probe_register_prio(struct tracepoint *tp, void *probe,
- void *data, int prio)
+ void *data, int prio, bool no_warn)
{
struct tracepoint_func tp_func;
int ret;
@@ -366,7 +368,7 @@ int tracepoint_probe_register_prio(struct tracepoint *tp, void *probe,
tp_func.func = probe;
tp_func.data = data;
tp_func.prio = prio;
- ret = tracepoint_add_func(tp, &tp_func, prio);
+ ret = tracepoint_add_func(tp, &tp_func, prio, no_warn);
mutex_unlock(&tracepoints_mutex);
return ret;
}
@@ -377,6 +379,7 @@ EXPORT_SYMBOL_GPL(tracepoint_probe_register_prio);
* @tp: tracepoint
* @probe: probe handler
* @data: tracepoint data
+ * @no_warn: Do not warn if the tracepoint is already registered
*
* Returns 0 if ok, error value on error.
* Note: if @tp is within a module, the caller is responsible for
@@ -384,9 +387,11 @@ EXPORT_SYMBOL_GPL(tracepoint_probe_register_prio);
* performed either with a tracepoint module going notifier, or from
* within module exit functions.
*/
-int tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data)
+int tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data,
+ bool no_warn)
{
- return tracepoint_probe_register_prio(tp, probe, data, TRACEPOINT_DEFAULT_PRIO);
+ return tracepoint_probe_register_prio(tp, probe, data,
+ TRACEPOINT_DEFAULT_PRIO, no_warn);
}
EXPORT_SYMBOL_GPL(tracepoint_probe_register);

diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c
index 4acf4251ee04..a9331c967690 100644
--- a/mm/kfence/kfence_test.c
+++ b/mm/kfence/kfence_test.c
@@ -820,7 +820,7 @@ static void register_tracepoints(struct tracepoint *tp, void *ignore)
{
check_trace_callback_type_console(probe_console);
if (!strcmp(tp->name, "console"))
- WARN_ON(tracepoint_probe_register(tp, probe_console, NULL));
+ WARN_ON(tracepoint_probe_register(tp, probe_console, NULL, true));
}

static void unregister_tracepoints(struct tracepoint *tp, void *ignore)

2021-06-26 18:47:07

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH] tracepoint: Do not warn on EEXIST or ENOENT


----- Steven Rostedt <[email protected]> wrote:
> On Sat, 26 Jun 2021 11:41:57 -0400
> Steven Rostedt <[email protected]> wrote:
>
> > If BPF is expected to register the same tracepoint with the same
> > callback and data more than once, then let's add a call to do that
> > without warning. Like I said, other callers expect the call to succeed
> > unless it's out of memory, which tends to cause other problems.
>
> If BPF is OK with registering the same probe more than once if user
> space expects it, we can add this patch, which allows the caller (in
> this case BPF) to not warn if the probe being registered is already
> registered, and keeps the idea that a probe registered twice is a bug
> for all other use cases.

How can removal of the duplicates be non buggy then ? The first removal will match both probes.

Thanks,

Mathieu

>
> -- Steve
>
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index 13f65420f188..656d22cf42fc 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -36,10 +36,11 @@ struct trace_eval_map {
> extern struct srcu_struct tracepoint_srcu;
>
> extern int
> -tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
> +tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data,
> + bool no_warn);
> extern int
> tracepoint_probe_register_prio(struct tracepoint *tp, void *probe, void *data,
> - int prio);
> + int prio, bool no_warn);
> extern int
> tracepoint_probe_unregister(struct tracepoint *tp, void *probe, void *data);
> extern void
> @@ -250,14 +251,16 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
> register_trace_##name(void (*probe)(data_proto), void *data) \
> { \
> return tracepoint_probe_register(&__tracepoint_##name, \
> - (void *)probe, data); \
> + (void *)probe, data, \
> + false); \
> } \
> static inline int \
> register_trace_prio_##name(void (*probe)(data_proto), void *data,\
> int prio) \
> { \
> return tracepoint_probe_register_prio(&__tracepoint_##name, \
> - (void *)probe, data, prio); \
> + (void *)probe, data, \
> + prio, false); \
> } \
> static inline int \
> unregister_trace_##name(void (*probe)(data_proto), void *data) \
> diff --git a/kernel/kcsan/kcsan_test.c b/kernel/kcsan/kcsan_test.c
> index 8bcffbdef3d3..b76738e61eee 100644
> --- a/kernel/kcsan/kcsan_test.c
> +++ b/kernel/kcsan/kcsan_test.c
> @@ -1160,7 +1160,7 @@ static void register_tracepoints(struct tracepoint *tp, void *ignore)
> {
> check_trace_callback_type_console(probe_console);
> if (!strcmp(tp->name, "console"))
> - WARN_ON(tracepoint_probe_register(tp, probe_console, NULL));
> + WARN_ON(tracepoint_probe_register(tp, probe_console, NULL, false));
> }
>
> __no_kcsan
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 7a52bc172841..3d3a80db40b5 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -1840,7 +1840,7 @@ static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *
> if (prog->aux->max_tp_access > btp->writable_size)
> return -EINVAL;
>
> - return tracepoint_probe_register(tp, (void *)btp->bpf_func, prog);
> + return tracepoint_probe_register(tp, (void *)btp->bpf_func, prog, true);
> }
>
> int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 80e96989770e..07986569b1b9 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -500,7 +500,7 @@ int trace_event_reg(struct trace_event_call *call,
> case TRACE_REG_REGISTER:
> return tracepoint_probe_register(call->tp,
> call->class->probe,
> - file);
> + file, false);
> case TRACE_REG_UNREGISTER:
> tracepoint_probe_unregister(call->tp,
> call->class->probe,
> @@ -511,7 +511,7 @@ int trace_event_reg(struct trace_event_call *call,
> case TRACE_REG_PERF_REGISTER:
> return tracepoint_probe_register(call->tp,
> call->class->perf_probe,
> - call);
> + call, false);
> case TRACE_REG_PERF_UNREGISTER:
> tracepoint_probe_unregister(call->tp,
> call->class->perf_probe,
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index 9f478d29b926..3c3a517b229e 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -273,7 +273,8 @@ static void tracepoint_update_call(struct tracepoint *tp, struct tracepoint_func
> * Add the probe function to a tracepoint.
> */
> static int tracepoint_add_func(struct tracepoint *tp,
> - struct tracepoint_func *func, int prio)
> + struct tracepoint_func *func, int prio,
> + bool no_warn)
> {
> struct tracepoint_func *old, *tp_funcs;
> int ret;
> @@ -288,7 +289,7 @@ static int tracepoint_add_func(struct tracepoint *tp,
> lockdep_is_held(&tracepoints_mutex));
> old = func_add(&tp_funcs, func, prio);
> if (IS_ERR(old)) {
> - WARN_ON_ONCE(PTR_ERR(old) != -ENOMEM);
> + WARN_ON_ONCE(!no_warn && PTR_ERR(old) != -ENOMEM);
> return PTR_ERR(old);
> }
>
> @@ -349,6 +350,7 @@ static int tracepoint_remove_func(struct tracepoint *tp,
> * @probe: probe handler
> * @data: tracepoint data
> * @prio: priority of this function over other registered functions
> + * @no_warn: Do not warn if the tracepoint is already registered
> *
> * Returns 0 if ok, error value on error.
> * Note: if @tp is within a module, the caller is responsible for
> @@ -357,7 +359,7 @@ static int tracepoint_remove_func(struct tracepoint *tp,
> * within module exit functions.
> */
> int tracepoint_probe_register_prio(struct tracepoint *tp, void *probe,
> - void *data, int prio)
> + void *data, int prio, bool no_warn)
> {
> struct tracepoint_func tp_func;
> int ret;
> @@ -366,7 +368,7 @@ int tracepoint_probe_register_prio(struct tracepoint *tp, void *probe,
> tp_func.func = probe;
> tp_func.data = data;
> tp_func.prio = prio;
> - ret = tracepoint_add_func(tp, &tp_func, prio);
> + ret = tracepoint_add_func(tp, &tp_func, prio, no_warn);
> mutex_unlock(&tracepoints_mutex);
> return ret;
> }
> @@ -377,6 +379,7 @@ EXPORT_SYMBOL_GPL(tracepoint_probe_register_prio);
> * @tp: tracepoint
> * @probe: probe handler
> * @data: tracepoint data
> + * @no_warn: Do not warn if the tracepoint is already registered
> *
> * Returns 0 if ok, error value on error.
> * Note: if @tp is within a module, the caller is responsible for
> @@ -384,9 +387,11 @@ EXPORT_SYMBOL_GPL(tracepoint_probe_register_prio);
> * performed either with a tracepoint module going notifier, or from
> * within module exit functions.
> */
> -int tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data)
> +int tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data,
> + bool no_warn)
> {
> - return tracepoint_probe_register_prio(tp, probe, data, TRACEPOINT_DEFAULT_PRIO);
> + return tracepoint_probe_register_prio(tp, probe, data,
> + TRACEPOINT_DEFAULT_PRIO, no_warn);
> }
> EXPORT_SYMBOL_GPL(tracepoint_probe_register);
>
> diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c
> index 4acf4251ee04..a9331c967690 100644
> --- a/mm/kfence/kfence_test.c
> +++ b/mm/kfence/kfence_test.c
> @@ -820,7 +820,7 @@ static void register_tracepoints(struct tracepoint *tp, void *ignore)
> {
> check_trace_callback_type_console(probe_console);
> if (!strcmp(tp->name, "console"))
> - WARN_ON(tracepoint_probe_register(tp, probe_console, NULL));
> + WARN_ON(tracepoint_probe_register(tp, probe_console, NULL, true));
> }
>
> static void unregister_tracepoints(struct tracepoint *tp, void *ignore)

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2021-06-26 23:36:28

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] tracepoint: Do not warn on EEXIST or ENOENT

On Sat, 26 Jun 2021 14:42:49 -0400 (EDT)
Mathieu Desnoyers <[email protected]> wrote:

> > If BPF is OK with registering the same probe more than once if user
> > space expects it, we can add this patch, which allows the caller (in
> > this case BPF) to not warn if the probe being registered is already
> > registered, and keeps the idea that a probe registered twice is a bug
> > for all other use cases.
>
> How can removal of the duplicates be non buggy then ? The first removal will match both probes.

The registering of the first duplicate would fail with an error, but
will not warn. There would be no unregistering needed.

-- Steve

2021-06-27 01:19:18

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH] tracepoint: Do not warn on EEXIST or ENOENT

On 2021/06/27 3:22, Steven Rostedt wrote:
>> If BPF is expected to register the same tracepoint with the same
>> callback and data more than once, then let's add a call to do that
>> without warning. Like I said, other callers expect the call to succeed
>> unless it's out of memory, which tends to cause other problems.
>
> If BPF is OK with registering the same probe more than once if user
> space expects it, we can add this patch, which allows the caller (in
> this case BPF) to not warn if the probe being registered is already
> registered, and keeps the idea that a probe registered twice is a bug
> for all other use cases.

I think BPF will not register the same tracepoint with the same callback and
data more than once, for bpf(BPF_RAW_TRACEPOINT_OPEN) cleans the request up
by calling bpf_link_cleanup() and returns -EEXIST. But I think BPF relies on
tracepoint_add_func() returning -EEXIST without crashing the kernel.

CPU: 0 PID: 16193 Comm: syz-executor.5 Not tainted 5.13.0-rc7-syzkaller #0
RIP: 0010:tracepoint_add_func+0x1fb/0xa90 kernel/tracepoint.c:291
Call Trace:
tracepoint_probe_register_prio kernel/tracepoint.c:369 [inline]
tracepoint_probe_register+0x9c/0xe0 kernel/tracepoint.c:389
__bpf_probe_register kernel/trace/bpf_trace.c:1843 [inline]
bpf_probe_register+0x15a/0x1c0 kernel/trace/bpf_trace.c:1848
bpf_raw_tracepoint_open+0x34a/0x720 kernel/bpf/syscall.c:2895
__do_sys_bpf+0x2586/0x4f40 kernel/bpf/syscall.c:4453
do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae

SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size) {
switch (cmd) {
case BPF_RAW_TRACEPOINT_OPEN:
err = bpf_raw_tracepoint_open(&attr) {
err = bpf_link_prime(&link->link, &link_primer);
if (err) {
kfree(link);
goto out_put_btp;
}
err = bpf_probe_register(link->btp, prog) {
return __bpf_probe_register(btp, prog) {
return tracepoint_probe_register(tp, (void *)btp->bpf_func, prog) {
return tracepoint_probe_register_prio(tp, probe, data, TRACEPOINT_DEFAULT_PRIO) {
mutex_lock(&tracepoints_mutex); // Serialization start.
ret = tracepoint_add_func(tp, &tp_func, prio) {
old = func_add(&tp_funcs, func, prio); // Returns -EEXIST.
if (IS_ERR(old)) {
WARN_ON_ONCE(PTR_ERR(old) != -ENOMEM); // Crashes due to warn_on_paic==1.
return PTR_ERR(old); // Returns -EEXIST.
}
}
mutex_unlock(&tracepoints_mutex); // Serialization end.
return ret; // Returns -EEXIST.
}
}
}
}
if (err) {
bpf_link_cleanup(&link_primer); // Reject if func_add() returned -EEXIST.
goto out_put_btp;
}
return bpf_link_settle(&link_primer);
}
break;
}
return ret; // Returns -EEXIST to the userspace.
}

On 2021/06/27 0:41, Steven Rostedt wrote:
>> (1) func_add() can reject an attempt to add same tracepoint multiple times
>> by returning -EEXIST to the caller.
>> (2) But tracepoint_add_func() (the caller of func_add()) is calling WARN_ON_ONCE()
>> if func_add() returned -EEXIST.
>
> That's because (before BPF) there's no place in the kernel that tries
> to register the same tracepoint multiple times, and was considered a
> bug if it happened, because there's no ref counters to deal with adding
> them multiple times.

I see. But does that make sense? Since func_add() can fail with -ENOMEM,
all places (even before BPF) needs to be prepared for failures.

>
> If the tracepoint is already registered (with the given function and
> data), then something likely went wrong.

That can be prepared on the caller side of tracepoint_add_func() rather than
tracepoint_add_func() side.

>
>> (3) And tracepoint_add_func() is triggerable via request from userspace.
>
> Only via BPF correct?
>
> I'm not sure how it works, but can't BPF catch that it is registering
> the same tracepoint again?

There is no chance to check whether some tracepoint is already registered, for
tracepoints_mutex is the only lock which gives us a chance to check whether
some tracepoint is already registered.

Should bpf() syscall hold a global lock (like tracepoints_mutex) which will serialize
the entire code in order to check whether some tracepoint is already registered?
That might severely damage concurrency.

2021-06-27 02:54:34

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] tracepoint: Do not warn on EEXIST or ENOENT

On Sun, 27 Jun 2021 10:10:24 +0900
Tetsuo Handa <[email protected]> wrote:

> On 2021/06/27 3:22, Steven Rostedt wrote:
> >> If BPF is expected to register the same tracepoint with the same
> >> callback and data more than once, then let's add a call to do that
> >> without warning. Like I said, other callers expect the call to succeed
> >> unless it's out of memory, which tends to cause other problems.
> >
> > If BPF is OK with registering the same probe more than once if user
> > space expects it, we can add this patch, which allows the caller (in
> > this case BPF) to not warn if the probe being registered is already
> > registered, and keeps the idea that a probe registered twice is a bug
> > for all other use cases.
>
> I think BPF will not register the same tracepoint with the same callback and
> data more than once, for bpf(BPF_RAW_TRACEPOINT_OPEN) cleans the request up
> by calling bpf_link_cleanup() and returns -EEXIST. But I think BPF relies on
> tracepoint_add_func() returning -EEXIST without crashing the kernel.

Which is the only user that does so, and what this patch addresses.

> > That's because (before BPF) there's no place in the kernel that tries
> > to register the same tracepoint multiple times, and was considered a
> > bug if it happened, because there's no ref counters to deal with adding
> > them multiple times.
>
> I see. But does that make sense? Since func_add() can fail with -ENOMEM,
> all places (even before BPF) needs to be prepared for failures.

Yes. -ENOMEM means that there's no resources to create a tracepoint.
But if the tracepoint already exsits, that means the accounting for
what tracepoints are running has been corrupted.

>
> >
> > If the tracepoint is already registered (with the given function and
> > data), then something likely went wrong.
>
> That can be prepared on the caller side of tracepoint_add_func() rather than
> tracepoint_add_func() side.

Not sure what you mean by that.

>
> >
> >> (3) And tracepoint_add_func() is triggerable via request from userspace.
> >
> > Only via BPF correct?
> >
> > I'm not sure how it works, but can't BPF catch that it is registering
> > the same tracepoint again?
>
> There is no chance to check whether some tracepoint is already registered, for
> tracepoints_mutex is the only lock which gives us a chance to check whether
> some tracepoint is already registered.
>
> Should bpf() syscall hold a global lock (like tracepoints_mutex) which will serialize
> the entire code in order to check whether some tracepoint is already registered?
> That might severely damage concurrency.

I think that the patch I posted handles what you want. For BPF it
returns without warning, but for all other cases, it warns. Does it fix
your issue?

-- Steve