hi,
I occasionally get following fault:
general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI
CPU: 3 PID: 28438 Comm: test_progs Tainted: G OE 6.4.0-rc3+ #448 dad92bc91c459c664b308990ada0799837010e31
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
RIP: 0010:rethook_try_get+0x34/0xf0
Code: 48 8b 47 08 85 d2 74 0b 65 8b 15 af 26 eb 7e 85 d2 74 57 48 85 c0 74 73 e8 39 8e f0 ff 84 c0 74 6a 48 8b 53 10 48 85 d2 74 >
RSP: 0018:ffffc90003ccfcf0 EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff88810920db40 RCX: 0000000000000003
RDX: 6b6b6b6b6b6b6b6b RSI: ffffffff82c0a371 RDI: ffffffff82bcbddb
RBP: ffffffff81f5a5f0 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000014000 R12: ffffffffa02ec3f2
R13: fffffffffffffff7 R14: ffffc90003ccfd38 R15: 0000000000000000
FS: 00007f2f8195eb80(0000) GS:ffff88846da00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2f819c0140 CR3: 0000000189cb8006 CR4: 0000000000770ee0
PKRU: 55555554
Call Trace:
<TASK>
fprobe_handler+0xc1/0x270
? __pfx_bpf_testmod_init+0x10/0x10 [bpf_testmod b0bc3019aa6d6bdb2afc30cf6381f510d7e5abbe]
? __pfx_bpf_testmod_init+0x10/0x10 [bpf_testmod b0bc3019aa6d6bdb2afc30cf6381f510d7e5abbe]
? bpf_fentry_test1+0x5/0x10
? bpf_fentry_test1+0x5/0x10
? bpf_testmod_init+0x22/0x80 [bpf_testmod b0bc3019aa6d6bdb2afc30cf6381f510d7e5abbe]
? do_one_initcall+0x63/0x2e0
? rcu_is_watching+0xd/0x40
? kmalloc_trace+0xaf/0xc0
? do_init_module+0x60/0x250
? __do_sys_finit_module+0xac/0x120
? do_syscall_64+0x37/0x90
? entry_SYSCALL_64_after_hwframe+0x72/0xdc
</TASK>
Modules linked in: bpf_testmod(OE+) loop bpf_preload intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul crc32c_intel >
I can't really reliable reproduce this, but while checking the code, I wonder
we should call rethook_free only after we call unregister_ftrace_function like
in the patch below
jirka
---
diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
index 18d36842faf5..0121e8c0d54e 100644
--- a/kernel/trace/fprobe.c
+++ b/kernel/trace/fprobe.c
@@ -364,19 +364,13 @@ int unregister_fprobe(struct fprobe *fp)
fp->ops.saved_func != fprobe_kprobe_handler))
return -EINVAL;
- /*
- * rethook_free() starts disabling the rethook, but the rethook handlers
- * may be running on other processors at this point. To make sure that all
- * current running handlers are finished, call unregister_ftrace_function()
- * after this.
- */
- if (fp->rethook)
- rethook_free(fp->rethook);
-
ret = unregister_ftrace_function(&fp->ops);
if (ret < 0)
return ret;
+ if (fp->rethook)
+ rethook_free(fp->rethook);
+
ftrace_free_filter(&fp->ops);
return ret;
On Wed, 7 Jun 2023 09:42:30 -0700
Jiri Olsa <[email protected]> wrote:
> I can't really reliable reproduce this, but while checking the code, I wonder
> we should call rethook_free only after we call unregister_ftrace_function like
> in the patch below
Yeah, I think you're right!
>
> jirka
>
>
> ---
> diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
> index 18d36842faf5..0121e8c0d54e 100644
> --- a/kernel/trace/fprobe.c
> +++ b/kernel/trace/fprobe.c
> @@ -364,19 +364,13 @@ int unregister_fprobe(struct fprobe *fp)
> fp->ops.saved_func != fprobe_kprobe_handler))
> return -EINVAL;
>
> - /*
> - * rethook_free() starts disabling the rethook, but the rethook handlers
> - * may be running on other processors at this point. To make sure that all
> - * current running handlers are finished, call unregister_ftrace_function()
> - * after this.
> - */
> - if (fp->rethook)
> - rethook_free(fp->rethook);
The above only waits for RCU to finish and then starts to free rethook.
This also means that something could be on the trampoline already and was
preempted. It could be that this code path gets preempted. Anyway, I don't
see how freeing rethook is safe before disabling all users.
Reviewed-by: Steven Rostedt (Google) <[email protected]>
-- Steve
> -
> ret = unregister_ftrace_function(&fp->ops);
> if (ret < 0)
> return ret;
>
> + if (fp->rethook)
> + rethook_free(fp->rethook);
> +
> ftrace_free_filter(&fp->ops);
>
> return ret;
On Tue, Jun 13, 2023 at 05:48:44PM -0400, Steven Rostedt wrote:
> On Wed, 7 Jun 2023 09:42:30 -0700
> Jiri Olsa <[email protected]> wrote:
>
> > I can't really reliable reproduce this, but while checking the code, I wonder
> > we should call rethook_free only after we call unregister_ftrace_function like
> > in the patch below
>
> Yeah, I think you're right!
>
> >
> > jirka
> >
> >
> > ---
> > diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
> > index 18d36842faf5..0121e8c0d54e 100644
> > --- a/kernel/trace/fprobe.c
> > +++ b/kernel/trace/fprobe.c
> > @@ -364,19 +364,13 @@ int unregister_fprobe(struct fprobe *fp)
> > fp->ops.saved_func != fprobe_kprobe_handler))
> > return -EINVAL;
> >
> > - /*
> > - * rethook_free() starts disabling the rethook, but the rethook handlers
> > - * may be running on other processors at this point. To make sure that all
> > - * current running handlers are finished, call unregister_ftrace_function()
> > - * after this.
> > - */
> > - if (fp->rethook)
> > - rethook_free(fp->rethook);
>
> The above only waits for RCU to finish and then starts to free rethook.
>
> This also means that something could be on the trampoline already and was
> preempted. It could be that this code path gets preempted. Anyway, I don't
> see how freeing rethook is safe before disabling all users.
>
> Reviewed-by: Steven Rostedt (Google) <[email protected]>
thanks, I'll send formal patch
jirka
>
> -- Steve
>
>
> > -
> > ret = unregister_ftrace_function(&fp->ops);
> > if (ret < 0)
> > return ret;
> >
> > + if (fp->rethook)
> > + rethook_free(fp->rethook);
> > +
> > ftrace_free_filter(&fp->ops);
> >
> > return ret;
>