Making arch_cpu_idle and rcu_idle_exit noinstr. Both functions run
in rcu 'not watching' context and if there's tracer attached to
them, which uses rcu (e.g. kprobe multi interface) it will hit RCU
warning like:
[ 3.017540] WARNING: suspicious RCU usage
...
[ 3.018363] kprobe_multi_link_handler+0x68/0x1c0
[ 3.018364] ? kprobe_multi_link_handler+0x3e/0x1c0
[ 3.018366] ? arch_cpu_idle_dead+0x10/0x10
[ 3.018367] ? arch_cpu_idle_dead+0x10/0x10
[ 3.018371] fprobe_handler.part.0+0xab/0x150
[ 3.018374] 0xffffffffa00080c8
[ 3.018393] ? arch_cpu_idle+0x5/0x10
[ 3.018398] arch_cpu_idle+0x5/0x10
[ 3.018399] default_idle_call+0x59/0x90
[ 3.018401] do_idle+0x1c3/0x1d0
The call path is following:
default_idle_call
rcu_idle_enter
arch_cpu_idle
rcu_idle_exit
The arch_cpu_idle and rcu_idle_exit are the only ones from above
path that are traceble and cause this problem on my setup.
Signed-off-by: Jiri Olsa <[email protected]>
---
arch/x86/kernel/process.c | 2 +-
kernel/rcu/tree.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index b370767f5b19..1345cb0124a6 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -720,7 +720,7 @@ void arch_cpu_idle_dead(void)
/*
* Called from the generic idle code.
*/
-void arch_cpu_idle(void)
+void noinstr arch_cpu_idle(void)
{
x86_idle();
}
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a4b8189455d5..20d529722f51 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -896,7 +896,7 @@ static void noinstr rcu_eqs_exit(bool user)
* If you add or remove a call to rcu_idle_exit(), be sure to test with
* CONFIG_RCU_EQS_DEBUG=y.
*/
-void rcu_idle_exit(void)
+void noinstr rcu_idle_exit(void)
{
unsigned long flags;
--
2.35.3
On Sun, May 15, 2022 at 10:36:52PM +0200, Jiri Olsa wrote:
> Making arch_cpu_idle and rcu_idle_exit noinstr. Both functions run
> in rcu 'not watching' context and if there's tracer attached to
> them, which uses rcu (e.g. kprobe multi interface) it will hit RCU
> warning like:
>
> [ 3.017540] WARNING: suspicious RCU usage
> ...
> [ 3.018363] kprobe_multi_link_handler+0x68/0x1c0
> [ 3.018364] ? kprobe_multi_link_handler+0x3e/0x1c0
> [ 3.018366] ? arch_cpu_idle_dead+0x10/0x10
> [ 3.018367] ? arch_cpu_idle_dead+0x10/0x10
> [ 3.018371] fprobe_handler.part.0+0xab/0x150
> [ 3.018374] 0xffffffffa00080c8
> [ 3.018393] ? arch_cpu_idle+0x5/0x10
> [ 3.018398] arch_cpu_idle+0x5/0x10
> [ 3.018399] default_idle_call+0x59/0x90
> [ 3.018401] do_idle+0x1c3/0x1d0
>
> The call path is following:
>
> default_idle_call
> rcu_idle_enter
> arch_cpu_idle
> rcu_idle_exit
>
> The arch_cpu_idle and rcu_idle_exit are the only ones from above
> path that are traceble and cause this problem on my setup.
>
> Signed-off-by: Jiri Olsa <[email protected]>
From an RCU viewpoint:
Reviewed-by: Paul E. McKenney <[email protected]>
[ I considered asking for an instrumentation_on() in rcu_idle_exit(),
but there is no point given that local_irq_restore() isn't something
you instrument anyway. ]
> ---
> arch/x86/kernel/process.c | 2 +-
> kernel/rcu/tree.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index b370767f5b19..1345cb0124a6 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -720,7 +720,7 @@ void arch_cpu_idle_dead(void)
> /*
> * Called from the generic idle code.
> */
> -void arch_cpu_idle(void)
> +void noinstr arch_cpu_idle(void)
> {
> x86_idle();
> }
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index a4b8189455d5..20d529722f51 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -896,7 +896,7 @@ static void noinstr rcu_eqs_exit(bool user)
> * If you add or remove a call to rcu_idle_exit(), be sure to test with
> * CONFIG_RCU_EQS_DEBUG=y.
> */
> -void rcu_idle_exit(void)
> +void noinstr rcu_idle_exit(void)
> {
> unsigned long flags;
>
> --
> 2.35.3
>
On Mon, May 16, 2022 at 01:49:22PM +0200, Frederic Weisbecker wrote:
> On Sun, May 15, 2022 at 09:25:35PM -0700, Paul E. McKenney wrote:
> > On Sun, May 15, 2022 at 10:36:52PM +0200, Jiri Olsa wrote:
> > > Making arch_cpu_idle and rcu_idle_exit noinstr. Both functions run
> > > in rcu 'not watching' context and if there's tracer attached to
> > > them, which uses rcu (e.g. kprobe multi interface) it will hit RCU
> > > warning like:
> > >
> > > [ 3.017540] WARNING: suspicious RCU usage
> > > ...
> > > [ 3.018363] kprobe_multi_link_handler+0x68/0x1c0
> > > [ 3.018364] ? kprobe_multi_link_handler+0x3e/0x1c0
> > > [ 3.018366] ? arch_cpu_idle_dead+0x10/0x10
> > > [ 3.018367] ? arch_cpu_idle_dead+0x10/0x10
> > > [ 3.018371] fprobe_handler.part.0+0xab/0x150
> > > [ 3.018374] 0xffffffffa00080c8
> > > [ 3.018393] ? arch_cpu_idle+0x5/0x10
> > > [ 3.018398] arch_cpu_idle+0x5/0x10
> > > [ 3.018399] default_idle_call+0x59/0x90
> > > [ 3.018401] do_idle+0x1c3/0x1d0
> > >
> > > The call path is following:
> > >
> > > default_idle_call
> > > rcu_idle_enter
> > > arch_cpu_idle
> > > rcu_idle_exit
> > >
> > > The arch_cpu_idle and rcu_idle_exit are the only ones from above
> > > path that are traceble and cause this problem on my setup.
> > >
> > > Signed-off-by: Jiri Olsa <[email protected]>
> >
> > From an RCU viewpoint:
> >
> > Reviewed-by: Paul E. McKenney <[email protected]>
> >
> > [ I considered asking for an instrumentation_on() in rcu_idle_exit(),
> > but there is no point given that local_irq_restore() isn't something
> > you instrument anyway. ]
>
> So local_irq_save() in the beginning of rcu_idle_exit() is unsafe because
> it is instrumentable by the function (graph) tracers and the irqsoff tracer.
>
> Also it calls into lockdep that might make use of RCU.
>
> That's why rcu_idle_exit() is not noinstr yet. See this patch:
>
> https://lore.kernel.org/lkml/[email protected]/
Ah, I should have looked at the context-tracking series again!
And I have to ask... How much debugging capability are we really losing
by continuing to use the raw versions of local_irq_{save,restore}()?
Thanx, Paul
> Thanks.
>
> >
> > > ---
> > > arch/x86/kernel/process.c | 2 +-
> > > kernel/rcu/tree.c | 2 +-
> > > 2 files changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > > index b370767f5b19..1345cb0124a6 100644
> > > --- a/arch/x86/kernel/process.c
> > > +++ b/arch/x86/kernel/process.c
> > > @@ -720,7 +720,7 @@ void arch_cpu_idle_dead(void)
> > > /*
> > > * Called from the generic idle code.
> > > */
> > > -void arch_cpu_idle(void)
> > > +void noinstr arch_cpu_idle(void)
> > > {
> > > x86_idle();
> > > }
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index a4b8189455d5..20d529722f51 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -896,7 +896,7 @@ static void noinstr rcu_eqs_exit(bool user)
> > > * If you add or remove a call to rcu_idle_exit(), be sure to test with
> > > * CONFIG_RCU_EQS_DEBUG=y.
> > > */
> > > -void rcu_idle_exit(void)
> > > +void noinstr rcu_idle_exit(void)
> > > {
> > > unsigned long flags;
> > >
> > > --
> > > 2.35.3
> > >
On Sun, May 15, 2022 at 09:25:35PM -0700, Paul E. McKenney wrote:
> On Sun, May 15, 2022 at 10:36:52PM +0200, Jiri Olsa wrote:
> > Making arch_cpu_idle and rcu_idle_exit noinstr. Both functions run
> > in rcu 'not watching' context and if there's tracer attached to
> > them, which uses rcu (e.g. kprobe multi interface) it will hit RCU
> > warning like:
> >
> > [ 3.017540] WARNING: suspicious RCU usage
> > ...
> > [ 3.018363] kprobe_multi_link_handler+0x68/0x1c0
> > [ 3.018364] ? kprobe_multi_link_handler+0x3e/0x1c0
> > [ 3.018366] ? arch_cpu_idle_dead+0x10/0x10
> > [ 3.018367] ? arch_cpu_idle_dead+0x10/0x10
> > [ 3.018371] fprobe_handler.part.0+0xab/0x150
> > [ 3.018374] 0xffffffffa00080c8
> > [ 3.018393] ? arch_cpu_idle+0x5/0x10
> > [ 3.018398] arch_cpu_idle+0x5/0x10
> > [ 3.018399] default_idle_call+0x59/0x90
> > [ 3.018401] do_idle+0x1c3/0x1d0
> >
> > The call path is following:
> >
> > default_idle_call
> > rcu_idle_enter
> > arch_cpu_idle
> > rcu_idle_exit
> >
> > The arch_cpu_idle and rcu_idle_exit are the only ones from above
> > path that are traceble and cause this problem on my setup.
> >
> > Signed-off-by: Jiri Olsa <[email protected]>
>
> From an RCU viewpoint:
>
> Reviewed-by: Paul E. McKenney <[email protected]>
>
> [ I considered asking for an instrumentation_on() in rcu_idle_exit(),
> but there is no point given that local_irq_restore() isn't something
> you instrument anyway. ]
So local_irq_save() in the beginning of rcu_idle_exit() is unsafe because
it is instrumentable by the function (graph) tracers and the irqsoff tracer.
Also it calls into lockdep that might make use of RCU.
That's why rcu_idle_exit() is not noinstr yet. See this patch:
https://lore.kernel.org/lkml/[email protected]/
Thanks.
>
> > ---
> > arch/x86/kernel/process.c | 2 +-
> > kernel/rcu/tree.c | 2 +-
> > 2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > index b370767f5b19..1345cb0124a6 100644
> > --- a/arch/x86/kernel/process.c
> > +++ b/arch/x86/kernel/process.c
> > @@ -720,7 +720,7 @@ void arch_cpu_idle_dead(void)
> > /*
> > * Called from the generic idle code.
> > */
> > -void arch_cpu_idle(void)
> > +void noinstr arch_cpu_idle(void)
> > {
> > x86_idle();
> > }
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index a4b8189455d5..20d529722f51 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -896,7 +896,7 @@ static void noinstr rcu_eqs_exit(bool user)
> > * If you add or remove a call to rcu_idle_exit(), be sure to test with
> > * CONFIG_RCU_EQS_DEBUG=y.
> > */
> > -void rcu_idle_exit(void)
> > +void noinstr rcu_idle_exit(void)
> > {
> > unsigned long flags;
> >
> > --
> > 2.35.3
> >
On 5/15/22 1:36 PM, Jiri Olsa wrote:
> Making arch_cpu_idle and rcu_idle_exit noinstr. Both functions run
> in rcu 'not watching' context and if there's tracer attached to
> them, which uses rcu (e.g. kprobe multi interface) it will hit RCU
> warning like:
>
> [ 3.017540] WARNING: suspicious RCU usage
> ...
> [ 3.018363] kprobe_multi_link_handler+0x68/0x1c0
> [ 3.018364] ? kprobe_multi_link_handler+0x3e/0x1c0
> [ 3.018366] ? arch_cpu_idle_dead+0x10/0x10
> [ 3.018367] ? arch_cpu_idle_dead+0x10/0x10
> [ 3.018371] fprobe_handler.part.0+0xab/0x150
> [ 3.018374] 0xffffffffa00080c8
> [ 3.018393] ? arch_cpu_idle+0x5/0x10
> [ 3.018398] arch_cpu_idle+0x5/0x10
> [ 3.018399] default_idle_call+0x59/0x90
> [ 3.018401] do_idle+0x1c3/0x1d0
>
> The call path is following:
>
> default_idle_call
> rcu_idle_enter
> arch_cpu_idle
> rcu_idle_exit
>
> The arch_cpu_idle and rcu_idle_exit are the only ones from above
> path that are traceble and cause this problem on my setup.
>
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> arch/x86/kernel/process.c | 2 +-
> kernel/rcu/tree.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index b370767f5b19..1345cb0124a6 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -720,7 +720,7 @@ void arch_cpu_idle_dead(void)
> /*
> * Called from the generic idle code.
> */
> -void arch_cpu_idle(void)
> +void noinstr arch_cpu_idle(void)
noinstr includes a lot of attributes:
#define noinstr \
noinline notrace __attribute((__section__(".noinstr.text"))) \
__no_kcsan __no_sanitize_address __no_profile
__no_sanitize_coverage
should we use notrace here?
> {
> x86_idle();
> }
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index a4b8189455d5..20d529722f51 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -896,7 +896,7 @@ static void noinstr rcu_eqs_exit(bool user)
> * If you add or remove a call to rcu_idle_exit(), be sure to test with
> * CONFIG_RCU_EQS_DEBUG=y.
> */
> -void rcu_idle_exit(void)
> +void noinstr rcu_idle_exit(void)
> {
> unsigned long flags;
>
On Mon, May 16, 2022 at 07:54:37PM -0700, Yonghong Song wrote:
>
>
> On 5/15/22 1:36 PM, Jiri Olsa wrote:
> > Making arch_cpu_idle and rcu_idle_exit noinstr. Both functions run
> > in rcu 'not watching' context and if there's tracer attached to
> > them, which uses rcu (e.g. kprobe multi interface) it will hit RCU
> > warning like:
> >
> > [ 3.017540] WARNING: suspicious RCU usage
> > ...
> > [ 3.018363] kprobe_multi_link_handler+0x68/0x1c0
> > [ 3.018364] ? kprobe_multi_link_handler+0x3e/0x1c0
> > [ 3.018366] ? arch_cpu_idle_dead+0x10/0x10
> > [ 3.018367] ? arch_cpu_idle_dead+0x10/0x10
> > [ 3.018371] fprobe_handler.part.0+0xab/0x150
> > [ 3.018374] 0xffffffffa00080c8
> > [ 3.018393] ? arch_cpu_idle+0x5/0x10
> > [ 3.018398] arch_cpu_idle+0x5/0x10
> > [ 3.018399] default_idle_call+0x59/0x90
> > [ 3.018401] do_idle+0x1c3/0x1d0
> >
> > The call path is following:
> >
> > default_idle_call
> > rcu_idle_enter
> > arch_cpu_idle
> > rcu_idle_exit
> >
> > The arch_cpu_idle and rcu_idle_exit are the only ones from above
> > path that are traceble and cause this problem on my setup.
> >
> > Signed-off-by: Jiri Olsa <[email protected]>
> > ---
> > arch/x86/kernel/process.c | 2 +-
> > kernel/rcu/tree.c | 2 +-
> > 2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > index b370767f5b19..1345cb0124a6 100644
> > --- a/arch/x86/kernel/process.c
> > +++ b/arch/x86/kernel/process.c
> > @@ -720,7 +720,7 @@ void arch_cpu_idle_dead(void)
> > /*
> > * Called from the generic idle code.
> > */
> > -void arch_cpu_idle(void)
> > +void noinstr arch_cpu_idle(void)
>
> noinstr includes a lot of attributes:
>
> #define noinstr \
> noinline notrace __attribute((__section__(".noinstr.text"))) \
> __no_kcsan __no_sanitize_address __no_profile __no_sanitize_coverage
>
> should we use notrace here?
hm right, so notrace should be enough for our case (kprobe_multi)
which is based on ftrace/fprobe jump
noinstr (among other things) adds the function also the kprobes
blacklist, which will prevent standard kprobes to attach
ASAICS standard kprobes use rcu in probe path as well, like in
opt_pre_handler function
so I think we should go with noinstr
jirka
>
> > {
> > x86_idle();
> > }
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index a4b8189455d5..20d529722f51 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -896,7 +896,7 @@ static void noinstr rcu_eqs_exit(bool user)
> > * If you add or remove a call to rcu_idle_exit(), be sure to test with
> > * CONFIG_RCU_EQS_DEBUG=y.
> > */
> > -void rcu_idle_exit(void)
> > +void noinstr rcu_idle_exit(void)
> > {
> > unsigned long flags;
On Mon, May 16, 2022 at 01:49:22PM +0200, Frederic Weisbecker wrote:
> On Sun, May 15, 2022 at 09:25:35PM -0700, Paul E. McKenney wrote:
> > On Sun, May 15, 2022 at 10:36:52PM +0200, Jiri Olsa wrote:
> > > Making arch_cpu_idle and rcu_idle_exit noinstr. Both functions run
> > > in rcu 'not watching' context and if there's tracer attached to
> > > them, which uses rcu (e.g. kprobe multi interface) it will hit RCU
> > > warning like:
> > >
> > > [ 3.017540] WARNING: suspicious RCU usage
> > > ...
> > > [ 3.018363] kprobe_multi_link_handler+0x68/0x1c0
> > > [ 3.018364] ? kprobe_multi_link_handler+0x3e/0x1c0
> > > [ 3.018366] ? arch_cpu_idle_dead+0x10/0x10
> > > [ 3.018367] ? arch_cpu_idle_dead+0x10/0x10
> > > [ 3.018371] fprobe_handler.part.0+0xab/0x150
> > > [ 3.018374] 0xffffffffa00080c8
> > > [ 3.018393] ? arch_cpu_idle+0x5/0x10
> > > [ 3.018398] arch_cpu_idle+0x5/0x10
> > > [ 3.018399] default_idle_call+0x59/0x90
> > > [ 3.018401] do_idle+0x1c3/0x1d0
> > >
> > > The call path is following:
> > >
> > > default_idle_call
> > > rcu_idle_enter
> > > arch_cpu_idle
> > > rcu_idle_exit
> > >
> > > The arch_cpu_idle and rcu_idle_exit are the only ones from above
> > > path that are traceble and cause this problem on my setup.
> > >
> > > Signed-off-by: Jiri Olsa <[email protected]>
> >
> > From an RCU viewpoint:
> >
> > Reviewed-by: Paul E. McKenney <[email protected]>
> >
> > [ I considered asking for an instrumentation_on() in rcu_idle_exit(),
> > but there is no point given that local_irq_restore() isn't something
> > you instrument anyway. ]
>
> So local_irq_save() in the beginning of rcu_idle_exit() is unsafe because
> it is instrumentable by the function (graph) tracers and the irqsoff tracer.
>
> Also it calls into lockdep that might make use of RCU.
>
> That's why rcu_idle_exit() is not noinstr yet. See this patch:
>
> https://lore.kernel.org/lkml/[email protected]/
I see, could we mark it at least with notrace meanwhile?
jirka
Hi Jiri,
I love your patch! Perhaps something to improve:
[auto build test WARNING on bpf-next/master]
url: https://github.com/intel-lab-lkp/linux/commits/Jiri-Olsa/cpuidle-rcu-Making-arch_cpu_idle-and-rcu_idle_exit-noinstr/20220516-043752
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: x86_64-randconfig-a014-20220516 (https://download.01.org/0day-ci/archive/20220517/[email protected]/config)
compiler: gcc-11 (Debian 11.2.0-20) 11.2.0
reproduce (this is a W=1 build):
# https://github.com/intel-lab-lkp/linux/commit/0b6fee32d730f621f2bfc4d8d9f0729814398415
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Jiri-Olsa/cpuidle-rcu-Making-arch_cpu_idle-and-rcu_idle_exit-noinstr/20220516-043752
git checkout 0b6fee32d730f621f2bfc4d8d9f0729814398415
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>
All warnings (new ones prefixed by >>):
vmlinux.o: warning: objtool: vmx_l1d_flush()+0x13: call to static_key_count() leaves .noinstr.text section
vmlinux.o: warning: objtool: vmx_vcpu_enter_exit()+0x29: call to static_key_count() leaves .noinstr.text section
vmlinux.o: warning: objtool: vmx_update_host_rsp()+0x3e: call to static_key_count() leaves .noinstr.text section
vmlinux.o: warning: objtool: arch_cpu_idle()+0xb: call to {dynamic}() leaves .noinstr.text section
>> vmlinux.o: warning: objtool: rcu_idle_exit()+0x25: call to trace_hardirqs_on() leaves .noinstr.text section
vmlinux.o: warning: objtool: enter_from_user_mode()+0x1c: call to __kcsan_check_access() leaves .noinstr.text section
vmlinux.o: warning: objtool: syscall_enter_from_user_mode()+0x21: call to __kcsan_check_access() leaves .noinstr.text section
vmlinux.o: warning: objtool: syscall_enter_from_user_mode_prepare()+0x1c: call to __kcsan_check_access() leaves .noinstr.text section
vmlinux.o: warning: objtool: exit_to_user_mode()+0x1b: call to static_key_count() leaves .noinstr.text section
vmlinux.o: warning: objtool: syscall_exit_to_user_mode()+0x36: call to static_key_count() leaves .noinstr.text section
vmlinux.o: warning: objtool: irqentry_enter_from_user_mode()+0x1c: call to __kcsan_check_access() leaves .noinstr.text section
vmlinux.o: warning: objtool: irqentry_exit_to_user_mode()+0x22: call to static_key_count() leaves .noinstr.text section
--
0-DAY CI Kernel Test Service
https://01.org/lkp
Hi Jiri,
I love your patch! Perhaps something to improve:
[auto build test WARNING on bpf-next/master]
url: https://github.com/intel-lab-lkp/linux/commits/Jiri-Olsa/cpuidle-rcu-Making-arch_cpu_idle-and-rcu_idle_exit-noinstr/20220516-043752
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: x86_64-randconfig-c002-20220516 (https://download.01.org/0day-ci/archive/20220517/[email protected]/config)
compiler: gcc-11 (Debian 11.2.0-20) 11.2.0
reproduce (this is a W=1 build):
# https://github.com/intel-lab-lkp/linux/commit/0b6fee32d730f621f2bfc4d8d9f0729814398415
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Jiri-Olsa/cpuidle-rcu-Making-arch_cpu_idle-and-rcu_idle_exit-noinstr/20220516-043752
git checkout 0b6fee32d730f621f2bfc4d8d9f0729814398415
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>
All warnings (new ones prefixed by >>):
>> vmlinux.o: warning: objtool: arch_cpu_idle()+0x0: call to {dynamic}() leaves .noinstr.text section
>> vmlinux.o: warning: objtool: rcu_idle_exit()+0x16: call to trace_hardirqs_off() leaves .noinstr.text section
vmlinux.o: warning: objtool: enter_from_user_mode()+0x18: call to __kcsan_check_access() leaves .noinstr.text section
vmlinux.o: warning: objtool: syscall_enter_from_user_mode()+0x1d: call to __kcsan_check_access() leaves .noinstr.text section
vmlinux.o: warning: objtool: syscall_enter_from_user_mode_prepare()+0x18: call to __kcsan_check_access() leaves .noinstr.text section
vmlinux.o: warning: objtool: irqentry_enter_from_user_mode()+0x18: call to __kcsan_check_access() leaves .noinstr.text section
--
0-DAY CI Kernel Test Service
https://01.org/lkp
On Tue, May 17, 2022 at 12:13:45PM +0200, Jiri Olsa wrote:
> On Mon, May 16, 2022 at 01:49:22PM +0200, Frederic Weisbecker wrote:
> > On Sun, May 15, 2022 at 09:25:35PM -0700, Paul E. McKenney wrote:
> > > On Sun, May 15, 2022 at 10:36:52PM +0200, Jiri Olsa wrote:
> > > > Making arch_cpu_idle and rcu_idle_exit noinstr. Both functions run
> > > > in rcu 'not watching' context and if there's tracer attached to
> > > > them, which uses rcu (e.g. kprobe multi interface) it will hit RCU
> > > > warning like:
> > > >
> > > > [ 3.017540] WARNING: suspicious RCU usage
> > > > ...
> > > > [ 3.018363] kprobe_multi_link_handler+0x68/0x1c0
> > > > [ 3.018364] ? kprobe_multi_link_handler+0x3e/0x1c0
> > > > [ 3.018366] ? arch_cpu_idle_dead+0x10/0x10
> > > > [ 3.018367] ? arch_cpu_idle_dead+0x10/0x10
> > > > [ 3.018371] fprobe_handler.part.0+0xab/0x150
> > > > [ 3.018374] 0xffffffffa00080c8
> > > > [ 3.018393] ? arch_cpu_idle+0x5/0x10
> > > > [ 3.018398] arch_cpu_idle+0x5/0x10
> > > > [ 3.018399] default_idle_call+0x59/0x90
> > > > [ 3.018401] do_idle+0x1c3/0x1d0
> > > >
> > > > The call path is following:
> > > >
> > > > default_idle_call
> > > > rcu_idle_enter
> > > > arch_cpu_idle
> > > > rcu_idle_exit
> > > >
> > > > The arch_cpu_idle and rcu_idle_exit are the only ones from above
> > > > path that are traceble and cause this problem on my setup.
> > > >
> > > > Signed-off-by: Jiri Olsa <[email protected]>
> > >
> > > From an RCU viewpoint:
> > >
> > > Reviewed-by: Paul E. McKenney <[email protected]>
> > >
> > > [ I considered asking for an instrumentation_on() in rcu_idle_exit(),
> > > but there is no point given that local_irq_restore() isn't something
> > > you instrument anyway. ]
> >
> > So local_irq_save() in the beginning of rcu_idle_exit() is unsafe because
> > it is instrumentable by the function (graph) tracers and the irqsoff tracer.
> >
> > Also it calls into lockdep that might make use of RCU.
> >
> > That's why rcu_idle_exit() is not noinstr yet. See this patch:
> >
> > https://lore.kernel.org/lkml/[email protected]/
>
> I see, could we mark it at least with notrace meanwhile?
For the RCU part, how about as follows?
If this approach is reasonable, my guess would be that Frederic will pull
it into his context-tracking series, perhaps using a revert of this patch
to maintain sanity in the near term.
If this approach is unreasonable, well, that is Murphy for you!
For the x86 idle part, my feeling is still that the rcu_idle_enter()
and rcu_idle_exit() need to be pushed deeper into the code. Perhaps
an ongoing process as the idle loop continues to be dug deeper?
Thanx, Paul
------------------------------------------------------------------------
commit cd338be719a0a692e0d50e1a8438e1f6c7165d9c
Author: Paul E. McKenney <[email protected]>
Date: Tue May 17 21:00:04 2022 -0700
rcu: Apply noinstr to rcu_idle_enter() and rcu_idle_exit()
This commit applies the "noinstr" tag to the rcu_idle_enter() and
rcu_idle_exit() functions, which are invoked from portions of the idle
loop that cannot be instrumented. These tags require reworking the
rcu_eqs_enter() and rcu_eqs_exit() functions that these two functions
invoke in order to cause them to use normal assertions rather than
lockdep. In addition, within rcu_idle_exit(), the raw versions of
local_irq_save() and local_irq_restore() are used, again to avoid issues
with lockdep in uninstrumented code.
This patch is based in part on an earlier patch by Jiri Olsa, discussions
with Peter Zijlstra and Frederic Weisbecker, earlier changes by Thomas
Gleixner, and off-list discussions with Yonghong Song.
Link: https://lore.kernel.org/lkml/[email protected]/
Reported-by: Jiri Olsa <[email protected]>
Reported-by: Alexei Starovoitov <[email protected]>
Reported-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Reviewed-by: Yonghong Song <[email protected]>
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 222d59299a2af..02233b17cce0e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -635,8 +635,8 @@ static noinstr void rcu_eqs_enter(bool user)
return;
}
- lockdep_assert_irqs_disabled();
instrumentation_begin();
+ lockdep_assert_irqs_disabled();
trace_rcu_dyntick(TPS("Start"), rdp->dynticks_nesting, 0, atomic_read(&rdp->dynticks));
WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current));
rcu_preempt_deferred_qs(current);
@@ -663,9 +663,9 @@ static noinstr void rcu_eqs_enter(bool user)
* If you add or remove a call to rcu_idle_enter(), be sure to test with
* CONFIG_RCU_EQS_DEBUG=y.
*/
-void rcu_idle_enter(void)
+void noinstr rcu_idle_enter(void)
{
- lockdep_assert_irqs_disabled();
+ WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !raw_irqs_disabled());
rcu_eqs_enter(false);
}
EXPORT_SYMBOL_GPL(rcu_idle_enter);
@@ -865,7 +865,7 @@ static void noinstr rcu_eqs_exit(bool user)
struct rcu_data *rdp;
long oldval;
- lockdep_assert_irqs_disabled();
+ WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !raw_irqs_disabled());
rdp = this_cpu_ptr(&rcu_data);
oldval = rdp->dynticks_nesting;
WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && oldval < 0);
@@ -900,13 +900,13 @@ static void noinstr rcu_eqs_exit(bool user)
* If you add or remove a call to rcu_idle_exit(), be sure to test with
* CONFIG_RCU_EQS_DEBUG=y.
*/
-void rcu_idle_exit(void)
+void noinstr rcu_idle_exit(void)
{
unsigned long flags;
- local_irq_save(flags);
+ raw_local_irq_save(flags);
rcu_eqs_exit(false);
- local_irq_restore(flags);
+ raw_local_irq_restore(flags);
}
EXPORT_SYMBOL_GPL(rcu_idle_exit);
On Tue, 17 May 2022 09:49:33 +0200
Jiri Olsa <[email protected]> wrote:
> On Mon, May 16, 2022 at 07:54:37PM -0700, Yonghong Song wrote:
> >
> >
> > On 5/15/22 1:36 PM, Jiri Olsa wrote:
> > > Making arch_cpu_idle and rcu_idle_exit noinstr. Both functions run
> > > in rcu 'not watching' context and if there's tracer attached to
> > > them, which uses rcu (e.g. kprobe multi interface) it will hit RCU
> > > warning like:
> > >
> > > [ 3.017540] WARNING: suspicious RCU usage
> > > ...
> > > [ 3.018363] kprobe_multi_link_handler+0x68/0x1c0
> > > [ 3.018364] ? kprobe_multi_link_handler+0x3e/0x1c0
> > > [ 3.018366] ? arch_cpu_idle_dead+0x10/0x10
> > > [ 3.018367] ? arch_cpu_idle_dead+0x10/0x10
> > > [ 3.018371] fprobe_handler.part.0+0xab/0x150
> > > [ 3.018374] 0xffffffffa00080c8
> > > [ 3.018393] ? arch_cpu_idle+0x5/0x10
> > > [ 3.018398] arch_cpu_idle+0x5/0x10
> > > [ 3.018399] default_idle_call+0x59/0x90
> > > [ 3.018401] do_idle+0x1c3/0x1d0
> > >
> > > The call path is following:
> > >
> > > default_idle_call
> > > rcu_idle_enter
> > > arch_cpu_idle
> > > rcu_idle_exit
> > >
> > > The arch_cpu_idle and rcu_idle_exit are the only ones from above
> > > path that are traceble and cause this problem on my setup.
> > >
> > > Signed-off-by: Jiri Olsa <[email protected]>
> > > ---
> > > arch/x86/kernel/process.c | 2 +-
> > > kernel/rcu/tree.c | 2 +-
> > > 2 files changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > > index b370767f5b19..1345cb0124a6 100644
> > > --- a/arch/x86/kernel/process.c
> > > +++ b/arch/x86/kernel/process.c
> > > @@ -720,7 +720,7 @@ void arch_cpu_idle_dead(void)
> > > /*
> > > * Called from the generic idle code.
> > > */
> > > -void arch_cpu_idle(void)
> > > +void noinstr arch_cpu_idle(void)
> >
> > noinstr includes a lot of attributes:
> >
> > #define noinstr \
> > noinline notrace __attribute((__section__(".noinstr.text"))) \
> > __no_kcsan __no_sanitize_address __no_profile __no_sanitize_coverage
> >
> > should we use notrace here?
>
> hm right, so notrace should be enough for our case (kprobe_multi)
> which is based on ftrace/fprobe jump
>
> noinstr (among other things) adds the function also the kprobes
> blacklist, which will prevent standard kprobes to attach
>
> ASAICS standard kprobes use rcu in probe path as well, like in
> opt_pre_handler function
>
> so I think we should go with noinstr
Yes, I agree that noinstr is preferrable for these functions.
Thank you!
>
> jirka
>
> >
> > > {
> > > x86_idle();
> > > }
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index a4b8189455d5..20d529722f51 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -896,7 +896,7 @@ static void noinstr rcu_eqs_exit(bool user)
> > > * If you add or remove a call to rcu_idle_exit(), be sure to test with
> > > * CONFIG_RCU_EQS_DEBUG=y.
> > > */
> > > -void rcu_idle_exit(void)
> > > +void noinstr rcu_idle_exit(void)
> > > {
> > > unsigned long flags;
--
Masami Hiramatsu <[email protected]>
On Thu, May 19, 2022 at 01:33:16PM +0200, Jiri Olsa wrote:
> On Wed, May 18, 2022 at 09:21:18AM -0700, Paul E. McKenney wrote:
> > On Tue, May 17, 2022 at 12:13:45PM +0200, Jiri Olsa wrote:
> > > On Mon, May 16, 2022 at 01:49:22PM +0200, Frederic Weisbecker wrote:
> > > > On Sun, May 15, 2022 at 09:25:35PM -0700, Paul E. McKenney wrote:
> > > > > On Sun, May 15, 2022 at 10:36:52PM +0200, Jiri Olsa wrote:
> > > > > > Making arch_cpu_idle and rcu_idle_exit noinstr. Both functions run
> > > > > > in rcu 'not watching' context and if there's tracer attached to
> > > > > > them, which uses rcu (e.g. kprobe multi interface) it will hit RCU
> > > > > > warning like:
> > > > > >
> > > > > > [ 3.017540] WARNING: suspicious RCU usage
> > > > > > ...
> > > > > > [ 3.018363] kprobe_multi_link_handler+0x68/0x1c0
> > > > > > [ 3.018364] ? kprobe_multi_link_handler+0x3e/0x1c0
> > > > > > [ 3.018366] ? arch_cpu_idle_dead+0x10/0x10
> > > > > > [ 3.018367] ? arch_cpu_idle_dead+0x10/0x10
> > > > > > [ 3.018371] fprobe_handler.part.0+0xab/0x150
> > > > > > [ 3.018374] 0xffffffffa00080c8
> > > > > > [ 3.018393] ? arch_cpu_idle+0x5/0x10
> > > > > > [ 3.018398] arch_cpu_idle+0x5/0x10
> > > > > > [ 3.018399] default_idle_call+0x59/0x90
> > > > > > [ 3.018401] do_idle+0x1c3/0x1d0
> > > > > >
> > > > > > The call path is following:
> > > > > >
> > > > > > default_idle_call
> > > > > > rcu_idle_enter
> > > > > > arch_cpu_idle
> > > > > > rcu_idle_exit
> > > > > >
> > > > > > The arch_cpu_idle and rcu_idle_exit are the only ones from above
> > > > > > path that are traceble and cause this problem on my setup.
> > > > > >
> > > > > > Signed-off-by: Jiri Olsa <[email protected]>
> > > > >
> > > > > From an RCU viewpoint:
> > > > >
> > > > > Reviewed-by: Paul E. McKenney <[email protected]>
> > > > >
> > > > > [ I considered asking for an instrumentation_on() in rcu_idle_exit(),
> > > > > but there is no point given that local_irq_restore() isn't something
> > > > > you instrument anyway. ]
> > > >
> > > > So local_irq_save() in the beginning of rcu_idle_exit() is unsafe because
> > > > it is instrumentable by the function (graph) tracers and the irqsoff tracer.
> > > >
> > > > Also it calls into lockdep that might make use of RCU.
> > > >
> > > > That's why rcu_idle_exit() is not noinstr yet. See this patch:
> > > >
> > > > https://lore.kernel.org/lkml/[email protected]/
> > >
> > > I see, could we mark it at least with notrace meanwhile?
> >
> > For the RCU part, how about as follows?
> >
> > If this approach is reasonable, my guess would be that Frederic will pull
> > it into his context-tracking series, perhaps using a revert of this patch
> > to maintain sanity in the near term.
> >
> > If this approach is unreasonable, well, that is Murphy for you!
>
> I checked and it works in my test ;-)
Whew!!! One piece of the problem might be solved, then. ;-)
> > For the x86 idle part, my feeling is still that the rcu_idle_enter()
> > and rcu_idle_exit() need to be pushed deeper into the code. Perhaps
> > an ongoing process as the idle loop continues to be dug deeper?
>
> for arch_cpu_idle with noinstr I'm getting this W=1 warning:
>
> vmlinux.o: warning: objtool: arch_cpu_idle()+0xb: call to {dynamic}() leaves .noinstr.text section
>
> we could have it with notrace if that's a problem
I would be happy to queue the arch_cpu_idle() portion of your patch on
-rcu, if that would move things forward. I suspect that additional
x86_idle() surgery is required, but maybe I am just getting confused
about what the x86_idle() function pointer can point to. But it looks
to me like these need further help:
o static void amd_e400_idle(void)
Plus things it calls, like tick_broadcast_enter() and
tick_broadcast_exit().
o static __cpuidle void mwait_idle(void)
So it might not be all that much additional work, even if I have avoided
confusion about what the x86_idle() function pointer can point to. But
I do not trust my ability to test this accurately.
Thoughts?
Thanx, Paul
> thanks,
> jirka
>
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > commit cd338be719a0a692e0d50e1a8438e1f6c7165d9c
> > Author: Paul E. McKenney <[email protected]>
> > Date: Tue May 17 21:00:04 2022 -0700
> >
> > rcu: Apply noinstr to rcu_idle_enter() and rcu_idle_exit()
> >
> > This commit applies the "noinstr" tag to the rcu_idle_enter() and
> > rcu_idle_exit() functions, which are invoked from portions of the idle
> > loop that cannot be instrumented. These tags require reworking the
> > rcu_eqs_enter() and rcu_eqs_exit() functions that these two functions
> > invoke in order to cause them to use normal assertions rather than
> > lockdep. In addition, within rcu_idle_exit(), the raw versions of
> > local_irq_save() and local_irq_restore() are used, again to avoid issues
> > with lockdep in uninstrumented code.
> >
> > This patch is based in part on an earlier patch by Jiri Olsa, discussions
> > with Peter Zijlstra and Frederic Weisbecker, earlier changes by Thomas
> > Gleixner, and off-list discussions with Yonghong Song.
> >
> > Link: https://lore.kernel.org/lkml/[email protected]/
> > Reported-by: Jiri Olsa <[email protected]>
> > Reported-by: Alexei Starovoitov <[email protected]>
> > Reported-by: Andrii Nakryiko <[email protected]>
> > Signed-off-by: Paul E. McKenney <[email protected]>
> > Reviewed-by: Yonghong Song <[email protected]>
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 222d59299a2af..02233b17cce0e 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -635,8 +635,8 @@ static noinstr void rcu_eqs_enter(bool user)
> > return;
> > }
> >
> > - lockdep_assert_irqs_disabled();
> > instrumentation_begin();
> > + lockdep_assert_irqs_disabled();
> > trace_rcu_dyntick(TPS("Start"), rdp->dynticks_nesting, 0, atomic_read(&rdp->dynticks));
> > WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current));
> > rcu_preempt_deferred_qs(current);
> > @@ -663,9 +663,9 @@ static noinstr void rcu_eqs_enter(bool user)
> > * If you add or remove a call to rcu_idle_enter(), be sure to test with
> > * CONFIG_RCU_EQS_DEBUG=y.
> > */
> > -void rcu_idle_enter(void)
> > +void noinstr rcu_idle_enter(void)
> > {
> > - lockdep_assert_irqs_disabled();
> > + WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !raw_irqs_disabled());
> > rcu_eqs_enter(false);
> > }
> > EXPORT_SYMBOL_GPL(rcu_idle_enter);
> > @@ -865,7 +865,7 @@ static void noinstr rcu_eqs_exit(bool user)
> > struct rcu_data *rdp;
> > long oldval;
> >
> > - lockdep_assert_irqs_disabled();
> > + WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !raw_irqs_disabled());
> > rdp = this_cpu_ptr(&rcu_data);
> > oldval = rdp->dynticks_nesting;
> > WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && oldval < 0);
> > @@ -900,13 +900,13 @@ static void noinstr rcu_eqs_exit(bool user)
> > * If you add or remove a call to rcu_idle_exit(), be sure to test with
> > * CONFIG_RCU_EQS_DEBUG=y.
> > */
> > -void rcu_idle_exit(void)
> > +void noinstr rcu_idle_exit(void)
> > {
> > unsigned long flags;
> >
> > - local_irq_save(flags);
> > + raw_local_irq_save(flags);
> > rcu_eqs_exit(false);
> > - local_irq_restore(flags);
> > + raw_local_irq_restore(flags);
> > }
> > EXPORT_SYMBOL_GPL(rcu_idle_exit);
> >
On Wed, May 18, 2022 at 09:21:18AM -0700, Paul E. McKenney wrote:
> On Tue, May 17, 2022 at 12:13:45PM +0200, Jiri Olsa wrote:
> > On Mon, May 16, 2022 at 01:49:22PM +0200, Frederic Weisbecker wrote:
> > > On Sun, May 15, 2022 at 09:25:35PM -0700, Paul E. McKenney wrote:
> > > > On Sun, May 15, 2022 at 10:36:52PM +0200, Jiri Olsa wrote:
> > > > > Making arch_cpu_idle and rcu_idle_exit noinstr. Both functions run
> > > > > in rcu 'not watching' context and if there's tracer attached to
> > > > > them, which uses rcu (e.g. kprobe multi interface) it will hit RCU
> > > > > warning like:
> > > > >
> > > > > [ 3.017540] WARNING: suspicious RCU usage
> > > > > ...
> > > > > [ 3.018363] kprobe_multi_link_handler+0x68/0x1c0
> > > > > [ 3.018364] ? kprobe_multi_link_handler+0x3e/0x1c0
> > > > > [ 3.018366] ? arch_cpu_idle_dead+0x10/0x10
> > > > > [ 3.018367] ? arch_cpu_idle_dead+0x10/0x10
> > > > > [ 3.018371] fprobe_handler.part.0+0xab/0x150
> > > > > [ 3.018374] 0xffffffffa00080c8
> > > > > [ 3.018393] ? arch_cpu_idle+0x5/0x10
> > > > > [ 3.018398] arch_cpu_idle+0x5/0x10
> > > > > [ 3.018399] default_idle_call+0x59/0x90
> > > > > [ 3.018401] do_idle+0x1c3/0x1d0
> > > > >
> > > > > The call path is following:
> > > > >
> > > > > default_idle_call
> > > > > rcu_idle_enter
> > > > > arch_cpu_idle
> > > > > rcu_idle_exit
> > > > >
> > > > > The arch_cpu_idle and rcu_idle_exit are the only ones from above
> > > > > path that are traceble and cause this problem on my setup.
> > > > >
> > > > > Signed-off-by: Jiri Olsa <[email protected]>
> > > >
> > > > From an RCU viewpoint:
> > > >
> > > > Reviewed-by: Paul E. McKenney <[email protected]>
> > > >
> > > > [ I considered asking for an instrumentation_on() in rcu_idle_exit(),
> > > > but there is no point given that local_irq_restore() isn't something
> > > > you instrument anyway. ]
> > >
> > > So local_irq_save() in the beginning of rcu_idle_exit() is unsafe because
> > > it is instrumentable by the function (graph) tracers and the irqsoff tracer.
> > >
> > > Also it calls into lockdep that might make use of RCU.
> > >
> > > That's why rcu_idle_exit() is not noinstr yet. See this patch:
> > >
> > > https://lore.kernel.org/lkml/[email protected]/
> >
> > I see, could we mark it at least with notrace meanwhile?
>
> For the RCU part, how about as follows?
>
> If this approach is reasonable, my guess would be that Frederic will pull
> it into his context-tracking series, perhaps using a revert of this patch
> to maintain sanity in the near term.
>
> If this approach is unreasonable, well, that is Murphy for you!
I checked and it works in my test ;-)
>
> For the x86 idle part, my feeling is still that the rcu_idle_enter()
> and rcu_idle_exit() need to be pushed deeper into the code. Perhaps
> an ongoing process as the idle loop continues to be dug deeper?
for arch_cpu_idle with noinstr I'm getting this W=1 warning:
vmlinux.o: warning: objtool: arch_cpu_idle()+0xb: call to {dynamic}() leaves .noinstr.text section
we could have it with notrace if that's a problem
thanks,
jirka
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit cd338be719a0a692e0d50e1a8438e1f6c7165d9c
> Author: Paul E. McKenney <[email protected]>
> Date: Tue May 17 21:00:04 2022 -0700
>
> rcu: Apply noinstr to rcu_idle_enter() and rcu_idle_exit()
>
> This commit applies the "noinstr" tag to the rcu_idle_enter() and
> rcu_idle_exit() functions, which are invoked from portions of the idle
> loop that cannot be instrumented. These tags require reworking the
> rcu_eqs_enter() and rcu_eqs_exit() functions that these two functions
> invoke in order to cause them to use normal assertions rather than
> lockdep. In addition, within rcu_idle_exit(), the raw versions of
> local_irq_save() and local_irq_restore() are used, again to avoid issues
> with lockdep in uninstrumented code.
>
> This patch is based in part on an earlier patch by Jiri Olsa, discussions
> with Peter Zijlstra and Frederic Weisbecker, earlier changes by Thomas
> Gleixner, and off-list discussions with Yonghong Song.
>
> Link: https://lore.kernel.org/lkml/[email protected]/
> Reported-by: Jiri Olsa <[email protected]>
> Reported-by: Alexei Starovoitov <[email protected]>
> Reported-by: Andrii Nakryiko <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
> Reviewed-by: Yonghong Song <[email protected]>
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 222d59299a2af..02233b17cce0e 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -635,8 +635,8 @@ static noinstr void rcu_eqs_enter(bool user)
> return;
> }
>
> - lockdep_assert_irqs_disabled();
> instrumentation_begin();
> + lockdep_assert_irqs_disabled();
> trace_rcu_dyntick(TPS("Start"), rdp->dynticks_nesting, 0, atomic_read(&rdp->dynticks));
> WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current));
> rcu_preempt_deferred_qs(current);
> @@ -663,9 +663,9 @@ static noinstr void rcu_eqs_enter(bool user)
> * If you add or remove a call to rcu_idle_enter(), be sure to test with
> * CONFIG_RCU_EQS_DEBUG=y.
> */
> -void rcu_idle_enter(void)
> +void noinstr rcu_idle_enter(void)
> {
> - lockdep_assert_irqs_disabled();
> + WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !raw_irqs_disabled());
> rcu_eqs_enter(false);
> }
> EXPORT_SYMBOL_GPL(rcu_idle_enter);
> @@ -865,7 +865,7 @@ static void noinstr rcu_eqs_exit(bool user)
> struct rcu_data *rdp;
> long oldval;
>
> - lockdep_assert_irqs_disabled();
> + WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !raw_irqs_disabled());
> rdp = this_cpu_ptr(&rcu_data);
> oldval = rdp->dynticks_nesting;
> WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && oldval < 0);
> @@ -900,13 +900,13 @@ static void noinstr rcu_eqs_exit(bool user)
> * If you add or remove a call to rcu_idle_exit(), be sure to test with
> * CONFIG_RCU_EQS_DEBUG=y.
> */
> -void rcu_idle_exit(void)
> +void noinstr rcu_idle_exit(void)
> {
> unsigned long flags;
>
> - local_irq_save(flags);
> + raw_local_irq_save(flags);
> rcu_eqs_exit(false);
> - local_irq_restore(flags);
> + raw_local_irq_restore(flags);
> }
> EXPORT_SYMBOL_GPL(rcu_idle_exit);
>
On Thu, May 19, 2022 at 06:54:39AM -0700, Paul E. McKenney wrote:
> On Thu, May 19, 2022 at 01:33:16PM +0200, Jiri Olsa wrote:
> > On Wed, May 18, 2022 at 09:21:18AM -0700, Paul E. McKenney wrote:
> > > On Tue, May 17, 2022 at 12:13:45PM +0200, Jiri Olsa wrote:
> > > > On Mon, May 16, 2022 at 01:49:22PM +0200, Frederic Weisbecker wrote:
> > > > > On Sun, May 15, 2022 at 09:25:35PM -0700, Paul E. McKenney wrote:
> > > > > > On Sun, May 15, 2022 at 10:36:52PM +0200, Jiri Olsa wrote:
> > > > > > > Making arch_cpu_idle and rcu_idle_exit noinstr. Both functions run
> > > > > > > in rcu 'not watching' context and if there's tracer attached to
> > > > > > > them, which uses rcu (e.g. kprobe multi interface) it will hit RCU
> > > > > > > warning like:
> > > > > > >
> > > > > > > [ 3.017540] WARNING: suspicious RCU usage
> > > > > > > ...
> > > > > > > [ 3.018363] kprobe_multi_link_handler+0x68/0x1c0
> > > > > > > [ 3.018364] ? kprobe_multi_link_handler+0x3e/0x1c0
> > > > > > > [ 3.018366] ? arch_cpu_idle_dead+0x10/0x10
> > > > > > > [ 3.018367] ? arch_cpu_idle_dead+0x10/0x10
> > > > > > > [ 3.018371] fprobe_handler.part.0+0xab/0x150
> > > > > > > [ 3.018374] 0xffffffffa00080c8
> > > > > > > [ 3.018393] ? arch_cpu_idle+0x5/0x10
> > > > > > > [ 3.018398] arch_cpu_idle+0x5/0x10
> > > > > > > [ 3.018399] default_idle_call+0x59/0x90
> > > > > > > [ 3.018401] do_idle+0x1c3/0x1d0
> > > > > > >
> > > > > > > The call path is following:
> > > > > > >
> > > > > > > default_idle_call
> > > > > > > rcu_idle_enter
> > > > > > > arch_cpu_idle
> > > > > > > rcu_idle_exit
> > > > > > >
> > > > > > > The arch_cpu_idle and rcu_idle_exit are the only ones from above
> > > > > > > path that are traceble and cause this problem on my setup.
> > > > > > >
> > > > > > > Signed-off-by: Jiri Olsa <[email protected]>
> > > > > >
> > > > > > From an RCU viewpoint:
> > > > > >
> > > > > > Reviewed-by: Paul E. McKenney <[email protected]>
> > > > > >
> > > > > > [ I considered asking for an instrumentation_on() in rcu_idle_exit(),
> > > > > > but there is no point given that local_irq_restore() isn't something
> > > > > > you instrument anyway. ]
> > > > >
> > > > > So local_irq_save() in the beginning of rcu_idle_exit() is unsafe because
> > > > > it is instrumentable by the function (graph) tracers and the irqsoff tracer.
> > > > >
> > > > > Also it calls into lockdep that might make use of RCU.
> > > > >
> > > > > That's why rcu_idle_exit() is not noinstr yet. See this patch:
> > > > >
> > > > > https://lore.kernel.org/lkml/[email protected]/
> > > >
> > > > I see, could we mark it at least with notrace meanwhile?
> > >
> > > For the RCU part, how about as follows?
> > >
> > > If this approach is reasonable, my guess would be that Frederic will pull
> > > it into his context-tracking series, perhaps using a revert of this patch
> > > to maintain sanity in the near term.
> > >
> > > If this approach is unreasonable, well, that is Murphy for you!
> >
> > I checked and it works in my test ;-)
>
> Whew!!! One piece of the problem might be solved, then. ;-)
>
> > > For the x86 idle part, my feeling is still that the rcu_idle_enter()
> > > and rcu_idle_exit() need to be pushed deeper into the code. Perhaps
> > > an ongoing process as the idle loop continues to be dug deeper?
> >
> > for arch_cpu_idle with noinstr I'm getting this W=1 warning:
> >
> > vmlinux.o: warning: objtool: arch_cpu_idle()+0xb: call to {dynamic}() leaves .noinstr.text section
> >
> > we could have it with notrace if that's a problem
>
> I would be happy to queue the arch_cpu_idle() portion of your patch on
> -rcu, if that would move things forward. I suspect that additional
> x86_idle() surgery is required, but maybe I am just getting confused
> about what the x86_idle() function pointer can point to. But it looks
> to me like these need further help:
>
> o static void amd_e400_idle(void)
> Plus things it calls, like tick_broadcast_enter() and
> tick_broadcast_exit().
>
> o static __cpuidle void mwait_idle(void)
>
> So it might not be all that much additional work, even if I have avoided
> confusion about what the x86_idle() function pointer can point to. But
> I do not trust my ability to test this accurately.
same here ;-) you're right, there will be other places based
on x86_idle function pointer.. I'll check it, but perhaps we
could address that when someone reports that
jirka
On Mon, May 23, 2022 at 03:12:28PM +0200, Jiri Olsa wrote:
> On Thu, May 19, 2022 at 06:54:39AM -0700, Paul E. McKenney wrote:
> > On Thu, May 19, 2022 at 01:33:16PM +0200, Jiri Olsa wrote:
> > > On Wed, May 18, 2022 at 09:21:18AM -0700, Paul E. McKenney wrote:
> > > > On Tue, May 17, 2022 at 12:13:45PM +0200, Jiri Olsa wrote:
> > > > > On Mon, May 16, 2022 at 01:49:22PM +0200, Frederic Weisbecker wrote:
> > > > > > On Sun, May 15, 2022 at 09:25:35PM -0700, Paul E. McKenney wrote:
> > > > > > > On Sun, May 15, 2022 at 10:36:52PM +0200, Jiri Olsa wrote:
> > > > > > > > Making arch_cpu_idle and rcu_idle_exit noinstr. Both functions run
> > > > > > > > in rcu 'not watching' context and if there's tracer attached to
> > > > > > > > them, which uses rcu (e.g. kprobe multi interface) it will hit RCU
> > > > > > > > warning like:
> > > > > > > >
> > > > > > > > [ 3.017540] WARNING: suspicious RCU usage
> > > > > > > > ...
> > > > > > > > [ 3.018363] kprobe_multi_link_handler+0x68/0x1c0
> > > > > > > > [ 3.018364] ? kprobe_multi_link_handler+0x3e/0x1c0
> > > > > > > > [ 3.018366] ? arch_cpu_idle_dead+0x10/0x10
> > > > > > > > [ 3.018367] ? arch_cpu_idle_dead+0x10/0x10
> > > > > > > > [ 3.018371] fprobe_handler.part.0+0xab/0x150
> > > > > > > > [ 3.018374] 0xffffffffa00080c8
> > > > > > > > [ 3.018393] ? arch_cpu_idle+0x5/0x10
> > > > > > > > [ 3.018398] arch_cpu_idle+0x5/0x10
> > > > > > > > [ 3.018399] default_idle_call+0x59/0x90
> > > > > > > > [ 3.018401] do_idle+0x1c3/0x1d0
> > > > > > > >
> > > > > > > > The call path is following:
> > > > > > > >
> > > > > > > > default_idle_call
> > > > > > > > rcu_idle_enter
> > > > > > > > arch_cpu_idle
> > > > > > > > rcu_idle_exit
> > > > > > > >
> > > > > > > > The arch_cpu_idle and rcu_idle_exit are the only ones from above
> > > > > > > > path that are traceble and cause this problem on my setup.
> > > > > > > >
> > > > > > > > Signed-off-by: Jiri Olsa <[email protected]>
> > > > > > >
> > > > > > > From an RCU viewpoint:
> > > > > > >
> > > > > > > Reviewed-by: Paul E. McKenney <[email protected]>
> > > > > > >
> > > > > > > [ I considered asking for an instrumentation_on() in rcu_idle_exit(),
> > > > > > > but there is no point given that local_irq_restore() isn't something
> > > > > > > you instrument anyway. ]
> > > > > >
> > > > > > So local_irq_save() in the beginning of rcu_idle_exit() is unsafe because
> > > > > > it is instrumentable by the function (graph) tracers and the irqsoff tracer.
> > > > > >
> > > > > > Also it calls into lockdep that might make use of RCU.
> > > > > >
> > > > > > That's why rcu_idle_exit() is not noinstr yet. See this patch:
> > > > > >
> > > > > > https://lore.kernel.org/lkml/[email protected]/
> > > > >
> > > > > I see, could we mark it at least with notrace meanwhile?
> > > >
> > > > For the RCU part, how about as follows?
> > > >
> > > > If this approach is reasonable, my guess would be that Frederic will pull
> > > > it into his context-tracking series, perhaps using a revert of this patch
> > > > to maintain sanity in the near term.
> > > >
> > > > If this approach is unreasonable, well, that is Murphy for you!
> > >
> > > I checked and it works in my test ;-)
> >
> > Whew!!! One piece of the problem might be solved, then. ;-)
> >
> > > > For the x86 idle part, my feeling is still that the rcu_idle_enter()
> > > > and rcu_idle_exit() need to be pushed deeper into the code. Perhaps
> > > > an ongoing process as the idle loop continues to be dug deeper?
> > >
> > > for arch_cpu_idle with noinstr I'm getting this W=1 warning:
> > >
> > > vmlinux.o: warning: objtool: arch_cpu_idle()+0xb: call to {dynamic}() leaves .noinstr.text section
> > >
> > > we could have it with notrace if that's a problem
> >
> > I would be happy to queue the arch_cpu_idle() portion of your patch on
> > -rcu, if that would move things forward. I suspect that additional
> > x86_idle() surgery is required, but maybe I am just getting confused
> > about what the x86_idle() function pointer can point to. But it looks
> > to me like these need further help:
> >
> > o static void amd_e400_idle(void)
> > Plus things it calls, like tick_broadcast_enter() and
> > tick_broadcast_exit().
> >
> > o static __cpuidle void mwait_idle(void)
> >
> > So it might not be all that much additional work, even if I have avoided
> > confusion about what the x86_idle() function pointer can point to. But
> > I do not trust my ability to test this accurately.
>
> same here ;-) you're right, there will be other places based
> on x86_idle function pointer.. I'll check it, but perhaps we
> could address that when someone reports that
>
> jirka
Any thoughts on the correct approach? One extreme would be to
mark all sorts of things noinstr. Another extreme would be to
enclose all sorts of things in RCU_NONIDLE(). Yet another extreme
would be to push the rcu_idle_enter() and rcu_idle_exit() calls
still deeper into the idle loop.
Or does Peter's recent series somehow cover all of this?
https://lore.kernel.org/all/[email protected]/
Thanx, Paul
Hi Jiri,
Would you like to consider to add rcu_is_watching check in
to solve this from the viewpoint of kprobe_multi_link_prog_run
itself? And accounting of missed runs can be added as well
to imporve observability.
Regards,
Ze
-----------------
From 29fd3cd713e65461325c2703cf5246a6fae5d4fe Mon Sep 17 00:00:00 2001
From: Ze Gao <[email protected]>
Date: Sat, 20 May 2023 17:32:05 +0800
Subject: [PATCH] bpf: kprobe_multi runs bpf progs only when rcu_is_watching
From the perspective of kprobe_multi_link_prog_run, any traceable
functions can be attached while bpf progs need specical care and
ought to be under rcu protection. To solve the likely rcu lockdep
warns once for good, when (future) functions in idle path were
attached accidentally, we better paying some cost to check at least
in kernel-side, and return when rcu is not watching, which helps
to avoid any unpredictable results.
Signed-off-by: Ze Gao <[email protected]>
---
kernel/trace/bpf_trace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 9a050e36dc6c..3e6ea7274765 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2622,7 +2622,7 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,
struct bpf_run_ctx *old_run_ctx;
int err;
- if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
+ if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1 || !rcu_is_watching())) {
err = 0;
goto out;
}
--
2.40.1
On 5/20/23 2:47 AM, Ze Gao wrote:
>
> Hi Jiri,
>
> Would you like to consider to add rcu_is_watching check in
> to solve this from the viewpoint of kprobe_multi_link_prog_run
> itself? And accounting of missed runs can be added as well
> to imporve observability.
>
> Regards,
> Ze
>
>
> -----------------
> From 29fd3cd713e65461325c2703cf5246a6fae5d4fe Mon Sep 17 00:00:00 2001
> From: Ze Gao <[email protected]>
> Date: Sat, 20 May 2023 17:32:05 +0800
> Subject: [PATCH] bpf: kprobe_multi runs bpf progs only when rcu_is_watching
>
> From the perspective of kprobe_multi_link_prog_run, any traceable
> functions can be attached while bpf progs need specical care and
> ought to be under rcu protection. To solve the likely rcu lockdep
> warns once for good, when (future) functions in idle path were
> attached accidentally, we better paying some cost to check at least
> in kernel-side, and return when rcu is not watching, which helps
> to avoid any unpredictable results.
kprobe_multi/fprobe share the same set of attachments with fentry.
Currently, fentry does not filter with !rcu_is_watching, maybe
because this is an extreme corner case. Not sure whether it is
worthwhile or not.
Maybe if you can give a concrete example (e.g., attachment point)
with current code base to show what the issue you encountered and
it will make it easier to judge whether adding !rcu_is_watching()
is necessary or not.
>
> Signed-off-by: Ze Gao <[email protected]>
> ---
> kernel/trace/bpf_trace.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 9a050e36dc6c..3e6ea7274765 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -2622,7 +2622,7 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,
> struct bpf_run_ctx *old_run_ctx;
> int err;
>
> - if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
> + if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1 || !rcu_is_watching())) {
> err = 0;
> goto out;
> }
On Sat, May 20, 2023 at 05:47:24PM +0800, Ze Gao wrote:
>
> Hi Jiri,
>
> Would you like to consider to add rcu_is_watching check in
> to solve this from the viewpoint of kprobe_multi_link_prog_run
I think this was discussed in here:
https://lore.kernel.org/bpf/[email protected]/
and was considered a bug, there's fix mentioned later in the thread
there's also this recent patchset:
https://lore.kernel.org/bpf/[email protected]/
that solves related problems
> itself? And accounting of missed runs can be added as well
> to imporve observability.
right, we count fprobe->nmissed but it's not exposed, we should allow
to get 'missed' stats from both fprobe and kprobe_multi later, which
is missing now, will check
thanks,
jirka
>
> Regards,
> Ze
>
>
> -----------------
> From 29fd3cd713e65461325c2703cf5246a6fae5d4fe Mon Sep 17 00:00:00 2001
> From: Ze Gao <[email protected]>
> Date: Sat, 20 May 2023 17:32:05 +0800
> Subject: [PATCH] bpf: kprobe_multi runs bpf progs only when rcu_is_watching
>
> From the perspective of kprobe_multi_link_prog_run, any traceable
> functions can be attached while bpf progs need specical care and
> ought to be under rcu protection. To solve the likely rcu lockdep
> warns once for good, when (future) functions in idle path were
> attached accidentally, we better paying some cost to check at least
> in kernel-side, and return when rcu is not watching, which helps
> to avoid any unpredictable results.
>
> Signed-off-by: Ze Gao <[email protected]>
> ---
> kernel/trace/bpf_trace.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 9a050e36dc6c..3e6ea7274765 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -2622,7 +2622,7 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,
> struct bpf_run_ctx *old_run_ctx;
> int err;
>
> - if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
> + if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1 || !rcu_is_watching())) {
> err = 0;
> goto out;
> }
> --
> 2.40.1
>
On Sun, 21 May 2023 10:08:46 +0200
Jiri Olsa <[email protected]> wrote:
> On Sat, May 20, 2023 at 05:47:24PM +0800, Ze Gao wrote:
> >
> > Hi Jiri,
> >
> > Would you like to consider to add rcu_is_watching check in
> > to solve this from the viewpoint of kprobe_multi_link_prog_run
>
> I think this was discussed in here:
> https://lore.kernel.org/bpf/[email protected]/
>
> and was considered a bug, there's fix mentioned later in the thread
>
> there's also this recent patchset:
> https://lore.kernel.org/bpf/[email protected]/
>
> that solves related problems
I think this rcu_is_watching() is a bit different issue. This rcu_is_watching()
check is required if the kprobe_multi_link_prog_run() uses any RCU API.
E.g. rethook_try_get() is also checks rcu_is_watching() because it uses
call_rcu().
Thank you,
>
> > itself? And accounting of missed runs can be added as well
> > to imporve observability.
>
> right, we count fprobe->nmissed but it's not exposed, we should allow
> to get 'missed' stats from both fprobe and kprobe_multi later, which
> is missing now, will check
>
> thanks,
> jirka
>
> >
> > Regards,
> > Ze
> >
> >
> > -----------------
> > From 29fd3cd713e65461325c2703cf5246a6fae5d4fe Mon Sep 17 00:00:00 2001
> > From: Ze Gao <[email protected]>
> > Date: Sat, 20 May 2023 17:32:05 +0800
> > Subject: [PATCH] bpf: kprobe_multi runs bpf progs only when rcu_is_watching
> >
> > From the perspective of kprobe_multi_link_prog_run, any traceable
> > functions can be attached while bpf progs need specical care and
> > ought to be under rcu protection. To solve the likely rcu lockdep
> > warns once for good, when (future) functions in idle path were
> > attached accidentally, we better paying some cost to check at least
> > in kernel-side, and return when rcu is not watching, which helps
> > to avoid any unpredictable results.
> >
> > Signed-off-by: Ze Gao <[email protected]>
> > ---
> > kernel/trace/bpf_trace.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index 9a050e36dc6c..3e6ea7274765 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -2622,7 +2622,7 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,
> > struct bpf_run_ctx *old_run_ctx;
> > int err;
> >
> > - if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
> > + if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1 || !rcu_is_watching())) {
> > err = 0;
> > goto out;
> > }
> > --
> > 2.40.1
> >
--
Masami Hiramatsu (Google) <[email protected]>
On Sun, May 21, 2023 at 6:09 PM Masami Hiramatsu <[email protected]> wrote:
>
> On Sun, 21 May 2023 10:08:46 +0200
> Jiri Olsa <[email protected]> wrote:
>
> > On Sat, May 20, 2023 at 05:47:24PM +0800, Ze Gao wrote:
> > >
> > > Hi Jiri,
> > >
> > > Would you like to consider to add rcu_is_watching check in
> > > to solve this from the viewpoint of kprobe_multi_link_prog_run
> >
> > I think this was discussed in here:
> > https://lore.kernel.org/bpf/[email protected]/
> >
> > and was considered a bug, there's fix mentioned later in the thread
> >
> > there's also this recent patchset:
> > https://lore.kernel.org/bpf/[email protected]/
> >
> > that solves related problems
>
> I think this rcu_is_watching() is a bit different issue. This rcu_is_watching()
> check is required if the kprobe_multi_link_prog_run() uses any RCU API.
> E.g. rethook_try_get() is also checks rcu_is_watching() because it uses
> call_rcu().
Yes, that's my point!
Regards,
Ze
>
> >
> > > itself? And accounting of missed runs can be added as well
> > > to imporve observability.
> >
> > right, we count fprobe->nmissed but it's not exposed, we should allow
> > to get 'missed' stats from both fprobe and kprobe_multi later, which
> > is missing now, will check
> >
> > thanks,
> > jirka
> >
> > >
> > > Regards,
> > > Ze
> > >
> > >
> > > -----------------
> > > From 29fd3cd713e65461325c2703cf5246a6fae5d4fe Mon Sep 17 00:00:00 2001
> > > From: Ze Gao <[email protected]>
> > > Date: Sat, 20 May 2023 17:32:05 +0800
> > > Subject: [PATCH] bpf: kprobe_multi runs bpf progs only when rcu_is_watching
> > >
> > > From the perspective of kprobe_multi_link_prog_run, any traceable
> > > functions can be attached while bpf progs need specical care and
> > > ought to be under rcu protection. To solve the likely rcu lockdep
> > > warns once for good, when (future) functions in idle path were
> > > attached accidentally, we better paying some cost to check at least
> > > in kernel-side, and return when rcu is not watching, which helps
> > > to avoid any unpredictable results.
> > >
> > > Signed-off-by: Ze Gao <[email protected]>
> > > ---
> > > kernel/trace/bpf_trace.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > > index 9a050e36dc6c..3e6ea7274765 100644
> > > --- a/kernel/trace/bpf_trace.c
> > > +++ b/kernel/trace/bpf_trace.c
> > > @@ -2622,7 +2622,7 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,
> > > struct bpf_run_ctx *old_run_ctx;
> > > int err;
> > >
> > > - if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
> > > + if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1 || !rcu_is_watching())) {
> > > err = 0;
> > > goto out;
> > > }
> > > --
> > > 2.40.1
> > >
>
>
> --
> Masami Hiramatsu (Google) <[email protected]>
> kprobe_multi/fprobe share the same set of attachments with fentry.
> Currently, fentry does not filter with !rcu_is_watching, maybe
> because this is an extreme corner case. Not sure whether it is
> worthwhile or not.
Agreed, it's rare, especially after Peter's patches which push narrow
down rcu eqs regions
in the idle path and reduce the chance of any traceable functions
happening in between.
However, from RCU's perspective, we ought to check if rcu_is_watching
theoretically
when there's a chance our code will run in the idle path and also we
need rcu to be alive,
And also we cannot simply make assumptions for any future changes in
the idle path.
You know, just like what was hit in the thread.
> Maybe if you can give a concrete example (e.g., attachment point)
> with current code base to show what the issue you encountered and
> it will make it easier to judge whether adding !rcu_is_watching()
> is necessary or not.
I can reproduce likely warnings on v6.1.18 where arch_cpu_idle is
traceable but not on the latest version
so far. But as I state above, in theory we need it. So here is a
gentle ping :) .
On Sun, May 21, 2023 at 11:10:16PM +0800, Ze Gao wrote:
> > kprobe_multi/fprobe share the same set of attachments with fentry.
> > Currently, fentry does not filter with !rcu_is_watching, maybe
> > because this is an extreme corner case. Not sure whether it is
> > worthwhile or not.
>
> Agreed, it's rare, especially after Peter's patches which push narrow
> down rcu eqs regions
> in the idle path and reduce the chance of any traceable functions
> happening in between.
>
> However, from RCU's perspective, we ought to check if rcu_is_watching
> theoretically
> when there's a chance our code will run in the idle path and also we
> need rcu to be alive,
> And also we cannot simply make assumptions for any future changes in
> the idle path.
> You know, just like what was hit in the thread.
>
> > Maybe if you can give a concrete example (e.g., attachment point)
> > with current code base to show what the issue you encountered and
> > it will make it easier to judge whether adding !rcu_is_watching()
> > is necessary or not.
>
> I can reproduce likely warnings on v6.1.18 where arch_cpu_idle is
> traceable but not on the latest version
> so far. But as I state above, in theory we need it. So here is a
> gentle ping :) .
hum, this change [1] added rcu_is_watching check to ftrace_test_recursion_trylock,
which we use in fprobe_handler and is coming to fprobe_exit_handler in [2]
I might be missing something, but it seems like we don't need another
rcu_is_watching call on kprobe_multi level
jirka
[1] d099dbfd3306 cpuidle: tracing: Warn about !rcu_is_watching()
[2] https://lore.kernel.org/bpf/[email protected]/
On Sun, 21 May 2023 22:26:37 +0200
Jiri Olsa <[email protected]> wrote:
> On Sun, May 21, 2023 at 11:10:16PM +0800, Ze Gao wrote:
> > > kprobe_multi/fprobe share the same set of attachments with fentry.
> > > Currently, fentry does not filter with !rcu_is_watching, maybe
> > > because this is an extreme corner case. Not sure whether it is
> > > worthwhile or not.
> >
> > Agreed, it's rare, especially after Peter's patches which push narrow
> > down rcu eqs regions
> > in the idle path and reduce the chance of any traceable functions
> > happening in between.
> >
> > However, from RCU's perspective, we ought to check if rcu_is_watching
> > theoretically
> > when there's a chance our code will run in the idle path and also we
> > need rcu to be alive,
> > And also we cannot simply make assumptions for any future changes in
> > the idle path.
> > You know, just like what was hit in the thread.
> >
> > > Maybe if you can give a concrete example (e.g., attachment point)
> > > with current code base to show what the issue you encountered and
> > > it will make it easier to judge whether adding !rcu_is_watching()
> > > is necessary or not.
> >
> > I can reproduce likely warnings on v6.1.18 where arch_cpu_idle is
> > traceable but not on the latest version
> > so far. But as I state above, in theory we need it. So here is a
> > gentle ping :) .
>
> hum, this change [1] added rcu_is_watching check to ftrace_test_recursion_trylock,
> which we use in fprobe_handler and is coming to fprobe_exit_handler in [2]
>
> I might be missing something, but it seems like we don't need another
> rcu_is_watching call on kprobe_multi level
Good point! OK, then it seems we don't need it. The rethook continues to
use the rcu_is_watching() because it is also used from kprobes, but the
kprobe_multi doesn't need it.
Thank you,
>
> jirka
>
>
> [1] d099dbfd3306 cpuidle: tracing: Warn about !rcu_is_watching()
> [2] https://lore.kernel.org/bpf/[email protected]/
--
Masami Hiramatsu (Google) <[email protected]>
Oops, I missed that. Thanks for pointing that out, which I thought is
conditional use of rcu_is_watching before.
One last point, I think we should double check on this
"fentry does not filter with !rcu_is_watching"
as quoted from Yonghong and argue whether it needs
the same check for fentry as well.
Regards,
Ze
On 5/21/23 7:07 PM, Ze Gao wrote:
> Oops, I missed that. Thanks for pointing that out, which I thought is
> conditional use of rcu_is_watching before.
>
> One last point, I think we should double check on this
> "fentry does not filter with !rcu_is_watching"
> as quoted from Yonghong and argue whether it needs
> the same check for fentry as well.
I would suggest that we address rcu_is_watching issue for fentry
only if we do have a reproducible case to show something goes wrong...
>
> Regards,
> Ze
On Mon, 22 May 2023 10:07:42 +0800
Ze Gao <[email protected]> wrote:
> Oops, I missed that. Thanks for pointing that out, which I thought is
> conditional use of rcu_is_watching before.
>
> One last point, I think we should double check on this
> "fentry does not filter with !rcu_is_watching"
> as quoted from Yonghong and argue whether it needs
> the same check for fentry as well.
rcu_is_watching() comment says;
* if the current CPU is not in its idle loop or is in an interrupt or
* NMI handler, return true.
Thus it returns *fault* if the current CPU is in the idle loop and not
any interrupt(including NMI) context. This means if any tracable function
is called from idle loop, it can be !rcu_is_watching(). I meant, this is
'context' based check, thus fentry can not filter out that some commonly
used functions is called from that context but it can be detected.
Thank you,
>
> Regards,
> Ze
--
Masami Hiramatsu (Google) <[email protected]>
On Tue, May 23, 2023 at 01:30:19PM +0800, Masami Hiramatsu wrote:
> On Mon, 22 May 2023 10:07:42 +0800
> Ze Gao <[email protected]> wrote:
>
> > Oops, I missed that. Thanks for pointing that out, which I thought is
> > conditional use of rcu_is_watching before.
> >
> > One last point, I think we should double check on this
> > "fentry does not filter with !rcu_is_watching"
> > as quoted from Yonghong and argue whether it needs
> > the same check for fentry as well.
>
> rcu_is_watching() comment says;
>
> * if the current CPU is not in its idle loop or is in an interrupt or
> * NMI handler, return true.
>
> Thus it returns *fault* if the current CPU is in the idle loop and not
> any interrupt(including NMI) context. This means if any tracable function
> is called from idle loop, it can be !rcu_is_watching(). I meant, this is
> 'context' based check, thus fentry can not filter out that some commonly
> used functions is called from that context but it can be detected.
It really does return false (rather than faulting?) if the current CPU
is deep within the idle loop.
In addition, the recent x86/entry rework (thank you Peter and
Thomas!) mean that the "idle loop" is quite restricted, as can be
seen by the invocations of ct_cpuidle_enter() and ct_cpuidle_exit().
For example, in default_idle_call(), these are immediately before and
after the call to arch_cpu_idle().
Would the following help? Or am I missing your point?
Thanx, Paul
------------------------------------------------------------------------
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 1449cb69a0e0..fae9b4e29c93 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -679,10 +679,14 @@ static void rcu_disable_urgency_upon_qs(struct rcu_data *rdp)
/**
* rcu_is_watching - see if RCU thinks that the current CPU is not idle
*
- * Return true if RCU is watching the running CPU, which means that this
- * CPU can safely enter RCU read-side critical sections. In other words,
- * if the current CPU is not in its idle loop or is in an interrupt or
- * NMI handler, return true.
+ * Return @true if RCU is watching the running CPU and @false otherwise.
+ * An @true return means that this CPU can safely enter RCU read-side
+ * critical sections.
+ *
+ * More specifically, if the current CPU is not deep within its idle
+ * loop, return @true. Note that rcu_is_watching() will return @true if
+ * invoked from an interrupt or NMI handler, even if that interrupt or
+ * NMI interrupted the CPU while it was deep within its idle loop.
*
* Make notrace because it can be called by the internal functions of
* ftrace, and making this notrace removes unnecessary recursion calls.
[ Added a subject, as I always want to delete these emails as spam! ]
On Mon, 22 May 2023 10:07:42 +0800
Ze Gao <[email protected]> wrote:
> Oops, I missed that. Thanks for pointing that out, which I thought is
> conditional use of rcu_is_watching before.
>
> One last point, I think we should double check on this
> "fentry does not filter with !rcu_is_watching"
> as quoted from Yonghong and argue whether it needs
> the same check for fentry as well.
>
Note that trace_test_and_set_recursion() (which is used by
ftrace_test_recursion_trylock()) checks for rcu_is_watching() and
returns false if it isn't (and the trylock will fail).
-- Steve
Thanks Steven, I think we've come to a consensus on this.
The question here is whether bpf tracing fentry i.e.,
__bpf_prog_enter{_sleepable}
needs to check rcu_is_watching as well before using rcu related
calls. And Yonghong suggested making a change when there is
indeed some bad case occurring since it's rare the tracee is in the idle path.
Regards,
Ze
On Tue, May 23, 2023 at 10:10 PM Steven Rostedt <[email protected]> wrote:
>
> [ Added a subject, as I always want to delete these emails as spam! ]
>
> On Mon, 22 May 2023 10:07:42 +0800
> Ze Gao <[email protected]> wrote:
>
> > Oops, I missed that. Thanks for pointing that out, which I thought is
> > conditional use of rcu_is_watching before.
> >
> > One last point, I think we should double check on this
> > "fentry does not filter with !rcu_is_watching"
> > as quoted from Yonghong and argue whether it needs
> > the same check for fentry as well.
> >
>
> Note that trace_test_and_set_recursion() (which is used by
> ftrace_test_recursion_trylock()) checks for rcu_is_watching() and
> returns false if it isn't (and the trylock will fail).
>
> -- Steve
On Mon, 22 May 2023 23:59:28 -0700
"Paul E. McKenney" <[email protected]> wrote:
> On Tue, May 23, 2023 at 01:30:19PM +0800, Masami Hiramatsu wrote:
> > On Mon, 22 May 2023 10:07:42 +0800
> > Ze Gao <[email protected]> wrote:
> >
> > > Oops, I missed that. Thanks for pointing that out, which I thought is
> > > conditional use of rcu_is_watching before.
> > >
> > > One last point, I think we should double check on this
> > > "fentry does not filter with !rcu_is_watching"
> > > as quoted from Yonghong and argue whether it needs
> > > the same check for fentry as well.
> >
> > rcu_is_watching() comment says;
> >
> > * if the current CPU is not in its idle loop or is in an interrupt or
> > * NMI handler, return true.
> >
> > Thus it returns *fault* if the current CPU is in the idle loop and not
> > any interrupt(including NMI) context. This means if any tracable function
> > is called from idle loop, it can be !rcu_is_watching(). I meant, this is
> > 'context' based check, thus fentry can not filter out that some commonly
> > used functions is called from that context but it can be detected.
>
> It really does return false (rather than faulting?) if the current CPU
> is deep within the idle loop.
>
> In addition, the recent x86/entry rework (thank you Peter and
> Thomas!) mean that the "idle loop" is quite restricted, as can be
> seen by the invocations of ct_cpuidle_enter() and ct_cpuidle_exit().
> For example, in default_idle_call(), these are immediately before and
> after the call to arch_cpu_idle().
Thanks! I also found that the default_idle_call() is enough small and
it seems not happening on fentry because there are no commonly used
functions on that path.
>
> Would the following help? Or am I missing your point?
Yes, thank you for the update!
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 1449cb69a0e0..fae9b4e29c93 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -679,10 +679,14 @@ static void rcu_disable_urgency_upon_qs(struct rcu_data *rdp)
> /**
> * rcu_is_watching - see if RCU thinks that the current CPU is not idle
> *
> - * Return true if RCU is watching the running CPU, which means that this
> - * CPU can safely enter RCU read-side critical sections. In other words,
> - * if the current CPU is not in its idle loop or is in an interrupt or
> - * NMI handler, return true.
> + * Return @true if RCU is watching the running CPU and @false otherwise.
> + * An @true return means that this CPU can safely enter RCU read-side
> + * critical sections.
> + *
> + * More specifically, if the current CPU is not deep within its idle
> + * loop, return @true. Note that rcu_is_watching() will return @true if
> + * invoked from an interrupt or NMI handler, even if that interrupt or
> + * NMI interrupted the CPU while it was deep within its idle loop.
> *
> * Make notrace because it can be called by the internal functions of
> * ftrace, and making this notrace removes unnecessary recursion calls.
--
Masami Hiramatsu (Google) <[email protected]>