In ppc, compiler based sanitizer will generate instrument instructions
around statement WRITE_ONCE(local_paca->irq_soft_mask, mask):
0xc000000000295cb0 <+0>: addis r2,r12,774
0xc000000000295cb4 <+4>: addi r2,r2,16464
0xc000000000295cb8 <+8>: mflr r0
0xc000000000295cbc <+12>: bl 0xc00000000008bb4c <mcount>
0xc000000000295cc0 <+16>: mflr r0
0xc000000000295cc4 <+20>: std r31,-8(r1)
0xc000000000295cc8 <+24>: addi r3,r13,2354
0xc000000000295ccc <+28>: mr r31,r13
0xc000000000295cd0 <+32>: std r0,16(r1)
0xc000000000295cd4 <+36>: stdu r1,-48(r1)
0xc000000000295cd8 <+40>: bl 0xc000000000609b98 <__asan_store1+8>
0xc000000000295cdc <+44>: nop
0xc000000000295ce0 <+48>: li r9,1
0xc000000000295ce4 <+52>: stb r9,2354(r31)
0xc000000000295ce8 <+56>: addi r1,r1,48
0xc000000000295cec <+60>: ld r0,16(r1)
0xc000000000295cf0 <+64>: ld r31,-8(r1)
0xc000000000295cf4 <+68>: mtlr r0
If there is a context switch before "stb r9,2354(r31)", r31 may
not equal to r13, in such case, irq soft mask will not work.
This patch disable sanitizer in irq_soft_mask_set.
Signed-off-by: Zhouyi Zhou <[email protected]>
---
Dear PPC developers
I found this bug when trying to do rcutorture tests in ppc VM of
Open Source Lab of Oregon State University following Paul E. McKenny's guidance.
console.log report following bug:
[ 346.527467][ T100] BUG: using smp_processor_id() in preemptible [00000000] code: rcu_torture_rea/100^M
[ 346.529416][ T100] caller is rcu_preempt_deferred_qs_irqrestore+0x74/0xed0^M
[ 346.531157][ T100] CPU: 4 PID: 100 Comm: rcu_torture_rea Tainted: G W 5.19.0-rc5-next-20220708-dirty #253^M
[ 346.533620][ T100] Call Trace:^M
[ 346.534449][ T100] [c0000000094876c0] [c000000000ce2b68] dump_stack_lvl+0xbc/0x108 (unreliable)^M
[ 346.536632][ T100] [c000000009487710] [c000000001712954] check_preemption_disabled+0x154/0x160^M
[ 346.538665][ T100] [c0000000094877a0] [c0000000002ce2d4] rcu_preempt_deferred_qs_irqrestore+0x74/0xed0^M
[ 346.540830][ T100] [c0000000094878b0] [c0000000002cf3c0] __rcu_read_unlock+0x290/0x3b0^M
[ 346.542746][ T100] [c000000009487910] [c0000000002bb330] rcu_torture_read_unlock+0x30/0xb0^M
[ 346.544779][ T100] [c000000009487930] [c0000000002b7ff8] rcutorture_one_extend+0x198/0x810^M
[ 346.546851][ T100] [c000000009487a10] [c0000000002b8bfc] rcu_torture_one_read+0x58c/0xc90^M
[ 346.548844][ T100] [c000000009487ca0] [c0000000002b942c] rcu_torture_reader+0x12c/0x360^M
[ 346.550784][ T100] [c000000009487db0] [c0000000001de978] kthread+0x1e8/0x220^M
[ 346.552555][ T100] [c000000009487e10] [c00000000000cd54] ret_from_kernel_thread+0x5c/0x64^M
After 12 days debugging, I finally narrow the problem to irq_soft_mask_set.
I am a beginner, hope I can be of some beneficial to the community ;-)
Thanks
Zhouyi
--
arch/powerpc/include/asm/hw_irq.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
index 26ede09c521d..a5ae8d82cc9d 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -121,7 +121,7 @@ static inline notrace unsigned long irq_soft_mask_return(void)
* for the critical section and as a clobber because
* we changed paca->irq_soft_mask
*/
-static inline notrace void irq_soft_mask_set(unsigned long mask)
+static inline notrace __no_kcsan __no_sanitize_address void irq_soft_mask_set(unsigned long mask)
{
/*
* The irq mask must always include the STD bit if any are set.
--
2.34.1
Le 21/08/2022 à 03:00, Zhouyi Zhou a écrit :
> In ppc, compiler based sanitizer will generate instrument instructions
> around statement WRITE_ONCE(local_paca->irq_soft_mask, mask):
>
> 0xc000000000295cb0 <+0>: addis r2,r12,774
> 0xc000000000295cb4 <+4>: addi r2,r2,16464
> 0xc000000000295cb8 <+8>: mflr r0
> 0xc000000000295cbc <+12>: bl 0xc00000000008bb4c <mcount>
> 0xc000000000295cc0 <+16>: mflr r0
> 0xc000000000295cc4 <+20>: std r31,-8(r1)
> 0xc000000000295cc8 <+24>: addi r3,r13,2354
> 0xc000000000295ccc <+28>: mr r31,r13
> 0xc000000000295cd0 <+32>: std r0,16(r1)
> 0xc000000000295cd4 <+36>: stdu r1,-48(r1)
> 0xc000000000295cd8 <+40>: bl 0xc000000000609b98 <__asan_store1+8>
> 0xc000000000295cdc <+44>: nop
> 0xc000000000295ce0 <+48>: li r9,1
> 0xc000000000295ce4 <+52>: stb r9,2354(r31)
> 0xc000000000295ce8 <+56>: addi r1,r1,48
> 0xc000000000295cec <+60>: ld r0,16(r1)
> 0xc000000000295cf0 <+64>: ld r31,-8(r1)
> 0xc000000000295cf4 <+68>: mtlr r0
>
> If there is a context switch before "stb r9,2354(r31)", r31 may
> not equal to r13, in such case, irq soft mask will not work.
>
> This patch disable sanitizer in irq_soft_mask_set.
Well spotted, thanks.
You should add:
Fixes: ef5b570d3700 ("powerpc/irq: Don't open code irq_soft_mask helpers")
By the way, I think the READ_ONCE() likely has the same issue so you
should fix irq_soft_mask_return() at the same time.
>
> Signed-off-by: Zhouyi Zhou <[email protected]>
> ---
> Dear PPC developers
>
> I found this bug when trying to do rcutorture tests in ppc VM of
> Open Source Lab of Oregon State University following Paul E. McKenny's guidance.
>
> console.log report following bug:
>
> [ 346.527467][ T100] BUG: using smp_processor_id() in preemptible [00000000] code: rcu_torture_rea/100^M
> [ 346.529416][ T100] caller is rcu_preempt_deferred_qs_irqrestore+0x74/0xed0^M
> [ 346.531157][ T100] CPU: 4 PID: 100 Comm: rcu_torture_rea Tainted: G W 5.19.0-rc5-next-20220708-dirty #253^M
> [ 346.533620][ T100] Call Trace:^M
> [ 346.534449][ T100] [c0000000094876c0] [c000000000ce2b68] dump_stack_lvl+0xbc/0x108 (unreliable)^M
> [ 346.536632][ T100] [c000000009487710] [c000000001712954] check_preemption_disabled+0x154/0x160^M
> [ 346.538665][ T100] [c0000000094877a0] [c0000000002ce2d4] rcu_preempt_deferred_qs_irqrestore+0x74/0xed0^M
> [ 346.540830][ T100] [c0000000094878b0] [c0000000002cf3c0] __rcu_read_unlock+0x290/0x3b0^M
> [ 346.542746][ T100] [c000000009487910] [c0000000002bb330] rcu_torture_read_unlock+0x30/0xb0^M
> [ 346.544779][ T100] [c000000009487930] [c0000000002b7ff8] rcutorture_one_extend+0x198/0x810^M
> [ 346.546851][ T100] [c000000009487a10] [c0000000002b8bfc] rcu_torture_one_read+0x58c/0xc90^M
> [ 346.548844][ T100] [c000000009487ca0] [c0000000002b942c] rcu_torture_reader+0x12c/0x360^M
> [ 346.550784][ T100] [c000000009487db0] [c0000000001de978] kthread+0x1e8/0x220^M
> [ 346.552555][ T100] [c000000009487e10] [c00000000000cd54] ret_from_kernel_thread+0x5c/0x64^M
>
> After 12 days debugging, I finally narrow the problem to irq_soft_mask_set.
>
> I am a beginner, hope I can be of some beneficial to the community ;-)
>
> Thanks
> Zhouyi
> --
> arch/powerpc/include/asm/hw_irq.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
> index 26ede09c521d..a5ae8d82cc9d 100644
> --- a/arch/powerpc/include/asm/hw_irq.h
> +++ b/arch/powerpc/include/asm/hw_irq.h
> @@ -121,7 +121,7 @@ static inline notrace unsigned long irq_soft_mask_return(void)
> * for the critical section and as a clobber because
> * we changed paca->irq_soft_mask
> */
> -static inline notrace void irq_soft_mask_set(unsigned long mask)
> +static inline notrace __no_kcsan __no_sanitize_address void irq_soft_mask_set(unsigned long mask)
> {
> /*
> * The irq mask must always include the STD bit if any are set.
On Mon, Aug 22, 2022 at 2:04 PM Christophe Leroy
<[email protected]> wrote:
>
>
>
> Le 21/08/2022 à 03:00, Zhouyi Zhou a écrit :
> > In ppc, compiler based sanitizer will generate instrument instructions
> > around statement WRITE_ONCE(local_paca->irq_soft_mask, mask):
> >
> > 0xc000000000295cb0 <+0>: addis r2,r12,774
> > 0xc000000000295cb4 <+4>: addi r2,r2,16464
> > 0xc000000000295cb8 <+8>: mflr r0
> > 0xc000000000295cbc <+12>: bl 0xc00000000008bb4c <mcount>
> > 0xc000000000295cc0 <+16>: mflr r0
> > 0xc000000000295cc4 <+20>: std r31,-8(r1)
> > 0xc000000000295cc8 <+24>: addi r3,r13,2354
> > 0xc000000000295ccc <+28>: mr r31,r13
> > 0xc000000000295cd0 <+32>: std r0,16(r1)
> > 0xc000000000295cd4 <+36>: stdu r1,-48(r1)
> > 0xc000000000295cd8 <+40>: bl 0xc000000000609b98 <__asan_store1+8>
> > 0xc000000000295cdc <+44>: nop
> > 0xc000000000295ce0 <+48>: li r9,1
> > 0xc000000000295ce4 <+52>: stb r9,2354(r31)
> > 0xc000000000295ce8 <+56>: addi r1,r1,48
> > 0xc000000000295cec <+60>: ld r0,16(r1)
> > 0xc000000000295cf0 <+64>: ld r31,-8(r1)
> > 0xc000000000295cf4 <+68>: mtlr r0
> >
> > If there is a context switch before "stb r9,2354(r31)", r31 may
> > not equal to r13, in such case, irq soft mask will not work.
> >
> > This patch disable sanitizer in irq_soft_mask_set.
>
> Well spotted, thanks.
Thank Christophe for reviewing my patch and your guidance!
>
> You should add:
>
> Fixes: ef5b570d3700 ("powerpc/irq: Don't open code irq_soft_mask helpers")
OK, I will do it!
>
> By the way, I think the READ_ONCE() likely has the same issue so you
> should fix irq_soft_mask_return() at the same time.
Yes, after disassembling irq_soft_mask_return, she has the same issue
as irq_soft_mask_set.
In addition, I read hw_irq.h by naked eye to search for removed inline
assembly one by one,
I found another place that we could possible enhance (Thank Paul E.
McKenny for teaching me use git blame ;-)):
077fc62b2b66a("powerpc/irq: remove inline assembly in hard_irq_disable macro")
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -282,9 +282,7 @@ static inline bool pmi_irq_pending(void)
flags = irq_soft_mask_set_return(IRQS_ALL_DISABLED); \
local_paca->irq_happened |= PACA_IRQ_HARD_DIS; \
if (!arch_irqs_disabled_flags(flags)) { \
- asm ("stdx %%r1, 0, %1 ;" \
- : "=m" (local_paca->saved_r1) \
- : "b" (&local_paca->saved_r1)); \
+ WRITE_ONCE(local_paca->saved_r1, current_stack_pointer);\
trace_hardirqs_off(); \
} \
} while(0)
I wrap the macro hard_irq_disable into a test function and disassemble
it, she has the above issue too:
(gdb) disassemble test002
Dump of assembler code for function test002:
0xc000000000295db0 <+0>: addis r2,r12,774
0xc000000000295db4 <+4>: addi r2,r2,16464
0xc000000000295db8 <+8>: mflr r0
0xc000000000295dbc <+12>: bl 0xc00000000008bacc <mcount>
0xc000000000295dc0 <+16>: mflr r0
0xc000000000295dc4 <+20>: std r30,-16(r1)
0xc000000000295dc8 <+24>: std r31,-8(r1)
0xc000000000295dcc <+28>: li r9,2
0xc000000000295dd0 <+32>: std r0,16(r1)
0xc000000000295dd4 <+36>: stdu r1,-48(r1)
0xc000000000295dd8 <+40>: mtmsrd r9,1
0xc000000000295ddc <+44>: addi r3,r13,2354
0xc000000000295de0 <+48>: mr r31,r13
0xc000000000295de4 <+52>: bl 0xc000000000609838 <__asan_load1+8>
0xc000000000295de8 <+56>: nop
0xc000000000295dec <+60>: li r3,3
0xc000000000295df0 <+64>: lbz r30,2354(r31)
0xc000000000295df4 <+68>: bl 0xc00000000028de90 <irq_soft_mask_set>
0xc000000000295df8 <+72>: mr r31,r13
0xc000000000295dfc <+76>: addi r3,r13,2355
0xc000000000295e00 <+80>: bl 0xc000000000609838 <__asan_load1+8>
0xc000000000295e04 <+84>: nop
0xc000000000295e08 <+88>: lbz r9,2355(r31)
0xc000000000295e0c <+92>: andi. r10,r30,1
0xc000000000295e10 <+96>: ori r9,r9,1
0xc000000000295e14 <+100>: stb r9,2355(r31)
0xc000000000295e18 <+104>: bne 0xc000000000295e30 <test002+128>
0xc000000000295e1c <+108>: mr r30,r1
0xc000000000295e20 <+112>: addi r3,r31,2328
0xc000000000295e24 <+116>: bl 0xc000000000609dd8 <__asan_store8+8>
0xc000000000295e28 <+120>: nop
0xc000000000295e2c <+124>: std r30,2328(r31)
0xc000000000295e30 <+128>: addi r1,r1,48
0xc000000000295e34 <+132>: ld r0,16(r1)
0xc000000000295e38 <+136>: ld r30,-16(r1)
0xc000000000295e3c <+140>: ld r31,-8(r1)
0xc000000000295e40 <+144>: mtlr r0
0xc000000000295e44 <+148>: blr
Could we rewrite this macro into a static inline function as
irq_soft_mask_set does, and disable sanitizer for it?
Thanks again
Cheers
Zhouyi
>
> >
> > Signed-off-by: Zhouyi Zhou <[email protected]>
> > ---
> > Dear PPC developers
> >
> > I found this bug when trying to do rcutorture tests in ppc VM of
> > Open Source Lab of Oregon State University following Paul E. McKenny's guidance.
> >
> > console.log report following bug:
> >
> > [ 346.527467][ T100] BUG: using smp_processor_id() in preemptible [00000000] code: rcu_torture_rea/100^M
> > [ 346.529416][ T100] caller is rcu_preempt_deferred_qs_irqrestore+0x74/0xed0^M
> > [ 346.531157][ T100] CPU: 4 PID: 100 Comm: rcu_torture_rea Tainted: G W 5.19.0-rc5-next-20220708-dirty #253^M
> > [ 346.533620][ T100] Call Trace:^M
> > [ 346.534449][ T100] [c0000000094876c0] [c000000000ce2b68] dump_stack_lvl+0xbc/0x108 (unreliable)^M
> > [ 346.536632][ T100] [c000000009487710] [c000000001712954] check_preemption_disabled+0x154/0x160^M
> > [ 346.538665][ T100] [c0000000094877a0] [c0000000002ce2d4] rcu_preempt_deferred_qs_irqrestore+0x74/0xed0^M
> > [ 346.540830][ T100] [c0000000094878b0] [c0000000002cf3c0] __rcu_read_unlock+0x290/0x3b0^M
> > [ 346.542746][ T100] [c000000009487910] [c0000000002bb330] rcu_torture_read_unlock+0x30/0xb0^M
> > [ 346.544779][ T100] [c000000009487930] [c0000000002b7ff8] rcutorture_one_extend+0x198/0x810^M
> > [ 346.546851][ T100] [c000000009487a10] [c0000000002b8bfc] rcu_torture_one_read+0x58c/0xc90^M
> > [ 346.548844][ T100] [c000000009487ca0] [c0000000002b942c] rcu_torture_reader+0x12c/0x360^M
> > [ 346.550784][ T100] [c000000009487db0] [c0000000001de978] kthread+0x1e8/0x220^M
> > [ 346.552555][ T100] [c000000009487e10] [c00000000000cd54] ret_from_kernel_thread+0x5c/0x64^M
> >
> > After 12 days debugging, I finally narrow the problem to irq_soft_mask_set.
> >
> > I am a beginner, hope I can be of some beneficial to the community ;-)
> >
> > Thanks
> > Zhouyi
> > --
> > arch/powerpc/include/asm/hw_irq.h | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
> > index 26ede09c521d..a5ae8d82cc9d 100644
> > --- a/arch/powerpc/include/asm/hw_irq.h
> > +++ b/arch/powerpc/include/asm/hw_irq.h
> > @@ -121,7 +121,7 @@ static inline notrace unsigned long irq_soft_mask_return(void)
> > * for the critical section and as a clobber because
> > * we changed paca->irq_soft_mask
> > */
> > -static inline notrace void irq_soft_mask_set(unsigned long mask)
> > +static inline notrace __no_kcsan __no_sanitize_address void irq_soft_mask_set(unsigned long mask)
> > {
> > /*
> > * The irq mask must always include the STD bit if any are set.
Le 23/08/2022 à 03:43, Zhouyi Zhou a écrit :
> On Mon, Aug 22, 2022 at 2:04 PM Christophe Leroy
> <[email protected]> wrote:
>>
>>
>>
>> Le 21/08/2022 à 03:00, Zhouyi Zhou a écrit :
>>> In ppc, compiler based sanitizer will generate instrument instructions
>>> around statement WRITE_ONCE(local_paca->irq_soft_mask, mask):
>>>
>>> 0xc000000000295cb0 <+0>: addis r2,r12,774
>>> 0xc000000000295cb4 <+4>: addi r2,r2,16464
>>> 0xc000000000295cb8 <+8>: mflr r0
>>> 0xc000000000295cbc <+12>: bl 0xc00000000008bb4c <mcount>
>>> 0xc000000000295cc0 <+16>: mflr r0
>>> 0xc000000000295cc4 <+20>: std r31,-8(r1)
>>> 0xc000000000295cc8 <+24>: addi r3,r13,2354
>>> 0xc000000000295ccc <+28>: mr r31,r13
>>> 0xc000000000295cd0 <+32>: std r0,16(r1)
>>> 0xc000000000295cd4 <+36>: stdu r1,-48(r1)
>>> 0xc000000000295cd8 <+40>: bl 0xc000000000609b98 <__asan_store1+8>
>>> 0xc000000000295cdc <+44>: nop
>>> 0xc000000000295ce0 <+48>: li r9,1
>>> 0xc000000000295ce4 <+52>: stb r9,2354(r31)
>>> 0xc000000000295ce8 <+56>: addi r1,r1,48
>>> 0xc000000000295cec <+60>: ld r0,16(r1)
>>> 0xc000000000295cf0 <+64>: ld r31,-8(r1)
>>> 0xc000000000295cf4 <+68>: mtlr r0
>>>
>>> If there is a context switch before "stb r9,2354(r31)", r31 may
>>> not equal to r13, in such case, irq soft mask will not work.
>>>
>>> This patch disable sanitizer in irq_soft_mask_set.
>>
>> Well spotted, thanks.
> Thank Christophe for reviewing my patch and your guidance!
>>
>> You should add:
>>
>> Fixes: ef5b570d3700 ("powerpc/irq: Don't open code irq_soft_mask helpers")
> OK, I will do it!
>>
>> By the way, I think the READ_ONCE() likely has the same issue so you
>> should fix irq_soft_mask_return() at the same time.
> Yes, after disassembling irq_soft_mask_return, she has the same issue
> as irq_soft_mask_set.
>
> In addition, I read hw_irq.h by naked eye to search for removed inline
> assembly one by one,
> I found another place that we could possible enhance (Thank Paul E.
> McKenny for teaching me use git blame ;-)):
> 077fc62b2b66a("powerpc/irq: remove inline assembly in hard_irq_disable macro")
> --- a/arch/powerpc/include/asm/hw_irq.h
> +++ b/arch/powerpc/include/asm/hw_irq.h
> @@ -282,9 +282,7 @@ static inline bool pmi_irq_pending(void)
> flags = irq_soft_mask_set_return(IRQS_ALL_DISABLED); \
> local_paca->irq_happened |= PACA_IRQ_HARD_DIS; \
> if (!arch_irqs_disabled_flags(flags)) { \
> - asm ("stdx %%r1, 0, %1 ;" \
> - : "=m" (local_paca->saved_r1) \
> - : "b" (&local_paca->saved_r1)); \
> + WRITE_ONCE(local_paca->saved_r1, current_stack_pointer);\
> trace_hardirqs_off(); \
> } \
> } while(0)
> I wrap the macro hard_irq_disable into a test function and disassemble
> it, she has the above issue too:
> (gdb) disassemble test002
> Dump of assembler code for function test002:
...
> Could we rewrite this macro into a static inline function as
> irq_soft_mask_set does, and disable sanitizer for it?
Yes we could try to do that, hoping it will not bring too much troubles
with circular header inclusion.
Another possible approach could be to create a WRITE_ONCE_NOCHECK() more
or less similar to existing READ_ONCE_NOCHECK().
We could also change READ_ONCE_NOCHECK() to accept bytes size then use
it for irq_soft_mask_return() .
Christophe
Zhouyi Zhou <[email protected]> writes:
> In ppc, compiler based sanitizer will generate instrument instructions
> around statement WRITE_ONCE(local_paca->irq_soft_mask, mask):
>
> 0xc000000000295cb0 <+0>: addis r2,r12,774
> 0xc000000000295cb4 <+4>: addi r2,r2,16464
> 0xc000000000295cb8 <+8>: mflr r0
> 0xc000000000295cbc <+12>: bl 0xc00000000008bb4c <mcount>
> 0xc000000000295cc0 <+16>: mflr r0
> 0xc000000000295cc4 <+20>: std r31,-8(r1)
> 0xc000000000295cc8 <+24>: addi r3,r13,2354
> 0xc000000000295ccc <+28>: mr r31,r13
> 0xc000000000295cd0 <+32>: std r0,16(r1)
> 0xc000000000295cd4 <+36>: stdu r1,-48(r1)
> 0xc000000000295cd8 <+40>: bl 0xc000000000609b98 <__asan_store1+8>
> 0xc000000000295cdc <+44>: nop
> 0xc000000000295ce0 <+48>: li r9,1
> 0xc000000000295ce4 <+52>: stb r9,2354(r31)
> 0xc000000000295ce8 <+56>: addi r1,r1,48
> 0xc000000000295cec <+60>: ld r0,16(r1)
> 0xc000000000295cf0 <+64>: ld r31,-8(r1)
> 0xc000000000295cf4 <+68>: mtlr r0
>
> If there is a context switch before "stb r9,2354(r31)", r31 may
> not equal to r13, in such case, irq soft mask will not work.
>
> This patch disable sanitizer in irq_soft_mask_set.
>
> Signed-off-by: Zhouyi Zhou <[email protected]>
> ---
> Dear PPC developers
>
> I found this bug when trying to do rcutorture tests in ppc VM of
> Open Source Lab of Oregon State University following Paul E. McKenny's guidance.
>
> console.log report following bug:
>
> [ 346.527467][ T100] BUG: using smp_processor_id() in preemptible [00000000] code: rcu_torture_rea/100^M
> [ 346.529416][ T100] caller is rcu_preempt_deferred_qs_irqrestore+0x74/0xed0^M
> [ 346.531157][ T100] CPU: 4 PID: 100 Comm: rcu_torture_rea Tainted: G W 5.19.0-rc5-next-20220708-dirty #253^M
> [ 346.533620][ T100] Call Trace:^M
> [ 346.534449][ T100] [c0000000094876c0] [c000000000ce2b68] dump_stack_lvl+0xbc/0x108 (unreliable)^M
> [ 346.536632][ T100] [c000000009487710] [c000000001712954] check_preemption_disabled+0x154/0x160^M
> [ 346.538665][ T100] [c0000000094877a0] [c0000000002ce2d4] rcu_preempt_deferred_qs_irqrestore+0x74/0xed0^M
> [ 346.540830][ T100] [c0000000094878b0] [c0000000002cf3c0] __rcu_read_unlock+0x290/0x3b0^M
> [ 346.542746][ T100] [c000000009487910] [c0000000002bb330] rcu_torture_read_unlock+0x30/0xb0^M
> [ 346.544779][ T100] [c000000009487930] [c0000000002b7ff8] rcutorture_one_extend+0x198/0x810^M
> [ 346.546851][ T100] [c000000009487a10] [c0000000002b8bfc] rcu_torture_one_read+0x58c/0xc90^M
> [ 346.548844][ T100] [c000000009487ca0] [c0000000002b942c] rcu_torture_reader+0x12c/0x360^M
> [ 346.550784][ T100] [c000000009487db0] [c0000000001de978] kthread+0x1e8/0x220^M
> [ 346.552555][ T100] [c000000009487e10] [c00000000000cd54] ret_from_kernel_thread+0x5c/0x64^M
>
> After 12 days debugging, I finally narrow the problem to irq_soft_mask_set.
Thanks for spending 12 days debugging it! O_o
> diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
> index 26ede09c521d..a5ae8d82cc9d 100644
> --- a/arch/powerpc/include/asm/hw_irq.h
> +++ b/arch/powerpc/include/asm/hw_irq.h
> @@ -121,7 +121,7 @@ static inline notrace unsigned long irq_soft_mask_return(void)
> * for the critical section and as a clobber because
> * we changed paca->irq_soft_mask
> */
> -static inline notrace void irq_soft_mask_set(unsigned long mask)
> +static inline notrace __no_kcsan __no_sanitize_address void irq_soft_mask_set(unsigned long mask)
> {
> /*
> * The irq mask must always include the STD bit if any are set.
My worry is that this will force irq_soft_mask_set() out of line, which
we would rather avoid. It's meant to be a fast path.
In fact with this applied I see nearly 300 out-of-line copies of the
function when building a defconfig, and ~1700 calls to it.
Normally it is inlined at every call site.
So I think I'm inclined to revert ef5b570d3700 ("powerpc/irq: Don't open
code irq_soft_mask helpers").
It was a nice looking cleanup, but those loads must not be instrumented
by KASAN, but we also want them inlined, and AFAICS the only way to
achieve that is to go back to inline asm.
cheers
Le 23/08/2022 à 10:33, Michael Ellerman a écrit :
> Zhouyi Zhou <[email protected]> writes:
>> In ppc, compiler based sanitizer will generate instrument instructions
>> around statement WRITE_ONCE(local_paca->irq_soft_mask, mask):
>>
>> 0xc000000000295cb0 <+0>: addis r2,r12,774
>> 0xc000000000295cb4 <+4>: addi r2,r2,16464
>> 0xc000000000295cb8 <+8>: mflr r0
>> 0xc000000000295cbc <+12>: bl 0xc00000000008bb4c <mcount>
>> 0xc000000000295cc0 <+16>: mflr r0
>> 0xc000000000295cc4 <+20>: std r31,-8(r1)
>> 0xc000000000295cc8 <+24>: addi r3,r13,2354
>> 0xc000000000295ccc <+28>: mr r31,r13
>> 0xc000000000295cd0 <+32>: std r0,16(r1)
>> 0xc000000000295cd4 <+36>: stdu r1,-48(r1)
>> 0xc000000000295cd8 <+40>: bl 0xc000000000609b98 <__asan_store1+8>
>> 0xc000000000295cdc <+44>: nop
>> 0xc000000000295ce0 <+48>: li r9,1
>> 0xc000000000295ce4 <+52>: stb r9,2354(r31)
>> 0xc000000000295ce8 <+56>: addi r1,r1,48
>> 0xc000000000295cec <+60>: ld r0,16(r1)
>> 0xc000000000295cf0 <+64>: ld r31,-8(r1)
>> 0xc000000000295cf4 <+68>: mtlr r0
>>
>> If there is a context switch before "stb r9,2354(r31)", r31 may
>> not equal to r13, in such case, irq soft mask will not work.
>>
>> This patch disable sanitizer in irq_soft_mask_set.
>>
>> Signed-off-by: Zhouyi Zhou <[email protected]>
>> ---
>> Dear PPC developers
>>
>> I found this bug when trying to do rcutorture tests in ppc VM of
>> Open Source Lab of Oregon State University following Paul E. McKenny's guidance.
>>
>> console.log report following bug:
>>
>> [ 346.527467][ T100] BUG: using smp_processor_id() in preemptible [00000000] code: rcu_torture_rea/100^M
>> [ 346.529416][ T100] caller is rcu_preempt_deferred_qs_irqrestore+0x74/0xed0^M
>> [ 346.531157][ T100] CPU: 4 PID: 100 Comm: rcu_torture_rea Tainted: G W 5.19.0-rc5-next-20220708-dirty #253^M
>> [ 346.533620][ T100] Call Trace:^M
>> [ 346.534449][ T100] [c0000000094876c0] [c000000000ce2b68] dump_stack_lvl+0xbc/0x108 (unreliable)^M
>> [ 346.536632][ T100] [c000000009487710] [c000000001712954] check_preemption_disabled+0x154/0x160^M
>> [ 346.538665][ T100] [c0000000094877a0] [c0000000002ce2d4] rcu_preempt_deferred_qs_irqrestore+0x74/0xed0^M
>> [ 346.540830][ T100] [c0000000094878b0] [c0000000002cf3c0] __rcu_read_unlock+0x290/0x3b0^M
>> [ 346.542746][ T100] [c000000009487910] [c0000000002bb330] rcu_torture_read_unlock+0x30/0xb0^M
>> [ 346.544779][ T100] [c000000009487930] [c0000000002b7ff8] rcutorture_one_extend+0x198/0x810^M
>> [ 346.546851][ T100] [c000000009487a10] [c0000000002b8bfc] rcu_torture_one_read+0x58c/0xc90^M
>> [ 346.548844][ T100] [c000000009487ca0] [c0000000002b942c] rcu_torture_reader+0x12c/0x360^M
>> [ 346.550784][ T100] [c000000009487db0] [c0000000001de978] kthread+0x1e8/0x220^M
>> [ 346.552555][ T100] [c000000009487e10] [c00000000000cd54] ret_from_kernel_thread+0x5c/0x64^M
>>
>> After 12 days debugging, I finally narrow the problem to irq_soft_mask_set.
>
> Thanks for spending 12 days debugging it! O_o
>
>> diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
>> index 26ede09c521d..a5ae8d82cc9d 100644
>> --- a/arch/powerpc/include/asm/hw_irq.h
>> +++ b/arch/powerpc/include/asm/hw_irq.h
>> @@ -121,7 +121,7 @@ static inline notrace unsigned long irq_soft_mask_return(void)
>> * for the critical section and as a clobber because
>> * we changed paca->irq_soft_mask
>> */
>> -static inline notrace void irq_soft_mask_set(unsigned long mask)
>> +static inline notrace __no_kcsan __no_sanitize_address void irq_soft_mask_set(unsigned long mask)
>> {
>> /*
>> * The irq mask must always include the STD bit if any are set.
>
> My worry is that this will force irq_soft_mask_set() out of line, which
> we would rather avoid. It's meant to be a fast path.
>
> In fact with this applied I see nearly 300 out-of-line copies of the
> function when building a defconfig, and ~1700 calls to it.
>
> Normally it is inlined at every call site.
>
>
> So I think I'm inclined to revert ef5b570d3700 ("powerpc/irq: Don't open
> code irq_soft_mask helpers").
Could you revert it only partially ? In extenso, revert the
READ/WRITE_ONCE and bring back the inline asm in irq_soft_mask_return()
and irq_soft_mask_set(), but keep other changes.
>
> It was a nice looking cleanup, but those loads must not be instrumented
> by KASAN, but we also want them inlined, and AFAICS the only way to
> achieve that is to go back to inline asm.
>
It's a pitty.
Would it be acceptable to have it out of line when KASAN is selected and
inline otherwise ? In that case there is __no_sanitize_or_inline
Christophe
Le 23/08/2022 à 10:47, Christophe Leroy a écrit :
>
>
> Le 23/08/2022 à 10:33, Michael Ellerman a écrit :
>> Zhouyi Zhou <[email protected]> writes:
>>
>> My worry is that this will force irq_soft_mask_set() out of line, which
>> we would rather avoid. It's meant to be a fast path.
>>
>> In fact with this applied I see nearly 300 out-of-line copies of the
>> function when building a defconfig, and ~1700 calls to it.
>>
>> Normally it is inlined at every call site.
>>
>>
>> So I think I'm inclined to revert ef5b570d3700 ("powerpc/irq: Don't open
>> code irq_soft_mask helpers").
>
> Could you revert it only partially ? In extenso, revert the
> READ/WRITE_ONCE and bring back the inline asm in irq_soft_mask_return()
> and irq_soft_mask_set(), but keep other changes.
I sent a patch doing that.
Christophe
On Wed, Aug 24, 2022 at 12:50 AM Christophe Leroy
<[email protected]> wrote:
>
>
>
> Le 23/08/2022 à 10:47, Christophe Leroy a écrit :
> >
> >
> > Le 23/08/2022 à 10:33, Michael Ellerman a écrit :
> >> Zhouyi Zhou <[email protected]> writes:
> >>
> >> My worry is that this will force irq_soft_mask_set() out of line, which
> >> we would rather avoid. It's meant to be a fast path.
> >>
> >> In fact with this applied I see nearly 300 out-of-line copies of the
> >> function when building a defconfig, and ~1700 calls to it.
> >>
> >> Normally it is inlined at every call site.
> >>
> >>
> >> So I think I'm inclined to revert ef5b570d3700 ("powerpc/irq: Don't open
> >> code irq_soft_mask helpers").
> >
> > Could you revert it only partially ? In extenso, revert the
> > READ/WRITE_ONCE and bring back the inline asm in irq_soft_mask_return()
> > and irq_soft_mask_set(), but keep other changes.
>
> I sent a patch doing that.
Thank Christophe for the fix. I am very glad to be of benefit to the
community ;-)
Also thank Michael and Paul for your constant encouragement and
guidance, I learned to use objdump to count the number of failed
inline function calls today ;-)
By the way, from my experiments, both gcc-11 and clang-14 behave the
same as Michael has described.
Cheers
Zhouyi
>
> Christophe