2021-06-02 11:46:45

by Zheng Yejian

[permalink] [raw]
Subject: [BUG] I found a bug when try to enable record_ftrace

Hello,

There may be a deadlock caused by ftrace recursion when try to enable
record_ftrace.
I'd like to known if the patchset
(https://lore.kernel.org/lkml/[email protected]/) is
trying to fix it.

Procedure for reproducing the problem is:
1) this problem appears both in 5.13-rc4 and 5.10;

2) my work environment is:
qemu-arm version 4.0.0
arm-linux-gnueabi-gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0

3) then try to enable record_ftrace:
/ # mount -t pstore pstore /sys/fs/pstore
/ # echo 1 > /sys/kernel/debug/pstore/record_ftrace

4) then system appears to be stuck, and use arm-linux-gnueabi-gdb
dump the following call stack:
#0 arch_spin_lock (lock=0x811a0f98) at
/home/zyj/Linux/linux-master/arch/arm/include/asm/spinlock.h:74
#1 do_raw_spin_lock_flags (flags=<synthetic pointer>,
lock=0x811a0f98) at
/home/zyj/Linux/linux-master/include/linux/spinlock.h:195
#2 __raw_spin_lock_irqsave (lock=0x811a0f98) at
/home/zyj/Linux/linux-master/include/linux/spinlock_api_smp.h:119
#3 _raw_spin_lock_irqsave (lock=lock@entry=0x811a0f98) at
/home/zyj/Linux/linux-master/kernel/locking/spinlock.c:159
#4 0x8046c6e0 in buffer_size_add (prz=prz@entry=0x811a0f80,
a=a@entry=16) at /home/zyj/Linux/linux-master/fs/pstore/ram_core.c:82
#5 0x8046cc20 in persistent_ram_write (prz=0x811a0f80,
s=0x81137b80, count=16) at
/home/zyj/Linux/linux-master/fs/pstore/ram_core.c:327
#6 0x8046b438 in ramoops_pstore_write (record=0x81137b90) at
/home/zyj/Linux/linux-master/fs/pstore/ram.c:331
#7 0x8046add8 in pstore_ftrace_call (ip=2156609456,
parent_ip=2152122068, op=<optimized out>, fregs=<optimized out>)
at /home/zyj/Linux/linux-master/fs/pstore/ftrace.c:54
#8 0x801dd580 in __ftrace_ops_list_func (ignored=0x0, fregs=0x0,
parent_ip=2152122068, ip=2156609456) at
/home/zyj/Linux/linux-master/kernel/trace/ftrace.c:7003
#9 ftrace_ops_list_func (ip=2156609456, parent_ip=2152122068,
op=<optimized out>, fregs=0x0) at
/home/zyj/Linux/linux-master/kernel/trace/ftrace.c:7028
#10 0x801109f8 in ftrace_caller () at
/home/zyj/Linux/linux-master/arch/arm/kernel/entry-ftrace.S:224
#11 0x808b3fb4 in _raw_spin_unlock_irqrestore
(lock=lock@entry=0x811a0f98, flags=1610612883) at
/home/zyj/Linux/linux-master/kernel/locking/spinlock.c:190
#12 0x8046c6d4 in buffer_size_add (prz=prz@entry=0x811a0f80,
a=a@entry=16) at /home/zyj/Linux/linux-master/fs/pstore/ram_core.c:95
#13 0x8046cc20 in persistent_ram_write (prz=0x811a0f80,
s=0x81137cf8, count=16) at
/home/zyj/Linux/linux-master/fs/pstore/ram_core.c:327
#14 0x8046b438 in ramoops_pstore_write (record=0x81137d08) at
/home/zyj/Linux/linux-master/fs/pstore/ram.c:331
#15 0x8046add8 in pstore_ftrace_call (ip=2148632188,
parent_ip=2148601660, op=<optimized out>, fregs=<optimized out>)
at /home/zyj/Linux/linux-master/fs/pstore/ftrace.c:54
#16 0x801dd580 in __ftrace_ops_list_func (ignored=0x0, fregs=0x0,
parent_ip=2148601660, ip=2148632188) at
/home/zyj/Linux/linux-master/kernel/trace/ftrace.c:7003
#17 ftrace_ops_list_func (ip=2148632188, parent_ip=2148601660,
op=<optimized out>, fregs=0x0) at
/home/zyj/Linux/linux-master/kernel/trace/ftrace.c:7028
#18 0x801109f8 in ftrace_caller () at
/home/zyj/Linux/linux-master/arch/arm/kernel/entry-ftrace.S:224
#19 0x80118680 in __set_fixmap (idx=idx@entry=FIX_TEXT_POKE0,
phys=phys@entry=0, prot=prot@entry=0) at
/home/zyj/Linux/linux-master/arch/arm/mm/mmu.c:385
#20 0x80110f3c in patch_unmap (flags=<synthetic pointer>,
fixmap=129) at /home/zyj/Linux/linux-master/arch/arm/kernel/patch.c:45
#21 __patch_text_real (addr=addr@entry=0x808b3fb0
<_raw_spin_unlock_irqrestore+16>, insn=insn@entry=3957420680,
remap=remap@entry=true)
at /home/zyj/Linux/linux-master/arch/arm/kernel/patch.c:104
#22 0x80110b40 in __patch_text (insn=3957420680, addr=0x808b3fb0
<_raw_spin_unlock_irqrestore+16>) at
/home/zyj/Linux/linux-master/arch/arm/include/asm/patch.h:10
#23 ftrace_modify_code (pc=2156609456, old=3904716800,
new=3957420680, validate=true) at
/home/zyj/Linux/linux-master/arch/arm/kernel/ftrace.c:95
#24 0x80110cf4 in ftrace_make_call (rec=rec@entry=0x81007ce0,
addr=addr@entry=2148600280) at
/home/zyj/Linux/linux-master/arch/arm/kernel/ftrace.c:132
#25 0x801dda40 in __ftrace_replace_code (rec=rec@entry=0x81007ce0,
enable=enable@entry=true) at
/home/zyj/Linux/linux-master/kernel/trace/ftrace.c:2524
#26 0x801de17c in ftrace_replace_code (mod_flags=mod_flags@entry=1)
at /home/zyj/Linux/linux-master/kernel/trace/ftrace.c:2554
#27 0x801de39c in ftrace_modify_all_code (command=5) at
/home/zyj/Linux/linux-master/kernel/trace/ftrace.c:2698
#28 0x80110ad8 in __ftrace_modify_code (data=<optimized out>) at
/home/zyj/Linux/linux-master/arch/arm/kernel/ftrace.c:39
#29 0x801d4da4 in multi_cpu_stop (data=data@entry=0x81c69d84) at
/home/zyj/Linux/linux-master/kernel/stop_machine.c:240
#30 0x801d4a14 in cpu_stopper_thread (cpu=<optimized out>) at
/home/zyj/Linux/linux-master/kernel/stop_machine.c:511
#31 0x8014de3c in smpboot_thread_fn (data=0x810c1c80) at
/home/zyj/Linux/linux-master/kernel/smpboot.c:165
#32 0x8014a0dc in kthread (_create=0x810c1d40) at
/home/zyj/Linux/linux-master/kernel/kthread.c:313
#33 0x80100150 in ret_from_fork () at
/home/zyj/Linux/linux-master/arch/arm/kernel/entry-common.S:158
Backtrace stopped: previous frame identical to this frame (corrupt
stack?)

See above #7~#15, there is a recursion in function pstore_ftrace_call(),
and a spin_lock(lock=0x811a0f98) is hold since unlock operation in above
#11 not finished. Then in above #0, trying to acquire same lock cause a
deadlock.

Enabling 'record_ftrace' seems a basic operation of pstore/ftrace, Does
it mean this feature is not available for a while?

Best Regards,
Zheng


2021-06-02 13:55:01

by Steven Rostedt

[permalink] [raw]
Subject: Re: [BUG] I found a bug when try to enable record_ftrace

On Wed, 2 Jun 2021 15:42:23 +0800
"Zhengyejian (Zetta)" <[email protected]> wrote:

> Hello,
>
> There may be a deadlock caused by ftrace recursion when try to enable
> record_ftrace.
> I'd like to known if the patchset
> (https://lore.kernel.org/lkml/[email protected]/) is
> trying to fix it.
>
> Procedure for reproducing the problem is:
> 1) this problem appears both in 5.13-rc4 and 5.10;
>
> 2) my work environment is:
> qemu-arm version 4.0.0
> arm-linux-gnueabi-gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
>
> 3) then try to enable record_ftrace:
> / # mount -t pstore pstore /sys/fs/pstore
> / # echo 1 > /sys/kernel/debug/pstore/record_ftrace
>
> 4) then system appears to be stuck, and use arm-linux-gnueabi-gdb
> dump the following call stack:
> #0 arch_spin_lock (lock=0x811a0f98) at
> /home/zyj/Linux/linux-master/arch/arm/include/asm/spinlock.h:74
> #1 do_raw_spin_lock_flags (flags=<synthetic pointer>,
> lock=0x811a0f98) at
> /home/zyj/Linux/linux-master/include/linux/spinlock.h:195
> #2 __raw_spin_lock_irqsave (lock=0x811a0f98) at
> /home/zyj/Linux/linux-master/include/linux/spinlock_api_smp.h:119
> #3 _raw_spin_lock_irqsave (lock=lock@entry=0x811a0f98) at
> /home/zyj/Linux/linux-master/kernel/locking/spinlock.c:159
> #4 0x8046c6e0 in buffer_size_add (prz=prz@entry=0x811a0f80,
> a=a@entry=16) at /home/zyj/Linux/linux-master/fs/pstore/ram_core.c:82
> #5 0x8046cc20 in persistent_ram_write (prz=0x811a0f80,
> s=0x81137b80, count=16) at
> /home/zyj/Linux/linux-master/fs/pstore/ram_core.c:327
> #6 0x8046b438 in ramoops_pstore_write (record=0x81137b90) at
> /home/zyj/Linux/linux-master/fs/pstore/ram.c:331
> #7 0x8046add8 in pstore_ftrace_call (ip=2156609456,
> parent_ip=2152122068, op=<optimized out>, fregs=<optimized out>)
> at /home/zyj/Linux/linux-master/fs/pstore/ftrace.c:54
> #8 0x801dd580 in __ftrace_ops_list_func (ignored=0x0, fregs=0x0,
> parent_ip=2152122068, ip=2156609456) at
> /home/zyj/Linux/linux-master/kernel/trace/ftrace.c:7003
> #9 ftrace_ops_list_func (ip=2156609456, parent_ip=2152122068,
> op=<optimized out>, fregs=0x0) at
> /home/zyj/Linux/linux-master/kernel/trace/ftrace.c:7028
> #10 0x801109f8 in ftrace_caller () at
> /home/zyj/Linux/linux-master/arch/arm/kernel/entry-ftrace.S:224
> #11 0x808b3fb4 in _raw_spin_unlock_irqrestore
> (lock=lock@entry=0x811a0f98, flags=1610612883) at
> /home/zyj/Linux/linux-master/kernel/locking/spinlock.c:190
> #12 0x8046c6d4 in buffer_size_add (prz=prz@entry=0x811a0f80,
> a=a@entry=16) at /home/zyj/Linux/linux-master/fs/pstore/ram_core.c:95
> #13 0x8046cc20 in persistent_ram_write (prz=0x811a0f80,
> s=0x81137cf8, count=16) at
> /home/zyj/Linux/linux-master/fs/pstore/ram_core.c:327
> #14 0x8046b438 in ramoops_pstore_write (record=0x81137d08) at
> /home/zyj/Linux/linux-master/fs/pstore/ram.c:331
> #15 0x8046add8 in pstore_ftrace_call (ip=2148632188,
> parent_ip=2148601660, op=<optimized out>, fregs=<optimized out>)
> at /home/zyj/Linux/linux-master/fs/pstore/ftrace.c:54
> #16 0x801dd580 in __ftrace_ops_list_func (ignored=0x0, fregs=0x0,
> parent_ip=2148601660, ip=2148632188) at
> /home/zyj/Linux/linux-master/kernel/trace/ftrace.c:7003
> #17 ftrace_ops_list_func (ip=2148632188, parent_ip=2148601660,
> op=<optimized out>, fregs=0x0) at
> /home/zyj/Linux/linux-master/kernel/trace/ftrace.c:7028
> #18 0x801109f8 in ftrace_caller () at
> /home/zyj/Linux/linux-master/arch/arm/kernel/entry-ftrace.S:224
> #19 0x80118680 in __set_fixmap (idx=idx@entry=FIX_TEXT_POKE0,
> phys=phys@entry=0, prot=prot@entry=0) at
> /home/zyj/Linux/linux-master/arch/arm/mm/mmu.c:385
> #20 0x80110f3c in patch_unmap (flags=<synthetic pointer>,
> fixmap=129) at /home/zyj/Linux/linux-master/arch/arm/kernel/patch.c:45
> #21 __patch_text_real (addr=addr@entry=0x808b3fb0
> <_raw_spin_unlock_irqrestore+16>, insn=insn@entry=3957420680,
> remap=remap@entry=true)
> at /home/zyj/Linux/linux-master/arch/arm/kernel/patch.c:104
> #22 0x80110b40 in __patch_text (insn=3957420680, addr=0x808b3fb0
> <_raw_spin_unlock_irqrestore+16>) at
> /home/zyj/Linux/linux-master/arch/arm/include/asm/patch.h:10
> #23 ftrace_modify_code (pc=2156609456, old=3904716800,
> new=3957420680, validate=true) at
> /home/zyj/Linux/linux-master/arch/arm/kernel/ftrace.c:95
> #24 0x80110cf4 in ftrace_make_call (rec=rec@entry=0x81007ce0,
> addr=addr@entry=2148600280) at
> /home/zyj/Linux/linux-master/arch/arm/kernel/ftrace.c:132
> #25 0x801dda40 in __ftrace_replace_code (rec=rec@entry=0x81007ce0,
> enable=enable@entry=true) at
> /home/zyj/Linux/linux-master/kernel/trace/ftrace.c:2524
> #26 0x801de17c in ftrace_replace_code (mod_flags=mod_flags@entry=1)
> at /home/zyj/Linux/linux-master/kernel/trace/ftrace.c:2554
> #27 0x801de39c in ftrace_modify_all_code (command=5) at
> /home/zyj/Linux/linux-master/kernel/trace/ftrace.c:2698
> #28 0x80110ad8 in __ftrace_modify_code (data=<optimized out>) at
> /home/zyj/Linux/linux-master/arch/arm/kernel/ftrace.c:39
> #29 0x801d4da4 in multi_cpu_stop (data=data@entry=0x81c69d84) at
> /home/zyj/Linux/linux-master/kernel/stop_machine.c:240
> #30 0x801d4a14 in cpu_stopper_thread (cpu=<optimized out>) at
> /home/zyj/Linux/linux-master/kernel/stop_machine.c:511
> #31 0x8014de3c in smpboot_thread_fn (data=0x810c1c80) at
> /home/zyj/Linux/linux-master/kernel/smpboot.c:165
> #32 0x8014a0dc in kthread (_create=0x810c1d40) at
> /home/zyj/Linux/linux-master/kernel/kthread.c:313
> #33 0x80100150 in ret_from_fork () at
> /home/zyj/Linux/linux-master/arch/arm/kernel/entry-common.S:158
> Backtrace stopped: previous frame identical to this frame (corrupt
> stack?)
>
> See above #7~#15, there is a recursion in function pstore_ftrace_call(),
> and a spin_lock(lock=0x811a0f98) is hold since unlock operation in above
> #11 not finished. Then in above #0, trying to acquire same lock cause a
> deadlock.
>
> Enabling 'record_ftrace' seems a basic operation of pstore/ftrace, Does
> it mean this feature is not available for a while?

I don't use pstore or arm. That is a question for the pstore folks.

-- Steve

2021-06-02 17:45:28

by Kees Cook

[permalink] [raw]
Subject: Re: [BUG] I found a bug when try to enable record_ftrace


On Wed, Jun 02, 2021 at 03:42:23PM +0800, Zhengyejian (Zetta) wrote:
> Hello,
>
> There may be a deadlock caused by ftrace recursion when try to enable
> record_ftrace.

Hi,

Thanks for the report. Joel, is this something you can take a look at?

-Kees

> I'd like to known if the patchset
> (https://lore.kernel.org/lkml/[email protected]/) is
> trying to fix it.
>
> Procedure for reproducing the problem is:
> 1) this problem appears both in 5.13-rc4 and 5.10;
>
> 2) my work environment is:
> qemu-arm version 4.0.0
> arm-linux-gnueabi-gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
>
> 3) then try to enable record_ftrace:
> / # mount -t pstore pstore /sys/fs/pstore
> / # echo 1 > /sys/kernel/debug/pstore/record_ftrace
>
> 4) then system appears to be stuck, and use arm-linux-gnueabi-gdb dump the
> following call stack:
> #0 arch_spin_lock (lock=0x811a0f98) at
> /home/zyj/Linux/linux-master/arch/arm/include/asm/spinlock.h:74
> #1 do_raw_spin_lock_flags (flags=<synthetic pointer>, lock=0x811a0f98)
> at /home/zyj/Linux/linux-master/include/linux/spinlock.h:195
> #2 __raw_spin_lock_irqsave (lock=0x811a0f98) at
> /home/zyj/Linux/linux-master/include/linux/spinlock_api_smp.h:119
> #3 _raw_spin_lock_irqsave (lock=lock@entry=0x811a0f98) at
> /home/zyj/Linux/linux-master/kernel/locking/spinlock.c:159
> #4 0x8046c6e0 in buffer_size_add (prz=prz@entry=0x811a0f80,
> a=a@entry=16) at /home/zyj/Linux/linux-master/fs/pstore/ram_core.c:82
> #5 0x8046cc20 in persistent_ram_write (prz=0x811a0f80, s=0x81137b80,
> count=16) at /home/zyj/Linux/linux-master/fs/pstore/ram_core.c:327
> #6 0x8046b438 in ramoops_pstore_write (record=0x81137b90) at
> /home/zyj/Linux/linux-master/fs/pstore/ram.c:331
> #7 0x8046add8 in pstore_ftrace_call (ip=2156609456,
> parent_ip=2152122068, op=<optimized out>, fregs=<optimized out>)
> at /home/zyj/Linux/linux-master/fs/pstore/ftrace.c:54
> #8 0x801dd580 in __ftrace_ops_list_func (ignored=0x0, fregs=0x0,
> parent_ip=2152122068, ip=2156609456) at
> /home/zyj/Linux/linux-master/kernel/trace/ftrace.c:7003
> #9 ftrace_ops_list_func (ip=2156609456, parent_ip=2152122068,
> op=<optimized out>, fregs=0x0) at
> /home/zyj/Linux/linux-master/kernel/trace/ftrace.c:7028
> #10 0x801109f8 in ftrace_caller () at
> /home/zyj/Linux/linux-master/arch/arm/kernel/entry-ftrace.S:224
> #11 0x808b3fb4 in _raw_spin_unlock_irqrestore
> (lock=lock@entry=0x811a0f98, flags=1610612883) at
> /home/zyj/Linux/linux-master/kernel/locking/spinlock.c:190
> #12 0x8046c6d4 in buffer_size_add (prz=prz@entry=0x811a0f80,
> a=a@entry=16) at /home/zyj/Linux/linux-master/fs/pstore/ram_core.c:95
> #13 0x8046cc20 in persistent_ram_write (prz=0x811a0f80, s=0x81137cf8,
> count=16) at /home/zyj/Linux/linux-master/fs/pstore/ram_core.c:327
> #14 0x8046b438 in ramoops_pstore_write (record=0x81137d08) at
> /home/zyj/Linux/linux-master/fs/pstore/ram.c:331
> #15 0x8046add8 in pstore_ftrace_call (ip=2148632188,
> parent_ip=2148601660, op=<optimized out>, fregs=<optimized out>)
> at /home/zyj/Linux/linux-master/fs/pstore/ftrace.c:54
> #16 0x801dd580 in __ftrace_ops_list_func (ignored=0x0, fregs=0x0,
> parent_ip=2148601660, ip=2148632188) at
> /home/zyj/Linux/linux-master/kernel/trace/ftrace.c:7003
> #17 ftrace_ops_list_func (ip=2148632188, parent_ip=2148601660,
> op=<optimized out>, fregs=0x0) at
> /home/zyj/Linux/linux-master/kernel/trace/ftrace.c:7028
> #18 0x801109f8 in ftrace_caller () at
> /home/zyj/Linux/linux-master/arch/arm/kernel/entry-ftrace.S:224
> #19 0x80118680 in __set_fixmap (idx=idx@entry=FIX_TEXT_POKE0,
> phys=phys@entry=0, prot=prot@entry=0) at
> /home/zyj/Linux/linux-master/arch/arm/mm/mmu.c:385
> #20 0x80110f3c in patch_unmap (flags=<synthetic pointer>, fixmap=129) at
> /home/zyj/Linux/linux-master/arch/arm/kernel/patch.c:45
> #21 __patch_text_real (addr=addr@entry=0x808b3fb0
> <_raw_spin_unlock_irqrestore+16>, insn=insn@entry=3957420680,
> remap=remap@entry=true)
> at /home/zyj/Linux/linux-master/arch/arm/kernel/patch.c:104
> #22 0x80110b40 in __patch_text (insn=3957420680, addr=0x808b3fb0
> <_raw_spin_unlock_irqrestore+16>) at
> /home/zyj/Linux/linux-master/arch/arm/include/asm/patch.h:10
> #23 ftrace_modify_code (pc=2156609456, old=3904716800, new=3957420680,
> validate=true) at /home/zyj/Linux/linux-master/arch/arm/kernel/ftrace.c:95
> #24 0x80110cf4 in ftrace_make_call (rec=rec@entry=0x81007ce0,
> addr=addr@entry=2148600280) at
> /home/zyj/Linux/linux-master/arch/arm/kernel/ftrace.c:132
> #25 0x801dda40 in __ftrace_replace_code (rec=rec@entry=0x81007ce0,
> enable=enable@entry=true) at
> /home/zyj/Linux/linux-master/kernel/trace/ftrace.c:2524
> #26 0x801de17c in ftrace_replace_code (mod_flags=mod_flags@entry=1) at
> /home/zyj/Linux/linux-master/kernel/trace/ftrace.c:2554
> #27 0x801de39c in ftrace_modify_all_code (command=5) at
> /home/zyj/Linux/linux-master/kernel/trace/ftrace.c:2698
> #28 0x80110ad8 in __ftrace_modify_code (data=<optimized out>) at
> /home/zyj/Linux/linux-master/arch/arm/kernel/ftrace.c:39
> #29 0x801d4da4 in multi_cpu_stop (data=data@entry=0x81c69d84) at
> /home/zyj/Linux/linux-master/kernel/stop_machine.c:240
> #30 0x801d4a14 in cpu_stopper_thread (cpu=<optimized out>) at
> /home/zyj/Linux/linux-master/kernel/stop_machine.c:511
> #31 0x8014de3c in smpboot_thread_fn (data=0x810c1c80) at
> /home/zyj/Linux/linux-master/kernel/smpboot.c:165
> #32 0x8014a0dc in kthread (_create=0x810c1d40) at
> /home/zyj/Linux/linux-master/kernel/kthread.c:313
> #33 0x80100150 in ret_from_fork () at
> /home/zyj/Linux/linux-master/arch/arm/kernel/entry-common.S:158
> Backtrace stopped: previous frame identical to this frame (corrupt
> stack?)
>
> See above #7~#15, there is a recursion in function pstore_ftrace_call(), and
> a spin_lock(lock=0x811a0f98) is hold since unlock operation in above #11 not
> finished. Then in above #0, trying to acquire same lock cause a deadlock.
>
> Enabling 'record_ftrace' seems a basic operation of pstore/ftrace, Does it
> mean this feature is not available for a while?
>
> Best Regards,
> Zheng

--
Kees Cook

2021-06-08 19:37:31

by Joel Fernandes

[permalink] [raw]
Subject: Re: [BUG] I found a bug when try to enable record_ftrace

On Wed, Jun 02, 2021 at 10:43:29AM -0700, Kees Cook wrote:
>
> On Wed, Jun 02, 2021 at 03:42:23PM +0800, Zhengyejian (Zetta) wrote:
> > Hello,
> >
> > There may be a deadlock caused by ftrace recursion when try to enable
> > record_ftrace.
>
> Hi,
>
> Thanks for the report. Joel, is this something you can take a look at?

Kees,

Sorry for late reply. Right now I am dealing with a production-stop bug and
would not likely get time. Plus I feel a bit less motivated, because the
ftrace in pstore the way it stands now is unusable due to perf issues anyway
(I wonder if anyone uses it at all now). I would rather someone who has time
to work on this try to revive: https://lkml.org/lkml/2020/9/2/1075 . Should
we just delete pstore function tracing till then?

-Joel


> -Kees
>
> > I'd like to known if the patchset
> > (https://lore.kernel.org/lkml/[email protected]/) is
> > trying to fix it.
> >
> > Procedure for reproducing the problem is:
> > 1) this problem appears both in 5.13-rc4 and 5.10;
> >
> > 2) my work environment is:
> > qemu-arm version 4.0.0
> > arm-linux-gnueabi-gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
> >
> > 3) then try to enable record_ftrace:
> > / # mount -t pstore pstore /sys/fs/pstore
> > / # echo 1 > /sys/kernel/debug/pstore/record_ftrace
> >
> > 4) then system appears to be stuck, and use arm-linux-gnueabi-gdb dump the
> > following call stack:
> > #0 arch_spin_lock (lock=0x811a0f98) at
> > /home/zyj/Linux/linux-master/arch/arm/include/asm/spinlock.h:74
> > #1 do_raw_spin_lock_flags (flags=<synthetic pointer>, lock=0x811a0f98)
> > at /home/zyj/Linux/linux-master/include/linux/spinlock.h:195
> > #2 __raw_spin_lock_irqsave (lock=0x811a0f98) at
> > /home/zyj/Linux/linux-master/include/linux/spinlock_api_smp.h:119
> > #3 _raw_spin_lock_irqsave (lock=lock@entry=0x811a0f98) at
> > /home/zyj/Linux/linux-master/kernel/locking/spinlock.c:159
> > #4 0x8046c6e0 in buffer_size_add (prz=prz@entry=0x811a0f80,
> > a=a@entry=16) at /home/zyj/Linux/linux-master/fs/pstore/ram_core.c:82
> > #5 0x8046cc20 in persistent_ram_write (prz=0x811a0f80, s=0x81137b80,
> > count=16) at /home/zyj/Linux/linux-master/fs/pstore/ram_core.c:327
> > #6 0x8046b438 in ramoops_pstore_write (record=0x81137b90) at
> > /home/zyj/Linux/linux-master/fs/pstore/ram.c:331
> > #7 0x8046add8 in pstore_ftrace_call (ip=2156609456,
> > parent_ip=2152122068, op=<optimized out>, fregs=<optimized out>)
> > at /home/zyj/Linux/linux-master/fs/pstore/ftrace.c:54
> > #8 0x801dd580 in __ftrace_ops_list_func (ignored=0x0, fregs=0x0,
> > parent_ip=2152122068, ip=2156609456) at
> > /home/zyj/Linux/linux-master/kernel/trace/ftrace.c:7003
> > #9 ftrace_ops_list_func (ip=2156609456, parent_ip=2152122068,
> > op=<optimized out>, fregs=0x0) at
> > /home/zyj/Linux/linux-master/kernel/trace/ftrace.c:7028
> > #10 0x801109f8 in ftrace_caller () at
> > /home/zyj/Linux/linux-master/arch/arm/kernel/entry-ftrace.S:224
> > #11 0x808b3fb4 in _raw_spin_unlock_irqrestore
> > (lock=lock@entry=0x811a0f98, flags=1610612883) at
> > /home/zyj/Linux/linux-master/kernel/locking/spinlock.c:190
> > #12 0x8046c6d4 in buffer_size_add (prz=prz@entry=0x811a0f80,
> > a=a@entry=16) at /home/zyj/Linux/linux-master/fs/pstore/ram_core.c:95
> > #13 0x8046cc20 in persistent_ram_write (prz=0x811a0f80, s=0x81137cf8,
> > count=16) at /home/zyj/Linux/linux-master/fs/pstore/ram_core.c:327
> > #14 0x8046b438 in ramoops_pstore_write (record=0x81137d08) at
> > /home/zyj/Linux/linux-master/fs/pstore/ram.c:331
> > #15 0x8046add8 in pstore_ftrace_call (ip=2148632188,
> > parent_ip=2148601660, op=<optimized out>, fregs=<optimized out>)
> > at /home/zyj/Linux/linux-master/fs/pstore/ftrace.c:54
> > #16 0x801dd580 in __ftrace_ops_list_func (ignored=0x0, fregs=0x0,
> > parent_ip=2148601660, ip=2148632188) at
> > /home/zyj/Linux/linux-master/kernel/trace/ftrace.c:7003
> > #17 ftrace_ops_list_func (ip=2148632188, parent_ip=2148601660,
> > op=<optimized out>, fregs=0x0) at
> > /home/zyj/Linux/linux-master/kernel/trace/ftrace.c:7028
> > #18 0x801109f8 in ftrace_caller () at
> > /home/zyj/Linux/linux-master/arch/arm/kernel/entry-ftrace.S:224
> > #19 0x80118680 in __set_fixmap (idx=idx@entry=FIX_TEXT_POKE0,
> > phys=phys@entry=0, prot=prot@entry=0) at
> > /home/zyj/Linux/linux-master/arch/arm/mm/mmu.c:385
> > #20 0x80110f3c in patch_unmap (flags=<synthetic pointer>, fixmap=129) at
> > /home/zyj/Linux/linux-master/arch/arm/kernel/patch.c:45
> > #21 __patch_text_real (addr=addr@entry=0x808b3fb0
> > <_raw_spin_unlock_irqrestore+16>, insn=insn@entry=3957420680,
> > remap=remap@entry=true)
> > at /home/zyj/Linux/linux-master/arch/arm/kernel/patch.c:104
> > #22 0x80110b40 in __patch_text (insn=3957420680, addr=0x808b3fb0
> > <_raw_spin_unlock_irqrestore+16>) at
> > /home/zyj/Linux/linux-master/arch/arm/include/asm/patch.h:10
> > #23 ftrace_modify_code (pc=2156609456, old=3904716800, new=3957420680,
> > validate=true) at /home/zyj/Linux/linux-master/arch/arm/kernel/ftrace.c:95
> > #24 0x80110cf4 in ftrace_make_call (rec=rec@entry=0x81007ce0,
> > addr=addr@entry=2148600280) at
> > /home/zyj/Linux/linux-master/arch/arm/kernel/ftrace.c:132
> > #25 0x801dda40 in __ftrace_replace_code (rec=rec@entry=0x81007ce0,
> > enable=enable@entry=true) at
> > /home/zyj/Linux/linux-master/kernel/trace/ftrace.c:2524
> > #26 0x801de17c in ftrace_replace_code (mod_flags=mod_flags@entry=1) at
> > /home/zyj/Linux/linux-master/kernel/trace/ftrace.c:2554
> > #27 0x801de39c in ftrace_modify_all_code (command=5) at
> > /home/zyj/Linux/linux-master/kernel/trace/ftrace.c:2698
> > #28 0x80110ad8 in __ftrace_modify_code (data=<optimized out>) at
> > /home/zyj/Linux/linux-master/arch/arm/kernel/ftrace.c:39
> > #29 0x801d4da4 in multi_cpu_stop (data=data@entry=0x81c69d84) at
> > /home/zyj/Linux/linux-master/kernel/stop_machine.c:240
> > #30 0x801d4a14 in cpu_stopper_thread (cpu=<optimized out>) at
> > /home/zyj/Linux/linux-master/kernel/stop_machine.c:511
> > #31 0x8014de3c in smpboot_thread_fn (data=0x810c1c80) at
> > /home/zyj/Linux/linux-master/kernel/smpboot.c:165
> > #32 0x8014a0dc in kthread (_create=0x810c1d40) at
> > /home/zyj/Linux/linux-master/kernel/kthread.c:313
> > #33 0x80100150 in ret_from_fork () at
> > /home/zyj/Linux/linux-master/arch/arm/kernel/entry-common.S:158
> > Backtrace stopped: previous frame identical to this frame (corrupt
> > stack?)
> >
> > See above #7~#15, there is a recursion in function pstore_ftrace_call(), and
> > a spin_lock(lock=0x811a0f98) is hold since unlock operation in above #11 not
> > finished. Then in above #0, trying to acquire same lock cause a deadlock.
> >
> > Enabling 'record_ftrace' seems a basic operation of pstore/ftrace, Does it
> > mean this feature is not available for a while?
> >
> > Best Regards,
> > Zheng
>
> --
> Kees Cook

2021-06-09 18:17:56

by Kees Cook

[permalink] [raw]
Subject: Re: [BUG] I found a bug when try to enable record_ftrace

On Tue, Jun 08, 2021 at 03:16:44PM -0400, Joel Fernandes wrote:
> On Wed, Jun 02, 2021 at 10:43:29AM -0700, Kees Cook wrote:
> >
> > On Wed, Jun 02, 2021 at 03:42:23PM +0800, Zhengyejian (Zetta) wrote:
> > > Hello,
> > >
> > > There may be a deadlock caused by ftrace recursion when try to enable
> > > record_ftrace.
> >
> > Hi,
> >
> > Thanks for the report. Joel, is this something you can take a look at?
>
> Kees,
>
> Sorry for late reply. Right now I am dealing with a production-stop bug and
> would not likely get time. Plus I feel a bit less motivated, because the
> ftrace in pstore the way it stands now is unusable due to perf issues anyway
> (I wonder if anyone uses it at all now). I would rather someone who has time
> to work on this try to revive: https://lkml.org/lkml/2020/9/2/1075 . Should
> we just delete pstore function tracing till then?

If no one is using it, then yes, let's rip it out. Zetta, are you using
pstore ftrace?

-Kees

--
Kees Cook

2021-06-10 17:30:48

by Steven Rostedt

[permalink] [raw]
Subject: Re: [BUG] I found a bug when try to enable record_ftrace

On Wed, 2 Jun 2021 15:42:23 +0800
"Zhengyejian (Zetta)" <[email protected]> wrote:

> Enabling 'record_ftrace' seems a basic operation of pstore/ftrace, Does
> it mean this feature is not available for a while?

I think the update to the recursion protection of ftrace caused this to
trigger. There's a time during context switch into an interrupt and
before the context state is set where the recursion protection would
trigger when it wasn't a real recursion. To handle this, the recursion
protection allows for a single recursion to occur.

But if this protection is used to protect against grabbing a lock, then
it wont work, because a single recursion can cause a deadlock (as it
looks to be what happened here).

The below patch adds a version called ftrace_test_recursion_trylock_safe()
that goes back to the old method, and not allow any recursion at the
same context level (normal, softirq, irq and NMI), and will once again
miss the case when a trace happens at the start of an interrupt but
before the state is set to the interrupt context. But if you are
grabbing locks, you should not be tracing in that area anyway.

Does the below patch fix the issue for you?

-- Steve

diff --git a/fs/pstore/ftrace.c b/fs/pstore/ftrace.c
index 5939595f0115..941df90a4506 100644
--- a/fs/pstore/ftrace.c
+++ b/fs/pstore/ftrace.c
@@ -41,7 +41,11 @@ static void notrace pstore_ftrace_call(unsigned long ip,
if (unlikely(oops_in_progress))
return;

- bit = ftrace_test_recursion_trylock(ip, parent_ip);
+ /* Locking is not safe to be taken in NMI */
+ if (in_nmi())
+ return;
+
+ bit = ftrace_test_recursion_trylock_safe(ip, parent_ip);
if (bit < 0)
return;

diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index a9f9c5714e65..302b895d806c 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -159,8 +159,9 @@ extern void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip);
# define do_ftrace_record_recursion(ip, pip) do { } while (0)
#endif

-static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsigned long pip,
- int start, int max)
+static __always_inline int __trace_test_and_set_recursion(unsigned long ip, unsigned long pip,
+ int start, int max,
+ bool safe)
{
unsigned int val = READ_ONCE(current->trace_recursion);
int bit;
@@ -176,7 +177,7 @@ static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsign
* a switch between contexts. Allow for a single recursion.
*/
bit = TRACE_TRANSITION_BIT;
- if (val & (1 << bit)) {
+ if (safe || (val & (1 << bit))) {
do_ftrace_record_recursion(ip, pip);
return -1;
}
@@ -192,6 +193,28 @@ static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsign
return bit + 1;
}

+static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsigned long pip,
+ int start, int max)
+{
+ return __trace_test_and_set_recursion(ip, pip, start, max, false);
+}
+
+/*
+ * The safe version does not let any recursion happen.
+ * The unsafe version will allow for a single recursion to deal with
+ * the period during a context switch from normal to interrupt to NMI
+ * that may be in the wrong context. But if the caller is expecting
+ * this to be safe for grabbing locks, it must use the safe version
+ * otherwise it could cause a deadlock. But it may still miss events
+ * in the period of context switches, but if it is grabbing locks
+ * it shouldn't be tracing in that period anyway.
+ */
+static __always_inline int trace_test_and_set_recursion_safe(unsigned long ip, unsigned long pip,
+ int start, int max)
+{
+ return __trace_test_and_set_recursion(ip, pip, start, max, true);
+}
+
static __always_inline void trace_clear_recursion(int bit)
{
if (!bit)
@@ -208,6 +231,14 @@ static __always_inline void trace_clear_recursion(int bit)
* Use this for ftrace callbacks. This will detect if the function
* tracing recursed in the same context (normal vs interrupt),
*
+ * Note, this may allow one nested level of recursion, because of the
+ * way interrupts are tracked. If a trace happens at the start of
+ * an interrupt before the interrupts state is set, it needs to allow
+ * one recursion to handle this case.
+ *
+ * If this is used to protect locks, use the
+ * ftrace_test_recursion_trylock_safe() version instead.
+ *
* Returns: -1 if a recursion happened.
* >= 0 if no recursion
*/
@@ -216,6 +247,29 @@ static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
{
return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
}
+/**
+ * ftrace_test_recursion_trylock_safe - tests for recursion in same context
+ *
+ * Use this for ftrace callbacks. This will detect if the function
+ * tracing recursed in the same context (normal vs interrupt),
+ *
+ * Use this version if you depend on it for grabbing locks.
+ * You should also not be tracing in NMI either.
+ *
+ * This version is the same as the ftrace_test_recursion_trylock() except that
+ * it does not allow any recursion. It may produce a false positive if
+ * a trace occurs at the start of an interrupt but before the interrupt
+ * state is set. Any event that happens in that case will be considered
+ * a "recursion".
+ *
+ * Returns: -1 if a recursion happened.
+ * >= 0 if no recursion
+ */
+static __always_inline int ftrace_test_recursion_trylock_safe(unsigned long ip,
+ unsigned long parent_ip)
+{
+ return trace_test_and_set_recursion_safe(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
+}

/**
* ftrace_test_recursion_unlock - called when function callback is complete