by Naveen N. Rao

[permalink] [raw]

Subject: Re: POWER9 crash due to STRICT_KERNEL_RWX (WAS: Re: Linux-next POWER9 NULL pointer NIP...)

Qian Cai wrote:
>
>
>> On Apr 17, 2020, at 3:01 AM, Naveen N. Rao <[email protected]> wrote:
>>
>> Hi Qian,
>>
>> Qian Cai wrote:
>>> OK, reverted the commit,
>>> c55d7b5e6426 (“powerpc: Remove STRICT_KERNEL_RWX incompatibility with RELOCATABLE”)
>>> or set STRICT_KERNEL_RWX=n fixed the crash below and also mentioned in this thread,
>>> https://lore.kernel.org/lkml/[email protected]/
>>
>> Do you see any errors logged in dmesg when you see the crash? STRICT_KERNEL_RWX changes how patch_instruction() works, so it would be interesting to see if there are any ftrace-related errors thrown before the crash.
>
> Yes, looks like there is a warning right after,
>
> echo function > /sys/kernel/debug/tracing/current_tracer
> echo nop > /sys/kernel/debug/tracing/current_tracer
>
> and just before the crash,
>
> [ T3454] ftrace-powerpc: Unexpected call sequence at 00000000de85f044: 48003d1d 7c0802a6
> [ 56.870472][ T3454] ------------[ cut here ]------------
> [ 56.870500][ T3454] WARNING: CPU: 52 PID: 3454 at kernel/trace/ftrace.c:2026 ftrace_bug+0x104/0x310
> [ 56.870527][ T3454] Modules linked in: kvm_hv kvm ses enclosure scsi_transport_sas ip_tables x_tables xfs sd_mod i40e firmware_class aacraid dm_mirror dm_region_hash dm_log dm_mod
> [ 56.870592][ T3454] CPU: 52 PID: 3454 Comm: nip.sh Not tainted 5.7.0-rc1-next-20200416 #4
> [ 56.870627][ T3454] NIP: c0000000002a3ae4 LR: c0000000002a47fc CTR: c0000000002436f0
> [ 56.870661][ T3454] REGS: c00000069a9ef710 TRAP: 0700 Not tainted (5.7.0-rc1-next-20200416)
> [ 56.870697][ T3454] MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 28228222 XER: 00000000
> [ 56.870748][ T3454] CFAR: c0000000002a3a2c IRQMASK: 0
> [ 56.870748][ T3454] GPR00: c0000000002a47fc c00000069a9ef9a0 c0000000012f9000 ffffffffffffffea
> [ 56.870748][ T3454] GPR04: c0002004e2160438 c0000007fedf0ad8 00000000614ca19d 0000000000000007
> [ 56.870748][ T3454] GPR08: 0000000000000003 0000000000000000 0000000000000000 0000000000000002
> [ 56.870748][ T3454] GPR12: 0000000000004000 c0000007fffd5600 0000000040000000 0000000139ae9798
> [ 56.870748][ T3454] GPR16: 0000000139ae9724 0000000139a86968 0000000139a1f230 0000000139aed568
> [ 56.870748][ T3454] GPR20: 00000001402af8b0 0000000000000009 0000000139a996e8 00007fffc9186d94
> [ 56.870748][ T3454] GPR24: 0000000000000000 c00000069a9efc00 c00000000132cd00 c00000069a9efc40
> [ 56.870748][ T3454] GPR28: c0000000011c29e8 0000000000000001 c0002004e2160438 c008000009321a64
> [ 56.870969][ T3454] NIP [c0000000002a3ae4] ftrace_bug+0x104/0x310
> ftrace_bug at kernel/trace/ftrace.c:2026
> [ 56.870995][ T3454] LR [c0000000002a47fc] ftrace_modify_all_code+0x16c/0x210
> ftrace_modify_all_code at kernel/trace/ftrace.c:2672
> [ 56.871034][ T3454] Call Trace:
> [ 56.871057][ T3454] [c00000069a9ef9a0] [4bffff899a9efa00] 0x4bffff899a9efa00 (unreliable)
> [ 56.871086][ T3454] [c00000069a9efa20] [c0000000002a47fc] ftrace_modify_all_code+0x16c/0x210
> [ 56.871125][ T3454] [c00000069a9efa50] [c000000000061b68] arch_ftrace_update_code+0x18/0x30
> [ 56.871162][ T3454] [c00000069a9efa70] [c0000000002a49c4] ftrace_run_update_code+0x44/0xc0
> [ 56.871199][ T3454] [c00000069a9efaa0] [c0000000002aa3c8] ftrace_startup+0xe8/0x1b0
> [ 56.871236][ T3454] [c00000069a9efae0] [c0000000002aa4e0] register_ftrace_function+0x50/0xc0
> [ 56.871275][ T3454] [c00000069a9efb10] [c0000000002d0468] function_trace_init+0x98/0xd0
> [ 56.871312][ T3454] [c00000069a9efb40] [c0000000002c75c0] tracing_set_tracer+0x350/0x640
> [ 56.871349][ T3454] [c00000069a9efbe0] [c0000000002c7a90] tracing_set_trace_write+0x1e0/0x370
> [ 56.871388][ T3454] [c00000069a9efd00] [c00000000052094c] __vfs_write+0x3c/0x70
> [ 56.871424][ T3454] [c00000069a9efd20] [c000000000523d4c] vfs_write+0xcc/0x200
> [ 56.871461][ T3454] [c00000069a9efd70] [c0000000005240ec] ksys_write+0x7c/0x140
> [ 56.871498][ T3454] [c00000069a9efdc0] [c000000000038a94] system_call_exception+0x114/0x1e0
> [ 56.871535][ T3454] [c00000069a9efe20] [c00000000000c870] system_call_common+0xf0/0x278
> [ 56.871570][ T3454] Instruction dump:
> [ 56.871592][ T3454] 7d908120 4e800020 60000000 2b890001 409effd4 3c62ff8b 38631958 4bf4491d
> [ 56.871639][ T3454] 60000000 4bffffc0 60000000 fba10068 <0fe00000> 39000001 3ce20003 3d22fed7
> [ 56.871685][ T3454] irq event stamp: 95388
> [ 56.871708][ T3454] hardirqs last enabled at (95387): [<c0000000001e4f94>] console_unlock+0x6a4/0x950
> [ 56.871746][ T3454] hardirqs last disabled at (95388): [<c00000000000960c>] program_check_common_virt+0x2bc/0x310
> [ 56.871785][ T3454] softirqs last enabled at (91222): [<c000000000a213c8>] __do_softirq+0x658/0x8d8
> [ 56.871823][ T3454] softirqs last disabled at (91215): [<c00000000011b40c>] irq_exit+0x16c/0x1d0
> [ 56.871859][ T3454] ---[ end trace 48f8445450a4e206 ]---
> [ 56.871907][ T3454] ftrace failed to modify
> [ 56.871913][ T3454] [<c008000009321a64>] show_sas_rphy_phy_identifier+0xc/0x60 [scsi_transport_sas]
> show_sas_rphy_phy_identifier at drivers/scsi/scsi_transport_sas.c:1221
> [ 56.871969][ T3454] actual: 1d:3d:00:48
> [ 56.871996][ T3454] Setting ftrace call site to call ftrace function
> [ 56.872020][ T3454] ftrace record flags: 80000001
> [ 56.872054][ T3454] (1)
> [ 56.872054][ T3454] expected tramp: c000000000061fac

Thanks. That confirms the problem.

In this particular case, we are unable to convert a module function to
call into ftrace_caller() since the _mcount() entry has not been nop-ed
out during module load. I suspect there might be another error/warning
earlier on indicating that failure. There are a few scenarios where we
are not propagating errors from patch_instruction() properly, so it is
possible that no errors were thrown previously. I will send a separate
set of patches to address that.

The reason for the original crash is a bit more subtle. On module load,
we patch _mcount() call sites of all the module functions with a nop.
However, with STRICT_KERNEL_RWX, one of these is failing. When that
happens, we end up continuing to call into _mcount(), which uses the
default module relocation stub, and not the special ftrace stub. The
default stub uses r2, which won't always be loaded with the module TOC
pointer with -mprofile-kernel. Instead, r2 likely points to the kernel
TOC, so we end up loading incorrect entries from the kernel TOC to
branch to. In all the traces we've seen so far, a kernel function has
called into a module function, wherein the module function does not set
up its own TOC (ext4_iomap_end(), find_free_cb()).

The core of the problem though is that patch_instruction() is failing in
certain scenarios. We need to find out why that is, and address that.

Russell,
You mentioned in the past that you identified an issue with
patch_instruction() during early boot:
http://lkml.kernel.org/r/[email protected]

Does that failure happen without STRICT_MODULE_RWX, with just
STRICT_KERNEL_RWX? That might be relevant here.

- Naveen