2017-06-20 07:19:44

by Abdul Haleem

[permalink] [raw]
Subject: [BUG][next-20170619][347de24] PowerPC boot fails with Oops

Hi,

commit: 347de24 (powerpc/64s: implement arch-specific hardlockup
watchdog)

linux-next fails to boot on PowerPC Bare-metal box.

Test: boot
Machine type: Power 8 Bare-metal
Kernel: 4.12.0-rc5-next-20170619
gcc: version 4.8.5


In file arch/powerpc/kernel/watchdog.c

void soft_nmi_interrupt(struct pt_regs *regs)
{
unsigned long flags;
int cpu = raw_smp_processor_id();
u64 tb;

if (!cpumask_test_cpu(cpu, &wd_cpus_enabled))
return;

>>> nmi_enter();
tb = get_tb();



commit 347de24231df9f82969e2de3ad9f6976f1856a0f
Author: Nicholas Piggin <[email protected]>
Date: Sat Jun 17 09:33:56 2017 +1000

powerpc/64s: implement arch-specific hardlockup watchdog

Implement an arch-speicfic watchdog rather than use the perf-based
hardlockup detector.

The new watchdog takes the soft-NMI directly, rather than going
through
perf. Perf interrupts are to be made maskable in future, so that
would
prevent the perf detector from working in those regions.



boot logs:
----------
cpuidle: using governor menu
pstore: using zlib compression
pstore: Registered nvram as persistent store backend
------------[ cut here ]------------
kernel BUG at arch/powerpc/kernel/watchdog.c:206!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=2048
NUMA
PowerNV
Modules linked in:
CPU: 67 PID: 0 Comm: swapper/67 Not tainted 4.12.0-rc5-next-20170619 #1
task: c000000f272be700 task.stack: c000000f2736c000
NIP: c00000000002c5fc LR: c00000000002c5e8 CTR: c00000000016f570
REGS: c00000003fcd7a00 TRAP: 0700 Not tainted
(4.12.0-rc5-next-20170619)
MSR: 9000000000021033 <SF,HV,ME,IR,DR,RI,LE>
CR: 22004022 XER: 20000000
CFAR: c000000000149c6c SOFTE: 0
GPR00: c00000000002c5e8 c00000003fcd7c80 c00000000105e900
0000000000000000
GPR04: 0000000000000000 0000000000073388 c000000fff7cf014
0000000000000000
GPR08: 0000000ffea90000 0000000000100000 0000000040000000
0000000000000000
GPR12: 9000000000009033 c00000000fd57080 c000000f2736ff90
0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000040376a80
0000000040376ac8
GPR20: c000000ffe630000 0000000000000001 0000000000000002
0000000000000000
GPR24: 0000000000000000 c000000f2736c000 c000000f2736c080
0000000000000008
GPR28: c00000003fcd7d80 0000000000000003 0000000000000008
0000000000000043
NIP [c00000000002c5fc] soft_nmi_interrupt+0x9c/0x2e0
LR [c00000000002c5e8] soft_nmi_interrupt+0x88/0x2e0
Call Trace:
Instruction dump:
eba1ffe8 ebc1fff0 ebe1fff8 7c0803a6 4e800020 7c7c1b78 4811d615 60000000
78290464 8129000c 552902d6 79290020 <0b090000> 78290464 8149000c
3d4a0011
------------[ cut here ]------------
kernel BUG at arch/powerpc/kernel/watchdog.c:206!
------------[ cut here ]------------
kernel BUG at arch/powerpc/kernel/watchdog.c:206!
------------[ cut here ]------------
kernel BUG at arch/powerpc/kernel/watchdog.c:206!
------------[ cut here ]------------
kernel BUG at arch/powerpc/kernel/watchdog.c:206!
random: print_oops_end_marker+0x6c/0xa0 get_random_bytes called with
crng_init=0
---[ end trace 9756c1a885c69f33 ]---
--


Regard's

Abdul Haleem
IBM Linux Technology Centre



Attachments:
bootlogs.txt (30.30 kB)
Tul-NV-config (84.68 kB)
Download all attachments

2017-06-20 11:44:34

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [BUG][next-20170619][347de24] PowerPC boot fails with Oops

On Tue, 20 Jun 2017 12:49:25 +0530
Abdul Haleem <[email protected]> wrote:

> Hi,
>
> commit: 347de24 (powerpc/64s: implement arch-specific hardlockup
> watchdog)
>
> linux-next fails to boot on PowerPC Bare-metal box.
>
> Test: boot
> Machine type: Power 8 Bare-metal
> Kernel: 4.12.0-rc5-next-20170619
> gcc: version 4.8.5
>
>
> In file arch/powerpc/kernel/watchdog.c
>
> void soft_nmi_interrupt(struct pt_regs *regs)
> {
> unsigned long flags;
> int cpu = raw_smp_processor_id();
> u64 tb;
>
> if (!cpumask_test_cpu(cpu, &wd_cpus_enabled))
> return;
>
> >>> nmi_enter();

Thanks for the report.

This is due to emergency stacks not zeroing preempt_count, so they get
garbage here, and it just trips the BUG_ON(in_nmi()) check.

Don't think it's a bug in the proposed new powerpc watchdog. (at least
I was able to reproduce your bug and fix it by fixing the stack init).

Thanks,
Nick