2016-03-29 22:05:24

by Yury Norov

[permalink] [raw]
Subject: arm64: kernel v4.6-rc1 hangs on QEMU

Hi,

Checked for both v4.6-rc1 and current master (1993b17).
Config: arm64 defconfig
QEMU: QEMU emulator version 2.3.0 (Debian 1:2.3+dfsg-5ubuntu9.2)

Stacktrace:
#0 arch_counter_get_cntvct () at
./arch/arm64/include/asm/arch_timer.h:121
#1 __delay (cycles=1024) at arch/arm64/lib/delay.c:31
#2 0xffffff8008340970 in __const_udelay (xloops=<optimized out>) at
arch/arm64/lib/delay.c:41
#3 0xffffff800815420c in panic (fmt=<optimized out>) at
kernel/panic.c:257
#4 0xffffff80080be588 in do_exit (code=11) at kernel/exit.c:666
#5 0xffffff8008089d08 in die (str=<optimized out>, regs=0xffffff8008aebe20 <init_thread_union+15904>, err=143867376) at arch/arm64/kernel/traps.c:298
#6 0xffffff8008089dec in arm64_notify_die (str=<optimized out>, regs=<optimized out>, info=<optimized out>, err=<optimized out>) at arch/arm64/kernel/traps.c:309
#7 0xffffff800808212c in do_undefinstr (regs=0xffffff8008aebe20 <init_thread_union+15904>) at arch/arm64/kernel/traps.c:399
#8 0xffffff8008a1fe08 in cpuinfo_store_boot_cpu () at arch/arm64/kernel/cpuinfo.c:252
#9 0xffffff8008a1fe08 in cpuinfo_store_boot_cpu () at arch/arm64/kernel/cpuinfo.c:252
#10 0xffffff8008a20388 in smp_prepare_boot_cpu () at arch/arm64/kernel/smp.c:403
#11 0xffffff8008a1d6ec in start_kernel () at init/main.c:511
#12 0xffffff80080811d8 in __mmap_switched () at arch/arm64/kernel/head.S:437
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Corefile can be found at:
https://drive.google.com/file/d/0B93nHerV55yNdFp5em54TEVnU2c/view?usp=sharing

Yury.


2016-03-29 22:13:05

by Arnd Bergmann

[permalink] [raw]
Subject: Re: arm64: kernel v4.6-rc1 hangs on QEMU

On Wednesday 30 March 2016 01:05:02 Yury Norov wrote:
> Checked for both v4.6-rc1 and current master (1993b17).
> Config: arm64 defconfig
> QEMU: QEMU emulator version 2.3.0 (Debian 1:2.3+dfsg-5ubuntu9.2)
>
> Stacktrace:
> #0 arch_counter_get_cntvct () at
> ./arch/arm64/include/asm/arch_timer.h:121
> #1 __delay (cycles=1024) at arch/arm64/lib/delay.c:31
> #2 0xffffff8008340970 in __const_udelay (xloops=<optimized out>) at
> arch/arm64/lib/delay.c:41
> #3 0xffffff800815420c in panic (fmt=<optimized out>) at
> kernel/panic.c:257
> #4 0xffffff80080be588 in do_exit (code=11) at kernel/exit.c:666
> #5 0xffffff8008089d08 in die (str=<optimized out>, regs=0xffffff8008aebe20 <init_thread_union+15904>, err=143867376) at arch/arm64/kernel/traps.c:298
> #6 0xffffff8008089dec in arm64_notify_die (str=<optimized out>, regs=<optimized out>, info=<optimized out>, err=<optimized out>) at arch/arm64/kernel/traps.c:309
> #7 0xffffff800808212c in do_undefinstr (regs=0xffffff8008aebe20 <init_thread_union+15904>) at arch/arm64/kernel/traps.c:399
> #8 0xffffff8008a1fe08 in cpuinfo_store_boot_cpu () at arch/arm64/kernel/cpuinfo.c:252
> #9 0xffffff8008a1fe08 in cpuinfo_store_boot_cpu () at arch/arm64/kernel/cpuinfo.c:252
> #10 0xffffff8008a20388 in smp_prepare_boot_cpu () at arch/arm64/kernel/smp.c:403
> #11 0xffffff8008a1d6ec in start_kernel () at init/main.c:511
> #12 0xffffff80080811d8 in __mmap_switched () at arch/arm64/kernel/head.S:437
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>

Undefined instruction in cpuinfo_store_boot_cpu() could be related
to the SYS_ID_AA64MMFR2_EL1 access that was recently added.

What does the architecture say about reading unknown cpuid registers?

Arnd

2016-03-29 22:22:38

by Yury Norov

[permalink] [raw]
Subject: Re: arm64: kernel v4.6-rc1 hangs on QEMU

On Wed, Mar 30, 2016 at 12:12:30AM +0200, Arnd Bergmann wrote:
> On Wednesday 30 March 2016 01:05:02 Yury Norov wrote:
> > Checked for both v4.6-rc1 and current master (1993b17).
> > Config: arm64 defconfig
> > QEMU: QEMU emulator version 2.3.0 (Debian 1:2.3+dfsg-5ubuntu9.2)
> >
> > Stacktrace:
> > #0 arch_counter_get_cntvct () at
> > ./arch/arm64/include/asm/arch_timer.h:121
> > #1 __delay (cycles=1024) at arch/arm64/lib/delay.c:31
> > #2 0xffffff8008340970 in __const_udelay (xloops=<optimized out>) at
> > arch/arm64/lib/delay.c:41
> > #3 0xffffff800815420c in panic (fmt=<optimized out>) at
> > kernel/panic.c:257
> > #4 0xffffff80080be588 in do_exit (code=11) at kernel/exit.c:666
> > #5 0xffffff8008089d08 in die (str=<optimized out>, regs=0xffffff8008aebe20 <init_thread_union+15904>, err=143867376) at arch/arm64/kernel/traps.c:298
> > #6 0xffffff8008089dec in arm64_notify_die (str=<optimized out>, regs=<optimized out>, info=<optimized out>, err=<optimized out>) at arch/arm64/kernel/traps.c:309
> > #7 0xffffff800808212c in do_undefinstr (regs=0xffffff8008aebe20 <init_thread_union+15904>) at arch/arm64/kernel/traps.c:399
> > #8 0xffffff8008a1fe08 in cpuinfo_store_boot_cpu () at arch/arm64/kernel/cpuinfo.c:252
> > #9 0xffffff8008a1fe08 in cpuinfo_store_boot_cpu () at arch/arm64/kernel/cpuinfo.c:252
> > #10 0xffffff8008a20388 in smp_prepare_boot_cpu () at arch/arm64/kernel/smp.c:403
> > #11 0xffffff8008a1d6ec in start_kernel () at init/main.c:511
> > #12 0xffffff80080811d8 in __mmap_switched () at arch/arm64/kernel/head.S:437
> > Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> >
>
> Undefined instruction in cpuinfo_store_boot_cpu() could be related
> to the SYS_ID_AA64MMFR2_EL1 access that was recently added.
>
> What does the architecture say about reading unknown cpuid registers?
>
> Arnd

ThunderX has some unimplemented system registers. AFAIR, attempt to access it
causes data abort.

Yury.

2016-03-29 22:33:18

by Arnd Bergmann

[permalink] [raw]
Subject: Re: arm64: kernel v4.6-rc1 hangs on QEMU

On Wednesday 30 March 2016 01:22:17 Yury Norov wrote:
> >
> > Undefined instruction in cpuinfo_store_boot_cpu() could be related
> > to the SYS_ID_AA64MMFR2_EL1 access that was recently added.
> >
> > What does the architecture say about reading unknown cpuid registers?
> >
> > Arnd
>
> ThunderX has some unimplemented system registers. AFAIR, attempt to access it
> causes data abort.

Ok, if that is the case, maybe the read_cpuid() macro can be changed
so it contains a fixup for the trap? That should handle both data abort
and undefinstr.

Arnd

2016-03-29 22:52:38

by Yury Norov

[permalink] [raw]
Subject: Re: arm64: kernel v4.6-rc1 hangs on QEMU

On Wed, Mar 30, 2016 at 12:32:42AM +0200, Arnd Bergmann wrote:
> On Wednesday 30 March 2016 01:22:17 Yury Norov wrote:
> > >
> > > Undefined instruction in cpuinfo_store_boot_cpu() could be related
> > > to the SYS_ID_AA64MMFR2_EL1 access that was recently added.
> > >
> > > What does the architecture say about reading unknown cpuid registers?
> > >
> > > Arnd
> >
> > ThunderX has some unimplemented system registers. AFAIR, attempt to access it
> > causes data abort.
>
> Ok, if that is the case, maybe the read_cpuid() macro can be changed
> so it contains a fixup for the trap? That should handle both data abort
> and undefinstr.
>
> Arnd

Sounds alluring, but not clear what we'd return that way. I mean, how
we'd distinguish between correct value and error code (0, -1 or whatever).
But I think, we can do like this:

val = read_cpuid_safe(reg, impossible_val);
if (val == impossible_val)
goto err;

I think it will work for many cases.

Yury.

2016-03-30 06:44:57

by Kefeng Wang

[permalink] [raw]
Subject: Re: arm64: kernel v4.6-rc1 hangs on QEMU



On 2016/3/30 6:52, Yury Norov wrote:
> On Wed, Mar 30, 2016 at 12:32:42AM +0200, Arnd Bergmann wrote:
>> On Wednesday 30 March 2016 01:22:17 Yury Norov wrote:
>>>>
>>>> Undefined instruction in cpuinfo_store_boot_cpu() could be related
>>>> to the SYS_ID_AA64MMFR2_EL1 access that was recently added.
>>>>

please use new qemu with
commit e20d84c1407d43d5a2e2ac95dbb46db3b0af8f9f
Author: Peter Maydell <[email protected]>
Date: Fri Feb 19 14:07:43 2016 +0000

target-arm: Make reserved ranges in ID_AA64* spaces RAZ, not UNDEF

The v8 ARM ARM defines that unused spaces in the ID_AA64* system
register ranges are Reserved and must RAZ, rather than being UNDEF.
Implement this.

In particular, ARM v8.2 adds a new feature register ID_AA64MMFR2,
and newer versions of the Linux kernel will attempt to read this,
which causes them not to boot up on versions of QEMU missing this fix.

Since the encoding .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 2, .opc2 = 6
is actually defined in ARMv8 (as ID_MMFR4), we give it an entry in
the ARMCPU struct so CPUs can override it, though since none do
this too will just RAZ.


see https://lists.gnu.org/archive/html/qemu-devel/2016-02/msg04574.html

>>>> What does the architecture say about reading unknown cpuid registers?
>>>>
>>>> Arnd
>>>
>>> ThunderX has some unimplemented system registers. AFAIR, attempt to access it
>>> causes data abort.
>>
>> Ok, if that is the case, maybe the read_cpuid() macro can be changed
>> so it contains a fixup for the trap? That should handle both data abort
>> and undefinstr.
>>
>> Arnd
>
> Sounds alluring, but not clear what we'd return that way. I mean, how
> we'd distinguish between correct value and error code (0, -1 or whatever).
> But I think, we can do like this:
>
> val = read_cpuid_safe(reg, impossible_val);
> if (val == impossible_val)
> goto err;
>
> I think it will work for many cases.
>
> Yury.
>
>