System hung can be reproduced on qemu and real hardware using:
# perf test -v signal
If qemu is started with '-smp 1', system hung. In real hardware and in
qemu with smp > 1, the result is:
# /perf test -v signal
17: Test breakpoint overflow signal handler :
--- start ---
test child forked, pid 792
count1 11, count2 11, overflow 11
failed: RF EFLAG recursion issue detected
failed: wrong overflow hit
failed: wrong count for bp2
test child finished with -1
---- end ----
Test breakpoint overflow signal handler: FAILED!
Looks like something like [1] is required for ARM64.
Some analysis is done with qemu:
This testcase tests the intertaction between breakpoint, perf_event
and signal handling. It installs a breakpoint at the enter of a
function and makes the corresponding perf_event generate SIGIO when
the event raise.
When perf_event on a async perf_event is triggered:
if (*perf_event_fasync(event) && event->pending_kill) {
event->pending_wakeup = 1;
irq_work_queue(&event->pending);
}
it calls irq_work_queue(&event->pending), which is used to fire a
poll event and SIGIO. Later when perf_event is closed, in _free_event
irq_work_sync(&event->pending) is called to ensure all irq_work is done.
On ARM64, if we have only 1 cpu, the system hung at irq_work_sync().
Using gdb attached, I see:
1. IRQ is not disabled. Inside irq_work_sync, result of
arch_local_save_flags()
is 0x140.
2. hrtimer_interrupt() is still generated. The system is not dead.
3. In irq_work_tick, we have a chance to process irq_work. However,
llist_empty(raised) is false but arch_irq_work_has_interrupt()
is true, so kernel only process lazy_list.
4. handle_IPI() is never called, so I guess the IPI is disabled by
breakpoint
and not restored in this case.
[1]
http://lkml.kernel.org/r/[email protected]
On Mon, Dec 21, 2015 at 08:56:03PM +0800, Wangnan (F) wrote:
> System hung can be reproduced on qemu and real hardware using:
>
> # perf test -v signal
>
> If qemu is started with '-smp 1', system hung. In real hardware and in
> qemu with smp > 1, the result is:
That sounds like a qemu issue...
> # /perf test -v signal
> 17: Test breakpoint overflow signal handler :
> --- start ---
> test child forked, pid 792
> count1 11, count2 11, overflow 11
> failed: RF EFLAG recursion issue detected
> failed: wrong overflow hit
> failed: wrong count for bp2
> test child finished with -1
> ---- end ----
> Test breakpoint overflow signal handler: FAILED!
... and this sounds like the perf tool expecting to single-step over
signal handlers, whereas (on arm64) we deliberately single-step into
them, so you can run into a loop for this case.
Will
On 2015/12/21 22:01, Will Deacon wrote:
> On Mon, Dec 21, 2015 at 08:56:03PM +0800, Wangnan (F) wrote:
>
[SNIP]
>> # /perf test -v signal
>> 17: Test breakpoint overflow signal handler :
>> --- start ---
>> test child forked, pid 792
>> count1 11, count2 11, overflow 11
>> failed: RF EFLAG recursion issue detected
>> failed: wrong overflow hit
>> failed: wrong count for bp2
>> test child finished with -1
>> ---- end ----
>> Test breakpoint overflow signal handler: FAILED!
> ... and this sounds like the perf tool expecting to single-step over
> signal handlers, whereas (on arm64) we deliberately single-step into
> them, so you can run into a loop for this case.
Could you please provide me more information? Do you think it is
a kernel bug and should be fixed? Or it is an invalid testcase for
aarch64?
Thank you.
> Will