2023-02-23 06:49:39

by Peng Fan

[permalink] [raw]
Subject: Fail to freeze process

Hi kernel experts,

I am facing a suspend/resume issue with linux on top of jailhouse hypervisor on
ARM64 platform with 6.1 kernel.
Actually without enabling jailhouse hypervisor, the kernel suspend/resume well.
So it should be the jailhouse hypervisor introduce some interrupt/timer or else
bug cause this issue. But I have no idea for now what bug may introduce such
issue. So I wanna narrow and debug from linux side see why freeze time, then
move into jailhouse hypervisor to fix it.

I have try to enlarge freeze time to 90s, still has similar issue, process freeze
failure, the issue not happen every time, but after a few round suspend/resume,
it triggers. And the cpu running the process has a very large timer expiration value.
Even I use jtag to trigger the timer interrupt, the cpu runs into idle again.

I see the process has flag 0xa05, it has SIG Pending, but not sure why it could
not freeze.

Seems I have no idea to wakeup the cpu from idle and let it schedule.

Hope you have any ideas.

---- Running < /unit_tests/SRTC/rtcwakeup.out > test ----

rtcwakeup.[ 1153.430758] PM: suspend entry (deep)
out: wakeup from "mem" using rtc0[ 1153.435689] Filesystems sync: 0.000 seconds
at Fri Jan 2 00:20:51 1970
[ 1153.487507] Freezing user space processes ...
[ 1173.495070] Freezing of tasks failed after 20.003 seconds (1 tasks refusing to freeze, wq_busy=0):
[ 1173.504091] task:systemd-userwor state:R stack:0 pid:1563 ppid:588 flags:0x00000a05
[ 1173.512457] Call trace:
[ 1173.514909] __switch_to+0xf0/0x170
[ 1173.518416] __schedule+0x28c/0x710
[ 1173.521916] schedule+0x5c/0xd0
[ 1173.525064] schedule_timeout+0x8c/0x100
[ 1173.528996] __skb_wait_for_more_packets+0x128/0x190
[ 1173.533975] __skb_recv_datagram+0x80/0xe0
[ 1173.538081] skb_recv_datagram+0x34/0x90
[ 1173.542014] unix_accept+0xa0/0x1c0
[ 1173.545511] do_accept+0x114/0x190
[ 1173.548916] __sys_accept4+0x70/0xe4
[ 1173.552503] __arm64_sys_accept4+0x20/0x30
[ 1173.556609] invoke_syscall+0x48/0x114
[ 1173.560368] el0_svc_common.constprop.0+0xcc/0xec
[ 1173.565085] do_el0_svc+0x2c/0xd0
[ 1173.568412] el0_svc+0x2c/0x84
[ 1173.571472] el0t_64_sync_handler+0xf4/0x120
[ 1173.575752] el0t_64_sync+0x18c/0x190
[ 1173.579434]
[ 1173.580947] OOM killer enabled.
[ 1173.584095] Restarting tasks ... done.
[ 1173.589831] random: crng reseeded on system resumption
[ 1173.595422] PM: suspend exit
write /sys/power/state: Device or resource busy
===============================
suspend 57 times
===============================

Thanks,
Peng.