2021-03-29 16:51:06

by Oleg Nesterov

[permalink] [raw]
Subject: Re: Mutual debugging of 2 processes can stuck in unkillable stopped state

On 03/29, Igor Zhbanov wrote:
>
> Mutual debugging of 2 processes can stuck in unkillable stopped state

can't reproduce and can't understand...

> Hi!
>
> When one process, let's say "A", is tracing the another process "B", and the
> process "B" is trying to attach to the process "A", then both of them are
> getting stuck in the "t+" state. And they are ignoring all of the signals
> including the SIGKILL,

Why do you think so? What is your kernel version?

"t" means TASK_TRACED, SIGKILL should wake it up and terminate.

> so it is not possible to terminate them without
> a reboot.
>
> To reproduce:
> 1) Run two terminals
> 2) Attach with "strace -p ..." from the first terminal to the shell (bash) of
> the second terminal.
> 3) In the second terminal run "exec strace -p ..." to attach to the PID of the
> first strace.
>
> Then you'll see that the second strace is hanging without any output. And the
> first strace will output following and hang too:
> ptrace(PTRACE_SEIZE, 11795, NULL,
> PTRACE_O_TRACESYSGOOD|PTRACE_O_TRACEEXEC|PTRACE_O_TRACEEXIT
>
> (The 11795 is the PID of the first strace itself.)
>
> And in the process list you will see following:
> ps awux | grep strace
> user 11776 0.0 0.0 24752 2248 pts/3 t+ 13:53 0:00 strace -p 11795
> user 11795 0.0 0.0 24752 3888 pts/1 t+ 13:54 0:00 strace -p 11776

OK, may be they sleep in PTRACE_EVENT_EXIT? After you tried to send SIGKILL?

please show us the output from "cat /proc/{11795,11776}/stack". And
"cat /proc/{11795,11776}/status" just in case.

Oleg.


2021-03-29 17:14:31

by Igor Zhbanov

[permalink] [raw]
Subject: Re: Mutual debugging of 2 processes can stuck in unkillable stopped state

Hi Oleg!

I've tried both 5.3.18 and 5.10.0. The behavior is the same.
The important thing is to run "exec strace -p ..." on the second terminal
to create the loop A->B->A.

So the last line from the first strace we see is:
ptrace(PTRACE_SEIZE, 1990, NULL, PTRACE_O_TRACESYSGOOD|PTRACE_O_TRACEEXEC|PTRACE_O_TRACEEXIT

I.e. it printed the syscall prior to its execution and hanged after the
execution.

izh@suse2:~> ps awux|grep strace
izh 1891 0.0 0.0 24752 3828 pts/1 ts+ 19:52 0:00 strace -p 1990
izh 1990 0.0 0.0 24752 3628 pts/0 t+ 19:53 0:00 strace -p 1891

izh@suse2:~> kill 1990 1891
izh@suse2:~> kill -9 1990 1891

izh@suse2:~> sudo cat /proc/1891/stack
[sudo] password for root:
[<0>] ptrace_stop+0x14a/0x260
[<0>] ptrace_do_notify+0x91/0xb0
[<0>] ptrace_notify+0x4e/0x70
[<0>] do_exit+0x910/0xb70
[<0>] do_group_exit+0x3a/0xa0
[<0>] get_signal+0x124/0x800
[<0>] arch_do_signal_or_restart+0xa9/0x290
[<0>] exit_to_user_mode_prepare+0xe7/0x1a0
[<0>] syscall_exit_to_user_mode+0x18/0x40
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

izh@suse2:~> sudo cat /proc/1990/stack
[<0>] ptrace_stop+0x14a/0x260
[<0>] ptrace_do_notify+0x91/0xb0
[<0>] ptrace_notify+0x4e/0x70
[<0>] do_exit+0x910/0xb70
[<0>] do_group_exit+0x3a/0xa0
[<0>] get_signal+0x124/0x800
[<0>] arch_do_signal_or_restart+0xa9/0x290
[<0>] exit_to_user_mode_prepare+0xe7/0x1a0
[<0>] syscall_exit_to_user_mode+0x18/0x40
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

izh@suse2:~> cat /proc/1891/status
Name: strace
Umask: 0022
State: t (tracing stop)
Tgid: 1891
Ngid: 0
Pid: 1891
PPid: 1890
TracerPid: 1990
Uid: 1000 1000 1000 1000
Gid: 100 100 100 100
FDSize: 256
Groups: 100
NStgid: 1891
NSpid: 1891
NSpgid: 1891
NSsid: 1891
VmPeak: 24752 kB
VmSize: 24752 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 3828 kB
VmRSS: 3828 kB
RssAnon: 520 kB
RssFile: 3308 kB
RssShmem: 0 kB
VmData: 284 kB
VmStk: 132 kB
VmExe: 1108 kB
VmLib: 2828 kB
VmPTE: 80 kB
VmSwap: 0 kB
HugetlbPages: 0 kB
CoreDumping: 0
THP_enabled: 1
Threads: 1
SigQ: 4/15639
SigPnd: 0000000000000000
ShdPnd: 0000000000014100
SigBlk: 0000000000002000
SigIgn: 0000000000300000
SigCgt: 0000000180007007
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp: 0
Seccomp_filters: 0
Speculation_Store_Bypass: vulnerable
SpeculationIndirectBranch: always enabled
Cpus_allowed: 7
Cpus_allowed_list: 0-2
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 1561
nonvoluntary_ctxt_switches: 7

izh@suse2:~> cat /proc/1990/status
Name: strace
Umask: 0022
State: t (tracing stop)
Tgid: 1990
Ngid: 0
Pid: 1990
PPid: 1847
TracerPid: 1891
Uid: 1000 1000 1000 1000
Gid: 100 100 100 100
FDSize: 256
Groups: 100
NStgid: 1990
NSpid: 1990
NSpgid: 1990
NSsid: 1847
VmPeak: 24752 kB
VmSize: 24752 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 3628 kB
VmRSS: 3628 kB
RssAnon: 520 kB
RssFile: 3108 kB
RssShmem: 0 kB
VmData: 284 kB
VmStk: 132 kB
VmExe: 1108 kB
VmLib: 2828 kB
VmPTE: 88 kB
VmSwap: 0 kB
HugetlbPages: 0 kB
CoreDumping: 0
THP_enabled: 1
Threads: 1
SigQ: 4/15639
SigPnd: 0000000000000000
ShdPnd: 0000000000014100
SigBlk: 0000000000002000
SigIgn: 0000000000300000
SigCgt: 0000000180007007
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp: 0
Seccomp_filters: 0
Speculation_Store_Bypass: vulnerable
SpeculationIndirectBranch: always enabled
Cpus_allowed: 7
Cpus_allowed_list: 0-2
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 180
nonvoluntary_ctxt_switches: 848

On 29.03.2021 19:49, Oleg Nesterov wrote:
> On 03/29, Igor Zhbanov wrote:
>>
>> Mutual debugging of 2 processes can stuck in unkillable stopped state
>
> can't reproduce and can't understand...
>
>> Hi!
>>
>> When one process, let's say "A", is tracing the another process "B", and the
>> process "B" is trying to attach to the process "A", then both of them are
>> getting stuck in the "t+" state. And they are ignoring all of the signals
>> including the SIGKILL,
>
> Why do you think so? What is your kernel version?
>
> "t" means TASK_TRACED, SIGKILL should wake it up and terminate.
>
>> so it is not possible to terminate them without
>> a reboot.
>>
>> To reproduce:
>> 1) Run two terminals
>> 2) Attach with "strace -p ..." from the first terminal to the shell (bash) of
>> the second terminal.
>> 3) In the second terminal run "exec strace -p ..." to attach to the PID of the
>> first strace.
>>
>> Then you'll see that the second strace is hanging without any output. And the
>> first strace will output following and hang too:
>> ptrace(PTRACE_SEIZE, 11795, NULL,
>> PTRACE_O_TRACESYSGOOD|PTRACE_O_TRACEEXEC|PTRACE_O_TRACEEXIT
>>
>> (The 11795 is the PID of the first strace itself.)
>>
>> And in the process list you will see following:
>> ps awux | grep strace
>> user 11776 0.0 0.0 24752 2248 pts/3 t+ 13:53 0:00 strace -p 11795
>> user 11795 0.0 0.0 24752 3888 pts/1 t+ 13:54 0:00 strace -p 11776
>
> OK, may be they sleep in PTRACE_EVENT_EXIT? After you tried to send SIGKILL?
>
> please show us the output from "cat /proc/{11795,11776}/stack". And
> "cat /proc/{11795,11776}/status" just in case.

2021-03-29 17:40:44

by Oleg Nesterov

[permalink] [raw]
Subject: Re: Mutual debugging of 2 processes can stuck in unkillable stopped state

Hi Igor,

So. As expected, they sleep in EVENT_EXIT _after_ you have already sent SIGKILL.

Oh. I can only repeat that PTRACE_EVENT_EXIT must die ;) Or at least we should
finally define its semantics.

Igor, thanks for your report, but (I think) this has nothing to do with mutual
debugging. I'll return to this problem in a couple of days, I'm a bit busy right
now.

Thanks,

Oleg.

On 03/29, Igor Zhbanov wrote:
> Hi Oleg!
>
> I've tried both 5.3.18 and 5.10.0. The behavior is the same.
> The important thing is to run "exec strace -p ..." on the second terminal
> to create the loop A->B->A.
>
> So the last line from the first strace we see is:
> ptrace(PTRACE_SEIZE, 1990, NULL, PTRACE_O_TRACESYSGOOD|PTRACE_O_TRACEEXEC|PTRACE_O_TRACEEXIT
>
> I.e. it printed the syscall prior to its execution and hanged after the
> execution.
>
> izh@suse2:~> ps awux|grep strace
> izh 1891 0.0 0.0 24752 3828 pts/1 ts+ 19:52 0:00 strace -p 1990
> izh 1990 0.0 0.0 24752 3628 pts/0 t+ 19:53 0:00 strace -p 1891
>
> izh@suse2:~> kill 1990 1891
> izh@suse2:~> kill -9 1990 1891
>
> izh@suse2:~> sudo cat /proc/1891/stack
> [sudo] password for root:
> [<0>] ptrace_stop+0x14a/0x260
> [<0>] ptrace_do_notify+0x91/0xb0
> [<0>] ptrace_notify+0x4e/0x70
> [<0>] do_exit+0x910/0xb70
> [<0>] do_group_exit+0x3a/0xa0
> [<0>] get_signal+0x124/0x800
> [<0>] arch_do_signal_or_restart+0xa9/0x290
> [<0>] exit_to_user_mode_prepare+0xe7/0x1a0
> [<0>] syscall_exit_to_user_mode+0x18/0x40
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> izh@suse2:~> sudo cat /proc/1990/stack
> [<0>] ptrace_stop+0x14a/0x260
> [<0>] ptrace_do_notify+0x91/0xb0
> [<0>] ptrace_notify+0x4e/0x70
> [<0>] do_exit+0x910/0xb70
> [<0>] do_group_exit+0x3a/0xa0
> [<0>] get_signal+0x124/0x800
> [<0>] arch_do_signal_or_restart+0xa9/0x290
> [<0>] exit_to_user_mode_prepare+0xe7/0x1a0
> [<0>] syscall_exit_to_user_mode+0x18/0x40
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> izh@suse2:~> cat /proc/1891/status
> Name: strace
> Umask: 0022
> State: t (tracing stop)
> Tgid: 1891
> Ngid: 0
> Pid: 1891
> PPid: 1890
> TracerPid: 1990
> Uid: 1000 1000 1000 1000
> Gid: 100 100 100 100
> FDSize: 256
> Groups: 100
> NStgid: 1891
> NSpid: 1891
> NSpgid: 1891
> NSsid: 1891
> VmPeak: 24752 kB
> VmSize: 24752 kB
> VmLck: 0 kB
> VmPin: 0 kB
> VmHWM: 3828 kB
> VmRSS: 3828 kB
> RssAnon: 520 kB
> RssFile: 3308 kB
> RssShmem: 0 kB
> VmData: 284 kB
> VmStk: 132 kB
> VmExe: 1108 kB
> VmLib: 2828 kB
> VmPTE: 80 kB
> VmSwap: 0 kB
> HugetlbPages: 0 kB
> CoreDumping: 0
> THP_enabled: 1
> Threads: 1
> SigQ: 4/15639
> SigPnd: 0000000000000000
> ShdPnd: 0000000000014100
> SigBlk: 0000000000002000
> SigIgn: 0000000000300000
> SigCgt: 0000000180007007
> CapInh: 0000000000000000
> CapPrm: 0000000000000000
> CapEff: 0000000000000000
> CapBnd: 000001ffffffffff
> CapAmb: 0000000000000000
> NoNewPrivs: 0
> Seccomp: 0
> Seccomp_filters: 0
> Speculation_Store_Bypass: vulnerable
> SpeculationIndirectBranch: always enabled
> Cpus_allowed: 7
> Cpus_allowed_list: 0-2
> Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
> Mems_allowed_list: 0
> voluntary_ctxt_switches: 1561
> nonvoluntary_ctxt_switches: 7
>
> izh@suse2:~> cat /proc/1990/status
> Name: strace
> Umask: 0022
> State: t (tracing stop)
> Tgid: 1990
> Ngid: 0
> Pid: 1990
> PPid: 1847
> TracerPid: 1891
> Uid: 1000 1000 1000 1000
> Gid: 100 100 100 100
> FDSize: 256
> Groups: 100
> NStgid: 1990
> NSpid: 1990
> NSpgid: 1990
> NSsid: 1847
> VmPeak: 24752 kB
> VmSize: 24752 kB
> VmLck: 0 kB
> VmPin: 0 kB
> VmHWM: 3628 kB
> VmRSS: 3628 kB
> RssAnon: 520 kB
> RssFile: 3108 kB
> RssShmem: 0 kB
> VmData: 284 kB
> VmStk: 132 kB
> VmExe: 1108 kB
> VmLib: 2828 kB
> VmPTE: 88 kB
> VmSwap: 0 kB
> HugetlbPages: 0 kB
> CoreDumping: 0
> THP_enabled: 1
> Threads: 1
> SigQ: 4/15639
> SigPnd: 0000000000000000
> ShdPnd: 0000000000014100
> SigBlk: 0000000000002000
> SigIgn: 0000000000300000
> SigCgt: 0000000180007007
> CapInh: 0000000000000000
> CapPrm: 0000000000000000
> CapEff: 0000000000000000
> CapBnd: 000001ffffffffff
> CapAmb: 0000000000000000
> NoNewPrivs: 0
> Seccomp: 0
> Seccomp_filters: 0
> Speculation_Store_Bypass: vulnerable
> SpeculationIndirectBranch: always enabled
> Cpus_allowed: 7
> Cpus_allowed_list: 0-2
> Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
> Mems_allowed_list: 0
> voluntary_ctxt_switches: 180
> nonvoluntary_ctxt_switches: 848
>
> On 29.03.2021 19:49, Oleg Nesterov wrote:
> >On 03/29, Igor Zhbanov wrote:
> >>
> >>Mutual debugging of 2 processes can stuck in unkillable stopped state
> >
> >can't reproduce and can't understand...
> >
> >>Hi!
> >>
> >>When one process, let's say "A", is tracing the another process "B", and the
> >>process "B" is trying to attach to the process "A", then both of them are
> >>getting stuck in the "t+" state. And they are ignoring all of the signals
> >>including the SIGKILL,
> >
> >Why do you think so? What is your kernel version?
> >
> >"t" means TASK_TRACED, SIGKILL should wake it up and terminate.
> >
> >>so it is not possible to terminate them without
> >>a reboot.
> >>
> >>To reproduce:
> >>1) Run two terminals
> >>2) Attach with "strace -p ..." from the first terminal to the shell (bash) of
> >> the second terminal.
> >>3) In the second terminal run "exec strace -p ..." to attach to the PID of the
> >> first strace.
> >>
> >>Then you'll see that the second strace is hanging without any output. And the
> >>first strace will output following and hang too:
> >>ptrace(PTRACE_SEIZE, 11795, NULL,
> >> PTRACE_O_TRACESYSGOOD|PTRACE_O_TRACEEXEC|PTRACE_O_TRACEEXIT
> >>
> >>(The 11795 is the PID of the first strace itself.)
> >>
> >>And in the process list you will see following:
> >>ps awux | grep strace
> >>user 11776 0.0 0.0 24752 2248 pts/3 t+ 13:53 0:00 strace -p 11795
> >>user 11795 0.0 0.0 24752 3888 pts/1 t+ 13:54 0:00 strace -p 11776
> >
> >OK, may be they sleep in PTRACE_EVENT_EXIT? After you tried to send SIGKILL?
> >
> >please show us the output from "cat /proc/{11795,11776}/stack". And
> >"cat /proc/{11795,11776}/status" just in case.
>

2021-03-29 17:46:34

by Igor Zhbanov

[permalink] [raw]
Subject: Re: Mutual debugging of 2 processes can stuck in unkillable stopped state

Hi Oleg,

On 29.03.2021 20:38, Oleg Nesterov wrote:
> Hi Igor,
>
> So. As expected, they sleep in EVENT_EXIT _after_ you have already
> sent SIGKILL.

Here is the processes stack before sending any signals to them:

izh@suse2:~> sudo cat /proc/1751/stack
[<0>] ptrace_stop+0x14e/0x260
[<0>] ptrace_do_notify+0x91/0xb0
[<0>] ptrace_notify+0x4e/0x70
[<0>] syscall_slow_exit_work+0x90/0x150
[<0>] do_syscall_64+0x129/0x1f0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

izh@suse2:~> sudo cat /proc/1979/stack
[<0>] ptrace_stop+0x14e/0x260
[<0>] get_signal+0x4d5/0x840
[<0>] do_signal+0x30/0x6a0
[<0>] exit_to_usermode_loop+0x8b/0x120
[<0>] do_syscall_64+0x1c3/0x1f0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

> Igor, thanks for your report, but (I think) this has nothing to do
> with mutual debugging. I'll return to this problem in a couple of
> days, I'm a bit busy right now.

Sorry for inexact description. I said it from the user perspective when
the processes are stopped and can't be killed. :-)

2021-04-12 18:54:03

by Igor Zhbanov

[permalink] [raw]
Subject: Re: Mutual debugging of 2 processes can stuck in unkillable stopped state

Hi Oleg,

So what is the cause of this problem?

Thank you.

On 29.03.2021 20:44, Igor Zhbanov wrote:
>
> Here is the processes stack before sending any signals to them:
>
> izh@suse2:~> sudo cat /proc/1751/stack
> [<0>] ptrace_stop+0x14e/0x260
> [<0>] ptrace_do_notify+0x91/0xb0
> [<0>] ptrace_notify+0x4e/0x70
> [<0>] syscall_slow_exit_work+0x90/0x150
> [<0>] do_syscall_64+0x129/0x1f0
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> izh@suse2:~> sudo cat /proc/1979/stack
> [<0>] ptrace_stop+0x14e/0x260
> [<0>] get_signal+0x4d5/0x840
> [<0>] do_signal+0x30/0x6a0
> [<0>] exit_to_usermode_loop+0x8b/0x120
> [<0>] do_syscall_64+0x1c3/0x1f0
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
>> Igor, thanks for your report, but (I think) this has nothing to do
>> with mutual debugging. I'll return to this problem in a couple of
>> days, I'm a bit busy right now.

2021-04-13 22:06:23

by Oleg Nesterov

[permalink] [raw]
Subject: Re: Mutual debugging of 2 processes can stuck in unkillable stopped state

Hi Igor,

sorry for delay...

On 04/12, Igor Zhbanov wrote:
>
> Hi Oleg,
>
> So what is the cause of this problem?

The cause is clear. And well known ;) And again, this has almost nothing to do
with the mutual debugging.

The tracee sleeps in ptrace_stop(). You send SIGKILL. This wakes the tracee up,
it dequeues the signal, calls do_exit(), and stops again in PTRACE_EVENT_EXIT.
With SIGKILL in signal->shared_pending. This all looks as if the tracee doesn't
react to SIGKILL.

The only problem is that any change can break something which relies on the
current behaviour :/ I'll write another email on this.

Oleg.