2021-06-11 12:49:51

by Qi Zheng

[permalink] [raw]
Subject: [PATCH] x86: fix get_wchan() not support the ORC unwinder

Currently, the kernel CONFIG_UNWINDER_ORC option is enabled by
default on x86, but the implementation of get_wchan() is still
based on the frame pointer unwinder, so the /proc/<pid>/wchan
always return 0 regardless of whether the task <pid> is running.

We reimplement the get_wchan() by calling stack_trace_save_tsk(),
which is adapted to the ORC and frame pointer unwinders.

Fixes: ee9f8fce9964(x86/unwind: Add the ORC unwinder)
Signed-off-by: Qi Zheng <[email protected]>
---
arch/x86/kernel/process.c | 51 +++--------------------------------------------
1 file changed, 3 insertions(+), 48 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 5e1f38179f49..976a36918ed7 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -928,58 +928,13 @@ unsigned long arch_randomize_brk(struct mm_struct *mm)
*/
unsigned long get_wchan(struct task_struct *p)
{
- unsigned long start, bottom, top, sp, fp, ip, ret = 0;
- int count = 0;
+ unsigned long entry = 0;

if (p == current || p->state == TASK_RUNNING)
return 0;

- if (!try_get_task_stack(p))
- return 0;
-
- start = (unsigned long)task_stack_page(p);
- if (!start)
- goto out;
-
- /*
- * Layout of the stack page:
- *
- * ----------- topmax = start + THREAD_SIZE - sizeof(unsigned long)
- * PADDING
- * ----------- top = topmax - TOP_OF_KERNEL_STACK_PADDING
- * stack
- * ----------- bottom = start
- *
- * The tasks stack pointer points at the location where the
- * framepointer is stored. The data on the stack is:
- * ... IP FP ... IP FP
- *
- * We need to read FP and IP, so we need to adjust the upper
- * bound by another unsigned long.
- */
- top = start + THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING;
- top -= 2 * sizeof(unsigned long);
- bottom = start;
-
- sp = READ_ONCE(p->thread.sp);
- if (sp < bottom || sp > top)
- goto out;
-
- fp = READ_ONCE_NOCHECK(((struct inactive_task_frame *)sp)->bp);
- do {
- if (fp < bottom || fp > top)
- goto out;
- ip = READ_ONCE_NOCHECK(*(unsigned long *)(fp + sizeof(unsigned long)));
- if (!in_sched_functions(ip)) {
- ret = ip;
- goto out;
- }
- fp = READ_ONCE_NOCHECK(*(unsigned long *)fp);
- } while (count++ < 16 && p->state != TASK_RUNNING);
-
-out:
- put_task_stack(p);
- return ret;
+ stack_trace_save_tsk(p, &entry, 1, 0);
+ return entry;
}

long do_arch_prctl_common(struct task_struct *task, int option,
--
2.11.0


2021-06-14 16:24:59

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86: fix get_wchan() not support the ORC unwinder

On 6/11/21 5:46 AM, Qi Zheng wrote:
> Currently, the kernel CONFIG_UNWINDER_ORC option is enabled by
> default on x86, but the implementation of get_wchan() is still
> based on the frame pointer unwinder, so the /proc/<pid>/wchan
> always return 0 regardless of whether the task <pid> is running.
>
> We reimplement the get_wchan() by calling stack_trace_save_tsk(),
> which is adapted to the ORC and frame pointer unwinders.

How much slower does this make ps?

--Andy

2021-06-16 07:35:49

by Qi Zheng

[permalink] [raw]
Subject: Re: [External] Re: [PATCH] x86: fix get_wchan() not support the ORC unwinder

On Tue, Jun 15, 2021 at 12:21 AM Andy Lutomirski <[email protected]> wrote:
>
> On 6/11/21 5:46 AM, Qi Zheng wrote:
> > Currently, the kernel CONFIG_UNWINDER_ORC option is enabled by
> > default on x86, but the implementation of get_wchan() is still
> > based on the frame pointer unwinder, so the /proc/<pid>/wchan
> > always return 0 regardless of whether the task <pid> is running.
> >
> > We reimplement the get_wchan() by calling stack_trace_save_tsk(),
> > which is adapted to the ORC and frame pointer unwinders.
>
> How much slower does this make ps?

I used the bpftrace tool to test the running time of get_wchan() in the two
cases of the ORC and frame pointer unwinders, the test script and
the result are as follows:

the test script:
bpftrace -e 'kprobe:get_wchan { @start[tid] = nsecs; } kretprobe: get_wchan
/@start[tid]/ { @ns[comm] = hist(nsecs - @start[tid]); delete(@start[tid]); }'

the result:
1) ORC unwinder ( before applying this patch )

@ns[ps]:
[512, 1K) 4609 |@@@@@@@@@@@@ |
[1K, 2K) 18599 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[2K, 4K) 1848 |@@@@@ |
[4K, 8K) 307 | |
[8K, 16K) 74 | |
[16K, 32K) 12 | |

73% of the cases are in the [1K, 2K) range.
Notice: In this case, the get_wchan() always returns the wrong value of 0.

2) ORC unwinder ( after applying this patch )

@ns[ps]:
[512, 1K) 536 |@ |
[1K, 2K) 19945 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[2K, 4K) 5604 |@@@@@@@@@@@@@@ |
[4K, 8K) 246 | |
[8K, 16K) 154 | |
[16K, 32K) 18 | |

75% of the cases are in the [1K, 2K) range.

3) frame point unwinder ( before applying this patch )

@ns[ps]:
[512, 1K) 245 | |
[1K, 2K) 16577 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[2K, 4K) 2788 |@@@@@@@@ |
[4K, 8K) 190 | |
[8K, 16K) 74 | |
[16K, 32K) 9 | |

83% of the cases are in the [1K, 2K) range.

4) frame point unwinder ( after applying this patch )

@ns[ps]:
[512, 1K) 85 | |
[1K, 2K) 12023 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[2K, 4K) 7418 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[4K, 8K) 232 |@ |
[8K, 16K) 104 | |
[16K, 32K) 18 | |

60% of the cases are in the [1K, 2K) range.

In summary, the running time of get_wchan() has increased after applying this
patch. But the get_wchan() is not the hotspot function, and this is a bug in the
default ORC option, so I think these increased runtimes are acceptable.

In addition, this issue has existed for nearly 4 years and no one has
fixed it, if
nobody cares about the return value of the get_wchan(), maybe we can return
0 or remove this function directly. What do you think?

Best regards,
Qi Zheng

>
> --Andy