2021-10-11 16:24:50

by Yang Jihong

[permalink] [raw]
Subject: [PATCH] tracing: save cmdline only when task does not exist in savecmd for optimization

commit 85f726a35e504418 use strncpy instead of memcpy when copying comm,
on ARM64 machine, this commit causes performance degradation.

For the task that already exists in savecmd, it is unnecessary to call
set_cmdline to execute strncpy once, run set_cmdline only if the task does
not exist in savecmd.

I have written an example (which is an extreme case) in which trace sched switch
is invoked for 1000 times, as shown in the following:

for (int i = 0; i < 1000; i++) {
trace_sched_switch(true, current, current);
}

On ARM64 machine, compare the data before and after the optimization:
+---------------------+------------------------------+------------------------+
| | Total number of instructions | Total number of cycles |
+---------------------+------------------------------+------------------------+
| Before optimization | 1107367 | 658491 |
+---------------------+------------------------------+------------------------+
| After optimization | 869367 | 520171 |
+---------------------+------------------------------+------------------------+
As shown above, there is nearly 26% performance

Signed-off-by: Yang Jihong <[email protected]>
---
kernel/trace/trace.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 7896d30d90f7..a795610a3b37 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2427,8 +2427,11 @@ static int trace_save_cmdline(struct task_struct *tsk)
savedcmd->cmdline_idx = idx;
}

- savedcmd->map_cmdline_to_pid[idx] = tsk->pid;
- set_cmdline(idx, tsk->comm);
+ /* save cmdline only when task does not exist in savecmd */
+ if (savedcmd->map_cmdline_to_pid[idx] != tsk->pid) {
+ savedcmd->map_cmdline_to_pid[idx] = tsk->pid;
+ set_cmdline(idx, tsk->comm);
+ }

arch_spin_unlock(&trace_cmdline_lock);

--
2.30.GIT


2021-10-14 03:05:20

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] tracing: save cmdline only when task does not exist in savecmd for optimization

On Mon, 11 Oct 2021 19:50:18 +0800
Yang Jihong <[email protected]> wrote:

> commit 85f726a35e504418 use strncpy instead of memcpy when copying comm,
> on ARM64 machine, this commit causes performance degradation.
>
> For the task that already exists in savecmd, it is unnecessary to call
> set_cmdline to execute strncpy once, run set_cmdline only if the task does
> not exist in savecmd.
>
> I have written an example (which is an extreme case) in which trace sched switch
> is invoked for 1000 times, as shown in the following:
>
> for (int i = 0; i < 1000; i++) {
> trace_sched_switch(true, current, current);
> }

Well that's a pretty non realistic benchmark.

>
> On ARM64 machine, compare the data before and after the optimization:
> +---------------------+------------------------------+------------------------+
> | | Total number of instructions | Total number of cycles |
> +---------------------+------------------------------+------------------------+
> | Before optimization | 1107367 | 658491 |
> +---------------------+------------------------------+------------------------+
> | After optimization | 869367 | 520171 |
> +---------------------+------------------------------+------------------------+
> As shown above, there is nearly 26% performance

I'd prefer to see a more realistic benchmark.

>
> Signed-off-by: Yang Jihong <[email protected]>
> ---
> kernel/trace/trace.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 7896d30d90f7..a795610a3b37 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -2427,8 +2427,11 @@ static int trace_save_cmdline(struct task_struct *tsk)
> savedcmd->cmdline_idx = idx;
> }
>
> - savedcmd->map_cmdline_to_pid[idx] = tsk->pid;
> - set_cmdline(idx, tsk->comm);
> + /* save cmdline only when task does not exist in savecmd */
> + if (savedcmd->map_cmdline_to_pid[idx] != tsk->pid) {
> + savedcmd->map_cmdline_to_pid[idx] = tsk->pid;
> + set_cmdline(idx, tsk->comm);
> + }

I'm not against adding this. Just for kicks I ran the following before
and after this patch:

# trace-cmd start -e sched
# perf stat -r 100 hackbench 50

Before:

Performance counter stats for '/work/c/hackbench 50' (100 runs):

6,261.26 msec task-clock # 6.126 CPUs utilized ( +- 0.12% )
93,519 context-switches # 14.936 K/sec ( +- 1.12% )
13,725 cpu-migrations # 2.192 K/sec ( +- 1.16% )
47,266 page-faults # 7.549 K/sec ( +- 0.54% )
22,911,885,026 cycles # 3.659 GHz ( +- 0.11% )
15,171,250,777 stalled-cycles-frontend # 66.22% frontend cycles idle ( +- 0.13% )
18,330,841,604 instructions # 0.80 insn per cycle
# 0.83 stalled cycles per insn ( +- 0.11% )
4,027,904,559 branches # 643.306 M/sec ( +- 0.11% )
31,327,782 branch-misses # 0.78% of all branches ( +- 0.20% )

1.02201 +- 0.00158 seconds time elapsed ( +- 0.15% )
After:

Performance counter stats for '/work/c/hackbench 50' (100 runs):

6,216.47 msec task-clock # 6.124 CPUs utilized ( +- 0.10% )
93,311 context-switches # 15.010 K/sec ( +- 0.91% )
13,719 cpu-migrations # 2.207 K/sec ( +- 1.09% )
47,085 page-faults # 7.574 K/sec ( +- 0.49% )
22,746,703,318 cycles # 3.659 GHz ( +- 0.09% )
15,012,911,121 stalled-cycles-frontend # 66.00% frontend cycles idle ( +- 0.11% )
18,275,147,949 instructions # 0.80 insn per cycle
# 0.82 stalled cycles per insn ( +- 0.08% )
4,017,673,788 branches # 646.295 M/sec ( +- 0.08% )
31,313,459 branch-misses # 0.78% of all branches ( +- 0.17% )

1.01506 +- 0.00150 seconds time elapsed ( +- 0.15% )

Really it's all in the noise, so adding this doesn't seem to hurt.

-- Steve



>
> arch_spin_unlock(&trace_cmdline_lock);
>

2021-10-14 08:12:14

by Yang Jihong

[permalink] [raw]
Subject: Re: [PATCH] tracing: save cmdline only when task does not exist in savecmd for optimization

Hi Steven,

On 2021/10/14 11:02, Steven Rostedt wrote:
> On Mon, 11 Oct 2021 19:50:18 +0800
> Yang Jihong <[email protected]> wrote:
>
>> commit 85f726a35e504418 use strncpy instead of memcpy when copying comm,
>> on ARM64 machine, this commit causes performance degradation.
>>
>> For the task that already exists in savecmd, it is unnecessary to call
>> set_cmdline to execute strncpy once, run set_cmdline only if the task does
>> not exist in savecmd.
>>
>> I have written an example (which is an extreme case) in which trace sched switch
>> is invoked for 1000 times, as shown in the following:
>>
>> for (int i = 0; i < 1000; i++) {
>> trace_sched_switch(true, current, current);
>> }
>
> Well that's a pretty non realistic benchmark.
>
>>
>> On ARM64 machine, compare the data before and after the optimization:
>> +---------------------+------------------------------+------------------------+
>> | | Total number of instructions | Total number of cycles |
>> +---------------------+------------------------------+------------------------+
>> | Before optimization | 1107367 | 658491 |
>> +---------------------+------------------------------+------------------------+
>> | After optimization | 869367 | 520171 |
>> +---------------------+------------------------------+------------------------+
>> As shown above, there is nearly 26% performance
>
> I'd prefer to see a more realistic benchmark.
>
>>
>> Signed-off-by: Yang Jihong <[email protected]>
>> ---
>> kernel/trace/trace.c | 7 +++++--
>> 1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
>> index 7896d30d90f7..a795610a3b37 100644
>> --- a/kernel/trace/trace.c
>> +++ b/kernel/trace/trace.c
>> @@ -2427,8 +2427,11 @@ static int trace_save_cmdline(struct task_struct *tsk)
>> savedcmd->cmdline_idx = idx;
>> }
>>
>> - savedcmd->map_cmdline_to_pid[idx] = tsk->pid;
>> - set_cmdline(idx, tsk->comm);
>> + /* save cmdline only when task does not exist in savecmd */
>> + if (savedcmd->map_cmdline_to_pid[idx] != tsk->pid) {
>> + savedcmd->map_cmdline_to_pid[idx] = tsk->pid;
>> + set_cmdline(idx, tsk->comm);
>> + }
>
> I'm not against adding this. Just for kicks I ran the following before
> and after this patch:
>
> # trace-cmd start -e sched
> # perf stat -r 100 hackbench 50
>
> Before:
>
> Performance counter stats for '/work/c/hackbench 50' (100 runs):
>
> 6,261.26 msec task-clock # 6.126 CPUs utilized ( +- 0.12% )
> 93,519 context-switches # 14.936 K/sec ( +- 1.12% )
> 13,725 cpu-migrations # 2.192 K/sec ( +- 1.16% )
> 47,266 page-faults # 7.549 K/sec ( +- 0.54% )
> 22,911,885,026 cycles # 3.659 GHz ( +- 0.11% )
> 15,171,250,777 stalled-cycles-frontend # 66.22% frontend cycles idle ( +- 0.13% )
> 18,330,841,604 instructions # 0.80 insn per cycle
> # 0.83 stalled cycles per insn ( +- 0.11% )
> 4,027,904,559 branches # 643.306 M/sec ( +- 0.11% )
> 31,327,782 branch-misses # 0.78% of all branches ( +- 0.20% )
>
> 1.02201 +- 0.00158 seconds time elapsed ( +- 0.15% )
> After:
>
> Performance counter stats for '/work/c/hackbench 50' (100 runs):
>
> 6,216.47 msec task-clock # 6.124 CPUs utilized ( +- 0.10% )
> 93,311 context-switches # 15.010 K/sec ( +- 0.91% )
> 13,719 cpu-migrations # 2.207 K/sec ( +- 1.09% )
> 47,085 page-faults # 7.574 K/sec ( +- 0.49% )
> 22,746,703,318 cycles # 3.659 GHz ( +- 0.09% )
> 15,012,911,121 stalled-cycles-frontend # 66.00% frontend cycles idle ( +- 0.11% )
> 18,275,147,949 instructions # 0.80 insn per cycle
> # 0.82 stalled cycles per insn ( +- 0.08% )
> 4,017,673,788 branches # 646.295 M/sec ( +- 0.08% )
> 31,313,459 branch-misses # 0.78% of all branches ( +- 0.17% )
>
> 1.01506 +- 0.00150 seconds time elapsed ( +- 0.15% )
>
> Really it's all in the noise, so adding this doesn't seem to hurt.
>
Thanks very much for benchmark test data. :)
Indeed, the effect of this modification is not obvious in scenarios
where tasks are repeatedly created, but only in scenarios where tasks
are repeatedly scheduled between a limited number of tasks.

Thanks,
Jihong
> -- Steve
>
>
>
>>
>> arch_spin_unlock(&trace_cmdline_lock);
>>
>
> .
>

2021-10-14 14:58:38

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] tracing: save cmdline only when task does not exist in savecmd for optimization

On Mon, 11 Oct 2021 19:50:18 +0800
Yang Jihong <[email protected]> wrote:

> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 7896d30d90f7..a795610a3b37 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -2427,8 +2427,11 @@ static int trace_save_cmdline(struct task_struct *tsk)
> savedcmd->cmdline_idx = idx;
> }
>
> - savedcmd->map_cmdline_to_pid[idx] = tsk->pid;
> - set_cmdline(idx, tsk->comm);
> + /* save cmdline only when task does not exist in savecmd */
> + if (savedcmd->map_cmdline_to_pid[idx] != tsk->pid) {
> + savedcmd->map_cmdline_to_pid[idx] = tsk->pid;
> + set_cmdline(idx, tsk->comm);
> + }
>

I now remember why I never did it this way. This breaks saving the command
line when we do an exec.

That is, just because mapped_pid == tsk->pid does not mean that the comm is
the same.

-- Steve

2021-10-15 10:22:15

by Yang Jihong

[permalink] [raw]
Subject: Re: [PATCH] tracing: save cmdline only when task does not exist in savecmd for optimization

Hi Steven,

On 2021/10/14 22:32, Steven Rostedt wrote:
> On Mon, 11 Oct 2021 19:50:18 +0800
> Yang Jihong <[email protected]> wrote:
>
>> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
>> index 7896d30d90f7..a795610a3b37 100644
>> --- a/kernel/trace/trace.c
>> +++ b/kernel/trace/trace.c
>> @@ -2427,8 +2427,11 @@ static int trace_save_cmdline(struct task_struct *tsk)
>> savedcmd->cmdline_idx = idx;
>> }
>>
>> - savedcmd->map_cmdline_to_pid[idx] = tsk->pid;
>> - set_cmdline(idx, tsk->comm);
>> + /* save cmdline only when task does not exist in savecmd */
>> + if (savedcmd->map_cmdline_to_pid[idx] != tsk->pid) {
>> + savedcmd->map_cmdline_to_pid[idx] = tsk->pid;
>> + set_cmdline(idx, tsk->comm);
>> + }
>>
>
> I now remember why I never did it this way. This breaks saving the command
> line when we do an exec.
>
> That is, just because mapped_pid == tsk->pid does not mean that the comm is
> the same.
>
If do an exec, the original process is replaced with a new binary.
Therefore, the command changes but the tsk->pid does not change.
Therefore, we need to savecmd here again?

Okay, I see. Thank you for the detailed explanation. :)

Thanks,
Jihong
> -- Steve
> .
>