2024-02-05 10:52:21

by Yang Jihong

[permalink] [raw]
Subject: [PATCH 0/5] perf sched: Minor optimizations for resource initialization

start_work_mutex, work_done_wait_mutex, curr_thread, curr_pid, and
cpu_last_switched are initialized together in cmd_sched(),
but for different perf sched subcommands, some actions are unnecessary,
especially perf sched record.
This series of patches initialize only required resources for different
subcommands.

Simple functional testing:

# perf sched record perf bench sched messaging
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

Total time: 0.204 [sec]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 14.868 MB perf.data (133947 samples) ]

# perf sched latency

-------------------------------------------------------------------------------------------------------------------------------------------
Task | Runtime ms | Switches | Avg delay ms | Max delay ms | Max delay start | Max delay end |
-------------------------------------------------------------------------------------------------------------------------------------------
qemu-system-x86:(3) | 2.753 ms | 10 | avg: 0.963 ms | max: 8.526 ms | max start: 457541.695704 s | max end: 457541.704230 s
sched-messaging:(401) | 2995.481 ms | 36312 | avg: 0.944 ms | max: 111.551 ms | max start: 457541.645349 s | max end: 457541.756900 s
rcu_sched:14 | 0.402 ms | 32 | avg: 0.266 ms | max: 7.996 ms | max start: 457541.581363 s | max end: 457541.589359 s
sshd:48125 | 0.461 ms | 2 | avg: 0.033 ms | max: 0.036 ms | max start: 457541.584374 s | max end: 457541.584410 s
perf:112408 | 2.321 ms | 2 | avg: 0.031 ms | max: 0.032 ms | max start: 457541.846874 s | max end: 457541.846906 s
<SNIP>
ksoftirqd/1:22 | 0.704 ms | 3 | avg: 0.005 ms | max: 0.005 ms | max start: 457541.845388 s | max end: 457541.845393 s
kworker/15:0-mm:61886 | 0.227 ms | 21 | avg: 0.005 ms | max: 0.006 ms | max start: 457541.739187 s | max end: 457541.739193 s
kworker/13:1-ev:92643 | 0.249 ms | 22 | avg: 0.005 ms | max: 0.006 ms | max start: 457541.768695 s | max end: 457541.768701 s
kworker/12:1-ev:61202 | 0.418 ms | 40 | avg: 0.005 ms | max: 0.007 ms | max start: 457541.739679 s | max end: 457541.739687 s
-----------------------------------------------------------------------------------------------------------------
TOTAL: | 3007.424 ms | 36776 |
---------------------------------------------------

# echo $?
0

# perf sched map
*A0 457541.580856 secs A0 => migration/0:15
*. 457541.580886 secs . => swapper:0
. *B0 457541.581018 secs B0 => migration/1:21
. *. 457541.581050 secs
. . *C0 457541.581147 secs C0 => migration/2:27
. . *. 457541.581174 secs
. . . *D0 457541.581259 secs D0 => migration/3:33
<SNIP>
E7 E7 E7 E7 E7 E7 E7 E7 E7 E7 E7 *E7 . . . . 457541.847783 secs
E7 E7 E7 E7 E7 E7 E7 E7 E7 E7 E7 E7 *E7 . . . 457541.847873 secs
E7 E7 E7 E7 E7 E7 E7 E7 E7 E7 E7 E7 E7 *E7 . . 457541.847954 secs
E7 E7 E7 E7 E7 E7 E7 E7 E7 E7 E7 E7 E7 E7 *E7 . 457541.848034 secs
E7 E7 E7 E7 E7 E7 E7 E7 E7 E7 E7 E7 E7 E7 E7 *E7 457541.848108 secs
# echo $?
0

# perf sched replay
run measurement overhead: 108 nsecs
sleep measurement overhead: 65244 nsecs
the run test took 1000002 nsecs
the sleep test took 1079677 nsecs
nr_run_events: 40700
nr_sleep_events: 41415
nr_wakeup_events: 31601
target-less wakeups: 15
multi-target wakeups: 805
task 0 ( swapper: 0), nr_events: 7307
task 1 ( swapper: 1), nr_events: 1
task 2 ( swapper: 2), nr_events: 1
<SNIP>
task 713 ( sched-messaging: 112809), nr_events: 987
task 714 ( sched-messaging: 112810), nr_events: 2706
------------------------------------------------------------
#1 : 1298.443, ravg: 1298.44, cpu: 8281.74 / 8281.74
#2 : 1316.426, ravg: 1300.24, cpu: 7577.61 / 8211.32
#3 : 1323.864, ravg: 1302.60, cpu: 7932.48 / 8183.44
#4 : 1329.423, ravg: 1305.29, cpu: 7646.17 / 8129.71
#5 : 1321.419, ravg: 1306.90, cpu: 7669.90 / 8083.73
#6 : 1322.082, ravg: 1308.42, cpu: 7755.66 / 8050.92
#7 : 1324.330, ravg: 1310.01, cpu: 7361.51 / 7981.98
#8 : 1312.489, ravg: 1310.26, cpu: 7170.11 / 7900.80
#9 : 1312.002, ravg: 1310.43, cpu: 7176.40 / 7828.36
#10 : 1311.737, ravg: 1310.56, cpu: 7121.14 / 7757.63
# echo $?
0

# perf sched script
perf 112408 [000] 457541.580826: sched:sched_stat_runtime: comm=perf pid=112408 runtime=53050 [ns] vruntime=621244222333 [ns]
perf 112408 [000] 457541.580831: sched:sched_waking: comm=migration/0 pid=15 prio=0 target_cpu=000
perf 112408 [000] 457541.580853: sched:sched_stat_runtime: comm=perf pid=112408 runtime=24379 [ns] vruntime=621244246712 [ns]
perf 112408 [000] 457541.580856: sched:sched_switch: prev_comm=perf prev_pid=112408 prev_prio=120 prev_state=R+ ==> next_comm=migration/0 next_pid=15 next_prio=0
<SNIP>
swapper 0 [012] 457541.847873: sched:sched_switch: prev_comm=swapper/12 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=perf next_pid=112408 next_prio=120
swapper 0 [013] 457541.847954: sched:sched_switch: prev_comm=swapper/13 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=perf next_pid=112408 next_prio=120
swapper 0 [014] 457541.848034: sched:sched_switch: prev_comm=swapper/14 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=perf next_pid=112408 next_prio=120
swapper 0 [015] 457541.848108: sched:sched_switch: prev_comm=swapper/15 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=perf next_pid=112408 next_prio=120
# echo $?
0

# perf sched timehist
Samples do not have callchains.
time cpu task name wait time sch delay run time
[tid/pid] (msec) (msec) (msec)
--------------- ------ ------------------------------ --------- --------- ---------
457541.580856 [0000] perf[112408] 0.000 0.000 0.000
457541.580886 [0000] migration/0[15] 0.000 0.024 0.029
457541.581018 [0001] perf[112408] 0.000 0.000 0.000
457541.581050 [0001] migration/1[21] 0.000 0.006 0.032
457541.581147 [0002] perf[112408] 0.000 0.000 0.000
457541.581174 [0002] migration/2[27] 0.000 0.005 0.026
<SNIP>
457541.847623 [0009] <idle> 0.010 0.000 1.489
457541.847704 [0010] <idle> 0.012 0.000 1.777
457541.847783 [0011] <idle> 0.233 0.000 1.213
457541.847873 [0012] <idle> 0.008 0.000 24.188
457541.847954 [0013] <idle> 0.009 0.000 1.734
457541.848034 [0014] <idle> 0.009 0.000 2.969
457541.848108 [0015] <idle> 0.220 0.000 1.205
# echo $?
0

Yang Jihong (5):
perf sched: Move start_work_mutex and work_done_wait_mutex
initialization to perf_sched__replay()
perf sched: Fix memory leak in perf_sched__map()
perf sched: Move curr_thread initialization to perf_sched__map()
perf sched: Move curr_pid and cpu_last_switched initialization to
perf_sched__{lat|map|replay}()
perf thread_map: Free strlist on normal path in
thread_map__new_by_tid_str()

tools/perf/builtin-sched.c | 163 ++++++++++++++++++++++-------------
tools/perf/util/thread_map.c | 2 +-
2 files changed, 105 insertions(+), 60 deletions(-)

--
2.34.1



2024-02-05 10:52:27

by Yang Jihong

[permalink] [raw]
Subject: [PATCH 5/5] perf thread_map: Free strlist on normal path in thread_map__new_by_tid_str()

slist needs to be freed in both error path and normal path in
thread_map__new_by_tid_str().

Signed-off-by: Yang Jihong <[email protected]>
---
tools/perf/util/thread_map.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c
index e848579e61a8..ea3b431b9783 100644
--- a/tools/perf/util/thread_map.c
+++ b/tools/perf/util/thread_map.c
@@ -280,13 +280,13 @@ struct perf_thread_map *thread_map__new_by_tid_str(const char *tid_str)
threads->nr = ntasks;
}
out:
+ strlist__delete(slist);
if (threads)
refcount_set(&threads->refcnt, 1);
return threads;

out_free_threads:
zfree(&threads);
- strlist__delete(slist);
goto out;
}

--
2.34.1


2024-02-05 10:52:50

by Yang Jihong

[permalink] [raw]
Subject: [PATCH 4/5] perf sched: Move curr_pid and cpu_last_switched initialization to perf_sched__{lat|map|replay}()

The curr_pid and cpu_last_switched are used only for the
'perf sched replay/latency/map'. Put their initialization in
perf_sched__{lat|map|replay () to reduce unnecessary actions in other
commands.

Simple functional testing:

# perf sched record perf bench sched messaging
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

Total time: 0.209 [sec]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 16.456 MB perf.data (147907 samples) ]

# perf sched lat

-------------------------------------------------------------------------------------------------------------------------------------------
Task | Runtime ms | Switches | Avg delay ms | Max delay ms | Max delay start | Max delay end |
-------------------------------------------------------------------------------------------------------------------------------------------
sched-messaging:(401) | 2990.699 ms | 38705 | avg: 0.661 ms | max: 67.046 ms | max start: 456532.624830 s | max end: 456532.691876 s
qemu-system-x86:(7) | 179.764 ms | 2191 | avg: 0.152 ms | max: 21.857 ms | max start: 456532.576434 s | max end: 456532.598291 s
sshd:48125 | 0.522 ms | 2 | avg: 0.037 ms | max: 0.046 ms | max start: 456532.514610 s | max end: 456532.514656 s
<SNIP>
ksoftirqd/11:82 | 0.063 ms | 1 | avg: 0.005 ms | max: 0.005 ms | max start: 456532.769366 s | max end: 456532.769371 s
kworker/9:0-mm_:34624 | 0.233 ms | 20 | avg: 0.004 ms | max: 0.007 ms | max start: 456532.690804 s | max end: 456532.690812 s
migration/13:93 | 0.000 ms | 1 | avg: 0.004 ms | max: 0.004 ms | max start: 456532.512669 s | max end: 456532.512674 s
-----------------------------------------------------------------------------------------------------------------
TOTAL: | 3180.750 ms | 41368 |
---------------------------------------------------

# echo $?
0

# perf sched map
*A0 456532.510141 secs A0 => migration/0:15
*. 456532.510171 secs . => swapper:0
. *B0 456532.510261 secs B0 => migration/1:21
. *. 456532.510279 secs
<SNIP>
L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 *L7 . . . . 456532.785979 secs
L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 *L7 . . . 456532.786054 secs
L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 *L7 . . 456532.786127 secs
L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 *L7 . 456532.786197 secs
L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 *L7 456532.786270 secs
# echo $?
0

# perf sched replay
run measurement overhead: 108 nsecs
sleep measurement overhead: 66473 nsecs
the run test took 1000002 nsecs
the sleep test took 1082686 nsecs
nr_run_events: 49334
nr_sleep_events: 50054
nr_wakeup_events: 34701
target-less wakeups: 165
multi-target wakeups: 766
task 0 ( swapper: 0), nr_events: 15419
task 1 ( swapper: 1), nr_events: 1
task 2 ( swapper: 2), nr_events: 1
<SNIP>
task 715 ( sched-messaging: 110248), nr_events: 1438
task 716 ( sched-messaging: 110249), nr_events: 512
task 717 ( sched-messaging: 110250), nr_events: 500
task 718 ( sched-messaging: 110251), nr_events: 537
task 719 ( sched-messaging: 110252), nr_events: 823
------------------------------------------------------------
#1 : 1325.288, ravg: 1325.29, cpu: 7823.35 / 7823.35
#2 : 1363.606, ravg: 1329.12, cpu: 7655.53 / 7806.56
#3 : 1349.494, ravg: 1331.16, cpu: 7544.80 / 7780.39
#4 : 1311.488, ravg: 1329.19, cpu: 7495.13 / 7751.86
#5 : 1309.902, ravg: 1327.26, cpu: 7266.65 / 7703.34
#6 : 1309.535, ravg: 1325.49, cpu: 7843.86 / 7717.39
#7 : 1316.482, ravg: 1324.59, cpu: 7854.41 / 7731.09
#8 : 1366.604, ravg: 1328.79, cpu: 7955.81 / 7753.57
#9 : 1326.286, ravg: 1328.54, cpu: 7466.86 / 7724.90
#10 : 1356.653, ravg: 1331.35, cpu: 7566.60 / 7709.07
# echo $?
0

Signed-off-by: Yang Jihong <[email protected]>
---
tools/perf/builtin-sched.c | 94 +++++++++++++++++++++++++-------------
1 file changed, 61 insertions(+), 33 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 54d79e560617..af76366ec798 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -3167,14 +3167,44 @@ static void perf_sched__merge_lat(struct perf_sched *sched)
}
}

+static int setup_cpus_switch_event(struct perf_sched *sched)
+{
+ unsigned int i;
+
+ sched->cpu_last_switched = calloc(MAX_CPUS, sizeof(*(sched->cpu_last_switched)));
+ if (!sched->cpu_last_switched)
+ return -1;
+
+ sched->curr_pid = malloc(MAX_CPUS * sizeof(*(sched->curr_pid)));
+ if (!sched->curr_pid) {
+ free(sched->cpu_last_switched);
+ return -1;
+ }
+
+ for (i = 0; i < MAX_CPUS; i++)
+ sched->curr_pid[i] = -1;
+
+ return 0;
+}
+
+static void free_cpus_switch_event(struct perf_sched *sched)
+{
+ free(sched->curr_pid);
+ free(sched->cpu_last_switched);
+}
+
static int perf_sched__lat(struct perf_sched *sched)
{
+ int rc = -1;
struct rb_node *next;

setup_pager();

+ if (setup_cpus_switch_event(sched))
+ return rc;
+
if (perf_sched__read_events(sched))
- return -1;
+ goto out_free_cpus_switch_event;

perf_sched__merge_lat(sched);
perf_sched__sort_lat(sched);
@@ -3203,7 +3233,11 @@ static int perf_sched__lat(struct perf_sched *sched)
print_bad_events(sched);
printf("\n");

- return 0;
+ rc = 0;
+
+out_free_cpus_switch_event:
+ free_cpus_switch_event(sched);
+ return rc;
}

static int setup_map_cpus(struct perf_sched *sched)
@@ -3270,9 +3304,12 @@ static int perf_sched__map(struct perf_sched *sched)
if (!sched->curr_thread)
return rc;

- if (setup_map_cpus(sched))
+ if (setup_cpus_switch_event(sched))
goto out_free_curr_thread;

+ if (setup_map_cpus(sched))
+ goto out_free_cpus_switch_event;
+
if (setup_color_pids(sched))
goto out_free_map_cpus;

@@ -3296,6 +3333,9 @@ static int perf_sched__map(struct perf_sched *sched)
free(sched->map.comp_cpus);
perf_cpu_map__put(sched->map.cpus);

+out_free_cpus_switch_event:
+ free_cpus_switch_event(sched);
+
out_free_curr_thread:
free(sched->curr_thread);
return rc;
@@ -3309,6 +3349,10 @@ static int perf_sched__replay(struct perf_sched *sched)
mutex_init(&sched->start_work_mutex);
mutex_init(&sched->work_done_wait_mutex);

+ ret = setup_cpus_switch_event(sched);
+ if (ret)
+ goto out_mutex_destroy;
+
calibrate_run_measurement_overhead(sched);
calibrate_sleep_measurement_overhead(sched);

@@ -3316,7 +3360,7 @@ static int perf_sched__replay(struct perf_sched *sched)

ret = perf_sched__read_events(sched);
if (ret)
- goto out_mutex_destroy;
+ goto out_free_cpus_switch_event;

printf("nr_run_events: %ld\n", sched->nr_run_events);
printf("nr_sleep_events: %ld\n", sched->nr_sleep_events);
@@ -3342,6 +3386,9 @@ static int perf_sched__replay(struct perf_sched *sched)
sched->thread_funcs_exit = true;
destroy_tasks(sched);

+out_free_cpus_switch_event:
+ free_cpus_switch_event(sched);
+
out_mutex_destroy:
mutex_destroy(&sched->start_work_mutex);
mutex_destroy(&sched->work_done_wait_mutex);
@@ -3580,21 +3627,7 @@ int cmd_sched(int argc, const char **argv)
.switch_event = replay_switch_event,
.fork_event = replay_fork_event,
};
- unsigned int i;
- int ret = 0;
-
- sched.cpu_last_switched = calloc(MAX_CPUS, sizeof(*sched.cpu_last_switched));
- if (!sched.cpu_last_switched) {
- ret = -ENOMEM;
- goto out;
- }
- sched.curr_pid = malloc(MAX_CPUS * sizeof(*sched.curr_pid));
- if (!sched.curr_pid) {
- ret = -ENOMEM;
- goto out;
- }
- for (i = 0; i < MAX_CPUS; i++)
- sched.curr_pid[i] = -1;
+ int ret;

argc = parse_options_subcommand(argc, argv, sched_options, sched_subcommands,
sched_usage, PARSE_OPT_STOP_AT_NON_OPTION);
@@ -3605,9 +3638,9 @@ int cmd_sched(int argc, const char **argv)
* Aliased to 'perf script' for now:
*/
if (!strcmp(argv[0], "script")) {
- ret = cmd_script(argc, argv);
+ return cmd_script(argc, argv);
} else if (strlen(argv[0]) > 2 && strstarts("record", argv[0])) {
- ret = __cmd_record(argc, argv);
+ return __cmd_record(argc, argv);
} else if (strlen(argv[0]) > 2 && strstarts("latency", argv[0])) {
sched.tp_handler = &lat_ops;
if (argc > 1) {
@@ -3616,7 +3649,7 @@ int cmd_sched(int argc, const char **argv)
usage_with_options(latency_usage, latency_options);
}
setup_sorting(&sched, latency_options, latency_usage);
- ret = perf_sched__lat(&sched);
+ return perf_sched__lat(&sched);
} else if (!strcmp(argv[0], "map")) {
if (argc) {
argc = parse_options(argc, argv, map_options, map_usage, 0);
@@ -3625,7 +3658,7 @@ int cmd_sched(int argc, const char **argv)
}
sched.tp_handler = &map_ops;
setup_sorting(&sched, latency_options, latency_usage);
- ret = perf_sched__map(&sched);
+ return perf_sched__map(&sched);
} else if (strlen(argv[0]) > 2 && strstarts("replay", argv[0])) {
sched.tp_handler = &replay_ops;
if (argc) {
@@ -3633,7 +3666,7 @@ int cmd_sched(int argc, const char **argv)
if (argc)
usage_with_options(replay_usage, replay_options);
}
- ret = perf_sched__replay(&sched);
+ return perf_sched__replay(&sched);
} else if (!strcmp(argv[0], "timehist")) {
if (argc) {
argc = parse_options(argc, argv, timehist_options,
@@ -3649,21 +3682,16 @@ int cmd_sched(int argc, const char **argv)
parse_options_usage(NULL, timehist_options, "w", true);
if (sched.show_next)
parse_options_usage(NULL, timehist_options, "n", true);
- ret = -EINVAL;
- goto out;
+ return -EINVAL;
}
ret = symbol__validate_sym_arguments();
if (ret)
- goto out;
+ return ret;

- ret = perf_sched__timehist(&sched);
+ return perf_sched__timehist(&sched);
} else {
usage_with_options(sched_usage, sched_options);
}

-out:
- free(sched.curr_pid);
- free(sched.cpu_last_switched);
-
- return ret;
+ return 0;
}
--
2.34.1


2024-02-05 19:22:52

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 4/5] perf sched: Move curr_pid and cpu_last_switched initialization to perf_sched__{lat|map|replay}()

On Mon, Feb 05, 2024 at 10:46:15AM +0000, Yang Jihong wrote:
> +static int setup_cpus_switch_event(struct perf_sched *sched)
> +{
> + unsigned int i;
> +
> + sched->cpu_last_switched = calloc(MAX_CPUS, sizeof(*(sched->cpu_last_switched)));
> + if (!sched->cpu_last_switched)
> + return -1;
> +
> + sched->curr_pid = malloc(MAX_CPUS * sizeof(*(sched->curr_pid)));
> + if (!sched->curr_pid) {
> + free(sched->cpu_last_switched);

zfree(&sched->cpu_last_switched);

> + return -1;
> + }
> +
> + for (i = 0; i < MAX_CPUS; i++)
> + sched->curr_pid[i] = -1;
> +
> + return 0;
> +}
> +
> +static void free_cpus_switch_event(struct perf_sched *sched)
> +{
> + free(sched->curr_pid);
> + free(sched->cpu_last_switched);

zfree(&sched->curr_pid);
zfree(&sched->cpu_last_switched);

> +}
> +
> static int perf_sched__lat(struct perf_sched *sched)
> {
> + int rc = -1;
> struct rb_node *next;
>
> setup_pager();
>
> + if (setup_cpus_switch_event(sched))
> + return rc;
> +
> if (perf_sched__read_events(sched))
> - return -1;
> + goto out_free_cpus_switch_event;
>
> perf_sched__merge_lat(sched);
> perf_sched__sort_lat(sched);
> @@ -3203,7 +3233,11 @@ static int perf_sched__lat(struct perf_sched *sched)
> print_bad_events(sched);
> printf("\n");
>
> - return 0;
> + rc = 0;
> +
> +out_free_cpus_switch_event:
> + free_cpus_switch_event(sched);
> + return rc;
> }
>
> static int setup_map_cpus(struct perf_sched *sched)
> @@ -3270,9 +3304,12 @@ static int perf_sched__map(struct perf_sched *sched)
> if (!sched->curr_thread)
> return rc;
>
> - if (setup_map_cpus(sched))
> + if (setup_cpus_switch_event(sched))
> goto out_free_curr_thread;
>
> + if (setup_map_cpus(sched))
> + goto out_free_cpus_switch_event;
> +
> if (setup_color_pids(sched))
> goto out_free_map_cpus;
>
> @@ -3296,6 +3333,9 @@ static int perf_sched__map(struct perf_sched *sched)
> free(sched->map.comp_cpus);
> perf_cpu_map__put(sched->map.cpus);
>
> +out_free_cpus_switch_event:
> + free_cpus_switch_event(sched);
> +
> out_free_curr_thread:
> free(sched->curr_thread);
> return rc;
> @@ -3309,6 +3349,10 @@ static int perf_sched__replay(struct perf_sched *sched)
> mutex_init(&sched->start_work_mutex);
> mutex_init(&sched->work_done_wait_mutex);
>
> + ret = setup_cpus_switch_event(sched);
> + if (ret)
> + goto out_mutex_destroy;
> +
> calibrate_run_measurement_overhead(sched);
> calibrate_sleep_measurement_overhead(sched);
>
> @@ -3316,7 +3360,7 @@ static int perf_sched__replay(struct perf_sched *sched)
>
> ret = perf_sched__read_events(sched);
> if (ret)
> - goto out_mutex_destroy;
> + goto out_free_cpus_switch_event;
>
> printf("nr_run_events: %ld\n", sched->nr_run_events);
> printf("nr_sleep_events: %ld\n", sched->nr_sleep_events);
> @@ -3342,6 +3386,9 @@ static int perf_sched__replay(struct perf_sched *sched)
> sched->thread_funcs_exit = true;
> destroy_tasks(sched);
>
> +out_free_cpus_switch_event:
> + free_cpus_switch_event(sched);
> +
> out_mutex_destroy:
> mutex_destroy(&sched->start_work_mutex);
> mutex_destroy(&sched->work_done_wait_mutex);
> @@ -3580,21 +3627,7 @@ int cmd_sched(int argc, const char **argv)
> .switch_event = replay_switch_event,
> .fork_event = replay_fork_event,
> };
> - unsigned int i;
> - int ret = 0;
> -
> - sched.cpu_last_switched = calloc(MAX_CPUS, sizeof(*sched.cpu_last_switched));
> - if (!sched.cpu_last_switched) {
> - ret = -ENOMEM;
> - goto out;
> - }
> - sched.curr_pid = malloc(MAX_CPUS * sizeof(*sched.curr_pid));
> - if (!sched.curr_pid) {
> - ret = -ENOMEM;
> - goto out;
> - }
> - for (i = 0; i < MAX_CPUS; i++)
> - sched.curr_pid[i] = -1;
> + int ret;
>
> argc = parse_options_subcommand(argc, argv, sched_options, sched_subcommands,
> sched_usage, PARSE_OPT_STOP_AT_NON_OPTION);
> @@ -3605,9 +3638,9 @@ int cmd_sched(int argc, const char **argv)
> * Aliased to 'perf script' for now:
> */
> if (!strcmp(argv[0], "script")) {
> - ret = cmd_script(argc, argv);
> + return cmd_script(argc, argv);
> } else if (strlen(argv[0]) > 2 && strstarts("record", argv[0])) {
> - ret = __cmd_record(argc, argv);
> + return __cmd_record(argc, argv);
> } else if (strlen(argv[0]) > 2 && strstarts("latency", argv[0])) {
> sched.tp_handler = &lat_ops;
> if (argc > 1) {
> @@ -3616,7 +3649,7 @@ int cmd_sched(int argc, const char **argv)
> usage_with_options(latency_usage, latency_options);
> }
> setup_sorting(&sched, latency_options, latency_usage);
> - ret = perf_sched__lat(&sched);
> + return perf_sched__lat(&sched);
> } else if (!strcmp(argv[0], "map")) {
> if (argc) {
> argc = parse_options(argc, argv, map_options, map_usage, 0);
> @@ -3625,7 +3658,7 @@ int cmd_sched(int argc, const char **argv)
> }
> sched.tp_handler = &map_ops;
> setup_sorting(&sched, latency_options, latency_usage);
> - ret = perf_sched__map(&sched);
> + return perf_sched__map(&sched);
> } else if (strlen(argv[0]) > 2 && strstarts("replay", argv[0])) {
> sched.tp_handler = &replay_ops;
> if (argc) {
> @@ -3633,7 +3666,7 @@ int cmd_sched(int argc, const char **argv)
> if (argc)
> usage_with_options(replay_usage, replay_options);
> }
> - ret = perf_sched__replay(&sched);
> + return perf_sched__replay(&sched);
> } else if (!strcmp(argv[0], "timehist")) {
> if (argc) {
> argc = parse_options(argc, argv, timehist_options,
> @@ -3649,21 +3682,16 @@ int cmd_sched(int argc, const char **argv)
> parse_options_usage(NULL, timehist_options, "w", true);
> if (sched.show_next)
> parse_options_usage(NULL, timehist_options, "n", true);
> - ret = -EINVAL;
> - goto out;
> + return -EINVAL;
> }
> ret = symbol__validate_sym_arguments();
> if (ret)
> - goto out;
> + return ret;
>
> - ret = perf_sched__timehist(&sched);
> + return perf_sched__timehist(&sched);
> } else {
> usage_with_options(sched_usage, sched_options);
> }
>
> -out:
> - free(sched.curr_pid);
> - free(sched.cpu_last_switched);
> -
> - return ret;
> + return 0;
> }
> --
> 2.34.1

2024-02-05 21:12:50

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 5/5] perf thread_map: Free strlist on normal path in thread_map__new_by_tid_str()

On Mon, Feb 05, 2024 at 10:46:16AM +0000, Yang Jihong wrote:
> slist needs to be freed in both error path and normal path in
> thread_map__new_by_tid_str().

Please add:

Fixes: b52956c961be3a04 ("perf tools: Allow multiple threads or processes in record, stat, top")

Reviewed-by: Arnaldo Carvalho de Melo <[email protected]>

- Arnaldo

> Signed-off-by: Yang Jihong <[email protected]>
> ---
> tools/perf/util/thread_map.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c
> index e848579e61a8..ea3b431b9783 100644
> --- a/tools/perf/util/thread_map.c
> +++ b/tools/perf/util/thread_map.c
> @@ -280,13 +280,13 @@ struct perf_thread_map *thread_map__new_by_tid_str(const char *tid_str)
> threads->nr = ntasks;
> }
> out:
> + strlist__delete(slist);
> if (threads)
> refcount_set(&threads->refcnt, 1);
> return threads;
>
> out_free_threads:
> zfree(&threads);
> - strlist__delete(slist);
> goto out;
> }
>
> --
> 2.34.1

2024-02-06 07:10:22

by Yang Jihong

[permalink] [raw]
Subject: Re: [PATCH 5/5] perf thread_map: Free strlist on normal path in thread_map__new_by_tid_str()

Hello,

On 2024/2/6 3:09, Arnaldo Carvalho de Melo wrote:
> On Mon, Feb 05, 2024 at 10:46:16AM +0000, Yang Jihong wrote:
>> slist needs to be freed in both error path and normal path in
>> thread_map__new_by_tid_str().
>
> Please add:
>
> Fixes: b52956c961be3a04 ("perf tools: Allow multiple threads or processes in record, stat, top")
>
OK, will add fixes tag in v2.

> Reviewed-by: Arnaldo Carvalho de Melo <[email protected]>
>
Thanks for the reviewed-by tag

> - Arnaldo

Thanks,
Yang