2020-06-17 08:34:57

by Alexey Budankov

[permalink] [raw]
Subject: [PATCH v8 00/13] perf: support enable and disable commands in stat and record modes


Changes in v8:
- avoided moving of fds at fdarray__filter() call
- skipped counting of fds with zeroed revents at fdarray__filter() call
- converted explicit --ctl-fd[-ack] into --control fd:ctl-fd[,ack-fd option
- updated docs to accommodate --control fd:ctl-fd[,ack-fd] option

v7: https://lore.kernel.org/lkml/[email protected]/

Changes in v7:
- added missing perf-record.txt changes
- adjusted docs wording for --ctl-fd,ctl-fd-ack options
to additionally mention --delay=-1 effect

v6: https://lore.kernel.org/lkml/[email protected]/

Changes in v6:
- split re-factoring of events handling loops for stat mode
into smaller incremental parts
- added parts missing at v5
- corrected v5 runtime issues

v5: https://lore.kernel.org/lkml/[email protected]/

Changes in v5:
- split re-factoring of events handling loops for stat mode
into smaller incremental parts

v4: https://lore.kernel.org/lkml/[email protected]/

Changes in v4:
- made checking of ctlfd state unconditional in record trace streaming loop
- introduced static poll fds to keep evlist__filter_pollfd() unaffected
- handled ret code of evlist__initialize_ctlfd() where need
- renamed and structured handle_events() function
- applied anonymous structs where needed

v3: https://lore.kernel.org/lkml/[email protected]/

Changes in v3:
- renamed functions and types from perf_evlist_ to evlist_ to avoid
clash with libperf code;
- extended commands to be strings of variable length consisting of
command name and also possibly including command specific data;
- merged docs update with the code changes;
- updated docs for -D,--delay=-1 option for stat and record modes;

v2: https://lore.kernel.org/lkml/[email protected]/

Changes in v2:
- renamed resume and pause commands to enable and disable ones, renamed
CTL_CMD_RESUME and CTL_CMD_PAUSE to CTL_CMD_ENABLE and CTL_CMD_DISABLE
to fit to the appropriate ioctls and avoid mixing up with PAUSE_OUTPUT
ioctl;
- factored out event handling loop into a handle_events() for stat mode;
- separated -D,--delay=-1 into separate patches for stat and record modes;

v1: https://lore.kernel.org/lkml/[email protected]/

repo: tip of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git perf/core

The patch set implements handling of 'start disabled', 'enable' and 'disable'
external control commands which can be provided for stat and record modes
of the tool from an external controlling process. 'start disabled' command
can be used to postpone enabling of events in the beginning of a monitoring
session. 'enable' and 'disable' commands can be used to enable and disable
events correspondingly any time after the start of the session.

The 'start disabled', 'enable' and 'disable' external control commands can be
used to focus measurement on specially selected time intervals of workload
execution. Focused measurement reduces tool intrusion and influence on
workload behavior, reduces distortion and amount of collected and stored
data, mitigates data accuracy loss because measurement and data capturing
happen only during intervals of interest.

A controlling process can be a bash shell script [1], native executable or
any other language program that can directly work with file descriptors,
e.g. pipes [2], and spawn a process, specially the tool one.

-D,--delay <val> option is extended with -1 value to skip events enabling
in the beginning of a monitoring session ('start disabled' command).
--control fd:ctl-fd[,ack-fd] command line option is introduced to provide the
tool with a pair of file descriptors to listen to control commands and reply
to the controlling process on the completion of received commands.

The tool reads control command message from ctl-fd descriptor, handles the
command and optionally replies acknowledgement message to ack-fd descriptor,
if it is specified on the command line. 'enable' command is recognized as
'enable' string message and 'disable' command is recognized as 'disable'
string message both received from ctl-fd descriptor. Completion message is
'ack\n' and sent to ack-fd descriptor.

Example bash script demonstrating simple use case follows:

#!/bin/bash

ctl_dir=/tmp/

ctl_fifo=${ctl_dir}perf_ctl.fifo
test -p ${ctl_fifo} && unlink ${ctl_fifo}
mkfifo ${ctl_fifo} && exec {ctl_fd}<>${ctl_fifo}

ctl_ack_fifo=${ctl_dir}perf_ctl_ack.fifo
test -p ${ctl_ack_fifo} && unlink ${ctl_ack_fifo}
mkfifo ${ctl_ack_fifo} && exec {ctl_fd_ack}<>${ctl_ack_fifo}

perf stat -D -1 -e cpu-cycles -a -I 1000 \
--control fd:${ctl_fd},${ctl_fd_ack} \
-- sleep 40 &
perf_pid=$!

sleep 5 && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e1 && echo "enabled(${e1})"
sleep 10 && echo 'disable' >&${ctl_fd} && read -u ${ctl_fd_ack} d1 && echo "disabled(${d1})"
sleep 5 && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e2 && echo "enabled(${e2})"
sleep 10 && echo 'disable' >&${ctl_fd} && read -u ${ctl_fd_ack} d2 && echo "disabled(${d2})"

exec {ctl_fd_ack}>&- && unlink ${ctl_ack_fifo}
exec {ctl_fd}>&- && unlink ${ctl_fifo}

wait -n ${perf_pid}
exit $?


Script output:

[root@host dir] example
Events disabled
# time counts unit events
1.001101062 <not counted> cpu-cycles
2.002994944 <not counted> cpu-cycles
3.004864340 <not counted> cpu-cycles
4.006727177 <not counted> cpu-cycles
Events enabled
enabled(ack)
4.993808464 3,124,246 cpu-cycles
5.008597004 3,325,624 cpu-cycles
6.010387483 83,472,992 cpu-cycles
7.012266598 55,877,621 cpu-cycles
8.014175695 97,892,729 cpu-cycles
9.016056093 68,461,242 cpu-cycles
10.017937507 55,449,643 cpu-cycles
11.019830154 68,938,167 cpu-cycles
12.021719952 55,164,101 cpu-cycles
13.023627550 70,535,720 cpu-cycles
14.025580995 53,240,125 cpu-cycles
disabled(ack)
14.997518260 53,558,068 cpu-cycles
Events disabled
15.027216416 <not counted> cpu-cycles
16.029052729 <not counted> cpu-cycles
17.030904762 <not counted> cpu-cycles
18.032073424 <not counted> cpu-cycles
19.033805074 <not counted> cpu-cycles
Events enabled
enabled(ack)
20.001279097 3,021,022 cpu-cycles
20.035044381 6,434,367 cpu-cycles
21.036923813 89,358,251 cpu-cycles
22.038825169 72,516,351 cpu-cycles
# time counts unit events
23.040715596 55,046,157 cpu-cycles
24.042643757 78,128,649 cpu-cycles
25.044558535 61,052,428 cpu-cycles
26.046452785 62,142,806 cpu-cycles
27.048353021 74,477,971 cpu-cycles
28.050241286 61,001,623 cpu-cycles
29.052149961 61,653,502 cpu-cycles
disabled(ack)
30.004980264 82,729,640 cpu-cycles
Events disabled
30.053516176 <not counted> cpu-cycles
31.055348366 <not counted> cpu-cycles
32.057202097 <not counted> cpu-cycles
33.059040702 <not counted> cpu-cycles
34.060843288 <not counted> cpu-cycles
35.000888624 <not counted> cpu-cycles
[root@host dir]#

[1] http://man7.org/linux/man-pages/man1/bash.1.html
[2] http://man7.org/linux/man-pages/man2/pipe.2.html

---
Alexey Budankov (13):
tools/libperf: avoid moving of fds at fdarray__filter() call
perf evlist: introduce control file descriptors
perf evlist: implement control command handling functions
perf stat: factor out body of event handling loop for system wide
perf stat: move target check to loop control statement
perf stat: factor out body of event handling loop for fork case
perf stat: factor out event handling loop into dispatch_events()
perf stat: extend -D,--delay option with -1 value
perf stat: implement control commands handling
perf stat: introduce --control fd:ctl-fd[,ack-fd] options
perf record: extend -D,--delay option with -1 value
perf record: implement control commands handling
perf record: introduce --control fd:ctl-fd[,ack-fd] options

tools/lib/api/fd/array.c | 11 +-
tools/perf/Documentation/perf-record.txt | 44 +++++-
tools/perf/Documentation/perf-stat.txt | 44 +++++-
tools/perf/builtin-record.c | 65 ++++++++-
tools/perf/builtin-stat.c | 176 ++++++++++++++++++-----
tools/perf/builtin-trace.c | 2 +-
tools/perf/tests/fdarray.c | 20 +--
tools/perf/util/evlist.c | 136 ++++++++++++++++++
tools/perf/util/evlist.h | 25 ++++
tools/perf/util/record.h | 4 +-
tools/perf/util/stat.h | 4 +-
11 files changed, 458 insertions(+), 73 deletions(-)

--
2.24.1


2020-06-17 08:37:20

by Alexey Budankov

[permalink] [raw]
Subject: [PATCH v8 01/13] tools/libperf: avoid moving of fds at fdarray__filter() call


Skip fds with zeroed revents field from count and avoid fds moving
at fdarray__filter() call so fds indices returned by fdarray__add()
call stay the same and can be used for direct access and processing
of fd revents status field at entries array of struct fdarray object.

Signed-off-by: Alexey Budankov <[email protected]>
---
tools/lib/api/fd/array.c | 11 +++++------
tools/perf/tests/fdarray.c | 20 ++------------------
2 files changed, 7 insertions(+), 24 deletions(-)

diff --git a/tools/lib/api/fd/array.c b/tools/lib/api/fd/array.c
index 58d44d5eee31..97843a837370 100644
--- a/tools/lib/api/fd/array.c
+++ b/tools/lib/api/fd/array.c
@@ -93,22 +93,21 @@ int fdarray__filter(struct fdarray *fda, short revents,
return 0;

for (fd = 0; fd < fda->nr; ++fd) {
+ if (!fda->entries[fd].revents)
+ continue;
+
if (fda->entries[fd].revents & revents) {
if (entry_destructor)
entry_destructor(fda, fd, arg);

+ fda->entries[fd].revents = fda->entries[fd].events = 0;
continue;
}

- if (fd != nr) {
- fda->entries[nr] = fda->entries[fd];
- fda->priv[nr] = fda->priv[fd];
- }
-
++nr;
}

- return fda->nr = nr;
+ return nr;
}

int fdarray__poll(struct fdarray *fda, int timeout)
diff --git a/tools/perf/tests/fdarray.c b/tools/perf/tests/fdarray.c
index c7c81c4a5b2b..d0c8a05aab2f 100644
--- a/tools/perf/tests/fdarray.c
+++ b/tools/perf/tests/fdarray.c
@@ -12,6 +12,7 @@ static void fdarray__init_revents(struct fdarray *fda, short revents)

for (fd = 0; fd < fda->nr; ++fd) {
fda->entries[fd].fd = fda->nr - fd;
+ fda->entries[fd].events = revents;
fda->entries[fd].revents = revents;
}
}
@@ -29,7 +30,7 @@ static int fdarray__fprintf_prefix(struct fdarray *fda, const char *prefix, FILE

int test__fdarray__filter(struct test *test __maybe_unused, int subtest __maybe_unused)
{
- int nr_fds, expected_fd[2], fd, err = TEST_FAIL;
+ int nr_fds, err = TEST_FAIL;
struct fdarray *fda = fdarray__new(5, 5);

if (fda == NULL) {
@@ -55,7 +56,6 @@ int test__fdarray__filter(struct test *test __maybe_unused, int subtest __maybe_

fdarray__init_revents(fda, POLLHUP);
fda->entries[2].revents = POLLIN;
- expected_fd[0] = fda->entries[2].fd;

pr_debug("\nfiltering all but fda->entries[2]:");
fdarray__fprintf_prefix(fda, "before", stderr);
@@ -66,17 +66,9 @@ int test__fdarray__filter(struct test *test __maybe_unused, int subtest __maybe_
goto out_delete;
}

- if (fda->entries[0].fd != expected_fd[0]) {
- pr_debug("\nfda->entries[0].fd=%d != %d\n",
- fda->entries[0].fd, expected_fd[0]);
- goto out_delete;
- }
-
fdarray__init_revents(fda, POLLHUP);
fda->entries[0].revents = POLLIN;
- expected_fd[0] = fda->entries[0].fd;
fda->entries[3].revents = POLLIN;
- expected_fd[1] = fda->entries[3].fd;

pr_debug("\nfiltering all but (fda->entries[0], fda->entries[3]):");
fdarray__fprintf_prefix(fda, "before", stderr);
@@ -88,14 +80,6 @@ int test__fdarray__filter(struct test *test __maybe_unused, int subtest __maybe_
goto out_delete;
}

- for (fd = 0; fd < 2; ++fd) {
- if (fda->entries[fd].fd != expected_fd[fd]) {
- pr_debug("\nfda->entries[%d].fd=%d != %d\n", fd,
- fda->entries[fd].fd, expected_fd[fd]);
- goto out_delete;
- }
- }
-
pr_debug("\n");

err = 0;
--
2.24.1

2020-06-17 08:38:05

by Alexey Budankov

[permalink] [raw]
Subject: [PATCH v8 02/13] perf evlist: introduce control file descriptors


Define and initialize control file descriptors.

Signed-off-by: Alexey Budankov <[email protected]>
---
tools/perf/util/evlist.c | 3 +++
tools/perf/util/evlist.h | 5 +++++
2 files changed, 8 insertions(+)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 173b4f0e0e6e..47541b5cab46 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -63,6 +63,9 @@ void evlist__init(struct evlist *evlist, struct perf_cpu_map *cpus,
perf_evlist__set_maps(&evlist->core, cpus, threads);
evlist->workload.pid = -1;
evlist->bkw_mmap_state = BKW_MMAP_NOTREADY;
+ evlist->ctl_fd.fd = -1;
+ evlist->ctl_fd.ack = -1;
+ evlist->ctl_fd.pos = -1;
}

struct evlist *evlist__new(void)
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index b6f325dfb4d2..0d8b361f1c8e 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -74,6 +74,11 @@ struct evlist {
pthread_t th;
volatile int done;
} thread;
+ struct {
+ int fd;
+ int ack;
+ int pos;
+ } ctl_fd;
};

struct evsel_str_handler {
--
2.24.1


2020-06-17 08:38:57

by Alexey Budankov

[permalink] [raw]
Subject: [PATCH v8 03/13] perf evlist: implement control command handling functions


Implement functions of initialization, finalization and processing
of control command messages coming from control file descriptors.
Allocate control file descriptor as descriptor at struct pollfd
object of evsel_list for atomic poll() operation.

Signed-off-by: Alexey Budankov <[email protected]>
---
tools/perf/util/evlist.c | 133 +++++++++++++++++++++++++++++++++++++++
tools/perf/util/evlist.h | 17 +++++
2 files changed, 150 insertions(+)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 47541b5cab46..11280ff559cd 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1718,3 +1718,136 @@ struct evsel *perf_evlist__reset_weak_group(struct evlist *evsel_list,
}
return leader;
}
+
+int evlist__initialize_ctlfd(struct evlist *evlist, int fd, int ack)
+{
+ if (fd == -1) {
+ pr_debug("Control descriptor is not initialized\n");
+ return 0;
+ }
+
+ evlist->ctl_fd.pos = perf_evlist__add_pollfd(&evlist->core, fd, NULL, POLLIN);
+ if (evlist->ctl_fd.pos < 0) {
+ evlist->ctl_fd.pos = -1;
+ pr_err("Failed to add ctl fd entry: %m\n");
+ return -1;
+ }
+
+ evlist->ctl_fd.fd = fd;
+ evlist->ctl_fd.ack = ack;
+
+ return 0;
+}
+
+int evlist__finalize_ctlfd(struct evlist *evlist)
+{
+ struct pollfd *entries = evlist->core.pollfd.entries;
+
+ if (evlist->ctl_fd.pos == -1)
+ return 0;
+
+ entries[evlist->ctl_fd.pos].fd = -1;
+ entries[evlist->ctl_fd.pos].events = 0;
+ entries[evlist->ctl_fd.pos].revents = 0;
+
+ evlist->ctl_fd.pos = -1;
+ evlist->ctl_fd.ack = -1;
+ evlist->ctl_fd.fd = -1;
+
+ return 0;
+}
+
+static int evlist__ctlfd_recv(struct evlist *evlist, enum evlist_ctl_cmd *cmd,
+ char *cmd_data, size_t data_size)
+{
+ int err;
+ char c;
+ size_t bytes_read = 0;
+
+ memset(cmd_data, 0, data_size--);
+
+ do {
+ err = read(evlist->ctl_fd.fd, &c, 1);
+ if (err > 0) {
+ if (c == '\n' || c == '\0')
+ break;
+ cmd_data[bytes_read++] = c;
+ if (bytes_read == data_size)
+ break;
+ } else {
+ if (err == -1)
+ pr_err("Failed to read from ctlfd %d: %m\n", evlist->ctl_fd.fd);
+ break;
+ }
+ } while (1);
+
+ pr_debug("Message from ctl_fd: \"%s%s\"\n", cmd_data,
+ bytes_read == data_size ? "" : c == '\n' ? "\\n" : "\\0");
+
+ if (err > 0) {
+ if (!strncmp(cmd_data, EVLIST_CTL_CMD_ENABLE_TAG,
+ strlen(EVLIST_CTL_CMD_ENABLE_TAG))) {
+ *cmd = EVLIST_CTL_CMD_ENABLE;
+ } else if (!strncmp(cmd_data, EVLIST_CTL_CMD_DISABLE_TAG,
+ strlen(EVLIST_CTL_CMD_DISABLE_TAG))) {
+ *cmd = EVLIST_CTL_CMD_DISABLE;
+ }
+ }
+
+ return err;
+}
+
+static int evlist__ctlfd_ack(struct evlist *evlist)
+{
+ int err;
+
+ if (evlist->ctl_fd.ack == -1)
+ return 0;
+
+ err = write(evlist->ctl_fd.ack, EVLIST_CTL_CMD_ACK_TAG,
+ sizeof(EVLIST_CTL_CMD_ACK_TAG));
+ if (err == -1)
+ pr_err("failed to write to ctl_ack_fd %d: %m\n", evlist->ctl_fd.ack);
+
+ return err;
+}
+
+int evlist__ctlfd_process(struct evlist *evlist, enum evlist_ctl_cmd *cmd)
+{
+ int err = 0;
+ char cmd_data[EVLIST_CTL_CMD_MAX_LEN];
+ int ctlfd_pos = evlist->ctl_fd.pos;
+ struct pollfd *entries = evlist->core.pollfd.entries;
+
+ if (ctlfd_pos == -1 || !entries[ctlfd_pos].revents)
+ return 0;
+
+ if (entries[ctlfd_pos].revents & POLLIN) {
+ err = evlist__ctlfd_recv(evlist, cmd, cmd_data,
+ EVLIST_CTL_CMD_MAX_LEN);
+ if (err > 0) {
+ switch (*cmd) {
+ case EVLIST_CTL_CMD_ENABLE:
+ evlist__enable(evlist);
+ break;
+ case EVLIST_CTL_CMD_DISABLE:
+ evlist__disable(evlist);
+ break;
+ case EVLIST_CTL_CMD_ACK:
+ case EVLIST_CTL_CMD_UNSUPPORTED:
+ default:
+ pr_debug("ctlfd: unsupported %d\n", *cmd);
+ break;
+ }
+ if (!(*cmd == EVLIST_CTL_CMD_ACK || *cmd == EVLIST_CTL_CMD_UNSUPPORTED))
+ evlist__ctlfd_ack(evlist);
+ }
+ }
+
+ if (entries[ctlfd_pos].revents & (POLLHUP | POLLERR))
+ evlist__finalize_ctlfd(evlist);
+ else
+ entries[ctlfd_pos].revents = 0;
+
+ return err;
+}
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 0d8b361f1c8e..bccf0a970371 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -360,4 +360,21 @@ void perf_evlist__force_leader(struct evlist *evlist);
struct evsel *perf_evlist__reset_weak_group(struct evlist *evlist,
struct evsel *evsel,
bool close);
+#define EVLIST_CTL_CMD_ENABLE_TAG "enable"
+#define EVLIST_CTL_CMD_DISABLE_TAG "disable"
+#define EVLIST_CTL_CMD_ACK_TAG "ack\n"
+
+#define EVLIST_CTL_CMD_MAX_LEN 64
+
+enum evlist_ctl_cmd {
+ EVLIST_CTL_CMD_UNSUPPORTED = 0,
+ EVLIST_CTL_CMD_ENABLE,
+ EVLIST_CTL_CMD_DISABLE,
+ EVLIST_CTL_CMD_ACK
+};
+
+int evlist__initialize_ctlfd(struct evlist *evlist, int ctl_fd, int ctl_fd_ack);
+int evlist__finalize_ctlfd(struct evlist *evlist);
+int evlist__ctlfd_process(struct evlist *evlist, enum evlist_ctl_cmd *cmd);
+
#endif /* __PERF_EVLIST_H */
--
2.24.1


2020-06-17 08:39:48

by Alexey Budankov

[permalink] [raw]
Subject: [PATCH v8 04/13] perf stat: factor out body of event handling loop for system wide


Introduce process_timeout() and process_interval() functions that
factor out body of event handling loop for attach and system wide
monitoring use cases.

Signed-off-by: Alexey Budankov <[email protected]>
---
tools/perf/builtin-stat.c | 28 ++++++++++++++++++++--------
1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 9be020e0098a..31f7ccf9537b 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -475,6 +475,23 @@ static void process_interval(void)
print_counters(&rs, 0, NULL);
}

+static bool print_interval(unsigned int interval, int *times)
+{
+ if (interval) {
+ process_interval();
+ if (interval_count && !(--(*times)))
+ return true;
+ }
+ return false;
+}
+
+static bool process_timeout(int timeout, unsigned int interval, int *times)
+{
+ if (timeout)
+ return true;
+ return print_interval(interval, times);
+}
+
static void enable_counters(void)
{
if (stat_config.initial_delay)
@@ -611,6 +628,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
struct affinity affinity;
int i, cpu;
bool second_pass = false;
+ bool stop = false;

if (interval) {
ts.tv_sec = interval / USEC_PER_MSEC;
@@ -805,17 +823,11 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
psignal(WTERMSIG(status), argv[0]);
} else {
enable_counters();
- while (!done) {
+ while (!done && !stop) {
nanosleep(&ts, NULL);
if (!is_target_alive(&target, evsel_list->core.threads))
break;
- if (timeout)
- break;
- if (interval) {
- process_interval();
- if (interval_count && !(--times))
- break;
- }
+ stop = process_timeout(timeout, interval, &times);
}
}

--
2.24.1


2020-06-17 08:41:28

by Alexey Budankov

[permalink] [raw]
Subject: [PATCH v8 06/13] perf stat: factor out body of event handling loop for fork case


Factor out body of event handling loop for fork case reusing
process_timeout() and process_interval() functions.

Signed-off-by: Alexey Budankov <[email protected]>
---
tools/perf/builtin-stat.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 62bad2df13ba..3bc538576607 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -798,13 +798,9 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
enable_counters();

if (interval || timeout) {
- while (!waitpid(child_pid, &status, WNOHANG)) {
+ while (!stop && !waitpid(child_pid, &status, WNOHANG)) {
nanosleep(&ts, NULL);
- if (timeout)
- break;
- process_interval();
- if (interval_count && !(--times))
- break;
+ stop = process_timeout(timeout, interval, &times);
}
}
if (child_pid != -1) {
--
2.24.1


2020-06-17 08:42:18

by Alexey Budankov

[permalink] [raw]
Subject: [PATCH v8 07/13] perf stat: factor out event handling loop into dispatch_events()


Consolidate event dispatching loops for fork, attach and system
wide monitoring use cases into common dispatch_events() function.

Signed-off-by: Alexey Budankov <[email protected]>
---
tools/perf/builtin-stat.c | 35 ++++++++++++++++++++++++-----------
1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 3bc538576607..39749c290508 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -557,6 +557,27 @@ static bool is_target_alive(struct target *_target,
return false;
}

+static int dispatch_events(bool forks, int timeout, int interval, int *times, struct timespec *ts)
+{
+ bool stop = false;
+ int child = 0, status = 0;
+
+ while (1) {
+ if (forks)
+ child = waitpid(child_pid, &status, WNOHANG);
+ else
+ child = !is_target_alive(&target, evsel_list->core.threads) ? 1 : 0;
+
+ if (done || stop || child)
+ break;
+
+ nanosleep(ts, NULL);
+ stop = process_timeout(timeout, interval, times);
+ }
+
+ return status;
+}
+
enum counter_recovery {
COUNTER_SKIP,
COUNTER_RETRY,
@@ -628,7 +649,6 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
struct affinity affinity;
int i, cpu;
bool second_pass = false;
- bool stop = false;

if (interval) {
ts.tv_sec = interval / USEC_PER_MSEC;
@@ -797,12 +817,8 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
perf_evlist__start_workload(evsel_list);
enable_counters();

- if (interval || timeout) {
- while (!stop && !waitpid(child_pid, &status, WNOHANG)) {
- nanosleep(&ts, NULL);
- stop = process_timeout(timeout, interval, &times);
- }
- }
+ if (interval || timeout)
+ status = dispatch_events(forks, timeout, interval, &times, &ts);
if (child_pid != -1) {
if (timeout)
kill(child_pid, SIGTERM);
@@ -819,10 +835,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
psignal(WTERMSIG(status), argv[0]);
} else {
enable_counters();
- while (!done && !stop && is_target_alive(&target, evsel_list->core.threads)) {
- nanosleep(&ts, NULL);
- stop = process_timeout(timeout, interval, &times);
- }
+ dispatch_events(forks, timeout, interval, &times, &ts);
}

disable_counters();
--
2.24.1


2020-06-17 08:42:25

by Alexey Budankov

[permalink] [raw]
Subject: [PATCH v8 05/13] perf stat: move target check to loop control statement


Check for target existence in loop control statement jointly with 'stop'
indicator based on command line values and external asynchronous 'done'
signal.

Signed-off-by: Alexey Budankov <[email protected]>
---
tools/perf/builtin-stat.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 31f7ccf9537b..62bad2df13ba 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -823,10 +823,8 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
psignal(WTERMSIG(status), argv[0]);
} else {
enable_counters();
- while (!done && !stop) {
+ while (!done && !stop && is_target_alive(&target, evsel_list->core.threads)) {
nanosleep(&ts, NULL);
- if (!is_target_alive(&target, evsel_list->core.threads))
- break;
stop = process_timeout(timeout, interval, &times);
}
}
--
2.24.1


2020-06-17 08:43:05

by Alexey Budankov

[permalink] [raw]
Subject: [PATCH v8 08/13] perf stat: extend -D,--delay option with -1 value


Extend -D,--delay option with -1 value to start monitoring with
events disabled to be enabled later by enable command provided
via control file descriptor.

Signed-off-by: Alexey Budankov <[email protected]>
---
tools/perf/Documentation/perf-stat.txt | 5 +++--
tools/perf/builtin-stat.c | 18 ++++++++++++++----
tools/perf/util/evlist.h | 3 +++
tools/perf/util/stat.h | 2 +-
4 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index b029ee728a0b..9f32f6cd558d 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -238,8 +238,9 @@ mode, use --per-node in addition to -a. (system-wide).

-D msecs::
--delay msecs::
-After starting the program, wait msecs before measuring. This is useful to
-filter out the startup phase of the program, which is often very different.
+After starting the program, wait msecs before measuring (-1: start with events
+disabled). This is useful to filter out the startup phase of the program,
+which is often very different.

-T::
--transaction::
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 39749c290508..f88d5ee55022 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -494,16 +494,26 @@ static bool process_timeout(int timeout, unsigned int interval, int *times)

static void enable_counters(void)
{
- if (stat_config.initial_delay)
+ if (stat_config.initial_delay < 0) {
+ pr_info(EVLIST_DISABLED_MSG);
+ return;
+ }
+
+ if (stat_config.initial_delay > 0) {
+ pr_info(EVLIST_DISABLED_MSG);
usleep(stat_config.initial_delay * USEC_PER_MSEC);
+ }

/*
* We need to enable counters only if:
* - we don't have tracee (attaching to task or cpu)
* - we have initial delay configured
*/
- if (!target__none(&target) || stat_config.initial_delay)
+ if (!target__none(&target) || stat_config.initial_delay) {
evlist__enable(evsel_list);
+ if (stat_config.initial_delay > 0)
+ pr_info(EVLIST_ENABLED_MSG);
+ }
}

static void disable_counters(void)
@@ -1060,8 +1070,8 @@ static struct option stat_options[] = {
"aggregate counts per thread", AGGR_THREAD),
OPT_SET_UINT(0, "per-node", &stat_config.aggr_mode,
"aggregate counts per numa node", AGGR_NODE),
- OPT_UINTEGER('D', "delay", &stat_config.initial_delay,
- "ms to wait before starting measurement after program start"),
+ OPT_INTEGER('D', "delay", &stat_config.initial_delay,
+ "ms to wait before starting measurement after program start (-1: start with events disabled)"),
OPT_CALLBACK_NOOPT(0, "metric-only", &stat_config.metric_only, NULL,
"Only print computed metrics. No raw values", enable_metric_only),
OPT_BOOLEAN(0, "metric-no-group", &stat_config.metric_no_group,
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index bccf0a970371..7c3726a685f5 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -377,4 +377,7 @@ int evlist__initialize_ctlfd(struct evlist *evlist, int ctl_fd, int ctl_fd_ack);
int evlist__finalize_ctlfd(struct evlist *evlist);
int evlist__ctlfd_process(struct evlist *evlist, enum evlist_ctl_cmd *cmd);

+#define EVLIST_ENABLED_MSG "Events enabled\n"
+#define EVLIST_DISABLED_MSG "Events disabled\n"
+
#endif /* __PERF_EVLIST_H */
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index f75ae679eb28..626421ef35c2 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -116,7 +116,7 @@ struct perf_stat_config {
FILE *output;
unsigned int interval;
unsigned int timeout;
- unsigned int initial_delay;
+ int initial_delay;
unsigned int unit_width;
unsigned int metric_only_len;
int times;
--
2.24.1


2020-06-17 08:45:09

by Alexey Budankov

[permalink] [raw]
Subject: [PATCH v8 10/13] perf stat: introduce --control fd:ctl-fd[,ack-fd] options


Introduce --control fd:ctl-fd[,ack-fd] options to pass open file
descriptors numbers from command line. Extend perf-stat.txt file
with --control fd:ctl-fd[,ack-fd] options description. Document
possible usage model introduced by --control fd:ctl-fd[,ack-fd]
options by providing example bash shell script.

Signed-off-by: Alexey Budankov <[email protected]>
---
tools/perf/Documentation/perf-stat.txt | 39 ++++++++++++++++++++++++++
tools/perf/builtin-stat.c | 38 +++++++++++++++++++++++++
tools/perf/util/stat.h | 2 ++
3 files changed, 79 insertions(+)

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 9f32f6cd558d..c9bfefc051fb 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -176,6 +176,45 @@ with it. --append may be used here. Examples:
3>results perf stat --log-fd 3 -- $cmd
3>>results perf stat --log-fd 3 --append -- $cmd

+--control fd:ctl-fd[,ack-fd]
+Listen on ctl-fd descriptor for command to control measurement ('enable': enable events,
+'disable': disable events). Measurements can be started with events disabled using
+--delay=-1 option. Optionally send control command completion ('ack\n') to ack-fd descriptor
+to synchronize with the controlling process. Example of bash shell script to enable and
+disable events during measurements:
+
+#!/bin/bash
+
+ctl_dir=/tmp/
+
+ctl_fifo=${ctl_dir}perf_ctl.fifo
+test -p ${ctl_fifo} && unlink ${ctl_fifo}
+mkfifo ${ctl_fifo}
+exec {ctl_fd}<>${ctl_fifo}
+
+ctl_ack_fifo=${ctl_dir}perf_ctl_ack.fifo
+test -p ${ctl_ack_fifo} && unlink ${ctl_ack_fifo}
+mkfifo ${ctl_ack_fifo}
+exec {ctl_fd_ack}<>${ctl_ack_fifo}
+
+perf stat -D -1 -e cpu-cycles -a -I 1000 \
+ --control fd:${ctl_fd},${ctl_fd_ack} \
+ -- sleep 30 &
+perf_pid=$!
+
+sleep 5 && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e1 && echo "enabled(${e1})"
+sleep 10 && echo 'disable' >&${ctl_fd} && read -u ${ctl_fd_ack} d1 && echo "disabled(${d1})"
+
+exec {ctl_fd_ack}>&-
+unlink ${ctl_ack_fifo}
+
+exec {ctl_fd}>&-
+unlink ${ctl_fifo}
+
+wait -n ${perf_pid}
+exit $?
+
+
--pre::
--post::
Pre and post measurement hooks, e.g.:
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index cc56d71a3ed5..4cb29b40b860 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -188,6 +188,8 @@ static struct perf_stat_config stat_config = {
.metric_only_len = METRIC_ONLY_LEN,
.walltime_nsecs_stats = &walltime_nsecs_stats,
.big_num = true,
+ .ctl_fd = -1,
+ .ctl_fd_ack = -1
};

static bool cpus_map_matched(struct evsel *a, struct evsel *b)
@@ -1032,6 +1034,33 @@ static int parse_metric_groups(const struct option *opt,
&stat_config.metric_events);
}

+static int parse_control_option(const struct option *opt,
+ const char *str,
+ int unset __maybe_unused)
+{
+ char *comma = NULL, *endptr = NULL;
+ struct perf_stat_config *config = (struct perf_stat_config *)opt->value;
+
+ if (strncmp(str, "fd:", 3))
+ return -EINVAL;
+
+ config->ctl_fd = strtoul(&str[3], &endptr, 0);
+ if (endptr == &str[3])
+ return -EINVAL;
+
+ comma = strchr(str, ',');
+ if (comma) {
+ if (endptr != comma)
+ return -EINVAL;
+
+ config->ctl_fd_ack = strtoul(comma + 1, &endptr, 0);
+ if (endptr == comma + 1 || *endptr != '\0')
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
static struct option stat_options[] = {
OPT_BOOLEAN('T', "transaction", &transaction_run,
"hardware transaction statistics"),
@@ -1133,6 +1162,10 @@ static struct option stat_options[] = {
"libpfm4 event selector. use 'perf list' to list available events",
parse_libpfm_events_option),
#endif
+ OPT_CALLBACK(0, "control", &stat_config, "fd:ctl-fd[,ack-fd]",
+ "Listen on ctl-fd descriptor for command to control measurement ('enable': enable events, 'disable': disable events).\n"
+ "\t\t\t Optionally send control command completion ('ack\\n') to ack-fd descriptor.",
+ parse_control_option),
OPT_END()
};

@@ -2304,6 +2337,9 @@ int cmd_stat(int argc, const char **argv)
signal(SIGALRM, skip_signal);
signal(SIGABRT, skip_signal);

+ if (evlist__initialize_ctlfd(evsel_list, stat_config.ctl_fd, stat_config.ctl_fd_ack))
+ goto out;
+
status = 0;
for (run_idx = 0; forever || run_idx < stat_config.run_count; run_idx++) {
if (stat_config.run_count != 1 && verbose > 0)
@@ -2323,6 +2359,8 @@ int cmd_stat(int argc, const char **argv)
if (!forever && status != -1 && (!interval || stat_config.summary))
print_counters(NULL, argc, argv);

+ evlist__finalize_ctlfd(evsel_list);
+
if (STAT_RECORD) {
/*
* We synthesize the kernel mmap record just so that older tools
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 626421ef35c2..06f0baabe775 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -133,6 +133,8 @@ struct perf_stat_config {
struct perf_cpu_map *cpus_aggr_map;
u64 *walltime_run;
struct rblist metric_events;
+ int ctl_fd;
+ int ctl_fd_ack;
};

void perf_stat__set_big_num(int set);
--
2.24.1


2020-06-17 08:46:03

by Alexey Budankov

[permalink] [raw]
Subject: [PATCH v8 09/13] perf stat: implement control commands handling


Implement handling of 'enable' and 'disable' control commands
coming from control file descriptor. process_evlist() function
checks for events on control fds and makes required operations.
If poll event splits initiated timeout interval then the reminder
is calculated and still waited in the following poll() syscall.

Signed-off-by: Alexey Budankov <[email protected]>
---
tools/perf/builtin-stat.c | 67 +++++++++++++++++++++++++++++----------
1 file changed, 50 insertions(+), 17 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index f88d5ee55022..cc56d71a3ed5 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -492,6 +492,31 @@ static bool process_timeout(int timeout, unsigned int interval, int *times)
return print_interval(interval, times);
}

+static bool process_evlist(struct evlist *evlist, unsigned int interval, int *times)
+{
+ bool stop = false;
+ enum evlist_ctl_cmd cmd = EVLIST_CTL_CMD_UNSUPPORTED;
+
+ if (evlist__ctlfd_process(evlist, &cmd) > 0) {
+ switch (cmd) {
+ case EVLIST_CTL_CMD_ENABLE:
+ pr_info(EVLIST_ENABLED_MSG);
+ stop = print_interval(interval, times);
+ break;
+ case EVLIST_CTL_CMD_DISABLE:
+ stop = print_interval(interval, times);
+ pr_info(EVLIST_DISABLED_MSG);
+ break;
+ case EVLIST_CTL_CMD_ACK:
+ case EVLIST_CTL_CMD_UNSUPPORTED:
+ default:
+ break;
+ }
+ }
+
+ return stop;
+}
+
static void enable_counters(void)
{
if (stat_config.initial_delay < 0) {
@@ -567,10 +592,21 @@ static bool is_target_alive(struct target *_target,
return false;
}

-static int dispatch_events(bool forks, int timeout, int interval, int *times, struct timespec *ts)
+static int dispatch_events(bool forks, int timeout, int interval, int *times)
{
bool stop = false;
int child = 0, status = 0;
+ int time_to_sleep, sleep_time;
+ struct timespec time_start, time_stop, time_diff;
+
+ if (interval)
+ sleep_time = interval;
+ else if (timeout)
+ sleep_time = timeout;
+ else
+ sleep_time = 1000;
+
+ time_to_sleep = sleep_time;

while (1) {
if (forks)
@@ -581,8 +617,17 @@ static int dispatch_events(bool forks, int timeout, int interval, int *times, st
if (done || stop || child)
break;

- nanosleep(ts, NULL);
- stop = process_timeout(timeout, interval, times);
+ clock_gettime(CLOCK_MONOTONIC, &time_start);
+ if (!(evlist__poll(evsel_list, time_to_sleep) > 0)) { /* poll timeout or EINTR */
+ stop = process_timeout(timeout, interval, times);
+ time_to_sleep = sleep_time;
+ } else { /* fd revent */
+ stop = process_evlist(evsel_list, interval, times);
+ clock_gettime(CLOCK_MONOTONIC, &time_stop);
+ diff_timespec(&time_diff, &time_stop, &time_start);
+ time_to_sleep -= time_diff.tv_sec * MSEC_PER_SEC +
+ time_diff.tv_nsec / NSEC_PER_MSEC;
+ }
}

return status;
@@ -651,7 +696,6 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
char msg[BUFSIZ];
unsigned long long t0, t1;
struct evsel *counter;
- struct timespec ts;
size_t l;
int status = 0;
const bool forks = (argc > 0);
@@ -660,17 +704,6 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
int i, cpu;
bool second_pass = false;

- if (interval) {
- ts.tv_sec = interval / USEC_PER_MSEC;
- ts.tv_nsec = (interval % USEC_PER_MSEC) * NSEC_PER_MSEC;
- } else if (timeout) {
- ts.tv_sec = timeout / USEC_PER_MSEC;
- ts.tv_nsec = (timeout % USEC_PER_MSEC) * NSEC_PER_MSEC;
- } else {
- ts.tv_sec = 1;
- ts.tv_nsec = 0;
- }
-
if (forks) {
if (perf_evlist__prepare_workload(evsel_list, &target, argv, is_pipe,
workload_exec_failed_signal) < 0) {
@@ -828,7 +861,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
enable_counters();

if (interval || timeout)
- status = dispatch_events(forks, timeout, interval, &times, &ts);
+ status = dispatch_events(forks, timeout, interval, &times);
if (child_pid != -1) {
if (timeout)
kill(child_pid, SIGTERM);
@@ -845,7 +878,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
psignal(WTERMSIG(status), argv[0]);
} else {
enable_counters();
- dispatch_events(forks, timeout, interval, &times, &ts);
+ dispatch_events(forks, timeout, interval, &times);
}

disable_counters();
--
2.24.1


2020-06-17 08:47:07

by Alexey Budankov

[permalink] [raw]
Subject: [PATCH v8 13/13] perf record: introduce --control fd:ctl-fd[,ack-fd] options


Introduce --control fd:ctl-fd[,ack-fd] options to pass open file
descriptors numbers from command line. Extend perf-record.txt file
with --control fd:ctl-fd[,ack-fd] options description. Document
possible usage model introduced by --control fd:ctl-fd[,ack-fd]
options by providing example bash shell script.

Signed-off-by: Alexey Budankov <[email protected]>
---
tools/perf/Documentation/perf-record.txt | 39 ++++++++++++++++++++++++
tools/perf/builtin-record.c | 37 ++++++++++++++++++++++
tools/perf/util/record.h | 2 ++
3 files changed, 78 insertions(+)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index a84376605805..3f72d8e261f3 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -627,6 +627,45 @@ option. The -e option and this one can be mixed and matched. Events
can be grouped using the {} notation.
endif::HAVE_LIBPFM[]

+--control fd:ctl-fd[,ack-fd]
+Listen on ctl-fd descriptor for command to control measurement ('enable': enable events,
+'disable': disable events). Measurements can be started with events disabled using
+--delay=-1 option. Optionally send control command completion ('ack\n') to ack-fd descriptor
+to synchronize with the controlling process. Example of bash shell script to enable and
+disable events during measurements:
+
+#!/bin/bash
+
+ctl_dir=/tmp/
+
+ctl_fifo=${ctl_dir}perf_ctl.fifo
+test -p ${ctl_fifo} && unlink ${ctl_fifo}
+mkfifo ${ctl_fifo}
+exec {ctl_fd}<>${ctl_fifo}
+
+ctl_ack_fifo=${ctl_dir}perf_ctl_ack.fifo
+test -p ${ctl_ack_fifo} && unlink ${ctl_ack_fifo}
+mkfifo ${ctl_ack_fifo}
+exec {ctl_fd_ack}<>${ctl_ack_fifo}
+
+perf record -D -1 -e cpu-cycles -a \
+ --control fd:${ctl_fd},${ctl_fd_ack} \
+ -- sleep 30 &
+perf_pid=$!
+
+sleep 5 && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e1 && echo "enabled(${e1})"
+sleep 10 && echo 'disable' >&${ctl_fd} && read -u ${ctl_fd_ack} d1 && echo "disabled(${d1})"
+
+exec {ctl_fd_ack}>&-
+unlink ${ctl_ack_fifo}
+
+exec {ctl_fd}>&-
+unlink ${ctl_fifo}
+
+wait -n ${perf_pid}
+exit $?
+
+
SEE ALSO
--------
linkperf:perf-stat[1], linkperf:perf-list[1], linkperf:perf-intel-pt[1]
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 0394e068dde8..5492c7dd3ac5 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1749,6 +1749,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
perf_evlist__start_workload(rec->evlist);
}

+ if (evlist__initialize_ctlfd(rec->evlist, opts->ctl_fd, opts->ctl_fd_ack))
+ goto out_child;
+
if (opts->initial_delay) {
pr_info(EVLIST_DISABLED_MSG);
if (opts->initial_delay > 0) {
@@ -1895,6 +1898,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
record__synthesize_workload(rec, true);

out_child:
+ evlist__finalize_ctlfd(rec->evlist);
record__mmap_read_all(rec, true);
record__aio_mmap_read_sync(rec);

@@ -2244,6 +2248,33 @@ static int record__parse_mmap_pages(const struct option *opt,
return ret;
}

+static int parse_control_option(const struct option *opt,
+ const char *str,
+ int unset __maybe_unused)
+{
+ char *comma = NULL, *endptr = NULL;
+ struct record_opts *config = (struct record_opts *)opt->value;
+
+ if (strncmp(str, "fd:", 3))
+ return -EINVAL;
+
+ config->ctl_fd = strtoul(&str[3], &endptr, 0);
+ if (endptr == &str[3])
+ return -EINVAL;
+
+ comma = strchr(str, ',');
+ if (comma) {
+ if (endptr != comma)
+ return -EINVAL;
+
+ config->ctl_fd_ack = strtoul(comma + 1, &endptr, 0);
+ if (endptr == comma + 1 || *endptr != '\0')
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
static void switch_output_size_warn(struct record *rec)
{
u64 wakeup_size = evlist__mmap_size(rec->opts.mmap_pages);
@@ -2380,6 +2411,8 @@ static struct record record = {
},
.mmap_flush = MMAP_FLUSH_DEFAULT,
.nr_threads_synthesize = 1,
+ .ctl_fd = -1,
+ .ctl_fd_ack = -1,
},
.tool = {
.sample = process_sample_event,
@@ -2581,6 +2614,10 @@ static struct option __record_options[] = {
"libpfm4 event selector. use 'perf list' to list available events",
parse_libpfm_events_option),
#endif
+ OPT_CALLBACK(0, "control", &record.opts, "fd:ctl-fd[,ack-fd]",
+ "Listen on ctl-fd descriptor for command to control measurement ('enable': enable events, 'disable': disable events).\n"
+ "\t\t\t Optionally send control command completion ('ack\\n') to ack-fd descriptor.",
+ parse_control_option),
OPT_END()
};

diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h
index da138dcb4d34..4cb72a478af1 100644
--- a/tools/perf/util/record.h
+++ b/tools/perf/util/record.h
@@ -70,6 +70,8 @@ struct record_opts {
int mmap_flush;
unsigned int comp_level;
unsigned int nr_threads_synthesize;
+ int ctl_fd;
+ int ctl_fd_ack;
};

extern const char * const *record_usage;
--
2.24.1


2020-06-17 08:47:26

by Alexey Budankov

[permalink] [raw]
Subject: [PATCH v8 11/13] perf record: extend -D,--delay option with -1 value


Extend -D,--delay option with -1 to start collection with events
disabled to be enabled later by 'enable' command provided via
control file descriptor.

Signed-off-by: Alexey Budankov <[email protected]>
---
tools/perf/Documentation/perf-record.txt | 5 +++--
tools/perf/builtin-record.c | 12 ++++++++----
tools/perf/builtin-trace.c | 2 +-
tools/perf/util/record.h | 2 +-
4 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index fa8a5fcd27ab..a84376605805 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -407,8 +407,9 @@ if combined with -a or -C options.

-D::
--delay=::
-After starting the program, wait msecs before measuring. This is useful to
-filter out the startup phase of the program, which is often very different.
+After starting the program, wait msecs before measuring (-1: start with events
+disabled). This is useful to filter out the startup phase of the program, which
+is often very different.

-I::
--intr-regs::
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index e108d90ae2ed..d0b29a1070a0 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1749,8 +1749,12 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
}

if (opts->initial_delay) {
- usleep(opts->initial_delay * USEC_PER_MSEC);
- evlist__enable(rec->evlist);
+ pr_info(EVLIST_DISABLED_MSG);
+ if (opts->initial_delay > 0) {
+ usleep(opts->initial_delay * USEC_PER_MSEC);
+ evlist__enable(rec->evlist);
+ pr_info(EVLIST_ENABLED_MSG);
+ }
}

trigger_ready(&auxtrace_snapshot_trigger);
@@ -2462,8 +2466,8 @@ static struct option __record_options[] = {
OPT_CALLBACK('G', "cgroup", &record.evlist, "name",
"monitor event in cgroup name only",
parse_cgroups),
- OPT_UINTEGER('D', "delay", &record.opts.initial_delay,
- "ms to wait before starting measurement after program start"),
+ OPT_INTEGER('D', "delay", &record.opts.initial_delay,
+ "ms to wait before starting measurement after program start (-1: start with events disabled)"),
OPT_BOOLEAN(0, "kcore", &record.opts.kcore, "copy /proc/kcore"),
OPT_STRING('u', "uid", &record.opts.target.uid_str, "user",
"user to profile"),
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 4cbb64edc998..290149b1b3b5 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -4813,7 +4813,7 @@ int cmd_trace(int argc, const char **argv)
"per thread proc mmap processing timeout in ms"),
OPT_CALLBACK('G', "cgroup", &trace, "name", "monitor event in cgroup name only",
trace__parse_cgroups),
- OPT_UINTEGER('D', "delay", &trace.opts.initial_delay,
+ OPT_INTEGER('D', "delay", &trace.opts.initial_delay,
"ms to wait before starting measurement after program "
"start"),
OPTS_EVSWITCH(&trace.evswitch),
diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h
index 39d1de4b2a36..da138dcb4d34 100644
--- a/tools/perf/util/record.h
+++ b/tools/perf/util/record.h
@@ -61,7 +61,7 @@ struct record_opts {
const char *auxtrace_snapshot_opts;
const char *auxtrace_sample_opts;
bool sample_transaction;
- unsigned initial_delay;
+ int initial_delay;
bool use_clockid;
clockid_t clockid;
u64 clockid_res_ns;
--
2.24.1


2020-06-17 08:48:11

by Alexey Budankov

[permalink] [raw]
Subject: [PATCH v8 12/13] perf record: implement control commands handling


Implement handling of 'enable' and 'disable' control commands
coming from control file descriptor.

Signed-off-by: Alexey Budankov <[email protected]>
---
tools/perf/builtin-record.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index d0b29a1070a0..0394e068dde8 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1527,6 +1527,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
bool disabled = false, draining = false;
int fd;
float ratio = 0;
+ enum evlist_ctl_cmd cmd = EVLIST_CTL_CMD_UNSUPPORTED;

atexit(record__sig_exit);
signal(SIGCHLD, sig_handler);
@@ -1830,6 +1831,21 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
alarm(rec->switch_output.time);
}

+ if (evlist__ctlfd_process(rec->evlist, &cmd) > 0) {
+ switch (cmd) {
+ case EVLIST_CTL_CMD_ENABLE:
+ pr_info(EVLIST_ENABLED_MSG);
+ break;
+ case EVLIST_CTL_CMD_DISABLE:
+ pr_info(EVLIST_DISABLED_MSG);
+ break;
+ case EVLIST_CTL_CMD_ACK:
+ case EVLIST_CTL_CMD_UNSUPPORTED:
+ default:
+ break;
+ }
+ }
+
if (hits == rec->samples) {
if (done || draining)
break;
--
2.24.1


2020-06-22 03:43:32

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v8 00/13] perf: support enable and disable commands in stat and record modes


Hi,

On 17.06.2020 11:30, Alexey Budankov wrote:
>
> Changes in v8:
> - avoided moving of fds at fdarray__filter() call
> - skipped counting of fds with zeroed revents at fdarray__filter() call
> - converted explicit --ctl-fd[-ack] into --control fd:ctl-fd[,ack-fd option
> - updated docs to accommodate --control fd:ctl-fd[,ack-fd] option

Are there any questions or thoughts so far?

Thanks,
Alexey

>
> v7: https://lore.kernel.org/lkml/[email protected]/
>
> Changes in v7:
> - added missing perf-record.txt changes
> - adjusted docs wording for --ctl-fd,ctl-fd-ack options
> to additionally mention --delay=-1 effect
>
> v6: https://lore.kernel.org/lkml/[email protected]/
>
> Changes in v6:
> - split re-factoring of events handling loops for stat mode
> into smaller incremental parts
> - added parts missing at v5
> - corrected v5 runtime issues
>
> v5: https://lore.kernel.org/lkml/[email protected]/
>
> Changes in v5:
> - split re-factoring of events handling loops for stat mode
> into smaller incremental parts
>
> v4: https://lore.kernel.org/lkml/[email protected]/
>
> Changes in v4:
> - made checking of ctlfd state unconditional in record trace streaming loop
> - introduced static poll fds to keep evlist__filter_pollfd() unaffected
> - handled ret code of evlist__initialize_ctlfd() where need
> - renamed and structured handle_events() function
> - applied anonymous structs where needed
>
> v3: https://lore.kernel.org/lkml/[email protected]/
>
> Changes in v3:
> - renamed functions and types from perf_evlist_ to evlist_ to avoid
> clash with libperf code;
> - extended commands to be strings of variable length consisting of
> command name and also possibly including command specific data;
> - merged docs update with the code changes;
> - updated docs for -D,--delay=-1 option for stat and record modes;
>
> v2: https://lore.kernel.org/lkml/[email protected]/
>
> Changes in v2:
> - renamed resume and pause commands to enable and disable ones, renamed
> CTL_CMD_RESUME and CTL_CMD_PAUSE to CTL_CMD_ENABLE and CTL_CMD_DISABLE
> to fit to the appropriate ioctls and avoid mixing up with PAUSE_OUTPUT
> ioctl;
> - factored out event handling loop into a handle_events() for stat mode;
> - separated -D,--delay=-1 into separate patches for stat and record modes;
>
> v1: https://lore.kernel.org/lkml/[email protected]/
>
> repo: tip of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git perf/core
>
> The patch set implements handling of 'start disabled', 'enable' and 'disable'
> external control commands which can be provided for stat and record modes
> of the tool from an external controlling process. 'start disabled' command
> can be used to postpone enabling of events in the beginning of a monitoring
> session. 'enable' and 'disable' commands can be used to enable and disable
> events correspondingly any time after the start of the session.
>
> The 'start disabled', 'enable' and 'disable' external control commands can be
> used to focus measurement on specially selected time intervals of workload
> execution. Focused measurement reduces tool intrusion and influence on
> workload behavior, reduces distortion and amount of collected and stored
> data, mitigates data accuracy loss because measurement and data capturing
> happen only during intervals of interest.
>
> A controlling process can be a bash shell script [1], native executable or
> any other language program that can directly work with file descriptors,
> e.g. pipes [2], and spawn a process, specially the tool one.
>
> -D,--delay <val> option is extended with -1 value to skip events enabling
> in the beginning of a monitoring session ('start disabled' command).
> --control fd:ctl-fd[,ack-fd] command line option is introduced to provide the
> tool with a pair of file descriptors to listen to control commands and reply
> to the controlling process on the completion of received commands.
>
> The tool reads control command message from ctl-fd descriptor, handles the
> command and optionally replies acknowledgement message to ack-fd descriptor,
> if it is specified on the command line. 'enable' command is recognized as
> 'enable' string message and 'disable' command is recognized as 'disable'
> string message both received from ctl-fd descriptor. Completion message is
> 'ack\n' and sent to ack-fd descriptor.
>
> Example bash script demonstrating simple use case follows:
>
> #!/bin/bash
>
> ctl_dir=/tmp/
>
> ctl_fifo=${ctl_dir}perf_ctl.fifo
> test -p ${ctl_fifo} && unlink ${ctl_fifo}
> mkfifo ${ctl_fifo} && exec {ctl_fd}<>${ctl_fifo}
>
> ctl_ack_fifo=${ctl_dir}perf_ctl_ack.fifo
> test -p ${ctl_ack_fifo} && unlink ${ctl_ack_fifo}
> mkfifo ${ctl_ack_fifo} && exec {ctl_fd_ack}<>${ctl_ack_fifo}
>
> perf stat -D -1 -e cpu-cycles -a -I 1000 \
> --control fd:${ctl_fd},${ctl_fd_ack} \
> -- sleep 40 &
> perf_pid=$!
>
> sleep 5 && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e1 && echo "enabled(${e1})"
> sleep 10 && echo 'disable' >&${ctl_fd} && read -u ${ctl_fd_ack} d1 && echo "disabled(${d1})"
> sleep 5 && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e2 && echo "enabled(${e2})"
> sleep 10 && echo 'disable' >&${ctl_fd} && read -u ${ctl_fd_ack} d2 && echo "disabled(${d2})"
>
> exec {ctl_fd_ack}>&- && unlink ${ctl_ack_fifo}
> exec {ctl_fd}>&- && unlink ${ctl_fifo}
>
> wait -n ${perf_pid}
> exit $?
>
>
> Script output:
>
> [root@host dir] example
> Events disabled
> # time counts unit events
> 1.001101062 <not counted> cpu-cycles
> 2.002994944 <not counted> cpu-cycles
> 3.004864340 <not counted> cpu-cycles
> 4.006727177 <not counted> cpu-cycles
> Events enabled
> enabled(ack)
> 4.993808464 3,124,246 cpu-cycles
> 5.008597004 3,325,624 cpu-cycles
> 6.010387483 83,472,992 cpu-cycles
> 7.012266598 55,877,621 cpu-cycles
> 8.014175695 97,892,729 cpu-cycles
> 9.016056093 68,461,242 cpu-cycles
> 10.017937507 55,449,643 cpu-cycles
> 11.019830154 68,938,167 cpu-cycles
> 12.021719952 55,164,101 cpu-cycles
> 13.023627550 70,535,720 cpu-cycles
> 14.025580995 53,240,125 cpu-cycles
> disabled(ack)
> 14.997518260 53,558,068 cpu-cycles
> Events disabled
> 15.027216416 <not counted> cpu-cycles
> 16.029052729 <not counted> cpu-cycles
> 17.030904762 <not counted> cpu-cycles
> 18.032073424 <not counted> cpu-cycles
> 19.033805074 <not counted> cpu-cycles
> Events enabled
> enabled(ack)
> 20.001279097 3,021,022 cpu-cycles
> 20.035044381 6,434,367 cpu-cycles
> 21.036923813 89,358,251 cpu-cycles
> 22.038825169 72,516,351 cpu-cycles
> # time counts unit events
> 23.040715596 55,046,157 cpu-cycles
> 24.042643757 78,128,649 cpu-cycles
> 25.044558535 61,052,428 cpu-cycles
> 26.046452785 62,142,806 cpu-cycles
> 27.048353021 74,477,971 cpu-cycles
> 28.050241286 61,001,623 cpu-cycles
> 29.052149961 61,653,502 cpu-cycles
> disabled(ack)
> 30.004980264 82,729,640 cpu-cycles
> Events disabled
> 30.053516176 <not counted> cpu-cycles
> 31.055348366 <not counted> cpu-cycles
> 32.057202097 <not counted> cpu-cycles
> 33.059040702 <not counted> cpu-cycles
> 34.060843288 <not counted> cpu-cycles
> 35.000888624 <not counted> cpu-cycles
> [root@host dir]#
>
> [1] http://man7.org/linux/man-pages/man1/bash.1.html
> [2] http://man7.org/linux/man-pages/man2/pipe.2.html
>
> ---
> Alexey Budankov (13):
> tools/libperf: avoid moving of fds at fdarray__filter() call
> perf evlist: introduce control file descriptors
> perf evlist: implement control command handling functions
> perf stat: factor out body of event handling loop for system wide
> perf stat: move target check to loop control statement
> perf stat: factor out body of event handling loop for fork case
> perf stat: factor out event handling loop into dispatch_events()
> perf stat: extend -D,--delay option with -1 value
> perf stat: implement control commands handling
> perf stat: introduce --control fd:ctl-fd[,ack-fd] options
> perf record: extend -D,--delay option with -1 value
> perf record: implement control commands handling
> perf record: introduce --control fd:ctl-fd[,ack-fd] options
>
> tools/lib/api/fd/array.c | 11 +-
> tools/perf/Documentation/perf-record.txt | 44 +++++-
> tools/perf/Documentation/perf-stat.txt | 44 +++++-
> tools/perf/builtin-record.c | 65 ++++++++-
> tools/perf/builtin-stat.c | 176 ++++++++++++++++++-----
> tools/perf/builtin-trace.c | 2 +-
> tools/perf/tests/fdarray.c | 20 +--
> tools/perf/util/evlist.c | 136 ++++++++++++++++++
> tools/perf/util/evlist.h | 25 ++++
> tools/perf/util/record.h | 4 +-
> tools/perf/util/stat.h | 4 +-
> 11 files changed, 458 insertions(+), 73 deletions(-)
>

2020-06-22 09:00:31

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v8 00/13] perf: support enable and disable commands in stat and record modes

On Mon, Jun 22, 2020 at 06:38:34AM +0300, Alexey Budankov wrote:
>
> Hi,
>
> On 17.06.2020 11:30, Alexey Budankov wrote:
> >
> > Changes in v8:
> > - avoided moving of fds at fdarray__filter() call
> > - skipped counting of fds with zeroed revents at fdarray__filter() call
> > - converted explicit --ctl-fd[-ack] into --control fd:ctl-fd[,ack-fd option
> > - updated docs to accommodate --control fd:ctl-fd[,ack-fd] option
>
> Are there any questions or thoughts so far?

hi,
I'm waiting on some comment for:
https://lore.kernel.org/lkml/20200615165802.GD2088119@krava/

jirka

>
> Thanks,
> Alexey
>
> >
> > v7: https://lore.kernel.org/lkml/[email protected]/
> >
> > Changes in v7:
> > - added missing perf-record.txt changes
> > - adjusted docs wording for --ctl-fd,ctl-fd-ack options
> > to additionally mention --delay=-1 effect
> >
> > v6: https://lore.kernel.org/lkml/[email protected]/
> >
> > Changes in v6:
> > - split re-factoring of events handling loops for stat mode
> > into smaller incremental parts
> > - added parts missing at v5
> > - corrected v5 runtime issues
> >
> > v5: https://lore.kernel.org/lkml/[email protected]/
> >
> > Changes in v5:
> > - split re-factoring of events handling loops for stat mode
> > into smaller incremental parts
> >
> > v4: https://lore.kernel.org/lkml/[email protected]/
> >
> > Changes in v4:
> > - made checking of ctlfd state unconditional in record trace streaming loop
> > - introduced static poll fds to keep evlist__filter_pollfd() unaffected
> > - handled ret code of evlist__initialize_ctlfd() where need
> > - renamed and structured handle_events() function
> > - applied anonymous structs where needed
> >
> > v3: https://lore.kernel.org/lkml/[email protected]/
> >
> > Changes in v3:
> > - renamed functions and types from perf_evlist_ to evlist_ to avoid
> > clash with libperf code;
> > - extended commands to be strings of variable length consisting of
> > command name and also possibly including command specific data;
> > - merged docs update with the code changes;
> > - updated docs for -D,--delay=-1 option for stat and record modes;
> >
> > v2: https://lore.kernel.org/lkml/[email protected]/
> >
> > Changes in v2:
> > - renamed resume and pause commands to enable and disable ones, renamed
> > CTL_CMD_RESUME and CTL_CMD_PAUSE to CTL_CMD_ENABLE and CTL_CMD_DISABLE
> > to fit to the appropriate ioctls and avoid mixing up with PAUSE_OUTPUT
> > ioctl;
> > - factored out event handling loop into a handle_events() for stat mode;
> > - separated -D,--delay=-1 into separate patches for stat and record modes;
> >
> > v1: https://lore.kernel.org/lkml/[email protected]/
> >
> > repo: tip of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git perf/core
> >
> > The patch set implements handling of 'start disabled', 'enable' and 'disable'
> > external control commands which can be provided for stat and record modes
> > of the tool from an external controlling process. 'start disabled' command
> > can be used to postpone enabling of events in the beginning of a monitoring
> > session. 'enable' and 'disable' commands can be used to enable and disable
> > events correspondingly any time after the start of the session.
> >
> > The 'start disabled', 'enable' and 'disable' external control commands can be
> > used to focus measurement on specially selected time intervals of workload
> > execution. Focused measurement reduces tool intrusion and influence on
> > workload behavior, reduces distortion and amount of collected and stored
> > data, mitigates data accuracy loss because measurement and data capturing
> > happen only during intervals of interest.
> >
> > A controlling process can be a bash shell script [1], native executable or
> > any other language program that can directly work with file descriptors,
> > e.g. pipes [2], and spawn a process, specially the tool one.
> >
> > -D,--delay <val> option is extended with -1 value to skip events enabling
> > in the beginning of a monitoring session ('start disabled' command).
> > --control fd:ctl-fd[,ack-fd] command line option is introduced to provide the
> > tool with a pair of file descriptors to listen to control commands and reply
> > to the controlling process on the completion of received commands.
> >
> > The tool reads control command message from ctl-fd descriptor, handles the
> > command and optionally replies acknowledgement message to ack-fd descriptor,
> > if it is specified on the command line. 'enable' command is recognized as
> > 'enable' string message and 'disable' command is recognized as 'disable'
> > string message both received from ctl-fd descriptor. Completion message is
> > 'ack\n' and sent to ack-fd descriptor.
> >
> > Example bash script demonstrating simple use case follows:
> >
> > #!/bin/bash
> >
> > ctl_dir=/tmp/
> >
> > ctl_fifo=${ctl_dir}perf_ctl.fifo
> > test -p ${ctl_fifo} && unlink ${ctl_fifo}
> > mkfifo ${ctl_fifo} && exec {ctl_fd}<>${ctl_fifo}
> >
> > ctl_ack_fifo=${ctl_dir}perf_ctl_ack.fifo
> > test -p ${ctl_ack_fifo} && unlink ${ctl_ack_fifo}
> > mkfifo ${ctl_ack_fifo} && exec {ctl_fd_ack}<>${ctl_ack_fifo}
> >
> > perf stat -D -1 -e cpu-cycles -a -I 1000 \
> > --control fd:${ctl_fd},${ctl_fd_ack} \
> > -- sleep 40 &
> > perf_pid=$!
> >
> > sleep 5 && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e1 && echo "enabled(${e1})"
> > sleep 10 && echo 'disable' >&${ctl_fd} && read -u ${ctl_fd_ack} d1 && echo "disabled(${d1})"
> > sleep 5 && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e2 && echo "enabled(${e2})"
> > sleep 10 && echo 'disable' >&${ctl_fd} && read -u ${ctl_fd_ack} d2 && echo "disabled(${d2})"
> >
> > exec {ctl_fd_ack}>&- && unlink ${ctl_ack_fifo}
> > exec {ctl_fd}>&- && unlink ${ctl_fifo}
> >
> > wait -n ${perf_pid}
> > exit $?
> >
> >
> > Script output:
> >
> > [root@host dir] example
> > Events disabled
> > # time counts unit events
> > 1.001101062 <not counted> cpu-cycles
> > 2.002994944 <not counted> cpu-cycles
> > 3.004864340 <not counted> cpu-cycles
> > 4.006727177 <not counted> cpu-cycles
> > Events enabled
> > enabled(ack)
> > 4.993808464 3,124,246 cpu-cycles
> > 5.008597004 3,325,624 cpu-cycles
> > 6.010387483 83,472,992 cpu-cycles
> > 7.012266598 55,877,621 cpu-cycles
> > 8.014175695 97,892,729 cpu-cycles
> > 9.016056093 68,461,242 cpu-cycles
> > 10.017937507 55,449,643 cpu-cycles
> > 11.019830154 68,938,167 cpu-cycles
> > 12.021719952 55,164,101 cpu-cycles
> > 13.023627550 70,535,720 cpu-cycles
> > 14.025580995 53,240,125 cpu-cycles
> > disabled(ack)
> > 14.997518260 53,558,068 cpu-cycles
> > Events disabled
> > 15.027216416 <not counted> cpu-cycles
> > 16.029052729 <not counted> cpu-cycles
> > 17.030904762 <not counted> cpu-cycles
> > 18.032073424 <not counted> cpu-cycles
> > 19.033805074 <not counted> cpu-cycles
> > Events enabled
> > enabled(ack)
> > 20.001279097 3,021,022 cpu-cycles
> > 20.035044381 6,434,367 cpu-cycles
> > 21.036923813 89,358,251 cpu-cycles
> > 22.038825169 72,516,351 cpu-cycles
> > # time counts unit events
> > 23.040715596 55,046,157 cpu-cycles
> > 24.042643757 78,128,649 cpu-cycles
> > 25.044558535 61,052,428 cpu-cycles
> > 26.046452785 62,142,806 cpu-cycles
> > 27.048353021 74,477,971 cpu-cycles
> > 28.050241286 61,001,623 cpu-cycles
> > 29.052149961 61,653,502 cpu-cycles
> > disabled(ack)
> > 30.004980264 82,729,640 cpu-cycles
> > Events disabled
> > 30.053516176 <not counted> cpu-cycles
> > 31.055348366 <not counted> cpu-cycles
> > 32.057202097 <not counted> cpu-cycles
> > 33.059040702 <not counted> cpu-cycles
> > 34.060843288 <not counted> cpu-cycles
> > 35.000888624 <not counted> cpu-cycles
> > [root@host dir]#
> >
> > [1] http://man7.org/linux/man-pages/man1/bash.1.html
> > [2] http://man7.org/linux/man-pages/man2/pipe.2.html
> >
> > ---
> > Alexey Budankov (13):
> > tools/libperf: avoid moving of fds at fdarray__filter() call
> > perf evlist: introduce control file descriptors
> > perf evlist: implement control command handling functions
> > perf stat: factor out body of event handling loop for system wide
> > perf stat: move target check to loop control statement
> > perf stat: factor out body of event handling loop for fork case
> > perf stat: factor out event handling loop into dispatch_events()
> > perf stat: extend -D,--delay option with -1 value
> > perf stat: implement control commands handling
> > perf stat: introduce --control fd:ctl-fd[,ack-fd] options
> > perf record: extend -D,--delay option with -1 value
> > perf record: implement control commands handling
> > perf record: introduce --control fd:ctl-fd[,ack-fd] options
> >
> > tools/lib/api/fd/array.c | 11 +-
> > tools/perf/Documentation/perf-record.txt | 44 +++++-
> > tools/perf/Documentation/perf-stat.txt | 44 +++++-
> > tools/perf/builtin-record.c | 65 ++++++++-
> > tools/perf/builtin-stat.c | 176 ++++++++++++++++++-----
> > tools/perf/builtin-trace.c | 2 +-
> > tools/perf/tests/fdarray.c | 20 +--
> > tools/perf/util/evlist.c | 136 ++++++++++++++++++
> > tools/perf/util/evlist.h | 25 ++++
> > tools/perf/util/record.h | 4 +-
> > tools/perf/util/stat.h | 4 +-
> > 11 files changed, 458 insertions(+), 73 deletions(-)
> >
>

2020-06-23 14:58:39

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v8 09/13] perf stat: implement control commands handling

On Wed, Jun 17, 2020 at 11:41:30AM +0300, Alexey Budankov wrote:
>
> Implement handling of 'enable' and 'disable' control commands
> coming from control file descriptor. process_evlist() function
> checks for events on control fds and makes required operations.
> If poll event splits initiated timeout interval then the reminder
> is calculated and still waited in the following poll() syscall.
>
> Signed-off-by: Alexey Budankov <[email protected]>
> ---
> tools/perf/builtin-stat.c | 67 +++++++++++++++++++++++++++++----------
> 1 file changed, 50 insertions(+), 17 deletions(-)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index f88d5ee55022..cc56d71a3ed5 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -492,6 +492,31 @@ static bool process_timeout(int timeout, unsigned int interval, int *times)
> return print_interval(interval, times);
> }
>
> +static bool process_evlist(struct evlist *evlist, unsigned int interval, int *times)
> +{
> + bool stop = false;
> + enum evlist_ctl_cmd cmd = EVLIST_CTL_CMD_UNSUPPORTED;
> +
> + if (evlist__ctlfd_process(evlist, &cmd) > 0) {
> + switch (cmd) {
> + case EVLIST_CTL_CMD_ENABLE:
> + pr_info(EVLIST_ENABLED_MSG);
> + stop = print_interval(interval, times);

why is interval printed in here?

> + break;
> + case EVLIST_CTL_CMD_DISABLE:
> + stop = print_interval(interval, times);

and here?

it should be called from the main loop when the interval time is elapsed no?

jirka

2020-06-23 14:58:47

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v8 03/13] perf evlist: implement control command handling functions

On Wed, Jun 17, 2020 at 11:36:46AM +0300, Alexey Budankov wrote:

SNIP

> + memset(cmd_data, 0, data_size--);
> +
> + do {
> + err = read(evlist->ctl_fd.fd, &c, 1);
> + if (err > 0) {
> + if (c == '\n' || c == '\0')
> + break;
> + cmd_data[bytes_read++] = c;
> + if (bytes_read == data_size)
> + break;
> + } else {
> + if (err == -1)
> + pr_err("Failed to read from ctlfd %d: %m\n", evlist->ctl_fd.fd);
> + break;
> + }
> + } while (1);
> +
> + pr_debug("Message from ctl_fd: \"%s%s\"\n", cmd_data,
> + bytes_read == data_size ? "" : c == '\n' ? "\\n" : "\\0");
> +
> + if (err > 0) {
> + if (!strncmp(cmd_data, EVLIST_CTL_CMD_ENABLE_TAG,
> + strlen(EVLIST_CTL_CMD_ENABLE_TAG))) {

you could use sizeof(EVLIST_CTL_CMD_ENABLE_TAG) instead, no function call

> + *cmd = EVLIST_CTL_CMD_ENABLE;
> + } else if (!strncmp(cmd_data, EVLIST_CTL_CMD_DISABLE_TAG,
> + strlen(EVLIST_CTL_CMD_DISABLE_TAG))) {

ditto

jirka

2020-06-23 14:58:54

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v8 03/13] perf evlist: implement control command handling functions

On Wed, Jun 17, 2020 at 11:36:46AM +0300, Alexey Budankov wrote:

SNIP

> +static int evlist__ctlfd_recv(struct evlist *evlist, enum evlist_ctl_cmd *cmd,
> + char *cmd_data, size_t data_size)
> +{
> + int err;
> + char c;
> + size_t bytes_read = 0;
> +
> + memset(cmd_data, 0, data_size--);
> +
> + do {
> + err = read(evlist->ctl_fd.fd, &c, 1);
> + if (err > 0) {
> + if (c == '\n' || c == '\0')
> + break;
> + cmd_data[bytes_read++] = c;
> + if (bytes_read == data_size)
> + break;
> + } else {
> + if (err == -1)
> + pr_err("Failed to read from ctlfd %d: %m\n", evlist->ctl_fd.fd);
> + break;
> + }
> + } while (1);
> +
> + pr_debug("Message from ctl_fd: \"%s%s\"\n", cmd_data,
> + bytes_read == data_size ? "" : c == '\n' ? "\\n" : "\\0");

do you want to display the message above only if (err > 0) ?

> +
> + if (err > 0) {
> + if (!strncmp(cmd_data, EVLIST_CTL_CMD_ENABLE_TAG,
> + strlen(EVLIST_CTL_CMD_ENABLE_TAG))) {
> + *cmd = EVLIST_CTL_CMD_ENABLE;
> + } else if (!strncmp(cmd_data, EVLIST_CTL_CMD_DISABLE_TAG,
> + strlen(EVLIST_CTL_CMD_DISABLE_TAG))) {
> + *cmd = EVLIST_CTL_CMD_DISABLE;
> + }
> + }
> +
> + return err;
> +}

SNIP

> + int ctlfd_pos = evlist->ctl_fd.pos;
> + struct pollfd *entries = evlist->core.pollfd.entries;
> +
> + if (ctlfd_pos == -1 || !entries[ctlfd_pos].revents)
> + return 0;
> +
> + if (entries[ctlfd_pos].revents & POLLIN) {
> + err = evlist__ctlfd_recv(evlist, cmd, cmd_data,
> + EVLIST_CTL_CMD_MAX_LEN);
> + if (err > 0) {
> + switch (*cmd) {
> + case EVLIST_CTL_CMD_ENABLE:
> + evlist__enable(evlist);
> + break;
> + case EVLIST_CTL_CMD_DISABLE:
> + evlist__disable(evlist);
> + break;
> + case EVLIST_CTL_CMD_ACK:
> + case EVLIST_CTL_CMD_UNSUPPORTED:
> + default:
> + pr_debug("ctlfd: unsupported %d\n", *cmd);
> + break;
> + }
> + if (!(*cmd == EVLIST_CTL_CMD_ACK || *cmd == EVLIST_CTL_CMD_UNSUPPORTED))
> + evlist__ctlfd_ack(evlist);
> + }
> + }
> +
> + if (entries[ctlfd_pos].revents & (POLLHUP | POLLERR))
> + evlist__finalize_ctlfd(evlist);

should this error be propagated via err?

jirka

2020-06-23 14:59:03

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v8 07/13] perf stat: factor out event handling loop into dispatch_events()

On Wed, Jun 17, 2020 at 11:40:03AM +0300, Alexey Budankov wrote:
>
> Consolidate event dispatching loops for fork, attach and system
> wide monitoring use cases into common dispatch_events() function.
>
> Signed-off-by: Alexey Budankov <[email protected]>
> ---
> tools/perf/builtin-stat.c | 35 ++++++++++++++++++++++++-----------
> 1 file changed, 24 insertions(+), 11 deletions(-)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 3bc538576607..39749c290508 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -557,6 +557,27 @@ static bool is_target_alive(struct target *_target,
> return false;
> }
>
> +static int dispatch_events(bool forks, int timeout, int interval, int *times, struct timespec *ts)
> +{
> + bool stop = false;
> + int child = 0, status = 0;
> +
> + while (1) {
> + if (forks)
> + child = waitpid(child_pid, &status, WNOHANG);
> + else
> + child = !is_target_alive(&target, evsel_list->core.threads) ? 1 : 0;

please renme child to something more accurate, so the condition
below makes more sense, like child_stoped or such

jirka

> +
> + if (done || stop || child)
> + break;
> +
> + nanosleep(ts, NULL);
> + stop = process_timeout(timeout, interval, times);
> + }
> +
> + return status;
> +}
> +

SNIP

2020-06-23 14:59:04

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v8 12/13] perf record: implement control commands handling

On Wed, Jun 17, 2020 at 11:43:58AM +0300, Alexey Budankov wrote:
>
> Implement handling of 'enable' and 'disable' control commands
> coming from control file descriptor.
>
> Signed-off-by: Alexey Budankov <[email protected]>
> ---
> tools/perf/builtin-record.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index d0b29a1070a0..0394e068dde8 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -1527,6 +1527,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
> bool disabled = false, draining = false;
> int fd;
> float ratio = 0;
> + enum evlist_ctl_cmd cmd = EVLIST_CTL_CMD_UNSUPPORTED;
>
> atexit(record__sig_exit);
> signal(SIGCHLD, sig_handler);
> @@ -1830,6 +1831,21 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
> alarm(rec->switch_output.time);
> }
>
> + if (evlist__ctlfd_process(rec->evlist, &cmd) > 0) {
> + switch (cmd) {
> + case EVLIST_CTL_CMD_ENABLE:
> + pr_info(EVLIST_ENABLED_MSG);
> + break;
> + case EVLIST_CTL_CMD_DISABLE:
> + pr_info(EVLIST_DISABLED_MSG);
> + break;
> + case EVLIST_CTL_CMD_ACK:
> + case EVLIST_CTL_CMD_UNSUPPORTED:
> + default:
> + break;
> + }
> + }

so there's still the filter call like:

if (evlist__filter_pollfd(rec->evlist, POLLERR | POLLHUP) == 0)
draining = true;

it will never be 0 if the control fds are stil alive no?

jirka

2020-06-23 14:59:15

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v8 09/13] perf stat: implement control commands handling

On Wed, Jun 17, 2020 at 11:41:30AM +0300, Alexey Budankov wrote:

SNIP

>
> while (1) {
> if (forks)
> @@ -581,8 +617,17 @@ static int dispatch_events(bool forks, int timeout, int interval, int *times, st
> if (done || stop || child)
> break;
>
> - nanosleep(ts, NULL);
> - stop = process_timeout(timeout, interval, times);
> + clock_gettime(CLOCK_MONOTONIC, &time_start);
> + if (!(evlist__poll(evsel_list, time_to_sleep) > 0)) { /* poll timeout or EINTR */
> + stop = process_timeout(timeout, interval, times);
> + time_to_sleep = sleep_time;
> + } else { /* fd revent */
> + stop = process_evlist(evsel_list, interval, times);
> + clock_gettime(CLOCK_MONOTONIC, &time_stop);
> + diff_timespec(&time_diff, &time_stop, &time_start);
> + time_to_sleep -= time_diff.tv_sec * MSEC_PER_SEC +
> + time_diff.tv_nsec / NSEC_PER_MSEC;

should we check time_to_sleep > time_diff first?

jirka

2020-06-23 15:01:46

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v8 04/13] perf stat: factor out body of event handling loop for system wide

On Wed, Jun 17, 2020 at 11:37:43AM +0300, Alexey Budankov wrote:
>
> Introduce process_timeout() and process_interval() functions that
> factor out body of event handling loop for attach and system wide
> monitoring use cases.
>
> Signed-off-by: Alexey Budankov <[email protected]>
> ---
> tools/perf/builtin-stat.c | 28 ++++++++++++++++++++--------
> 1 file changed, 20 insertions(+), 8 deletions(-)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 9be020e0098a..31f7ccf9537b 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -475,6 +475,23 @@ static void process_interval(void)
> print_counters(&rs, 0, NULL);
> }
>
> +static bool print_interval(unsigned int interval, int *times)
> +{
> + if (interval) {
> + process_interval();
> + if (interval_count && !(--(*times)))
> + return true;
> + }
> + return false;
> +}
> +
> +static bool process_timeout(int timeout, unsigned int interval, int *times)
> +{
> + if (timeout)
> + return true;
> + return print_interval(interval, times);
> +}

I think it's confusing to keep this together, that
process_timeout triggers also interval processing

I think you can keep the timeout separated from interval
processing and rename the print_interval to process_interval
and process_interval to __process_interval

jirka

> +
> static void enable_counters(void)
> {
> if (stat_config.initial_delay)
> @@ -611,6 +628,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
> struct affinity affinity;
> int i, cpu;
> bool second_pass = false;
> + bool stop = false;
>
> if (interval) {
> ts.tv_sec = interval / USEC_PER_MSEC;
> @@ -805,17 +823,11 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
> psignal(WTERMSIG(status), argv[0]);
> } else {
> enable_counters();
> - while (!done) {
> + while (!done && !stop) {
> nanosleep(&ts, NULL);
> if (!is_target_alive(&target, evsel_list->core.threads))
> break;
> - if (timeout)
> - break;
> - if (interval) {
> - process_interval();
> - if (interval_count && !(--times))
> - break;
> - }
> + stop = process_timeout(timeout, interval, &times);
> }
> }
>
> --
> 2.24.1
>
>

2020-06-23 15:06:09

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v8 03/13] perf evlist: implement control command handling functions

Em Tue, Jun 23, 2020 at 04:56:01PM +0200, Jiri Olsa escreveu:
> On Wed, Jun 17, 2020 at 11:36:46AM +0300, Alexey Budankov wrote:
>
> SNIP
>
> > + memset(cmd_data, 0, data_size--);
> > +
> > + do {
> > + err = read(evlist->ctl_fd.fd, &c, 1);
> > + if (err > 0) {
> > + if (c == '\n' || c == '\0')
> > + break;
> > + cmd_data[bytes_read++] = c;
> > + if (bytes_read == data_size)
> > + break;
> > + } else {
> > + if (err == -1)
> > + pr_err("Failed to read from ctlfd %d: %m\n", evlist->ctl_fd.fd);
> > + break;
> > + }
> > + } while (1);
> > +
> > + pr_debug("Message from ctl_fd: \"%s%s\"\n", cmd_data,
> > + bytes_read == data_size ? "" : c == '\n' ? "\\n" : "\\0");
> > +
> > + if (err > 0) {
> > + if (!strncmp(cmd_data, EVLIST_CTL_CMD_ENABLE_TAG,
> > + strlen(EVLIST_CTL_CMD_ENABLE_TAG))) {
>
> you could use sizeof(EVLIST_CTL_CMD_ENABLE_TAG) instead, no function call

-1, as sizeof will get the \0, right?

>
> > + *cmd = EVLIST_CTL_CMD_ENABLE;
> > + } else if (!strncmp(cmd_data, EVLIST_CTL_CMD_DISABLE_TAG,
> > + strlen(EVLIST_CTL_CMD_DISABLE_TAG))) {
>
> ditto
>
> jirka
>

--

- Arnaldo

2020-06-23 15:17:29

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v8 03/13] perf evlist: implement control command handling functions

On Tue, Jun 23, 2020 at 12:03:45PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Jun 23, 2020 at 04:56:01PM +0200, Jiri Olsa escreveu:
> > On Wed, Jun 17, 2020 at 11:36:46AM +0300, Alexey Budankov wrote:
> >
> > SNIP
> >
> > > + memset(cmd_data, 0, data_size--);
> > > +
> > > + do {
> > > + err = read(evlist->ctl_fd.fd, &c, 1);
> > > + if (err > 0) {
> > > + if (c == '\n' || c == '\0')
> > > + break;
> > > + cmd_data[bytes_read++] = c;
> > > + if (bytes_read == data_size)
> > > + break;
> > > + } else {
> > > + if (err == -1)
> > > + pr_err("Failed to read from ctlfd %d: %m\n", evlist->ctl_fd.fd);
> > > + break;
> > > + }
> > > + } while (1);
> > > +
> > > + pr_debug("Message from ctl_fd: \"%s%s\"\n", cmd_data,
> > > + bytes_read == data_size ? "" : c == '\n' ? "\\n" : "\\0");
> > > +
> > > + if (err > 0) {
> > > + if (!strncmp(cmd_data, EVLIST_CTL_CMD_ENABLE_TAG,
> > > + strlen(EVLIST_CTL_CMD_ENABLE_TAG))) {
> >
> > you could use sizeof(EVLIST_CTL_CMD_ENABLE_TAG) instead, no function call
>
> -1, as sizeof will get the \0, right?

yep, I think that's right

jirka

>
> >
> > > + *cmd = EVLIST_CTL_CMD_ENABLE;
> > > + } else if (!strncmp(cmd_data, EVLIST_CTL_CMD_DISABLE_TAG,
> > > + strlen(EVLIST_CTL_CMD_DISABLE_TAG))) {
> >
> > ditto
> >
> > jirka
> >
>
> --
>
> - Arnaldo
>

2020-06-24 13:23:33

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v8 03/13] perf evlist: implement control command handling functions


On 23.06.2020 17:56, Jiri Olsa wrote:
> On Wed, Jun 17, 2020 at 11:36:46AM +0300, Alexey Budankov wrote:
>
> SNIP
>
>> +static int evlist__ctlfd_recv(struct evlist *evlist, enum evlist_ctl_cmd *cmd,
>> + char *cmd_data, size_t data_size)
>> +{
>> + int err;
>> + char c;
>> + size_t bytes_read = 0;
>> +
>> + memset(cmd_data, 0, data_size--);
>> +
>> + do {
>> + err = read(evlist->ctl_fd.fd, &c, 1);
>> + if (err > 0) {
>> + if (c == '\n' || c == '\0')
>> + break;
>> + cmd_data[bytes_read++] = c;
>> + if (bytes_read == data_size)
>> + break;
>> + } else {
>> + if (err == -1)
>> + pr_err("Failed to read from ctlfd %d: %m\n", evlist->ctl_fd.fd);
>> + break;
>> + }
>> + } while (1);
>> +
>> + pr_debug("Message from ctl_fd: \"%s%s\"\n", cmd_data,
>> + bytes_read == data_size ? "" : c == '\n' ? "\\n" : "\\0");
>
> do you want to display the message above only if (err > 0) ?

I do, only for err >= 0. For err == -1 there is already the error message.

>
>> +
>> + if (err > 0) {
>> + if (!strncmp(cmd_data, EVLIST_CTL_CMD_ENABLE_TAG,
>> + strlen(EVLIST_CTL_CMD_ENABLE_TAG))) {
>> + *cmd = EVLIST_CTL_CMD_ENABLE;
>> + } else if (!strncmp(cmd_data, EVLIST_CTL_CMD_DISABLE_TAG,
>> + strlen(EVLIST_CTL_CMD_DISABLE_TAG))) {
>> + *cmd = EVLIST_CTL_CMD_DISABLE;
>> + }
>> + }
>> +
>> + return err;
>> +}
>
> SNIP
>
>> + int ctlfd_pos = evlist->ctl_fd.pos;
>> + struct pollfd *entries = evlist->core.pollfd.entries;
>> +
>> + if (ctlfd_pos == -1 || !entries[ctlfd_pos].revents)
>> + return 0;
>> +
>> + if (entries[ctlfd_pos].revents & POLLIN) {
>> + err = evlist__ctlfd_recv(evlist, cmd, cmd_data,
>> + EVLIST_CTL_CMD_MAX_LEN);
>> + if (err > 0) {
>> + switch (*cmd) {
>> + case EVLIST_CTL_CMD_ENABLE:
>> + evlist__enable(evlist);
>> + break;
>> + case EVLIST_CTL_CMD_DISABLE:
>> + evlist__disable(evlist);
>> + break;
>> + case EVLIST_CTL_CMD_ACK:
>> + case EVLIST_CTL_CMD_UNSUPPORTED:
>> + default:
>> + pr_debug("ctlfd: unsupported %d\n", *cmd);
>> + break;
>> + }
>> + if (!(*cmd == EVLIST_CTL_CMD_ACK || *cmd == EVLIST_CTL_CMD_UNSUPPORTED))
>> + evlist__ctlfd_ack(evlist);
>> + }
>> + }
>> +
>> + if (entries[ctlfd_pos].revents & (POLLHUP | POLLERR))
>> + evlist__finalize_ctlfd(evlist);
>
> should this error be propagated via err?

I don't think so. Currently it returns 0 only and there is no need to
stop collection or skip processing in caller in this case.

~Alexey

>
> jirka
>

2020-06-24 13:31:07

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v8 07/13] perf stat: factor out event handling loop into dispatch_events()


On 23.06.2020 17:56, Jiri Olsa wrote:
> On Wed, Jun 17, 2020 at 11:40:03AM +0300, Alexey Budankov wrote:
>>
>> Consolidate event dispatching loops for fork, attach and system
>> wide monitoring use cases into common dispatch_events() function.
>>
>> Signed-off-by: Alexey Budankov <[email protected]>
>> ---
>> tools/perf/builtin-stat.c | 35 ++++++++++++++++++++++++-----------
>> 1 file changed, 24 insertions(+), 11 deletions(-)
>>
>> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
>> index 3bc538576607..39749c290508 100644
>> --- a/tools/perf/builtin-stat.c
>> +++ b/tools/perf/builtin-stat.c
>> @@ -557,6 +557,27 @@ static bool is_target_alive(struct target *_target,
>> return false;
>> }
>>
>> +static int dispatch_events(bool forks, int timeout, int interval, int *times, struct timespec *ts)
>> +{
>> + bool stop = false;
>> + int child = 0, status = 0;
>> +
>> + while (1) {
>> + if (forks)
>> + child = waitpid(child_pid, &status, WNOHANG);
>> + else
>> + child = !is_target_alive(&target, evsel_list->core.threads) ? 1 : 0;
>
> please renme child to something more accurate, so the condition
> below makes more sense, like child_stoped or such

Well, let's have it named like child_stopped.

~Alexey

>
> jirka
>
>> +
>> + if (done || stop || child)
>> + break;
>> +
>> + nanosleep(ts, NULL);
>> + stop = process_timeout(timeout, interval, times);
>> + }
>> +
>> + return status;
>> +}
>> +
>
> SNIP
>

2020-06-24 13:38:54

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v8 03/13] perf evlist: implement control command handling functions



On 23.06.2020 18:14, Jiri Olsa wrote:
> On Tue, Jun 23, 2020 at 12:03:45PM -0300, Arnaldo Carvalho de Melo wrote:
>> Em Tue, Jun 23, 2020 at 04:56:01PM +0200, Jiri Olsa escreveu:
>>> On Wed, Jun 17, 2020 at 11:36:46AM +0300, Alexey Budankov wrote:
>>>
>>> SNIP
>>>
>>>> + memset(cmd_data, 0, data_size--);
>>>> +
>>>> + do {
>>>> + err = read(evlist->ctl_fd.fd, &c, 1);
>>>> + if (err > 0) {
>>>> + if (c == '\n' || c == '\0')
>>>> + break;
>>>> + cmd_data[bytes_read++] = c;
>>>> + if (bytes_read == data_size)
>>>> + break;
>>>> + } else {
>>>> + if (err == -1)
>>>> + pr_err("Failed to read from ctlfd %d: %m\n", evlist->ctl_fd.fd);
>>>> + break;
>>>> + }
>>>> + } while (1);
>>>> +
>>>> + pr_debug("Message from ctl_fd: \"%s%s\"\n", cmd_data,
>>>> + bytes_read == data_size ? "" : c == '\n' ? "\\n" : "\\0");
>>>> +
>>>> + if (err > 0) {
>>>> + if (!strncmp(cmd_data, EVLIST_CTL_CMD_ENABLE_TAG,
>>>> + strlen(EVLIST_CTL_CMD_ENABLE_TAG))) {
>>>
>>> you could use sizeof(EVLIST_CTL_CMD_ENABLE_TAG) instead, no function call
>>
>> -1, as sizeof will get the \0, right?
>
> yep, I think that's right

Ok, let's have it like (sizeof(EVLIST_CTL_CMD_ENABLE_TAG)-1).

~Alexey

>
> jirka
>
>>
>>>
>>>> + *cmd = EVLIST_CTL_CMD_ENABLE;
>>>> + } else if (!strncmp(cmd_data, EVLIST_CTL_CMD_DISABLE_TAG,
>>>> + strlen(EVLIST_CTL_CMD_DISABLE_TAG))) {
>>>
>>> ditto
>>>
>>> jirka
>>>
>>
>> --
>>
>> - Arnaldo
>>
>

2020-06-24 13:43:10

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v8 09/13] perf stat: implement control commands handling



On 23.06.2020 17:54, Jiri Olsa wrote:
> On Wed, Jun 17, 2020 at 11:41:30AM +0300, Alexey Budankov wrote:
>>
>> Implement handling of 'enable' and 'disable' control commands
>> coming from control file descriptor. process_evlist() function
>> checks for events on control fds and makes required operations.
>> If poll event splits initiated timeout interval then the reminder
>> is calculated and still waited in the following poll() syscall.
>>
>> Signed-off-by: Alexey Budankov <[email protected]>
>> ---
>> tools/perf/builtin-stat.c | 67 +++++++++++++++++++++++++++++----------
>> 1 file changed, 50 insertions(+), 17 deletions(-)
>>
>> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
>> index f88d5ee55022..cc56d71a3ed5 100644
>> --- a/tools/perf/builtin-stat.c
>> +++ b/tools/perf/builtin-stat.c
>> @@ -492,6 +492,31 @@ static bool process_timeout(int timeout, unsigned int interval, int *times)
>> return print_interval(interval, times);
>> }
>>
>> +static bool process_evlist(struct evlist *evlist, unsigned int interval, int *times)
>> +{
>> + bool stop = false;
>> + enum evlist_ctl_cmd cmd = EVLIST_CTL_CMD_UNSUPPORTED;
>> +
>> + if (evlist__ctlfd_process(evlist, &cmd) > 0) {
>> + switch (cmd) {
>> + case EVLIST_CTL_CMD_ENABLE:
>> + pr_info(EVLIST_ENABLED_MSG);
>> + stop = print_interval(interval, times);
>
> why is interval printed in here?
>
>> + break;
>> + case EVLIST_CTL_CMD_DISABLE:
>> + stop = print_interval(interval, times);
>
> and here?
>
> it should be called from the main loop when the interval time is elapsed no?

It is called from the main loop too and it is also additionally called here
to provide indication and counter values on commands processing times.

~Alexey

>
> jirka
>

2020-06-24 14:03:36

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v8 12/13] perf record: implement control commands handling


On 23.06.2020 17:54, Jiri Olsa wrote:
> On Wed, Jun 17, 2020 at 11:43:58AM +0300, Alexey Budankov wrote:
>>
>> Implement handling of 'enable' and 'disable' control commands
>> coming from control file descriptor.
>>
>> Signed-off-by: Alexey Budankov <[email protected]>
>> ---
>> tools/perf/builtin-record.c | 16 ++++++++++++++++
>> 1 file changed, 16 insertions(+)
>>
>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> index d0b29a1070a0..0394e068dde8 100644
>> --- a/tools/perf/builtin-record.c
>> +++ b/tools/perf/builtin-record.c
>> @@ -1527,6 +1527,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>> bool disabled = false, draining = false;
>> int fd;
>> float ratio = 0;
>> + enum evlist_ctl_cmd cmd = EVLIST_CTL_CMD_UNSUPPORTED;
>>
>> atexit(record__sig_exit);
>> signal(SIGCHLD, sig_handler);
>> @@ -1830,6 +1831,21 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>> alarm(rec->switch_output.time);
>> }
>>
>> + if (evlist__ctlfd_process(rec->evlist, &cmd) > 0) {
>> + switch (cmd) {
>> + case EVLIST_CTL_CMD_ENABLE:
>> + pr_info(EVLIST_ENABLED_MSG);
>> + break;
>> + case EVLIST_CTL_CMD_DISABLE:
>> + pr_info(EVLIST_DISABLED_MSG);
>> + break;
>> + case EVLIST_CTL_CMD_ACK:
>> + case EVLIST_CTL_CMD_UNSUPPORTED:
>> + default:
>> + break;
>> + }
>> + }
>
> so there's still the filter call like:
>
> if (evlist__filter_pollfd(rec->evlist, POLLERR | POLLHUP) == 0)
> draining = true;
>
> it will never be 0 if the control fds are stil alive no?

Due to change in filter_pollfd() and preceding evlist__ctlfd_process() call
now control fd is not counted by filter_pollfd().

However event fds with .revents == 0 are not counted either and this breaks
the algorithm thus something more is still required to cover this gap.

~Alexey

>
> jirka
>

2020-06-24 14:13:22

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v8 09/13] perf stat: implement control commands handling


On 23.06.2020 17:54, Jiri Olsa wrote:
> On Wed, Jun 17, 2020 at 11:41:30AM +0300, Alexey Budankov wrote:
>
> SNIP
>
>>
>> while (1) {
>> if (forks)
>> @@ -581,8 +617,17 @@ static int dispatch_events(bool forks, int timeout, int interval, int *times, st
>> if (done || stop || child)
>> break;
>>
>> - nanosleep(ts, NULL);
>> - stop = process_timeout(timeout, interval, times);
>> + clock_gettime(CLOCK_MONOTONIC, &time_start);
>> + if (!(evlist__poll(evsel_list, time_to_sleep) > 0)) { /* poll timeout or EINTR */
>> + stop = process_timeout(timeout, interval, times);
>> + time_to_sleep = sleep_time;
>> + } else { /* fd revent */
>> + stop = process_evlist(evsel_list, interval, times);
>> + clock_gettime(CLOCK_MONOTONIC, &time_stop);
>> + diff_timespec(&time_diff, &time_stop, &time_start);
>> + time_to_sleep -= time_diff.tv_sec * MSEC_PER_SEC +
>> + time_diff.tv_nsec / NSEC_PER_MSEC;
>
> should we check time_to_sleep > time_diff first?

Probably and if time_diff > time_to_sleep then time_to_sleep = 0 ?

~Alexey

>
> jirka
>

2020-06-24 14:28:49

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v8 04/13] perf stat: factor out body of event handling loop for system wide


On 23.06.2020 17:56, Jiri Olsa wrote:
> On Wed, Jun 17, 2020 at 11:37:43AM +0300, Alexey Budankov wrote:
>>
>> Introduce process_timeout() and process_interval() functions that
>> factor out body of event handling loop for attach and system wide
>> monitoring use cases.
>>
>> Signed-off-by: Alexey Budankov <[email protected]>
>> ---
>> tools/perf/builtin-stat.c | 28 ++++++++++++++++++++--------
>> 1 file changed, 20 insertions(+), 8 deletions(-)
>>
>> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
>> index 9be020e0098a..31f7ccf9537b 100644
>> --- a/tools/perf/builtin-stat.c
>> +++ b/tools/perf/builtin-stat.c
>> @@ -475,6 +475,23 @@ static void process_interval(void)
>> print_counters(&rs, 0, NULL);
>> }
>>
>> +static bool print_interval(unsigned int interval, int *times)
>> +{
>> + if (interval) {
>> + process_interval();
>> + if (interval_count && !(--(*times)))
>> + return true;
>> + }
>> + return false;
>> +}
>> +
>> +static bool process_timeout(int timeout, unsigned int interval, int *times)
>> +{
>> + if (timeout)
>> + return true;
>> + return print_interval(interval, times);
>> +}
>
> I think it's confusing to keep this together, that
> process_timeout triggers also interval processing
>
> I think you can keep the timeout separated from interval
> processing and rename the print_interval to process_interval
> and process_interval to __process_interval

Well, ok.

I will rename process_interval() to __process_interval() and
then print_interval() to process_interval().

Regarding timeout let's have it like this:

static bool process_timeout(int timeout)
{
return timeout ? true : false;
}

static bool process_timing_settings(int timeout, unsigned int interval, int *times)
{
bool res = process_timeout(timeout);
if (!res)
res = process_interval(interval, times);
return res;
}

Ok?

~Alexey

>
> jirka
>
>> +
>> static void enable_counters(void)
>> {
>> if (stat_config.initial_delay)
>> @@ -611,6 +628,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
>> struct affinity affinity;
>> int i, cpu;
>> bool second_pass = false;
>> + bool stop = false;
>>
>> if (interval) {
>> ts.tv_sec = interval / USEC_PER_MSEC;
>> @@ -805,17 +823,11 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
>> psignal(WTERMSIG(status), argv[0]);
>> } else {
>> enable_counters();
>> - while (!done) {
>> + while (!done && !stop) {
>> nanosleep(&ts, NULL);
>> if (!is_target_alive(&target, evsel_list->core.threads))
>> break;
>> - if (timeout)
>> - break;
>> - if (interval) {
>> - process_interval();
>> - if (interval_count && !(--times))
>> - break;
>> - }
>> + stop = process_timeout(timeout, interval, &times);
>> }
>> }
>>
>> --
>> 2.24.1
>>
>>
>

2020-06-24 14:57:05

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v8 12/13] perf record: implement control commands handling


On 24.06.2020 17:00, Alexey Budankov wrote:
>
> On 23.06.2020 17:54, Jiri Olsa wrote:
>> On Wed, Jun 17, 2020 at 11:43:58AM +0300, Alexey Budankov wrote:
>>>
>>> Implement handling of 'enable' and 'disable' control commands
>>> coming from control file descriptor.
>>>
>>> Signed-off-by: Alexey Budankov <[email protected]>
>>> ---
>>> tools/perf/builtin-record.c | 16 ++++++++++++++++
>>> 1 file changed, 16 insertions(+)
>>>
>>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>>> index d0b29a1070a0..0394e068dde8 100644
>>> --- a/tools/perf/builtin-record.c
>>> +++ b/tools/perf/builtin-record.c
>>> @@ -1527,6 +1527,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>>> bool disabled = false, draining = false;
>>> int fd;
>>> float ratio = 0;
>>> + enum evlist_ctl_cmd cmd = EVLIST_CTL_CMD_UNSUPPORTED;
>>>
>>> atexit(record__sig_exit);
>>> signal(SIGCHLD, sig_handler);
>>> @@ -1830,6 +1831,21 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>>> alarm(rec->switch_output.time);
>>> }
>>>
>>> + if (evlist__ctlfd_process(rec->evlist, &cmd) > 0) {
>>> + switch (cmd) {
>>> + case EVLIST_CTL_CMD_ENABLE:
>>> + pr_info(EVLIST_ENABLED_MSG);
>>> + break;
>>> + case EVLIST_CTL_CMD_DISABLE:
>>> + pr_info(EVLIST_DISABLED_MSG);
>>> + break;
>>> + case EVLIST_CTL_CMD_ACK:
>>> + case EVLIST_CTL_CMD_UNSUPPORTED:
>>> + default:
>>> + break;
>>> + }
>>> + }
>>
>> so there's still the filter call like:
>>
>> if (evlist__filter_pollfd(rec->evlist, POLLERR | POLLHUP) == 0)
>> draining = true;
>>
>> it will never be 0 if the control fds are stil alive no?
>
> Due to change in filter_pollfd() and preceding evlist__ctlfd_process() call
> now control fd is not counted by filter_pollfd().
And evlist__ctlfd_process() still should be called second time right
after evlist_poll() but prior filter_polfd().

~Alexey

>
> However event fds with .revents == 0 are not counted either and this breaks
> the algorithm thus something more is still required to cover this gap.
>
> ~Alexey
>
>>
>> jirka
>>

2020-06-24 17:22:20

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v8 01/13] tools/libperf: avoid moving of fds at fdarray__filter() call


On 17.06.2020 11:35, Alexey Budankov wrote:
>
> Skip fds with zeroed revents field from count and avoid fds moving
> at fdarray__filter() call so fds indices returned by fdarray__add()
> call stay the same and can be used for direct access and processing
> of fd revents status field at entries array of struct fdarray object.
>
> Signed-off-by: Alexey Budankov <[email protected]>
> ---
> tools/lib/api/fd/array.c | 11 +++++------
> tools/perf/tests/fdarray.c | 20 ++------------------
> 2 files changed, 7 insertions(+), 24 deletions(-)
>
> diff --git a/tools/lib/api/fd/array.c b/tools/lib/api/fd/array.c
> index 58d44d5eee31..97843a837370 100644
> --- a/tools/lib/api/fd/array.c
> +++ b/tools/lib/api/fd/array.c
> @@ -93,22 +93,21 @@ int fdarray__filter(struct fdarray *fda, short revents,
> return 0;
>
> for (fd = 0; fd < fda->nr; ++fd) {
> + if (!fda->entries[fd].revents)
> + continue;
> +

So it looks like this condition also filters out non signaling events fds, not only
control and others fds, and this should be somehow avoided so such event related fds
would be counted. Several options have been proposed so far:

1) Explicit typing of fds via API extension and filtering based on the types:
a) with separate fdarray__add_stat() call
b) with type arg of existing fdarray__add() call
c) various memory management design is possible

2) Playing tricks with fd positions inside entries and assumptions on fdarray API calls ordering
- looks more like a hack than a designed solution

3) Rewrite of fdarray class to allocate separate object for every added fds
- can be replaced with nonscrewing of fds by __filter()

4) Distinct between fds types at fdarray__filter() using .revents == 0 condition
- seems to have corner cases and thus not applicable

5) Extension of fdarray__poll(, *arg_ptr, arg_size) with arg of fds array to atomically poll
on fdarray_add()-ed fds and external arg fds and then external arg fds processing

6) Rewrite of fdarray class on epoll() call basis
- introduces new scalability restrictions for Perf tool

7) Anything else ...

~Alexey

> if (fda->entries[fd].revents & revents) {
> if (entry_destructor)
> entry_destructor(fda, fd, arg);
>
> + fda->entries[fd].revents = fda->entries[fd].events = 0;
> continue;
> }
>
> - if (fd != nr) {
> - fda->entries[nr] = fda->entries[fd];
> - fda->priv[nr] = fda->priv[fd];
> - }
> -
> ++nr;
> }
>
> - return fda->nr = nr;
> + return nr;
> }
>
> int fdarray__poll(struct fdarray *fda, int timeout)
> diff --git a/tools/perf/tests/fdarray.c b/tools/perf/tests/fdarray.c
> index c7c81c4a5b2b..d0c8a05aab2f 100644
> --- a/tools/perf/tests/fdarray.c
> +++ b/tools/perf/tests/fdarray.c
> @@ -12,6 +12,7 @@ static void fdarray__init_revents(struct fdarray *fda, short revents)
>
> for (fd = 0; fd < fda->nr; ++fd) {
> fda->entries[fd].fd = fda->nr - fd;
> + fda->entries[fd].events = revents;
> fda->entries[fd].revents = revents;
> }
> }
> @@ -29,7 +30,7 @@ static int fdarray__fprintf_prefix(struct fdarray *fda, const char *prefix, FILE
>
> int test__fdarray__filter(struct test *test __maybe_unused, int subtest __maybe_unused)
> {
> - int nr_fds, expected_fd[2], fd, err = TEST_FAIL;
> + int nr_fds, err = TEST_FAIL;
> struct fdarray *fda = fdarray__new(5, 5);
>
> if (fda == NULL) {
> @@ -55,7 +56,6 @@ int test__fdarray__filter(struct test *test __maybe_unused, int subtest __maybe_
>
> fdarray__init_revents(fda, POLLHUP);
> fda->entries[2].revents = POLLIN;
> - expected_fd[0] = fda->entries[2].fd;
>
> pr_debug("\nfiltering all but fda->entries[2]:");
> fdarray__fprintf_prefix(fda, "before", stderr);
> @@ -66,17 +66,9 @@ int test__fdarray__filter(struct test *test __maybe_unused, int subtest __maybe_
> goto out_delete;
> }
>
> - if (fda->entries[0].fd != expected_fd[0]) {
> - pr_debug("\nfda->entries[0].fd=%d != %d\n",
> - fda->entries[0].fd, expected_fd[0]);
> - goto out_delete;
> - }
> -
> fdarray__init_revents(fda, POLLHUP);
> fda->entries[0].revents = POLLIN;
> - expected_fd[0] = fda->entries[0].fd;
> fda->entries[3].revents = POLLIN;
> - expected_fd[1] = fda->entries[3].fd;
>
> pr_debug("\nfiltering all but (fda->entries[0], fda->entries[3]):");
> fdarray__fprintf_prefix(fda, "before", stderr);
> @@ -88,14 +80,6 @@ int test__fdarray__filter(struct test *test __maybe_unused, int subtest __maybe_
> goto out_delete;
> }
>
> - for (fd = 0; fd < 2; ++fd) {
> - if (fda->entries[fd].fd != expected_fd[fd]) {
> - pr_debug("\nfda->entries[%d].fd=%d != %d\n", fd,
> - fda->entries[fd].fd, expected_fd[fd]);
> - goto out_delete;
> - }
> - }
> -
> pr_debug("\n");
>
> err = 0;
>

2020-06-25 12:13:58

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v8 09/13] perf stat: implement control commands handling

On Wed, Jun 24, 2020 at 04:39:11PM +0300, Alexey Budankov wrote:
>
>
> On 23.06.2020 17:54, Jiri Olsa wrote:
> > On Wed, Jun 17, 2020 at 11:41:30AM +0300, Alexey Budankov wrote:
> >>
> >> Implement handling of 'enable' and 'disable' control commands
> >> coming from control file descriptor. process_evlist() function
> >> checks for events on control fds and makes required operations.
> >> If poll event splits initiated timeout interval then the reminder
> >> is calculated and still waited in the following poll() syscall.
> >>
> >> Signed-off-by: Alexey Budankov <[email protected]>
> >> ---
> >> tools/perf/builtin-stat.c | 67 +++++++++++++++++++++++++++++----------
> >> 1 file changed, 50 insertions(+), 17 deletions(-)
> >>
> >> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> >> index f88d5ee55022..cc56d71a3ed5 100644
> >> --- a/tools/perf/builtin-stat.c
> >> +++ b/tools/perf/builtin-stat.c
> >> @@ -492,6 +492,31 @@ static bool process_timeout(int timeout, unsigned int interval, int *times)
> >> return print_interval(interval, times);
> >> }
> >>
> >> +static bool process_evlist(struct evlist *evlist, unsigned int interval, int *times)
> >> +{
> >> + bool stop = false;
> >> + enum evlist_ctl_cmd cmd = EVLIST_CTL_CMD_UNSUPPORTED;
> >> +
> >> + if (evlist__ctlfd_process(evlist, &cmd) > 0) {
> >> + switch (cmd) {
> >> + case EVLIST_CTL_CMD_ENABLE:
> >> + pr_info(EVLIST_ENABLED_MSG);
> >> + stop = print_interval(interval, times);
> >
> > why is interval printed in here?
> >
> >> + break;
> >> + case EVLIST_CTL_CMD_DISABLE:
> >> + stop = print_interval(interval, times);
> >
> > and here?
> >
> > it should be called from the main loop when the interval time is elapsed no?
>
> It is called from the main loop too and it is also additionally called here
> to provide indication and counter values on commands processing times.

so it prints interval out of order?

jirka

2020-06-25 12:17:12

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v8 09/13] perf stat: implement control commands handling

On Wed, Jun 24, 2020 at 05:10:10PM +0300, Alexey Budankov wrote:
>
> On 23.06.2020 17:54, Jiri Olsa wrote:
> > On Wed, Jun 17, 2020 at 11:41:30AM +0300, Alexey Budankov wrote:
> >
> > SNIP
> >
> >>
> >> while (1) {
> >> if (forks)
> >> @@ -581,8 +617,17 @@ static int dispatch_events(bool forks, int timeout, int interval, int *times, st
> >> if (done || stop || child)
> >> break;
> >>
> >> - nanosleep(ts, NULL);
> >> - stop = process_timeout(timeout, interval, times);
> >> + clock_gettime(CLOCK_MONOTONIC, &time_start);
> >> + if (!(evlist__poll(evsel_list, time_to_sleep) > 0)) { /* poll timeout or EINTR */
> >> + stop = process_timeout(timeout, interval, times);
> >> + time_to_sleep = sleep_time;
> >> + } else { /* fd revent */
> >> + stop = process_evlist(evsel_list, interval, times);
> >> + clock_gettime(CLOCK_MONOTONIC, &time_stop);
> >> + diff_timespec(&time_diff, &time_stop, &time_start);
> >> + time_to_sleep -= time_diff.tv_sec * MSEC_PER_SEC +
> >> + time_diff.tv_nsec / NSEC_PER_MSEC;
> >
> > should we check time_to_sleep > time_diff first?
>
> Probably and if time_diff > time_to_sleep then time_to_sleep = 0 ?

or extra call to process_timeout? if we dont want to call evlist_poll
with 0 timeout

jirka

>
> ~Alexey
>
> >
> > jirka
> >
>

2020-06-25 12:18:28

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v8 04/13] perf stat: factor out body of event handling loop for system wide

On Wed, Jun 24, 2020 at 05:27:41PM +0300, Alexey Budankov wrote:
>
> On 23.06.2020 17:56, Jiri Olsa wrote:
> > On Wed, Jun 17, 2020 at 11:37:43AM +0300, Alexey Budankov wrote:
> >>
> >> Introduce process_timeout() and process_interval() functions that
> >> factor out body of event handling loop for attach and system wide
> >> monitoring use cases.
> >>
> >> Signed-off-by: Alexey Budankov <[email protected]>
> >> ---
> >> tools/perf/builtin-stat.c | 28 ++++++++++++++++++++--------
> >> 1 file changed, 20 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> >> index 9be020e0098a..31f7ccf9537b 100644
> >> --- a/tools/perf/builtin-stat.c
> >> +++ b/tools/perf/builtin-stat.c
> >> @@ -475,6 +475,23 @@ static void process_interval(void)
> >> print_counters(&rs, 0, NULL);
> >> }
> >>
> >> +static bool print_interval(unsigned int interval, int *times)
> >> +{
> >> + if (interval) {
> >> + process_interval();
> >> + if (interval_count && !(--(*times)))
> >> + return true;
> >> + }
> >> + return false;
> >> +}
> >> +
> >> +static bool process_timeout(int timeout, unsigned int interval, int *times)
> >> +{
> >> + if (timeout)
> >> + return true;
> >> + return print_interval(interval, times);
> >> +}
> >
> > I think it's confusing to keep this together, that
> > process_timeout triggers also interval processing
> >
> > I think you can keep the timeout separated from interval
> > processing and rename the print_interval to process_interval
> > and process_interval to __process_interval
>
> Well, ok.
>
> I will rename process_interval() to __process_interval() and
> then print_interval() to process_interval().
>
> Regarding timeout let's have it like this:
>
> static bool process_timeout(int timeout)
> {
> return timeout ? true : false;
> }

can't this just stay as value check after finished poll?

if (timeout)
break;

and then separate call to process_interval(interval, times)?

jirka

>
> static bool process_timing_settings(int timeout, unsigned int interval, int *times)
> {
> bool res = process_timeout(timeout);
> if (!res)
> res = process_interval(interval, times);
> return res;
> }
>
> Ok?
>
> ~Alexey
>
> >
> > jirka
> >
> >> +
> >> static void enable_counters(void)
> >> {
> >> if (stat_config.initial_delay)
> >> @@ -611,6 +628,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
> >> struct affinity affinity;
> >> int i, cpu;
> >> bool second_pass = false;
> >> + bool stop = false;
> >>
> >> if (interval) {
> >> ts.tv_sec = interval / USEC_PER_MSEC;
> >> @@ -805,17 +823,11 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
> >> psignal(WTERMSIG(status), argv[0]);
> >> } else {
> >> enable_counters();
> >> - while (!done) {
> >> + while (!done && !stop) {
> >> nanosleep(&ts, NULL);
> >> if (!is_target_alive(&target, evsel_list->core.threads))
> >> break;
> >> - if (timeout)
> >> - break;
> >> - if (interval) {
> >> - process_interval();
> >> - if (interval_count && !(--times))
> >> - break;
> >> - }
> >> + stop = process_timeout(timeout, interval, &times);
> >> }
> >> }
> >>
> >> --
> >> 2.24.1
> >>
> >>
> >
>

2020-06-25 14:52:57

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v8 09/13] perf stat: implement control commands handling

On 25.06.2020 15:12, Jiri Olsa wrote:
> On Wed, Jun 24, 2020 at 04:39:11PM +0300, Alexey Budankov wrote:
>>
>>
>> On 23.06.2020 17:54, Jiri Olsa wrote:
>>> On Wed, Jun 17, 2020 at 11:41:30AM +0300, Alexey Budankov wrote:
>>>>
>>>> Implement handling of 'enable' and 'disable' control commands
>>>> coming from control file descriptor. process_evlist() function
>>>> checks for events on control fds and makes required operations.
>>>> If poll event splits initiated timeout interval then the reminder
>>>> is calculated and still waited in the following poll() syscall.
>>>>
>>>> Signed-off-by: Alexey Budankov <[email protected]>
>>>> ---
>>>> tools/perf/builtin-stat.c | 67 +++++++++++++++++++++++++++++----------
>>>> 1 file changed, 50 insertions(+), 17 deletions(-)
>>>>
>>>> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
>>>> index f88d5ee55022..cc56d71a3ed5 100644
>>>> --- a/tools/perf/builtin-stat.c
>>>> +++ b/tools/perf/builtin-stat.c
>>>> @@ -492,6 +492,31 @@ static bool process_timeout(int timeout, unsigned int interval, int *times)
>>>> return print_interval(interval, times);
>>>> }
>>>>
>>>> +static bool process_evlist(struct evlist *evlist, unsigned int interval, int *times)
>>>> +{
>>>> + bool stop = false;
>>>> + enum evlist_ctl_cmd cmd = EVLIST_CTL_CMD_UNSUPPORTED;
>>>> +
>>>> + if (evlist__ctlfd_process(evlist, &cmd) > 0) {
>>>> + switch (cmd) {
>>>> + case EVLIST_CTL_CMD_ENABLE:
>>>> + pr_info(EVLIST_ENABLED_MSG);
>>>> + stop = print_interval(interval, times);
>>>
>>> why is interval printed in here?
>>>
>>>> + break;
>>>> + case EVLIST_CTL_CMD_DISABLE:
>>>> + stop = print_interval(interval, times);
>>>
>>> and here?
>>>
>>> it should be called from the main loop when the interval time is elapsed no?
>>
>> It is called from the main loop too and it is also additionally called here
>> to provide indication and counter values on commands processing times.
>
> so it prints interval out of order?

Looks like it does. The only issue with it I see is change in times value.

~Alexey

>
> jirka
>

2020-06-25 15:00:16

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v8 09/13] perf stat: implement control commands handling


On 25.06.2020 15:14, Jiri Olsa wrote:
> On Wed, Jun 24, 2020 at 05:10:10PM +0300, Alexey Budankov wrote:
>>
>> On 23.06.2020 17:54, Jiri Olsa wrote:
>>> On Wed, Jun 17, 2020 at 11:41:30AM +0300, Alexey Budankov wrote:
>>>
>>> SNIP
>>>
>>>>
>>>> while (1) {
>>>> if (forks)
>>>> @@ -581,8 +617,17 @@ static int dispatch_events(bool forks, int timeout, int interval, int *times, st
>>>> if (done || stop || child)
>>>> break;
>>>>
>>>> - nanosleep(ts, NULL);
>>>> - stop = process_timeout(timeout, interval, times);
>>>> + clock_gettime(CLOCK_MONOTONIC, &time_start);
>>>> + if (!(evlist__poll(evsel_list, time_to_sleep) > 0)) { /* poll timeout or EINTR */
>>>> + stop = process_timeout(timeout, interval, times);
>>>> + time_to_sleep = sleep_time;
>>>> + } else { /* fd revent */
>>>> + stop = process_evlist(evsel_list, interval, times);
>>>> + clock_gettime(CLOCK_MONOTONIC, &time_stop);
>>>> + diff_timespec(&time_diff, &time_stop, &time_start);
>>>> + time_to_sleep -= time_diff.tv_sec * MSEC_PER_SEC +
>>>> + time_diff.tv_nsec / NSEC_PER_MSEC;
>>>
>>> should we check time_to_sleep > time_diff first?
>>
>> Probably and if time_diff > time_to_sleep then time_to_sleep = 0 ?
>
> or extra call to process_timeout? if we dont want to call evlist_poll
> with 0 timeout

poll() man page says it is ok to call poll with 0 timeout so
process_timeout() and initialization of time_to_sleep will be
done in common flow.

~Alexey

2020-06-25 18:14:18

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v8 04/13] perf stat: factor out body of event handling loop for system wide


On 25.06.2020 15:17, Jiri Olsa wrote:
> On Wed, Jun 24, 2020 at 05:27:41PM +0300, Alexey Budankov wrote:
>>
>> On 23.06.2020 17:56, Jiri Olsa wrote:
>>> On Wed, Jun 17, 2020 at 11:37:43AM +0300, Alexey Budankov wrote:
>>>>
>>>> Introduce process_timeout() and process_interval() functions that
>>>> factor out body of event handling loop for attach and system wide
>>>> monitoring use cases.
>>>>
>>>> Signed-off-by: Alexey Budankov <[email protected]>
>>>> ---
>>>> tools/perf/builtin-stat.c | 28 ++++++++++++++++++++--------
>>>> 1 file changed, 20 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
>>>> index 9be020e0098a..31f7ccf9537b 100644
>>>> --- a/tools/perf/builtin-stat.c
>>>> +++ b/tools/perf/builtin-stat.c
>>>> @@ -475,6 +475,23 @@ static void process_interval(void)
>>>> print_counters(&rs, 0, NULL);
>>>> }
>>>>
>>>> +static bool print_interval(unsigned int interval, int *times)
>>>> +{
>>>> + if (interval) {
>>>> + process_interval();
>>>> + if (interval_count && !(--(*times)))
>>>> + return true;
>>>> + }
>>>> + return false;
>>>> +}
>>>> +
>>>> +static bool process_timeout(int timeout, unsigned int interval, int *times)
>>>> +{
>>>> + if (timeout)
>>>> + return true;
>>>> + return print_interval(interval, times);
>>>> +}
>>>
>>> I think it's confusing to keep this together, that
>>> process_timeout triggers also interval processing
>>>
>>> I think you can keep the timeout separated from interval
>>> processing and rename the print_interval to process_interval
>>> and process_interval to __process_interval
>>
>> Well, ok.
>>
>> I will rename process_interval() to __process_interval() and
>> then print_interval() to process_interval().
>>
>> Regarding timeout let's have it like this:
>>
>> static bool process_timeout(int timeout)
>> {
>> return timeout ? true : false;
>> }
>
> can't this just stay as value check after finished poll?
>
> if (timeout)
> break;
>
> and then separate call to process_interval(interval, times)?

Like this? Still makes sense to have it in a single function.

static bool process_timing_settings(int timeout, unsigned int interval, int *times)
{
bool res = timeout ? true : false;
if (!res)
res = process_interval(interval, times);
return res;
}

~Alexey

2020-06-25 18:25:01

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v8 12/13] perf record: implement control commands handling

On Wed, Jun 24, 2020 at 05:55:02PM +0300, Alexey Budankov wrote:
>
> On 24.06.2020 17:00, Alexey Budankov wrote:
> >
> > On 23.06.2020 17:54, Jiri Olsa wrote:
> >> On Wed, Jun 17, 2020 at 11:43:58AM +0300, Alexey Budankov wrote:
> >>>
> >>> Implement handling of 'enable' and 'disable' control commands
> >>> coming from control file descriptor.
> >>>
> >>> Signed-off-by: Alexey Budankov <[email protected]>
> >>> ---
> >>> tools/perf/builtin-record.c | 16 ++++++++++++++++
> >>> 1 file changed, 16 insertions(+)
> >>>
> >>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> >>> index d0b29a1070a0..0394e068dde8 100644
> >>> --- a/tools/perf/builtin-record.c
> >>> +++ b/tools/perf/builtin-record.c
> >>> @@ -1527,6 +1527,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
> >>> bool disabled = false, draining = false;
> >>> int fd;
> >>> float ratio = 0;
> >>> + enum evlist_ctl_cmd cmd = EVLIST_CTL_CMD_UNSUPPORTED;
> >>>
> >>> atexit(record__sig_exit);
> >>> signal(SIGCHLD, sig_handler);
> >>> @@ -1830,6 +1831,21 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
> >>> alarm(rec->switch_output.time);
> >>> }
> >>>
> >>> + if (evlist__ctlfd_process(rec->evlist, &cmd) > 0) {
> >>> + switch (cmd) {
> >>> + case EVLIST_CTL_CMD_ENABLE:
> >>> + pr_info(EVLIST_ENABLED_MSG);
> >>> + break;
> >>> + case EVLIST_CTL_CMD_DISABLE:
> >>> + pr_info(EVLIST_DISABLED_MSG);
> >>> + break;
> >>> + case EVLIST_CTL_CMD_ACK:
> >>> + case EVLIST_CTL_CMD_UNSUPPORTED:
> >>> + default:
> >>> + break;
> >>> + }
> >>> + }
> >>
> >> so there's still the filter call like:
> >>
> >> if (evlist__filter_pollfd(rec->evlist, POLLERR | POLLHUP) == 0)
> >> draining = true;
> >>
> >> it will never be 0 if the control fds are stil alive no?
> >
> > Due to change in filter_pollfd() and preceding evlist__ctlfd_process() call
> > now control fd is not counted by filter_pollfd().
> And evlist__ctlfd_process() still should be called second time right
> after evlist_poll() but prior filter_polfd().

aaah it's set to zero in here:

if (entries[ctlfd_pos].revents & (POLLHUP | POLLERR))
evlist__finalize_ctlfd(evlist);
else
entries[ctlfd_pos].revents = 0;

yea, that's bad.. another reason to call it a hack

jirka

>
> ~Alexey
>
> >
> > However event fds with .revents == 0 are not counted either and this breaks
> > the algorithm thus something more is still required to cover this gap.
> >
> > ~Alexey
> >
> >>
> >> jirka
> >>
>

2020-06-25 18:25:48

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v8 01/13] tools/libperf: avoid moving of fds at fdarray__filter() call

On Wed, Jun 24, 2020 at 08:19:32PM +0300, Alexey Budankov wrote:
>
> On 17.06.2020 11:35, Alexey Budankov wrote:
> >
> > Skip fds with zeroed revents field from count and avoid fds moving
> > at fdarray__filter() call so fds indices returned by fdarray__add()
> > call stay the same and can be used for direct access and processing
> > of fd revents status field at entries array of struct fdarray object.
> >
> > Signed-off-by: Alexey Budankov <[email protected]>
> > ---
> > tools/lib/api/fd/array.c | 11 +++++------
> > tools/perf/tests/fdarray.c | 20 ++------------------
> > 2 files changed, 7 insertions(+), 24 deletions(-)
> >
> > diff --git a/tools/lib/api/fd/array.c b/tools/lib/api/fd/array.c
> > index 58d44d5eee31..97843a837370 100644
> > --- a/tools/lib/api/fd/array.c
> > +++ b/tools/lib/api/fd/array.c
> > @@ -93,22 +93,21 @@ int fdarray__filter(struct fdarray *fda, short revents,
> > return 0;
> >
> > for (fd = 0; fd < fda->nr; ++fd) {
> > + if (!fda->entries[fd].revents)
> > + continue;
> > +
>
> So it looks like this condition also filters out non signaling events fds, not only
> control and others fds, and this should be somehow avoided so such event related fds
> would be counted. Several options have been proposed so far:
>
> 1) Explicit typing of fds via API extension and filtering based on the types:
> a) with separate fdarray__add_stat() call
> b) with type arg of existing fdarray__add() call
> c) various memory management design is possible
>
> 2) Playing tricks with fd positions inside entries and assumptions on fdarray API calls ordering
> - looks more like a hack than a designed solution
>
> 3) Rewrite of fdarray class to allocate separate object for every added fds
> - can be replaced with nonscrewing of fds by __filter()
>
> 4) Distinct between fds types at fdarray__filter() using .revents == 0 condition
> - seems to have corner cases and thus not applicable
>
> 5) Extension of fdarray__poll(, *arg_ptr, arg_size) with arg of fds array to atomically poll
> on fdarray_add()-ed fds and external arg fds and then external arg fds processing
>
> 6) Rewrite of fdarray class on epoll() call basis
> - introduces new scalability restrictions for Perf tool

hum, how many fds for polling do you expect in your workloads?

jirka

2020-06-25 18:27:26

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v8 04/13] perf stat: factor out body of event handling loop for system wide

On Thu, Jun 25, 2020 at 07:01:08PM +0300, Alexey Budankov wrote:

SNIP

> >>
> >> Well, ok.
> >>
> >> I will rename process_interval() to __process_interval() and
> >> then print_interval() to process_interval().
> >>
> >> Regarding timeout let's have it like this:
> >>
> >> static bool process_timeout(int timeout)
> >> {
> >> return timeout ? true : false;
> >> }
> >
> > can't this just stay as value check after finished poll?
> >
> > if (timeout)
> > break;
> >
> > and then separate call to process_interval(interval, times)?
>
> Like this? Still makes sense to have it in a single function.
>
> static bool process_timing_settings(int timeout, unsigned int interval, int *times)
> {
> bool res = timeout ? true : false;
> if (!res)
> res = process_interval(interval, times);
> return res;
> }

I don't see the connection between timeout and interval
IMO this just complicates things, is there a problem to
keep it separated as it is now?

jirka

>
> ~Alexey
>

2020-06-25 19:12:35

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v8 04/13] perf stat: factor out body of event handling loop for system wide


On 25.06.2020 20:13, Jiri Olsa wrote:
> On Thu, Jun 25, 2020 at 07:01:08PM +0300, Alexey Budankov wrote:
>
> SNIP
>
>>>>
>>>> Well, ok.
>>>>
>>>> I will rename process_interval() to __process_interval() and
>>>> then print_interval() to process_interval().
>>>>
>>>> Regarding timeout let's have it like this:
>>>>
>>>> static bool process_timeout(int timeout)
>>>> {
>>>> return timeout ? true : false;
>>>> }
>>>
>>> can't this just stay as value check after finished poll?
>>>
>>> if (timeout)
>>> break;
>>>
>>> and then separate call to process_interval(interval, times)?
>>
>> Like this? Still makes sense to have it in a single function.
>>
>> static bool process_timing_settings(int timeout, unsigned int interval, int *times)
>> {
>> bool res = timeout ? true : false;
>> if (!res)
>> res = process_interval(interval, times);
>> return res;
>> }
>
> I don't see the connection between timeout and interval
> IMO this just complicates things, is there a problem to
> keep it separated as it is now?

Not a problem. Can duplicate it in dispatch_events().

~Alexey

2020-06-25 19:34:15

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v8 01/13] tools/libperf: avoid moving of fds at fdarray__filter() call


On 25.06.2020 20:14, Jiri Olsa wrote:
> On Wed, Jun 24, 2020 at 08:19:32PM +0300, Alexey Budankov wrote:
>>
>> On 17.06.2020 11:35, Alexey Budankov wrote:
>>>
>>> Skip fds with zeroed revents field from count and avoid fds moving
>>> at fdarray__filter() call so fds indices returned by fdarray__add()
>>> call stay the same and can be used for direct access and processing
>>> of fd revents status field at entries array of struct fdarray object.
>>>
>>> Signed-off-by: Alexey Budankov <[email protected]>
>>> ---
>>> tools/lib/api/fd/array.c | 11 +++++------
>>> tools/perf/tests/fdarray.c | 20 ++------------------
>>> 2 files changed, 7 insertions(+), 24 deletions(-)
>>>
>>> diff --git a/tools/lib/api/fd/array.c b/tools/lib/api/fd/array.c
>>> index 58d44d5eee31..97843a837370 100644
>>> --- a/tools/lib/api/fd/array.c
>>> +++ b/tools/lib/api/fd/array.c
>>> @@ -93,22 +93,21 @@ int fdarray__filter(struct fdarray *fda, short revents,
>>> return 0;
>>>
>>> for (fd = 0; fd < fda->nr; ++fd) {
>>> + if (!fda->entries[fd].revents)
>>> + continue;
>>> +
>>
>> So it looks like this condition also filters out non signaling events fds, not only
>> control and others fds, and this should be somehow avoided so such event related fds
>> would be counted. Several options have been proposed so far:
>>
>> 1) Explicit typing of fds via API extension and filtering based on the types:
>> a) with separate fdarray__add_stat() call
>> b) with type arg of existing fdarray__add() call
>> c) various memory management design is possible
>>
>> 2) Playing tricks with fd positions inside entries and assumptions on fdarray API calls ordering
>> - looks more like a hack than a designed solution
>>
>> 3) Rewrite of fdarray class to allocate separate object for every added fds
>> - can be replaced with nonscrewing of fds by __filter()
>>
>> 4) Distinct between fds types at fdarray__filter() using .revents == 0 condition
>> - seems to have corner cases and thus not applicable
>>
>> 5) Extension of fdarray__poll(, *arg_ptr, arg_size) with arg of fds array to atomically poll
>> on fdarray_add()-ed fds and external arg fds and then external arg fds processing
>>
>> 6) Rewrite of fdarray class on epoll() call basis
>> - introduces new scalability restrictions for Perf tool
>
> hum, how many fds for polling do you expect in your workloads?

Currently it is several hundreds so default of 1K is easily hit and
"Profile a Large Number of PMU Events on Multi-Core Systems" section [1]
recommends:

soft nofile 65535
hard nofile 65535

for for /etc/security/limits.conf settings.

~Alexey

[1] https://software.intel.com/content/www/us/en/develop/documentation/vtune-cookbook/top/configuration-recipes/profiling-hardware-without-sampling-drivers.html

>
> jirka
>

2020-06-26 11:41:18

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v8 01/13] tools/libperf: avoid moving of fds at fdarray__filter() call

On Thu, Jun 25, 2020 at 10:32:29PM +0300, Alexey Budankov wrote:
>
> On 25.06.2020 20:14, Jiri Olsa wrote:
> > On Wed, Jun 24, 2020 at 08:19:32PM +0300, Alexey Budankov wrote:
> >>
> >> On 17.06.2020 11:35, Alexey Budankov wrote:
> >>>
> >>> Skip fds with zeroed revents field from count and avoid fds moving
> >>> at fdarray__filter() call so fds indices returned by fdarray__add()
> >>> call stay the same and can be used for direct access and processing
> >>> of fd revents status field at entries array of struct fdarray object.
> >>>
> >>> Signed-off-by: Alexey Budankov <[email protected]>
> >>> ---
> >>> tools/lib/api/fd/array.c | 11 +++++------
> >>> tools/perf/tests/fdarray.c | 20 ++------------------
> >>> 2 files changed, 7 insertions(+), 24 deletions(-)
> >>>
> >>> diff --git a/tools/lib/api/fd/array.c b/tools/lib/api/fd/array.c
> >>> index 58d44d5eee31..97843a837370 100644
> >>> --- a/tools/lib/api/fd/array.c
> >>> +++ b/tools/lib/api/fd/array.c
> >>> @@ -93,22 +93,21 @@ int fdarray__filter(struct fdarray *fda, short revents,
> >>> return 0;
> >>>
> >>> for (fd = 0; fd < fda->nr; ++fd) {
> >>> + if (!fda->entries[fd].revents)
> >>> + continue;
> >>> +
> >>
> >> So it looks like this condition also filters out non signaling events fds, not only
> >> control and others fds, and this should be somehow avoided so such event related fds
> >> would be counted. Several options have been proposed so far:
> >>
> >> 1) Explicit typing of fds via API extension and filtering based on the types:
> >> a) with separate fdarray__add_stat() call
> >> b) with type arg of existing fdarray__add() call
> >> c) various memory management design is possible
> >>
> >> 2) Playing tricks with fd positions inside entries and assumptions on fdarray API calls ordering
> >> - looks more like a hack than a designed solution
> >>
> >> 3) Rewrite of fdarray class to allocate separate object for every added fds
> >> - can be replaced with nonscrewing of fds by __filter()
> >>
> >> 4) Distinct between fds types at fdarray__filter() using .revents == 0 condition
> >> - seems to have corner cases and thus not applicable
> >>
> >> 5) Extension of fdarray__poll(, *arg_ptr, arg_size) with arg of fds array to atomically poll
> >> on fdarray_add()-ed fds and external arg fds and then external arg fds processing
> >>
> >> 6) Rewrite of fdarray class on epoll() call basis
> >> - introduces new scalability restrictions for Perf tool
> >
> > hum, how many fds for polling do you expect in your workloads?
>
> Currently it is several hundreds so default of 1K is easily hit and
> "Profile a Large Number of PMU Events on Multi-Core Systems" section [1]
> recommends:
>
> soft nofile 65535
> hard nofile 65535

I'm confused, are you talking about file descriptors limit now?
this wont be affected by epoll change.. what do I miss?

I thought your concern was fs.epoll.max_user_watches, which has
default value that seems to be enough:

$ cat /proc/sys/fs/epoll/max_user_watches
3169996

jirka


>
> for for /etc/security/limits.conf settings.
>
> ~Alexey
>
> [1] https://software.intel.com/content/www/us/en/develop/documentation/vtune-cookbook/top/configuration-recipes/profiling-hardware-without-sampling-drivers.html
>
> >
> > jirka
> >
>

2020-06-26 11:46:40

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v8 01/13] tools/libperf: avoid moving of fds at fdarray__filter() call


On 26.06.2020 12:37, Jiri Olsa wrote:
> On Thu, Jun 25, 2020 at 10:32:29PM +0300, Alexey Budankov wrote:
>>
>> On 25.06.2020 20:14, Jiri Olsa wrote:
>>> On Wed, Jun 24, 2020 at 08:19:32PM +0300, Alexey Budankov wrote:
>>>>
>>>> On 17.06.2020 11:35, Alexey Budankov wrote:
>>>>>
>>>>> Skip fds with zeroed revents field from count and avoid fds moving
>>>>> at fdarray__filter() call so fds indices returned by fdarray__add()
>>>>> call stay the same and can be used for direct access and processing
>>>>> of fd revents status field at entries array of struct fdarray object.
>>>>>
>>>>> Signed-off-by: Alexey Budankov <[email protected]>
>>>>> ---
>>>>> tools/lib/api/fd/array.c | 11 +++++------
>>>>> tools/perf/tests/fdarray.c | 20 ++------------------
>>>>> 2 files changed, 7 insertions(+), 24 deletions(-)
>>>>>
>>>>> diff --git a/tools/lib/api/fd/array.c b/tools/lib/api/fd/array.c
>>>>> index 58d44d5eee31..97843a837370 100644
>>>>> --- a/tools/lib/api/fd/array.c
>>>>> +++ b/tools/lib/api/fd/array.c
>>>>> @@ -93,22 +93,21 @@ int fdarray__filter(struct fdarray *fda, short revents,
>>>>> return 0;
>>>>>
>>>>> for (fd = 0; fd < fda->nr; ++fd) {
>>>>> + if (!fda->entries[fd].revents)
>>>>> + continue;
>>>>> +
>>>>
>>>> So it looks like this condition also filters out non signaling events fds, not only
>>>> control and others fds, and this should be somehow avoided so such event related fds
>>>> would be counted. Several options have been proposed so far:
>>>>
>>>> 1) Explicit typing of fds via API extension and filtering based on the types:
>>>> a) with separate fdarray__add_stat() call
>>>> b) with type arg of existing fdarray__add() call
>>>> c) various memory management design is possible
>>>>
>>>> 2) Playing tricks with fd positions inside entries and assumptions on fdarray API calls ordering
>>>> - looks more like a hack than a designed solution
>>>>
>>>> 3) Rewrite of fdarray class to allocate separate object for every added fds
>>>> - can be replaced with nonscrewing of fds by __filter()
>>>>
>>>> 4) Distinct between fds types at fdarray__filter() using .revents == 0 condition
>>>> - seems to have corner cases and thus not applicable
>>>>
>>>> 5) Extension of fdarray__poll(, *arg_ptr, arg_size) with arg of fds array to atomically poll
>>>> on fdarray_add()-ed fds and external arg fds and then external arg fds processing
>>>>
>>>> 6) Rewrite of fdarray class on epoll() call basis
>>>> - introduces new scalability restrictions for Perf tool
>>>
>>> hum, how many fds for polling do you expect in your workloads?
>>
>> Currently it is several hundreds so default of 1K is easily hit and
>> "Profile a Large Number of PMU Events on Multi-Core Systems" section [1]
>> recommends:
>>
>> soft nofile 65535
>> hard nofile 65535
>
> I'm confused, are you talking about file descriptors limit now?
> this wont be affected by epoll change.. what do I miss?

Currently there is already uname -n limit on the amount of open file descriptors
and Perf tool process is affected by that limit.

Moving to epoll() will impose one more max_user_watches limit and that can additionally
confine Perf applicability even though default value on some machines currently
is high enough.

~Alexey

>
> I thought your concern was fs.epoll.max_user_watches, which has
> default value that seems to be enough:
>
> $ cat /proc/sys/fs/epoll/max_user_watches
> 3169996
>
> jirka
>
>
>>
>> for for /etc/security/limits.conf settings.
>>
>> ~Alexey
>>
>> [1] https://software.intel.com/content/www/us/en/develop/documentation/vtune-cookbook/top/configuration-recipes/profiling-hardware-without-sampling-drivers.html
>>
>>>
>>> jirka
>>>
>>
>

2020-06-29 18:58:02

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v8 01/13] tools/libperf: avoid moving of fds at fdarray__filter() call


On 26.06.2020 13:06, Alexey Budankov wrote:
>
> On 26.06.2020 12:37, Jiri Olsa wrote:
>> On Thu, Jun 25, 2020 at 10:32:29PM +0300, Alexey Budankov wrote:
>>>
>>> On 25.06.2020 20:14, Jiri Olsa wrote:
>>>> On Wed, Jun 24, 2020 at 08:19:32PM +0300, Alexey Budankov wrote:
>>>>>
>>>>> On 17.06.2020 11:35, Alexey Budankov wrote:
>>>>>>
>>>>>> Skip fds with zeroed revents field from count and avoid fds moving
>>>>>> at fdarray__filter() call so fds indices returned by fdarray__add()
>>>>>> call stay the same and can be used for direct access and processing
>>>>>> of fd revents status field at entries array of struct fdarray object.
>>>>>>
>>>>>> Signed-off-by: Alexey Budankov <[email protected]>
>>>>>> ---
>>>>>> tools/lib/api/fd/array.c | 11 +++++------
>>>>>> tools/perf/tests/fdarray.c | 20 ++------------------
>>>>>> 2 files changed, 7 insertions(+), 24 deletions(-)
>>>>>>
>>>>>> diff --git a/tools/lib/api/fd/array.c b/tools/lib/api/fd/array.c
>>>>>> index 58d44d5eee31..97843a837370 100644
>>>>>> --- a/tools/lib/api/fd/array.c
>>>>>> +++ b/tools/lib/api/fd/array.c
>>>>>> @@ -93,22 +93,21 @@ int fdarray__filter(struct fdarray *fda, short revents,
>>>>>> return 0;
>>>>>>
>>>>>> for (fd = 0; fd < fda->nr; ++fd) {
>>>>>> + if (!fda->entries[fd].revents)
>>>>>> + continue;
>>>>>> +
>>>>>
>>>>> So it looks like this condition also filters out non signaling events fds, not only
>>>>> control and others fds, and this should be somehow avoided so such event related fds
>>>>> would be counted. Several options have been proposed so far:
>>>>>
>>>>> 1) Explicit typing of fds via API extension and filtering based on the types:
>>>>> a) with separate fdarray__add_stat() call
>>>>> b) with type arg of existing fdarray__add() call
>>>>> c) various memory management design is possible
>>>>>
>>>>> 2) Playing tricks with fd positions inside entries and assumptions on fdarray API calls ordering
>>>>> - looks more like a hack than a designed solution
>>>>>
>>>>> 3) Rewrite of fdarray class to allocate separate object for every added fds
>>>>> - can be replaced with nonscrewing of fds by __filter()
>>>>>
>>>>> 4) Distinct between fds types at fdarray__filter() using .revents == 0 condition
>>>>> - seems to have corner cases and thus not applicable
>>>>>
>>>>> 5) Extension of fdarray__poll(, *arg_ptr, arg_size) with arg of fds array to atomically poll
>>>>> on fdarray_add()-ed fds and external arg fds and then external arg fds processing
>>>>>
>>>>> 6) Rewrite of fdarray class on epoll() call basis
>>>>> - introduces new scalability restrictions for Perf tool
>>>>
>>>> hum, how many fds for polling do you expect in your workloads?
>>>
>>> Currently it is several hundreds so default of 1K is easily hit and
>>> "Profile a Large Number of PMU Events on Multi-Core Systems" section [1]
>>> recommends:
>>>
>>> soft nofile 65535
>>> hard nofile 65535
>>
>> I'm confused, are you talking about file descriptors limit now?
>> this wont be affected by epoll change.. what do I miss?
>
> Currently there is already uname -n limit on the amount of open file descriptors
> and Perf tool process is affected by that limit.
>
> Moving to epoll() will impose one more max_user_watches limit and that can additionally
> confine Perf applicability even though default value on some machines currently
> is high enough.

Prior making v9 I would prefer to agree on some design to be implemented in order to
avoid guessing and redundant reiterating.

Options that I see as good balanced ones are 1) or 5), + non screwing of fds to fix
staleness of pos(=fdarray__add()).

Are there any thoughts so far?

~Aleksei

2020-06-29 19:19:33

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v8 01/13] tools/libperf: avoid moving of fds at fdarray__filter() call

On Mon, Jun 29, 2020 at 06:11:52PM +0300, Alexey Budankov wrote:

SNIP

> >>
> >> I'm confused, are you talking about file descriptors limit now?
> >> this wont be affected by epoll change.. what do I miss?
> >
> > Currently there is already uname -n limit on the amount of open file descriptors
> > and Perf tool process is affected by that limit.
> >
> > Moving to epoll() will impose one more max_user_watches limit and that can additionally
> > confine Perf applicability even though default value on some machines currently
> > is high enough.
>
> Prior making v9 I would prefer to agree on some design to be implemented in order to
> avoid guessing and redundant reiterating.
>
> Options that I see as good balanced ones are 1) or 5), + non screwing of fds to fix
> staleness of pos(=fdarray__add()).
>
> Are there any thoughts so far?

let's try it and discuss over the code

jirka