LinuxLists.cc - [PATCH v1] perf tests: Run tests in parallel by default

2024-03-01 17:49:18

Subject: [PATCH v1] perf tests: Run tests in parallel by default

Switch from running tests sequentially to running in parallel by
default. Change the opt-in '-p' or '--parallel' flag to '-S' or
'--sequential'.

On an 8 core tigerlake an address sanitizer run time changes from:
326.54user 622.73system 6:59.91elapsed 226%CPU
to:
973.02user 583.98system 3:01.17elapsed 859%CPU

So over twice as fast, saving 4 minutes.

Signed-off-by: Ian Rogers <[email protected]>
---
This change is on top of the test fixes in:
https://lore.kernel.org/lkml/[email protected]/
---
tools/perf/tests/builtin-test.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index ddb2f4e38ea5..73f53b02f733 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -39,8 +39,8 @@
* making them easier to debug.
*/
static bool dont_fork;
-/* Fork the tests in parallel and then wait for their completion. */
-static bool parallel;
+/* Don't fork the tests in parallel and wait for their completion. */
+static bool sequential;
const char *dso_to_test;
const char *test_objdump_path = "objdump";

@@ -374,7 +374,7 @@ static int start_test(struct test_suite *test, int i, int subi, struct child_tes
}
(*child)->process.no_exec_cmd = run_test_child;
err = start_command(&(*child)->process);
- if (err || parallel)
+ if (err || !sequential)
return err;
return finish_test(*child, width);
}
@@ -440,7 +440,7 @@ static int __cmd_test(int argc, const char *argv[], struct intlist *skiplist)
int err = start_test(t, curr, -1, &child_tests[child_test_num++], width);

if (err) {
- /* TODO: if parallel waitpid the already forked children. */
+ /* TODO: if !sequential waitpid the already forked children. */
free(child_tests);
return err;
}
@@ -460,7 +460,7 @@ static int __cmd_test(int argc, const char *argv[], struct intlist *skiplist)
}
}
for (i = 0; i < child_test_num; i++) {
- if (parallel) {
+ if (!sequential) {
int ret = finish_test(child_tests[i], width);

if (ret)
@@ -536,8 +536,8 @@ int cmd_test(int argc, const char **argv)
"be more verbose (show symbol address, etc)"),
OPT_BOOLEAN('F', "dont-fork", &dont_fork,
"Do not fork for testcase"),
- OPT_BOOLEAN('p', "parallel", &parallel,
- "Run the tests altogether in parallel"),
+ OPT_BOOLEAN('S', "sequential", &sequential,
+ "Run the tests one after another rather than in parallel"),
OPT_STRING('w', "workload", &workload, "work", "workload to run for testing"),
OPT_STRING(0, "dso", &dso_to_test, "dso", "dso to test"),
OPT_STRING(0, "objdump", &test_objdump_path, "path",
@@ -564,6 +564,9 @@ int cmd_test(int argc, const char **argv)
if (workload)
return run_workload(workload, argc, argv);

+ if (dont_fork)
+ sequential = true;
+
symbol_conf.priv_size = sizeof(int);
symbol_conf.try_vmlinux_path = true;

--
2.44.0.278.ge034bb2e1d-goog

2024-03-20 15:59:07

by Ian Rogers

[permalink] [raw]

Subject: Re: [PATCH v1] perf tests: Run tests in parallel by default

On Fri, Mar 1, 2024 at 9:47 AM Ian Rogers <[email protected]> wrote:
>
> Switch from running tests sequentially to running in parallel by
> default. Change the opt-in '-p' or '--parallel' flag to '-S' or
> '--sequential'.
>
> On an 8 core tigerlake an address sanitizer run time changes from:
> 326.54user 622.73system 6:59.91elapsed 226%CPU
> to:
> 973.02user 583.98system 3:01.17elapsed 859%CPU
>
> So over twice as fast, saving 4 minutes.
>
> Signed-off-by: Ian Rogers <[email protected]>

Ping.

Thanks,
Ian

> ---
> This change is on top of the test fixes in:
> https://lore.kernel.org/lkml/[email protected]/
> ---
> tools/perf/tests/builtin-test.c | 17 ++++++++++-------
> 1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
> index ddb2f4e38ea5..73f53b02f733 100644
> --- a/tools/perf/tests/builtin-test.c
> +++ b/tools/perf/tests/builtin-test.c
> @@ -39,8 +39,8 @@
> * making them easier to debug.
> */
> static bool dont_fork;
> -/* Fork the tests in parallel and then wait for their completion. */
> -static bool parallel;
> +/* Don't fork the tests in parallel and wait for their completion. */
> +static bool sequential;
> const char *dso_to_test;
> const char *test_objdump_path = "objdump";
>
> @@ -374,7 +374,7 @@ static int start_test(struct test_suite *test, int i, int subi, struct child_tes
> }
> (*child)->process.no_exec_cmd = run_test_child;
> err = start_command(&(*child)->process);
> - if (err || parallel)
> + if (err || !sequential)
> return err;
> return finish_test(*child, width);
> }
> @@ -440,7 +440,7 @@ static int __cmd_test(int argc, const char *argv[], struct intlist *skiplist)
> int err = start_test(t, curr, -1, &child_tests[child_test_num++], width);
>
> if (err) {
> - /* TODO: if parallel waitpid the already forked children. */
> + /* TODO: if !sequential waitpid the already forked children. */
> free(child_tests);
> return err;
> }
> @@ -460,7 +460,7 @@ static int __cmd_test(int argc, const char *argv[], struct intlist *skiplist)
> }
> }
> for (i = 0; i < child_test_num; i++) {
> - if (parallel) {
> + if (!sequential) {
> int ret = finish_test(child_tests[i], width);
>
> if (ret)
> @@ -536,8 +536,8 @@ int cmd_test(int argc, const char **argv)
> "be more verbose (show symbol address, etc)"),
> OPT_BOOLEAN('F', "dont-fork", &dont_fork,
> "Do not fork for testcase"),
> - OPT_BOOLEAN('p', "parallel", &parallel,
> - "Run the tests altogether in parallel"),
> + OPT_BOOLEAN('S', "sequential", &sequential,
> + "Run the tests one after another rather than in parallel"),
> OPT_STRING('w', "workload", &workload, "work", "workload to run for testing"),
> OPT_STRING(0, "dso", &dso_to_test, "dso", "dso to test"),
> OPT_STRING(0, "objdump", &test_objdump_path, "path",
> @@ -564,6 +564,9 @@ int cmd_test(int argc, const char **argv)
> if (workload)
> return run_workload(workload, argc, argv);
>
> + if (dont_fork)
> + sequential = true;
> +
> symbol_conf.priv_size = sizeof(int);
> symbol_conf.try_vmlinux_path = true;
>
> --
> 2.44.0.278.ge034bb2e1d-goog
>

2024-03-20 18:20:41

by Arnaldo Carvalho de Melo

[permalink] [raw]

Subject: Re: [PATCH v1] perf tests: Run tests in parallel by default

On Wed, Mar 20, 2024 at 08:58:37AM -0700, Ian Rogers wrote:
> On Fri, Mar 1, 2024 at 9:47 AM Ian Rogers <[email protected]> wrote:
> >
> > Switch from running tests sequentially to running in parallel by
> > default. Change the opt-in '-p' or '--parallel' flag to '-S' or
> > '--sequential'.
> >
> > On an 8 core tigerlake an address sanitizer run time changes from:
> > 326.54user 622.73system 6:59.91elapsed 226%CPU
> > to:
> > 973.02user 583.98system 3:01.17elapsed 859%CPU
> >
> > So over twice as fast, saving 4 minutes.
> >
> > Signed-off-by: Ian Rogers <[email protected]>
>
> Ping.

Thanks, applied.

- Arnaldo

2024-04-26 15:07:57

by James Clark

[permalink] [raw]

Subject: Re: [PATCH v1] perf tests: Run tests in parallel by default

On 01/03/2024 17:47, Ian Rogers wrote:
> Switch from running tests sequentially to running in parallel by
> default. Change the opt-in '-p' or '--parallel' flag to '-S' or
> '--sequential'.
>
> On an 8 core tigerlake an address sanitizer run time changes from:
> 326.54user 622.73system 6:59.91elapsed 226%CPU
> to:
> 973.02user 583.98system 3:01.17elapsed 859%CPU
>
> So over twice as fast, saving 4 minutes.
>

Apologies for not replying earlier before this was applied. But IMO this
isn't a good default. Tests that use things like exclusive PMUs
(Coresight for example) can never pass when run in parallel.

For CI it's arguable whether you'd want to trade stability for speed.
And for interactive sessions there was already the --parallel option
which was easy to add and have it in your bash history.

Now we've changed the default, any CI will need to be updated to add
--sequential if it wants all the tests to pass. Maybe we could do some
hack and gate it on interactive vs non interactive sessions, but that
might be getting too clever. (Or a "don't run in parallel" flag on
certain tests)

Thanks
James

2024-04-26 15:44:05

by Adrian Hunter

[permalink] [raw]

Subject: Re: [PATCH v1] perf tests: Run tests in parallel by default

On 26/04/24 18:06, James Clark wrote:
>
>
> On 01/03/2024 17:47, Ian Rogers wrote:
>> Switch from running tests sequentially to running in parallel by
>> default. Change the opt-in '-p' or '--parallel' flag to '-S' or
>> '--sequential'.
>>
>> On an 8 core tigerlake an address sanitizer run time changes from:
>> 326.54user 622.73system 6:59.91elapsed 226%CPU
>> to:
>> 973.02user 583.98system 3:01.17elapsed 859%CPU
>>
>> So over twice as fast, saving 4 minutes.
>>
>
> Apologies for not replying earlier before this was applied. But IMO this
> isn't a good default. Tests that use things like exclusive PMUs
> (Coresight for example) can never pass when run in parallel.

Yes, that is an issue for Intel PT also.

>
> For CI it's arguable whether you'd want to trade stability for speed.
> And for interactive sessions there was already the --parallel option
> which was easy to add and have it in your bash history.
>
> Now we've changed the default, any CI will need to be updated to add
> --sequential if it wants all the tests to pass.

Same here.

> Maybe we could do some
> hack and gate it on interactive vs non interactive sessions, but that
> might be getting too clever. (Or a "don't run in parallel" flag on
> certain tests)

Perhaps more attention is needed for the tests that take so long.
For example, maybe they could do things in parallel.

Also -F option doesn't seem to work.

$ tools/perf/perf test -F
Couldn't find a vmlinux that matches the kernel running on this machine, skipping test
1: vmlinux symtab matches kallsyms : Skip
2: Detect openat syscall event : Ok
3: Detect openat syscall event on all cpus : Ok
4.1: Read samples using the mmap interface : Ok
4.2: User space counter reading of instructions : Ok
4.3: User space counter reading of cycles : Ok
5: Test data source output : Ok
WARNING: event 'numpmu' not valid (bits 16-17,20,22 of config '6530160' not supported by kernel)!
6.1: Test event parsing : Ok
6.2: Parsing of all PMU events from sysfs : Ok
WARNING: event 'N/A' not valid (bits 0-1 of config2 '3' not supported by kernel)!
6.3: Parsing of given PMU events from sysfs : Ok
6.4: Parsing of aliased events from sysfs : Skip (no aliases in sysfs)
6.5: Parsing of aliased events : Ok
6.6: Parsing of terms (event modifiers) : Ok
7: Simple expression parser : Ok
8: PERF_RECORD_* events & perf_sample fields : Ok
9: Parse perf pmu format : Ok
10.1: PMU event table sanity : Ok
10.2: PMU event map aliases : Ok
failed: recursion detected for M1
failed: recursion detected for M2
failed: recursion detected for M3
Not grouping metric tma_memory_fence's events.
Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric constraint:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric constraint:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
..
^C

2024-04-26 16:46:15

by Ian Rogers

[permalink] [raw]

Subject: Re: [PATCH v1] perf tests: Run tests in parallel by default

On Fri, Apr 26, 2024 at 8:42 AM Adrian Hunter <[email protected]> wrote:
>
> On 26/04/24 18:06, James Clark wrote:
> >
> >
> > On 01/03/2024 17:47, Ian Rogers wrote:
> >> Switch from running tests sequentially to running in parallel by
> >> default. Change the opt-in '-p' or '--parallel' flag to '-S' or
> >> '--sequential'.
> >>
> >> On an 8 core tigerlake an address sanitizer run time changes from:
> >> 326.54user 622.73system 6:59.91elapsed 226%CPU
> >> to:
> >> 973.02user 583.98system 3:01.17elapsed 859%CPU
> >>
> >> So over twice as fast, saving 4 minutes.
> >>
> >
> > Apologies for not replying earlier before this was applied. But IMO this
> > isn't a good default. Tests that use things like exclusive PMUs
> > (Coresight for example) can never pass when run in parallel.
>
> Yes, that is an issue for Intel PT also.

Right, Arnaldo and I discussed this and the safe thing to do for now
is to keep the default at serial - ie revert this change.

My longer term concern is that basically means people won't use
parallel testing and we'll bitrot in places like synthesis where it's
easy to make naive assumptions that say /proc won't be changing.

We have these tests set up for continuous testing in my team, and as
part of that set up each test is run independently to maximize
parallelism. This means the exclusive PMU thing does make the tests
flaky so it would be nice to have the tests address this, either by
skipping or busy waiting (only when the error code from the PMU is
busy, up to some maximum time period).

> >
> > For CI it's arguable whether you'd want to trade stability for speed.
> > And for interactive sessions there was already the --parallel option
> > which was easy to add and have it in your bash history.
> >
> > Now we've changed the default, any CI will need to be updated to add
> > --sequential if it wants all the tests to pass.
>
> Same here.
>
> > Maybe we could do some
> > hack and gate it on interactive vs non interactive sessions, but that
> > might be getting too clever. (Or a "don't run in parallel" flag on
> > certain tests)
>
> Perhaps more attention is needed for the tests that take so long.
> For example, maybe they could do things in parallel.

This is true for the tool in general. The problem is a lack of good
abstractions. Probably the easiest candidate for this are the shell
tests.

> Also -F option doesn't seem to work.
>
> $ tools/perf/perf test -F
> Couldn't find a vmlinux that matches the kernel running on this machine, skipping test
> 1: vmlinux symtab matches kallsyms : Skip
> 2: Detect openat syscall event : Ok
> 3: Detect openat syscall event on all cpus : Ok
> 4.1: Read samples using the mmap interface : Ok
> 4.2: User space counter reading of instructions : Ok
> 4.3: User space counter reading of cycles : Ok
> 5: Test data source output : Ok
> WARNING: event 'numpmu' not valid (bits 16-17,20,22 of config '6530160' not supported by kernel)!
> 6.1: Test event parsing : Ok
> 6.2: Parsing of all PMU events from sysfs : Ok
> WARNING: event 'N/A' not valid (bits 0-1 of config2 '3' not supported by kernel)!
> 6.3: Parsing of given PMU events from sysfs : Ok
> 6.4: Parsing of aliased events from sysfs : Skip (no aliases in sysfs)
> 6.5: Parsing of aliased events : Ok
> 6.6: Parsing of terms (event modifiers) : Ok
> 7: Simple expression parser : Ok
> 8: PERF_RECORD_* events & perf_sample fields : Ok
> 9: Parse perf pmu format : Ok
> 10.1: PMU event table sanity : Ok
> 10.2: PMU event map aliases : Ok
> failed: recursion detected for M1
> failed: recursion detected for M2
> failed: recursion detected for M3
> Not grouping metric tma_memory_fence's events.
> Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric constraint:
> echo 0 > /proc/sys/kernel/nmi_watchdog
> perf stat ...
> echo 1 > /proc/sys/kernel/nmi_watchdog
> Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric constraint:
> echo 0 > /proc/sys/kernel/nmi_watchdog
> perf stat ...
> echo 1 > /proc/sys/kernel/nmi_watchdog
> ...
> ^C

Not working meaning the errors are going to stderr? The issue is that
pr_err will write to stderr regardless of the verbosity setting and
some tests deliberately induce errors. We could use debug_set_file
to set the output to /dev/null, but that could hide trouble. -F
usually means your debugging so we've not cared in the past.

Thanks,
Ian