2014-10-01 05:28:39

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC PATCH] Perf Bench: Locking Microbenchmark


* Tuan Bui <[email protected]> wrote:

> Subject: [RFC PATCH] Perf Bench: Locking Microbenchmark
>
> In response to this thread https://lkml.org/lkml/2014/2/11/93,
> this is a micro benchmark that stresses locking contention in
> the kernel with creat(2) system call by spawning multiple
> processes to spam this system call. This workload generate
> similar results and contentions in AIM7 fserver workload but
> can generate outputs within seconds.
>
> With the creat(2) system call the contention vary on what locks
> are used in the particular file system. I have ran this
> benchmark only on ext4 and xfs file system.
>
> Running the creat workload on ext4 show contention in the mutex
> lock that is used by ext4_orphan_add() and ext4_orphan_del() to
> add or delete an inode from the list of inodes. At the same
> time running the creat workload on xfs show contention in the
> spinlock that is used by xsf_log_commit_cil() to commit a
> transaction to the Committed Item List.
>
> Here is a comparison of this benchmark with AIM7 running
> fserver workload at 500-1000 users along with a perf trace
> running on ext4 file system.
>
> Test machine is a 8-sockets 80 cores Westmere system HT-off on
> v3.17-rc6.
>
> AIM7 AIM7 perf-bench perf-bench
> Users Jobs/min Jobs/min/child Ops/sec Ops/sec/child
> 500 119668.25 239.34 104249 208
> 600 126074.90 210.12 106136 176
> 700 128662.42 183.80 106175 151
> 800 119822.05 149.78 106290 132
> 900 106150.25 117.94 105230 116
> 1000 104681.29 104.68 106489 106
>
> Perf trace for AIM7 fserver:
> 14.51% reaim [kernel.kallsyms] [k] osq_lock
> 4.98% reaim reaim [.] add_long
> 4.98% reaim reaim [.] add_int
> 4.31% reaim [kernel.kallsyms] [k] mutex_spin_on_owner
> ...
>
> Perf trace of perf bench creat
> 22.37% locking-creat [kernel.kallsyms] [k] osq_lock
> 5.77% locking-creat [kernel.kallsyms] [k] mutex_spin_on_owner
> 5.31% locking-creat [kernel.kallsyms] [k] _raw_spin_lock
> 5.15% locking-creat [jbd2] [k] jbd2_journal_put_journal_head
> ...

Very nice!

If you compare an strace of AIM7 steady state and 'perf bench
lock' steady state, is it comparable, i.e. do the syscalls and
other behavioral patterns match up?

> +'locking'::
> + Locking stressing benchmarks.
> +
> 'all'::
> All benchmark subsystems.
>
> @@ -213,6 +216,11 @@ Suite for evaluating wake calls.
> *requeue*::
> Suite for evaluating requeue calls.
>
> +SUITES FOR 'locking'
> +~~~~~~~~~~~~~~~~~~
> +*creat*::
> +Suite for evaluating locking contention through creat(2).

So I'd display it in the help text prominently that it's a
workload similar to the AIM7 workload.

> +static const struct option options[] = {
> + OPT_UINTEGER('s', "start", &start_nr_threads, "Numbers of processes to start"),
> + OPT_UINTEGER('e', "end", &end_nr_threads, "Numbers of process to end"),
> + OPT_UINTEGER('i', "increment", &increment_threads_by, "Number of threads to increment)"),
> + OPT_UINTEGER('r', "runtime", &bench_dur, "Specify benchmark runtime in seconds"),
> + OPT_END()
> +};

Is this the kind of parameters that AIM7 takes as well?

In any case, this is a very nice benchmarking utility.

Thanks,

Ingo


2014-10-01 17:12:32

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC PATCH] Perf Bench: Locking Microbenchmark

Em Wed, Oct 01, 2014 at 07:28:32AM +0200, Ingo Molnar escreveu:
> If you compare an strace of AIM7 steady state and 'perf bench
> lock' steady state, is it comparable, i.e. do the syscalls and

Isn't "lock" too generic? Isn't this stressing some specific lock and if
so shouldn't that be made abundantly clear in the 'perf bench' test name
and in the docs?

Or is this the case that it started by using 'creat' calls to stress
some locking and will go on adding more syscalls to stress more kernel
locks?

- Arnaldo

2014-10-03 04:53:24

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [RFC PATCH] Perf Bench: Locking Microbenchmark

On Wed, 2014-10-01 at 07:28 +0200, Ingo Molnar wrote:
> If you compare an strace of AIM7 steady state and 'perf bench
> lock' steady state, is it comparable, i.e. do the syscalls and
> other behavioral patterns match up?

With more than 1000 users I'm seeing:

- 33.74% locking-creat [kernel.kallsyms] [k] mspin_lock ◆
+ mspin_lock ▒
+ __mutex_lock_slowpath ▒
+ mutex_lock ▒
- 7.97% locking-creat [kernel.kallsyms] [k] mutex_spin_on_owner ▒
+ mutex_spin_on_owner ▒
+ __mutex_lock_slowpath ▒
+ mutex_lock

Lower users count just shows the syscall entries.

Of course, the aim7 setup was running on a ramdisk, thus avoiding any IO
overhead in the traces.

Thanks,
Davidlohr

2014-10-03 04:58:02

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [RFC PATCH] Perf Bench: Locking Microbenchmark

On Wed, 2014-10-01 at 14:12 -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Oct 01, 2014 at 07:28:32AM +0200, Ingo Molnar escreveu:
> > If you compare an strace of AIM7 steady state and 'perf bench
> > lock' steady state, is it comparable, i.e. do the syscalls and
>
> Isn't "lock" too generic? Isn't this stressing some specific lock and if
> so shouldn't that be made abundantly clear in the 'perf bench' test name
> and in the docs?

yeah, and 'perf bench locking creat' just doesn't sound right.

2014-10-08 22:11:23

by Tuan Bui

[permalink] [raw]
Subject: Re: [RFC PATCH] Perf Bench: Locking Microbenchmark

On Wed, 2014-10-01 at 14:12 -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Oct 01, 2014 at 07:28:32AM +0200, Ingo Molnar escreveu:
> > If you compare an strace of AIM7 steady state and 'perf bench
> > lock' steady state, is it comparable, i.e. do the syscalls and
>
> Isn't "lock" too generic? Isn't this stressing some specific lock and if
> so shouldn't that be made abundantly clear in the 'perf bench' test name
> and in the docs?
>

In this micro benchmark, I am trying to exhibit the same locking
contention shown in an AIM7 fserver workload. Since the creat(2) system
call is file system dependent running this on different file system show
different lock being contended that is why i did not specify specific
lock name in the doc. Do you have a suggestion here on how i should
name this benchmark?

> Or is this the case that it started by using 'creat' calls to stress
> some locking and will go on adding more syscalls to stress more kernel
> locks?
>

When running all AIM7 workloads looking for locking contention to
reproduce, creat was the only one I found interesting and useful to
stress locking contention.



2014-10-08 22:13:44

by Tuan Bui

[permalink] [raw]
Subject: Re: [RFC PATCH] Perf Bench: Locking Microbenchmark

On Wed, 2014-10-01 at 07:28 +0200, Ingo Molnar wrote:

> >
> > Perf trace of perf bench creat
> > 22.37% locking-creat [kernel.kallsyms] [k] osq_lock
> > 5.77% locking-creat [kernel.kallsyms] [k] mutex_spin_on_owner
> > 5.31% locking-creat [kernel.kallsyms] [k] _raw_spin_lock
> > 5.15% locking-creat [jbd2] [k] jbd2_journal_put_journal_head
> > ...
>
> Very nice!
>
> If you compare an strace of AIM7 steady state and 'perf bench
> lock' steady state, is it comparable, i.e. do the syscalls and
> other behavioral patterns match up?
>

Here is an strace -cf of my perf bench and AIM7 fserver workload at 1000
users on an ext4 file system. My perf bench results look comparable to
the AIM7 fserver workload to me. What do you think?

strace -cf for perf bench locking creat at 1000 users

% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ---------
79.29 4.421000 221 20018 creat
13.07 0.729000 729 1000 unlink
6.47 0.361000 18 20032 close
0.60 0.033213 33 1000 wait4
0.37 0.020365 20 1000 clone
0.20 0.011000 11 1003 2 futex
0.00 0.000037 6 6 munmap
0.00 0.000010 0 24 mprotect
0.00 0.000009 0 44 mmap
0.00 0.000000 0 12 read
0.00 0.000000 0 4 write
0.00 0.000000 0 1027 14 open

strace -cf for AIM7 fserver workload at 1000 users

% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- -----------
24.42 163.436284 50 3243016 creat
18.15 121.475390 17 7148543 brk
14.49 96.990556 85229 1138 35 wait4
7.86 52.605030 15 3394990 close
5.73 38.310323 31 1222317 write
4.99 33.389587 17 2000001 kill
4.85 32.432000 16 2001035 1000 rt_sigreturn
4.64 31.050979 64 483800 getdents
4.38 29.316247 14 2029311 rt_sigaction
3.10 20.744360 45 464016 5000 unlink
2.57 17.171514 15 1153825 read
1.13 7.588489 35 215104 link
0.89 5.945480 8 786320 433 stat
0.60 4.045701 11 366004 lseek
0.36 2.420812 9 263006 times
0.34 2.272305 18 124982 129 open


> > +'locking'::
> > + Locking stressing benchmarks.
> > +
> > 'all'::
> > All benchmark subsystems.
> >
> > @@ -213,6 +216,11 @@ Suite for evaluating wake calls.
> > *requeue*::
> > Suite for evaluating requeue calls.
> >
> > +SUITES FOR 'locking'
> > +~~~~~~~~~~~~~~~~~~
> > +*creat*::
> > +Suite for evaluating locking contention through creat(2).
>
> So I'd display it in the help text prominently that it's a
> workload similar to the AIM7 workload.
>

Thank you Ingo, I will add more comments to make it more clear that it
is similar to AIM7 fserver workload.

> > +static const struct option options[] = {
> > + OPT_UINTEGER('s', "start", &start_nr_threads, "Numbers of processes to start"),
> > + OPT_UINTEGER('e', "end", &end_nr_threads, "Numbers of process to end"),
> > + OPT_UINTEGER('i', "increment", &increment_threads_by, "Number of threads to increment)"),
> > + OPT_UINTEGER('r', "runtime", &bench_dur, "Specify benchmark runtime in seconds"),
> > + OPT_END()
> > +};
>
> Is this the kind of parameters that AIM7 takes as well?
>
> In any case, this is a very nice benchmarking utility.

Yes these parameters are similar to what AIM7 take except for the
runtime parameter. AIM7 does not have the option to specify how long
the benchmark will run. Also in AIM7 you can also specify numbers of
jobs per run which i did not include since i added a runtime parameter
for the benchmark.



2014-10-08 22:15:01

by Tuan Bui

[permalink] [raw]
Subject: Re: [RFC PATCH] Perf Bench: Locking Microbenchmark

On Thu, 2014-10-02 at 21:57 -0700, Davidlohr Bueso wrote:
> On Wed, 2014-10-01 at 14:12 -0300, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Oct 01, 2014 at 07:28:32AM +0200, Ingo Molnar escreveu:
> > > If you compare an strace of AIM7 steady state and 'perf bench
> > > lock' steady state, is it comparable, i.e. do the syscalls and
> >
> > Isn't "lock" too generic? Isn't this stressing some specific lock and if
> > so shouldn't that be made abundantly clear in the 'perf bench' test name
> > and in the docs?
>
> yeah, and 'perf bench locking creat' just doesn't sound right.
>

Do you have any suggestion on how i should name it?

2014-10-09 07:21:54

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC PATCH] Perf Bench: Locking Microbenchmark


* Tuan Bui <[email protected]> wrote:

> > > +static const struct option options[] = {
> > > + OPT_UINTEGER('s', "start", &start_nr_threads, "Numbers of processes to start"),
> > > + OPT_UINTEGER('e', "end", &end_nr_threads, "Numbers of process to end"),
> > > + OPT_UINTEGER('i', "increment", &increment_threads_by, "Number of threads to increment)"),
> > > + OPT_UINTEGER('r', "runtime", &bench_dur, "Specify benchmark runtime in seconds"),
> > > + OPT_END()
> > > +};
> >
> > Is this the kind of parameters that AIM7 takes as well?
> >
> > In any case, this is a very nice benchmarking utility.
>
> Yes these parameters are similar to what AIM7 take except for
> the runtime parameter. AIM7 does not have the option to
> specify how long the benchmark will run. Also in AIM7 you can
> also specify numbers of jobs per run which i did not include
> since i added a runtime parameter for the benchmark.

It might make sense to add that parameter - which would only be
allowed if no runtime is specified, or so.

I.e. to make it as easy for people to use this new tool when they
come with AIM7 benchmarking knowledge.

Thanks,

Ingo