2016-10-24 20:57:19

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH -tip 0/2] perf bench futex updates

Hi Arnaldo, here's the requested patch that gets rid of the struct alignment
tackling the main source of cacheline bouncing. In addition a small fix to
bogus inputs.

Thanks!

Davidlohr Bueso (2):
perf/bench-futex: Avoid worker cacheline bouncing
perf/bench-futex: Sanitize numeric parameters

tools/perf/bench/futex-hash.c | 15 +++++++++------
tools/perf/bench/futex-lock-pi.c | 7 ++++++-
tools/perf/bench/futex-requeue.c | 2 ++
tools/perf/bench/futex-wake-parallel.c | 4 ++++
tools/perf/bench/futex-wake.c | 3 +++
tools/perf/bench/futex.h | 4 ++++
6 files changed, 28 insertions(+), 7 deletions(-)

--
2.6.6


2016-10-24 20:57:23

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 1/2] perf/bench-futex: Avoid worker cacheline bouncing

Sebastian noted that overhead for worker thread ops (throughput)
accounting was producing 'perf' to appear in the profiles, consuming
a non-trivial (ie 13%) amount of CPU. This is due to cacheline
bouncing due to the increment of w->ops. We can easily fix this by
just working on a local copy and updating the actual worker once
done running, and ready to show the program summary. There is no
danger of the worker being concurrent, so we can trust that no stale
value is being seen by another thread.

This also gets rid of the unnecesary cache alignment hack; its not
worth it.

Reported-by: Sebastian Andrzej Siewior <[email protected]>
Acked-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Davidlohr Bueso <[email protected]>
---
tools/perf/bench/futex-hash.c | 11 +++++------
tools/perf/bench/futex-lock-pi.c | 4 +++-
2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c
index d9e5e80bb4d0..da04b8c5568a 100644
--- a/tools/perf/bench/futex-hash.c
+++ b/tools/perf/bench/futex-hash.c
@@ -39,15 +39,12 @@ static unsigned int threads_starting;
static struct stats throughput_stats;
static pthread_cond_t thread_parent, thread_worker;

-#define SMP_CACHE_BYTES 256
-#define __cacheline_aligned __attribute__ ((aligned (SMP_CACHE_BYTES)))
-
struct worker {
int tid;
u_int32_t *futex;
pthread_t thread;
unsigned long ops;
-} __cacheline_aligned;
+};

static const struct option options[] = {
OPT_UINTEGER('t', "threads", &nthreads, "Specify amount of threads"),
@@ -66,8 +63,9 @@ static const char * const bench_futex_hash_usage[] = {
static void *workerfn(void *arg)
{
int ret;
- unsigned int i;
struct worker *w = (struct worker *) arg;
+ unsigned int i;
+ unsigned long ops = w->ops; /* avoid cacheline bouncing */

pthread_mutex_lock(&thread_lock);
threads_starting--;
@@ -77,7 +75,7 @@ static void *workerfn(void *arg)
pthread_mutex_unlock(&thread_lock);

do {
- for (i = 0; i < nfutexes; i++, w->ops++) {
+ for (i = 0; i < nfutexes; i++, ops++) {
/*
* We want the futex calls to fail in order to stress
* the hashing of uaddr and not measure other steps,
@@ -91,6 +89,7 @@ static void *workerfn(void *arg)
}
} while (!done);

+ w->ops = ops;
return NULL;
}

diff --git a/tools/perf/bench/futex-lock-pi.c b/tools/perf/bench/futex-lock-pi.c
index 936d89d30483..7032e4643c65 100644
--- a/tools/perf/bench/futex-lock-pi.c
+++ b/tools/perf/bench/futex-lock-pi.c
@@ -75,6 +75,7 @@ static void toggle_done(int sig __maybe_unused,
static void *workerfn(void *arg)
{
struct worker *w = (struct worker *) arg;
+ unsigned long ops = w->ops;

pthread_mutex_lock(&thread_lock);
threads_starting--;
@@ -103,9 +104,10 @@ static void *workerfn(void *arg)
if (ret && !silent)
warn("thread %d: Could not unlock pi-lock for %p (%d)",
w->tid, w->futex, ret);
- w->ops++; /* account for thread's share of work */
+ ops++; /* account for thread's share of work */
} while (!done);

+ w->ops = ops;
return NULL;
}

--
2.6.6

2016-10-24 20:57:21

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 2/2] perf/bench-futex: Sanitize numeric parameters

This gets rid of oddities such as:

perf bench futex hash -t -4
perf: calloc: Cannot allocate memory

Runtime (and many more) are equally busted, ie run for bogus amounts
of time. Just use the abs, instead of, for example erroring out.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
tools/perf/bench/futex-hash.c | 4 ++++
tools/perf/bench/futex-lock-pi.c | 3 +++
tools/perf/bench/futex-requeue.c | 2 ++
tools/perf/bench/futex-wake-parallel.c | 4 ++++
tools/perf/bench/futex-wake.c | 3 +++
tools/perf/bench/futex.h | 4 ++++
6 files changed, 20 insertions(+)

diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c
index da04b8c5568a..bfbb6b5f609c 100644
--- a/tools/perf/bench/futex-hash.c
+++ b/tools/perf/bench/futex-hash.c
@@ -130,6 +130,8 @@ int bench_futex_hash(int argc, const char **argv,
}

ncpus = sysconf(_SC_NPROCESSORS_ONLN);
+ nsecs = futexbench_sanitize_numeric(nsecs);
+ nfutexes = futexbench_sanitize_numeric(nfutexes);

sigfillset(&act.sa_mask);
act.sa_sigaction = toggle_done;
@@ -137,6 +139,8 @@ int bench_futex_hash(int argc, const char **argv,

if (!nthreads) /* default to the number of CPUs */
nthreads = ncpus;
+ else
+ nthreads = futexbench_sanitize_numeric(nthreads);

worker = calloc(nthreads, sizeof(*worker));
if (!worker)
diff --git a/tools/perf/bench/futex-lock-pi.c b/tools/perf/bench/futex-lock-pi.c
index 7032e4643c65..465012b320ee 100644
--- a/tools/perf/bench/futex-lock-pi.c
+++ b/tools/perf/bench/futex-lock-pi.c
@@ -152,6 +152,7 @@ int bench_futex_lock_pi(int argc, const char **argv,
goto err;

ncpus = sysconf(_SC_NPROCESSORS_ONLN);
+ nsecs = futexbench_sanitize_numeric(nsecs);

sigfillset(&act.sa_mask);
act.sa_sigaction = toggle_done;
@@ -159,6 +160,8 @@ int bench_futex_lock_pi(int argc, const char **argv,

if (!nthreads)
nthreads = ncpus;
+ else
+ nthreads = futexbench_sanitize_numeric(nthreads);

worker = calloc(nthreads, sizeof(*worker));
if (!worker)
diff --git a/tools/perf/bench/futex-requeue.c b/tools/perf/bench/futex-requeue.c
index 2b9705a8734c..fd4ee95b689a 100644
--- a/tools/perf/bench/futex-requeue.c
+++ b/tools/perf/bench/futex-requeue.c
@@ -128,6 +128,8 @@ int bench_futex_requeue(int argc, const char **argv,

if (!nthreads)
nthreads = ncpus;
+ else
+ nthreads = futexbench_sanitize_numeric(nthreads);

worker = calloc(nthreads, sizeof(*worker));
if (!worker)
diff --git a/tools/perf/bench/futex-wake-parallel.c b/tools/perf/bench/futex-wake-parallel.c
index 2c8fa67ad537..beaa6c142477 100644
--- a/tools/perf/bench/futex-wake-parallel.c
+++ b/tools/perf/bench/futex-wake-parallel.c
@@ -217,8 +217,12 @@ int bench_futex_wake_parallel(int argc, const char **argv,
sigaction(SIGINT, &act, NULL);

ncpus = sysconf(_SC_NPROCESSORS_ONLN);
+ nwaking_threads = futexbench_sanitize_numeric(nwaking_threads);
+
if (!nblocked_threads)
nblocked_threads = ncpus;
+ else
+ nblocked_threads = futexbench_sanitize_numeric(nblocked_threads);

/* some sanity checks */
if (nwaking_threads > nblocked_threads || !nwaking_threads)
diff --git a/tools/perf/bench/futex-wake.c b/tools/perf/bench/futex-wake.c
index e246b1b8388a..46efcb98b5a4 100644
--- a/tools/perf/bench/futex-wake.c
+++ b/tools/perf/bench/futex-wake.c
@@ -129,6 +129,7 @@ int bench_futex_wake(int argc, const char **argv,
}

ncpus = sysconf(_SC_NPROCESSORS_ONLN);
+ nwakes = futexbench_sanitize_numeric(nwakes);

sigfillset(&act.sa_mask);
act.sa_sigaction = toggle_done;
@@ -136,6 +137,8 @@ int bench_futex_wake(int argc, const char **argv,

if (!nthreads)
nthreads = ncpus;
+ else
+ nthreads = futexbench_sanitize_numeric(nthreads);

worker = calloc(nthreads, sizeof(*worker));
if (!worker)
diff --git a/tools/perf/bench/futex.h b/tools/perf/bench/futex.h
index b2e06d1190d0..ba7c735c0c62 100644
--- a/tools/perf/bench/futex.h
+++ b/tools/perf/bench/futex.h
@@ -7,6 +7,7 @@
#ifndef _FUTEX_H
#define _FUTEX_H

+#include <stdlib.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/types.h>
@@ -99,4 +100,7 @@ static inline int pthread_attr_setaffinity_np(pthread_attr_t *attr,
}
#endif

+/* User input sanitation */
+#define futexbench_sanitize_numeric(__n) abs((__n))
+
#endif /* _FUTEX_H */
--
2.6.6

Subject: [tip:perf/core] perf bench futex: Avoid worker cacheline bouncing

Commit-ID: e2e1680fda1573ebfdd6bba5d58f978044746993
Gitweb: http://git.kernel.org/tip/e2e1680fda1573ebfdd6bba5d58f978044746993
Author: Davidlohr Bueso <[email protected]>
AuthorDate: Mon, 24 Oct 2016 13:56:52 -0700
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Tue, 25 Oct 2016 09:50:47 -0300

perf bench futex: Avoid worker cacheline bouncing

Sebastian noted that overhead for worker thread ops (throughput)
accounting was producing 'perf' to appear in the profiles, consuming a
non-trivial (i.e. 13%) amount of CPU.

This is due to cacheline bouncing due to the increment of w->ops.

We can easily fix this by just working on a local copy and updating the
actual worker once done running, and ready to show the program summary.
There is no danger of the worker being concurrent, so we can trust that
no stale value is being seen by another thread.

This also gets rid of the unnecessary cache alignment hack; its not
worth it.

Reported-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Davidlohr Bueso <[email protected]>
Acked-by: Sebastian Andrzej Siewior <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/bench/futex-hash.c | 11 +++++------
tools/perf/bench/futex-lock-pi.c | 4 +++-
2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c
index d9e5e80..da04b8c 100644
--- a/tools/perf/bench/futex-hash.c
+++ b/tools/perf/bench/futex-hash.c
@@ -39,15 +39,12 @@ static unsigned int threads_starting;
static struct stats throughput_stats;
static pthread_cond_t thread_parent, thread_worker;

-#define SMP_CACHE_BYTES 256
-#define __cacheline_aligned __attribute__ ((aligned (SMP_CACHE_BYTES)))
-
struct worker {
int tid;
u_int32_t *futex;
pthread_t thread;
unsigned long ops;
-} __cacheline_aligned;
+};

static const struct option options[] = {
OPT_UINTEGER('t', "threads", &nthreads, "Specify amount of threads"),
@@ -66,8 +63,9 @@ static const char * const bench_futex_hash_usage[] = {
static void *workerfn(void *arg)
{
int ret;
- unsigned int i;
struct worker *w = (struct worker *) arg;
+ unsigned int i;
+ unsigned long ops = w->ops; /* avoid cacheline bouncing */

pthread_mutex_lock(&thread_lock);
threads_starting--;
@@ -77,7 +75,7 @@ static void *workerfn(void *arg)
pthread_mutex_unlock(&thread_lock);

do {
- for (i = 0; i < nfutexes; i++, w->ops++) {
+ for (i = 0; i < nfutexes; i++, ops++) {
/*
* We want the futex calls to fail in order to stress
* the hashing of uaddr and not measure other steps,
@@ -91,6 +89,7 @@ static void *workerfn(void *arg)
}
} while (!done);

+ w->ops = ops;
return NULL;
}

diff --git a/tools/perf/bench/futex-lock-pi.c b/tools/perf/bench/futex-lock-pi.c
index 936d89d..7032e46 100644
--- a/tools/perf/bench/futex-lock-pi.c
+++ b/tools/perf/bench/futex-lock-pi.c
@@ -75,6 +75,7 @@ static void toggle_done(int sig __maybe_unused,
static void *workerfn(void *arg)
{
struct worker *w = (struct worker *) arg;
+ unsigned long ops = w->ops;

pthread_mutex_lock(&thread_lock);
threads_starting--;
@@ -103,9 +104,10 @@ static void *workerfn(void *arg)
if (ret && !silent)
warn("thread %d: Could not unlock pi-lock for %p (%d)",
w->tid, w->futex, ret);
- w->ops++; /* account for thread's share of work */
+ ops++; /* account for thread's share of work */
} while (!done);

+ w->ops = ops;
return NULL;
}


Subject: [tip:perf/core] perf bench futex: Sanitize numeric parameters

Commit-ID: 60758d6668b3e2fa8e5fd143d24d0425203d007e
Gitweb: http://git.kernel.org/tip/60758d6668b3e2fa8e5fd143d24d0425203d007e
Author: Davidlohr Bueso <[email protected]>
AuthorDate: Mon, 24 Oct 2016 13:56:53 -0700
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Tue, 25 Oct 2016 09:50:53 -0300

perf bench futex: Sanitize numeric parameters

This gets rid of oddities such as:

perf bench futex hash -t -4
perf: calloc: Cannot allocate memory

Runtime (and many more) are equally busted, i.e. run for bogus amounts of
time. Just use the abs, instead of, for example errorring out.

Committer note:

After the patch:

$ perf bench futex hash -t -4
# Running 'futex/hash' benchmark:
Run summary [PID 10178]: 4 threads, each operating on 1024 [private] futexes for 10 secs.

[thread 0] futexes: 0x34f9fa0 ... 0x34faf9c [ 4702208 ops/sec ]
[thread 1] futexes: 0x34fb140 ... 0x34fc13c [ 4707020 ops/sec ]
[thread 2] futexes: 0x34fc2e0 ... 0x34fd2dc [ 4711526 ops/sec ]
[thread 3] futexes: 0x34fd480 ... 0x34fe47c [ 4709683 ops/sec ]

Averaged 4707609 operations/sec (+- 0.04%), total secs = 10
$

Signed-off-by: Davidlohr Bueso <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/bench/futex-hash.c | 4 ++++
tools/perf/bench/futex-lock-pi.c | 3 +++
tools/perf/bench/futex-requeue.c | 2 ++
tools/perf/bench/futex-wake-parallel.c | 4 ++++
tools/perf/bench/futex-wake.c | 3 +++
tools/perf/bench/futex.h | 4 ++++
6 files changed, 20 insertions(+)

diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c
index da04b8c..bfbb6b5 100644
--- a/tools/perf/bench/futex-hash.c
+++ b/tools/perf/bench/futex-hash.c
@@ -130,6 +130,8 @@ int bench_futex_hash(int argc, const char **argv,
}

ncpus = sysconf(_SC_NPROCESSORS_ONLN);
+ nsecs = futexbench_sanitize_numeric(nsecs);
+ nfutexes = futexbench_sanitize_numeric(nfutexes);

sigfillset(&act.sa_mask);
act.sa_sigaction = toggle_done;
@@ -137,6 +139,8 @@ int bench_futex_hash(int argc, const char **argv,

if (!nthreads) /* default to the number of CPUs */
nthreads = ncpus;
+ else
+ nthreads = futexbench_sanitize_numeric(nthreads);

worker = calloc(nthreads, sizeof(*worker));
if (!worker)
diff --git a/tools/perf/bench/futex-lock-pi.c b/tools/perf/bench/futex-lock-pi.c
index 7032e46..465012b 100644
--- a/tools/perf/bench/futex-lock-pi.c
+++ b/tools/perf/bench/futex-lock-pi.c
@@ -152,6 +152,7 @@ int bench_futex_lock_pi(int argc, const char **argv,
goto err;

ncpus = sysconf(_SC_NPROCESSORS_ONLN);
+ nsecs = futexbench_sanitize_numeric(nsecs);

sigfillset(&act.sa_mask);
act.sa_sigaction = toggle_done;
@@ -159,6 +160,8 @@ int bench_futex_lock_pi(int argc, const char **argv,

if (!nthreads)
nthreads = ncpus;
+ else
+ nthreads = futexbench_sanitize_numeric(nthreads);

worker = calloc(nthreads, sizeof(*worker));
if (!worker)
diff --git a/tools/perf/bench/futex-requeue.c b/tools/perf/bench/futex-requeue.c
index 2b9705a..fd4ee95 100644
--- a/tools/perf/bench/futex-requeue.c
+++ b/tools/perf/bench/futex-requeue.c
@@ -128,6 +128,8 @@ int bench_futex_requeue(int argc, const char **argv,

if (!nthreads)
nthreads = ncpus;
+ else
+ nthreads = futexbench_sanitize_numeric(nthreads);

worker = calloc(nthreads, sizeof(*worker));
if (!worker)
diff --git a/tools/perf/bench/futex-wake-parallel.c b/tools/perf/bench/futex-wake-parallel.c
index 2c8fa67..beaa6c1 100644
--- a/tools/perf/bench/futex-wake-parallel.c
+++ b/tools/perf/bench/futex-wake-parallel.c
@@ -217,8 +217,12 @@ int bench_futex_wake_parallel(int argc, const char **argv,
sigaction(SIGINT, &act, NULL);

ncpus = sysconf(_SC_NPROCESSORS_ONLN);
+ nwaking_threads = futexbench_sanitize_numeric(nwaking_threads);
+
if (!nblocked_threads)
nblocked_threads = ncpus;
+ else
+ nblocked_threads = futexbench_sanitize_numeric(nblocked_threads);

/* some sanity checks */
if (nwaking_threads > nblocked_threads || !nwaking_threads)
diff --git a/tools/perf/bench/futex-wake.c b/tools/perf/bench/futex-wake.c
index e246b1b..46efcb9 100644
--- a/tools/perf/bench/futex-wake.c
+++ b/tools/perf/bench/futex-wake.c
@@ -129,6 +129,7 @@ int bench_futex_wake(int argc, const char **argv,
}

ncpus = sysconf(_SC_NPROCESSORS_ONLN);
+ nwakes = futexbench_sanitize_numeric(nwakes);

sigfillset(&act.sa_mask);
act.sa_sigaction = toggle_done;
@@ -136,6 +137,8 @@ int bench_futex_wake(int argc, const char **argv,

if (!nthreads)
nthreads = ncpus;
+ else
+ nthreads = futexbench_sanitize_numeric(nthreads);

worker = calloc(nthreads, sizeof(*worker));
if (!worker)
diff --git a/tools/perf/bench/futex.h b/tools/perf/bench/futex.h
index b2e06d1..ba7c735 100644
--- a/tools/perf/bench/futex.h
+++ b/tools/perf/bench/futex.h
@@ -7,6 +7,7 @@
#ifndef _FUTEX_H
#define _FUTEX_H

+#include <stdlib.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/types.h>
@@ -99,4 +100,7 @@ static inline int pthread_attr_setaffinity_np(pthread_attr_t *attr,
}
#endif

+/* User input sanitation */
+#define futexbench_sanitize_numeric(__n) abs((__n))
+
#endif /* _FUTEX_H */