2010-11-05 17:06:31

by Hitoshi Mitake

[permalink] [raw]
Subject: [PATCH] perf bench: add --prefault option for causing page faults before benchmark

This patch adds --prefault option to perf bench mem memcpy.
If user specify this option to perf bench mem memcpy, overhead of
page faults will be removed from the score of memcpy().

Example of usage:
| % ./perf bench mem memcpy -l 500MB
| # Running mem/memcpy benchmark...
| # Copying 500MB Bytes from 0x7fc036749010 to 0x7fc055b4a010 ...
|
| 628.526821 MB/Sec
| mitake@X201i:~/linux/.../tools/perf% ./perf bench mem memcpy -l 500MB --prefault
| # Running mem/memcpy benchmark...
| # Copying 500MB Bytes from 0x7ff1b45e2010 to 0x7ff1d39e3010 ...
|
| 4.849256 GB/Sec

Signed-off-by: Hitoshi Mitake <[email protected]>
Cc: Ma Ling <[email protected]>
Cc: Zhao Yakui <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: H. Peter Anvin <[email protected]>
---
tools/perf/bench/mem-memcpy.c | 9 ++++++++-
1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/tools/perf/bench/mem-memcpy.c b/tools/perf/bench/mem-memcpy.c
index 38dae74..be31ddb 100644
--- a/tools/perf/bench/mem-memcpy.c
+++ b/tools/perf/bench/mem-memcpy.c
@@ -23,8 +23,9 @@

static const char *length_str = "1MB";
static const char *routine = "default";
-static bool use_clock = false;
+static bool use_clock;
static int clock_fd;
+static bool prefault;

static const struct option options[] = {
OPT_STRING('l', "length", &length_str, "1MB",
@@ -34,6 +35,8 @@ static const struct option options[] = {
"Specify routine to copy"),
OPT_BOOLEAN('c', "clock", &use_clock,
"Use CPU clock for measuring"),
+ OPT_BOOLEAN('p', "prefault", &prefault,
+ "Cause page faults before memcpy()"),
OPT_END()
};

@@ -139,6 +142,10 @@ int bench_mem_memcpy(int argc, const char **argv,
length_str, src, dst);
}

+
+ if (prefault)
+ routines[i].fn(dst, src, length);
+
if (use_clock) {
init_clock();
clock_start = get_clock();
--
1.7.1.1


2010-11-10 09:30:04

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] perf bench: add --prefault option for causing page faults before benchmark


* Hitoshi Mitake <[email protected]> wrote:

> This patch adds --prefault option to perf bench mem memcpy.
> If user specify this option to perf bench mem memcpy, overhead of
> page faults will be removed from the score of memcpy().
>
> Example of usage:
> | % ./perf bench mem memcpy -l 500MB
> | # Running mem/memcpy benchmark...
> | # Copying 500MB Bytes from 0x7fc036749010 to 0x7fc055b4a010 ...
> |
> | 628.526821 MB/Sec
> | mitake@X201i:~/linux/.../tools/perf% ./perf bench mem memcpy -l 500MB --prefault
> | # Running mem/memcpy benchmark...
> | # Copying 500MB Bytes from 0x7ff1b45e2010 to 0x7ff1d39e3010 ...
> |
> | 4.849256 GB/Sec

Ok, looks rather useful.

We are rather close to being able to apply these bits. We need a resolution for the
arch/x86/lib/memcpy_64.S details. The ugliest are these kinds of #ifdefs:

+#ifndef PERF_BENCH
.Lmemcpy_e:
.previous
+#endif

What happens if we keep that label in place?

This:

+#ifndef PERF_BENCH
ENTRY(__memcpy)
ENTRY(memcpy)
CFI_STARTPROC
+#else
+ .globl memcpy_x86_64_unrolled
+memcpy_x86_64_unrolled:
+#endif

Could be removed if you defined an ENTRY() macro in perf, right?

This:

+#ifndef PERF_BENCH
+
CFI_ENDPROC
ENDPROC(memcpy)
ENDPROC(__memcpy)

Could be solved by defining ENDPROC()/etc. macros in perf, right?

We could remove this #ifdef:

+#ifndef PERF_BENCH
+
#include <linux/linkage.h>

#include <asm/cpufeature.h>
#include <asm/dwarf2.h>

+#endif /* PERF_BENCH */

if you added empty linkage.h, cpufeature.h and dwarf2.h files as
tools/perf/util/include/linux/linkage.h, tools/perf/util/include/asm/cpufeature.h.

That linkage.h file could even contain a short perf version of the ENTRY() macro,
etc.

That way we can avoid having to touch arch/x86/lib/memcpy_64.S altogether.

Thanks,

Ingo

2010-11-15 15:58:49

by Hitoshi Mitake

[permalink] [raw]
Subject: Re: [PATCH] perf bench: add --prefault option for causing page faults before benchmark

On 2010年11月10日 18:29, Ingo Molnar wrote:
>
> * Hitoshi Mitake<[email protected]> wrote:
>
>> This patch adds --prefault option to perf bench mem memcpy.
>> If user specify this option to perf bench mem memcpy, overhead of
>> page faults will be removed from the score of memcpy().
>>
>> Example of usage:
>> | % ./perf bench mem memcpy -l 500MB
>> | # Running mem/memcpy benchmark...
>> | # Copying 500MB Bytes from 0x7fc036749010 to 0x7fc055b4a010 ...
>> |
>> | 628.526821 MB/Sec
>> | mitake@X201i:~/linux/.../tools/perf% ./perf bench mem memcpy -l 500MB --prefault
>> | # Running mem/memcpy benchmark...
>> | # Copying 500MB Bytes from 0x7ff1b45e2010 to 0x7ff1d39e3010 ...
>> |
>> | 4.849256 GB/Sec
>
> Ok, looks rather useful.
>
> We are rather close to being able to apply these bits. We need a resolution for the
> arch/x86/lib/memcpy_64.S details. The ugliest are these kinds of #ifdefs:
>
> +#ifndef PERF_BENCH
> .Lmemcpy_e:
> .previous
> +#endif
>
> What happens if we keep that label in place?

This is the part of objdump -D arch/x86/lib/memcpy_64.o,

Disassembly of section .altinstr_replacement:

0000000000000000 <.altinstr_replacement>:
0: 48 89 f8 mov %rdi,%rax
3: 89 d1 mov %edx,%ecx
5: c1 e9 03 shr $0x3,%ecx
8: 83 e2 07 and $0x7,%edx
b: f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi)
e: 89 d1 mov %edx,%ecx
10: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi)
12: c3 retq

I didn't know that we can use the symbol name which start with '.',
and it seems that such a symbol is eliminated from object file.

We can know the start address of .Lmemcpy_c, the rep version of memcpy()
because the start address is stored in another section,
.altinstructions like this.

These information can be exploited for our purose, I'll try it.

>
> This:
>
> +#ifndef PERF_BENCH
> ENTRY(__memcpy)
> ENTRY(memcpy)
> CFI_STARTPROC
> +#else
> + .globl memcpy_x86_64_unrolled
> +memcpy_x86_64_unrolled:
> +#endif
>
> Could be removed if you defined an ENTRY() macro in perf, right?
>
> This:
>
> +#ifndef PERF_BENCH
> +
> CFI_ENDPROC
> ENDPROC(memcpy)
> ENDPROC(__memcpy)
>
> Could be solved by defining ENDPROC()/etc. macros in perf, right?
>
> We could remove this #ifdef:
>
> +#ifndef PERF_BENCH
> +
> #include<linux/linkage.h>
>
> #include<asm/cpufeature.h>
> #include<asm/dwarf2.h>
>
> +#endif /* PERF_BENCH */
>
> if you added empty linkage.h, cpufeature.h and dwarf2.h files as
> tools/perf/util/include/linux/linkage.h, tools/perf/util/include/asm/cpufeature.h.
>
> That linkage.h file could even contain a short perf version of the ENTRY() macro,
> etc.
>
> That way we can avoid having to touch arch/x86/lib/memcpy_64.S altogether.

Thanks for your advice. adding empty headers and macros
will be the smart way to include memcpy_64.S without modification.

Thanks,