On 2010年10月31日 04:23, Ingo Molnar wrote:
>
> * Hitoshi Mitake<[email protected]> wrote:
>
>> This patch adds new file: mem-memcpy-x86-64-asm.S
>> for x86-64 specific memcpy() benchmarking.
>> Added new benchmarks are,
>> x86-64-rep: memcpy() implemented with rep instruction
>> x86-64-unrolled: unrolled memcpy()
>>
>> Original idea of including the source files of kernel
>> for benchmarking is suggested by Ingo Molnar.
>> This is more effective than write-once programs for quantitative
>> evaluation of in-kernel, little and leaf functions called high frequently.
>> Because perf bench is in kernel source tree and executing it
>> on various hardwares, especially new model CPUs, is easy.
>>
>> This way can also be used for other functions of kernel e.g. checksum functions.
>>
>> Example of usage on Core i3 M330:
>>
>> | % ./perf bench mem memcpy -l 500MB
>> | # Running mem/memcpy benchmark...
>> | # Copying 500MB Bytes from 0x7f911f94c010 to 0x7f913ed4d010 ...
>> |
>> | 578.732506 MB/Sec
>> | % ./perf bench mem memcpy -l 500MB -r x86-64-rep
>> | # Running mem/memcpy benchmark...
>> | # Copying 500MB Bytes from 0x7fb4b6fe4010 to 0x7fb4d63e5010 ...
>> |
>> | 738.184980 MB/Sec
>> | % ./perf bench mem memcpy -l 500MB -r x86-64-unrolled
>> | # Running mem/memcpy benchmark...
>> | # Copying 500MB Bytes from 0x7f6f2e668010 to 0x7f6f4da69010 ...
>> |
>> | 767.483269 MB/Sec
>>
>> This shows clearly that unrolled memcpy() is efficient
>> than rep version and glibc's one :)
>
> Hey, really cool output :-)
>
> Might also make sense to measure Ma Ling's patched version?
Does Ma Ling's patched version mean,
http://marc.info/?l=linux-kernel&m=128652296500989&w=2
the memcpy applied the patch of the URL?
(It seems that this patch was written by Miao Xie.)
I'll include the result of patched version in the next post.
>
>> # checkpatch.pl warns about two externs in bench/mem-memcpy.c
>> # added by this patch. But I think it is no problem.
>
> You should put these:
>
> +#ifdef ARCH_X86_64
> +extern void *memcpy_x86_64_unrolled(void *to, const void *from, size_t len);
> +extern void *memcpy_x86_64_rep(void *to, const void *from, size_t len);
> +#endif
>
> into a .h file - a new one if needed.
>
> That will make both checkpatch and me happier ;-)
>
OK, I'll separate these files.
BTW, I found really interesting evaluation result.
Current results of "perf bench mem memcpy" include
the overhead of page faults because the measured memcpy()
is the first access to allocated memory area.
I tested the another version of perf bench mem memcpy,
which does memcpy() before measured memcpy() for removing
the overhead come from page faults.
And this is the result:
% ./perf bench mem memcpy -l 500MB -r x86-64-unrolled
# Running mem/memcpy benchmark...
# Copying 500MB Bytes from 0x7f19d488f010 to 0x7f19f3c90010 ...
4.608340 GB/Sec
% ./perf bench mem memcpy -l 500MB
# Running mem/memcpy benchmark...
# Copying 500MB Bytes from 0x7f696c3cc010 to 0x7f698b7cd010 ...
4.856442 GB/Sec
% ./perf bench mem memcpy -l 500MB -r x86-64-rep
# Running mem/memcpy benchmark...
# Copying 500MB Bytes from 0x7f45d6cff010 to 0x7f45f6100010 ...
6.024445 GB/Sec
The relation of scores reversed!
I cannot explain the cause of this result, and
this is really interesting phenomenon.
So I'd like to add new command line option,
like "--pre-page-faults" to perf bench mem memcpy,
for doing memcpy() before measured memcpy().
How do you think about this idea?
Thanks,
* Hitoshi Mitake <[email protected]> wrote:
> On 2010年10月31日 04:23, Ingo Molnar wrote:
> >
> >* Hitoshi Mitake<[email protected]> wrote:
> >
> >>This patch adds new file: mem-memcpy-x86-64-asm.S
> >>for x86-64 specific memcpy() benchmarking.
> >>Added new benchmarks are,
> >> x86-64-rep: memcpy() implemented with rep instruction
> >> x86-64-unrolled: unrolled memcpy()
> >>
> >>Original idea of including the source files of kernel
> >>for benchmarking is suggested by Ingo Molnar.
> >>This is more effective than write-once programs for quantitative
> >>evaluation of in-kernel, little and leaf functions called high frequently.
> >>Because perf bench is in kernel source tree and executing it
> >>on various hardwares, especially new model CPUs, is easy.
> >>
> >>This way can also be used for other functions of kernel e.g. checksum functions.
> >>
> >>Example of usage on Core i3 M330:
> >>
> >>| % ./perf bench mem memcpy -l 500MB
> >>| # Running mem/memcpy benchmark...
> >>| # Copying 500MB Bytes from 0x7f911f94c010 to 0x7f913ed4d010 ...
> >>|
> >>| 578.732506 MB/Sec
> >>| % ./perf bench mem memcpy -l 500MB -r x86-64-rep
> >>| # Running mem/memcpy benchmark...
> >>| # Copying 500MB Bytes from 0x7fb4b6fe4010 to 0x7fb4d63e5010 ...
> >>|
> >>| 738.184980 MB/Sec
> >>| % ./perf bench mem memcpy -l 500MB -r x86-64-unrolled
> >>| # Running mem/memcpy benchmark...
> >>| # Copying 500MB Bytes from 0x7f6f2e668010 to 0x7f6f4da69010 ...
> >>|
> >>| 767.483269 MB/Sec
> >>
> >>This shows clearly that unrolled memcpy() is efficient
> >>than rep version and glibc's one :)
> >
> >Hey, really cool output :-)
> >
> >Might also make sense to measure Ma Ling's patched version?
>
> Does Ma Ling's patched version mean,
>
> http://marc.info/?l=linux-kernel&m=128652296500989&w=2
>
> the memcpy applied the patch of the URL?
> (It seems that this patch was written by Miao Xie.)
>
> I'll include the result of patched version in the next post.
(Indeed it is Miao Xie - sorry!)
> >># checkpatch.pl warns about two externs in bench/mem-memcpy.c
> >># added by this patch. But I think it is no problem.
> >
> >You should put these:
> >
> > +#ifdef ARCH_X86_64
> > +extern void *memcpy_x86_64_unrolled(void *to, const void *from, size_t len);
> > +extern void *memcpy_x86_64_rep(void *to, const void *from, size_t len);
> > +#endif
> >
> >into a .h file - a new one if needed.
> >
> >That will make both checkpatch and me happier ;-)
> >
>
> OK, I'll separate these files.
>
> BTW, I found really interesting evaluation result.
> Current results of "perf bench mem memcpy" include
> the overhead of page faults because the measured memcpy()
> is the first access to allocated memory area.
>
> I tested the another version of perf bench mem memcpy,
> which does memcpy() before measured memcpy() for removing
> the overhead come from page faults.
>
> And this is the result:
>
> % ./perf bench mem memcpy -l 500MB -r x86-64-unrolled
> # Running mem/memcpy benchmark...
> # Copying 500MB Bytes from 0x7f19d488f010 to 0x7f19f3c90010 ...
>
> 4.608340 GB/Sec
>
> % ./perf bench mem memcpy -l 500MB
> # Running mem/memcpy benchmark...
> # Copying 500MB Bytes from 0x7f696c3cc010 to 0x7f698b7cd010 ...
>
> 4.856442 GB/Sec
>
> % ./perf bench mem memcpy -l 500MB -r x86-64-rep
> # Running mem/memcpy benchmark...
> # Copying 500MB Bytes from 0x7f45d6cff010 to 0x7f45f6100010 ...
>
> 6.024445 GB/Sec
>
> The relation of scores reversed!
> I cannot explain the cause of this result, and
> this is really interesting phenomenon.
Interesting indeed, and it would be nice to analyse that! (It should be possible,
using various PMU metrics in a clever way, to figure out what's happening inside the
CPU, right?)
> So I'd like to add new command line option,
> like "--pre-page-faults" to perf bench mem memcpy,
> for doing memcpy() before measured memcpy().
>
> How do you think about this idea?
Agreed. (Maybe name it --prefault, as 'prefaulting' is the term we generally use for
things like this.)
An even better solution would be to output _both_ results by default, so that people
can see both characteristics at a glance?
Thanks,
Ingo
On 2010年11月01日 18:02, Ingo Molnar wrote:
>
> * Hitoshi Mitake<[email protected]> wrote:
>
>> On 2010年10月31日 04:23, Ingo Molnar wrote:
>>>
>>> * Hitoshi Mitake<[email protected]> wrote:
>>>
>>>> This patch adds new file: mem-memcpy-x86-64-asm.S
>>>> for x86-64 specific memcpy() benchmarking.
>>>> Added new benchmarks are,
>>>> x86-64-rep: memcpy() implemented with rep instruction
>>>> x86-64-unrolled: unrolled memcpy()
>>>>
>>>> Original idea of including the source files of kernel
>>>> for benchmarking is suggested by Ingo Molnar.
>>>> This is more effective than write-once programs for quantitative
>>>> evaluation of in-kernel, little and leaf functions called high frequently.
>>>> Because perf bench is in kernel source tree and executing it
>>>> on various hardwares, especially new model CPUs, is easy.
>>>>
>>>> This way can also be used for other functions of kernel e.g. checksum functions.
>>>>
>>>> Example of usage on Core i3 M330:
>>>>
>>>> | % ./perf bench mem memcpy -l 500MB
>>>> | # Running mem/memcpy benchmark...
>>>> | # Copying 500MB Bytes from 0x7f911f94c010 to 0x7f913ed4d010 ...
>>>> |
>>>> | 578.732506 MB/Sec
>>>> | % ./perf bench mem memcpy -l 500MB -r x86-64-rep
>>>> | # Running mem/memcpy benchmark...
>>>> | # Copying 500MB Bytes from 0x7fb4b6fe4010 to 0x7fb4d63e5010 ...
>>>> |
>>>> | 738.184980 MB/Sec
>>>> | % ./perf bench mem memcpy -l 500MB -r x86-64-unrolled
>>>> | # Running mem/memcpy benchmark...
>>>> | # Copying 500MB Bytes from 0x7f6f2e668010 to 0x7f6f4da69010 ...
>>>> |
>>>> | 767.483269 MB/Sec
>>>>
>>>> This shows clearly that unrolled memcpy() is efficient
>>>> than rep version and glibc's one :)
>>>
>>> Hey, really cool output :-)
>>>
>>> Might also make sense to measure Ma Ling's patched version?
>>
>> Does Ma Ling's patched version mean,
>>
>> http://marc.info/?l=linux-kernel&m=128652296500989&w=2
>>
>> the memcpy applied the patch of the URL?
>> (It seems that this patch was written by Miao Xie.)
>>
>> I'll include the result of patched version in the next post.
>
> (Indeed it is Miao Xie - sorry!)
>
>>>> # checkpatch.pl warns about two externs in bench/mem-memcpy.c
>>>> # added by this patch. But I think it is no problem.
>>>
>>> You should put these:
>>>
>>> +#ifdef ARCH_X86_64
>>> +extern void *memcpy_x86_64_unrolled(void *to, const void *from, size_t len);
>>> +extern void *memcpy_x86_64_rep(void *to, const void *from, size_t len);
>>> +#endif
>>>
>>> into a .h file - a new one if needed.
>>>
>>> That will make both checkpatch and me happier ;-)
>>>
>>
>> OK, I'll separate these files.
>>
>> BTW, I found really interesting evaluation result.
>> Current results of "perf bench mem memcpy" include
>> the overhead of page faults because the measured memcpy()
>> is the first access to allocated memory area.
>>
>> I tested the another version of perf bench mem memcpy,
>> which does memcpy() before measured memcpy() for removing
>> the overhead come from page faults.
>>
>> And this is the result:
>>
>> % ./perf bench mem memcpy -l 500MB -r x86-64-unrolled
>> # Running mem/memcpy benchmark...
>> # Copying 500MB Bytes from 0x7f19d488f010 to 0x7f19f3c90010 ...
>>
>> 4.608340 GB/Sec
>>
>> % ./perf bench mem memcpy -l 500MB
>> # Running mem/memcpy benchmark...
>> # Copying 500MB Bytes from 0x7f696c3cc010 to 0x7f698b7cd010 ...
>>
>> 4.856442 GB/Sec
>>
>> % ./perf bench mem memcpy -l 500MB -r x86-64-rep
>> # Running mem/memcpy benchmark...
>> # Copying 500MB Bytes from 0x7f45d6cff010 to 0x7f45f6100010 ...
>>
>> 6.024445 GB/Sec
>>
>> The relation of scores reversed!
>> I cannot explain the cause of this result, and
>> this is really interesting phenomenon.
>
> Interesting indeed, and it would be nice to analyse that! (It should be possible,
> using various PMU metrics in a clever way, to figure out what's happening inside the
> CPU, right?)
>
>> So I'd like to add new command line option,
>> like "--pre-page-faults" to perf bench mem memcpy,
>> for doing memcpy() before measured memcpy().
>>
>> How do you think about this idea?
>
> Agreed. (Maybe name it --prefault, as 'prefaulting' is the term we generally use for
> things like this.)
>
> An even better solution would be to output _both_ results by default, so that people
> can see both characteristics at a glance?
Outputting both result of prefaulted and non prefaulted will be useful,
but this might be not good for using from scripts.
So I'll implement --prefault option first. If there is request
for outputting both, I'll consider to modify default output.
# Please wait about the result of Miao Xie's patch,
# benchmarking memcpy() of unaligned memory area is
# a little difficult
Thanks,
Hitoshi
* Hitoshi Mitake <[email protected]> wrote:
> > An even better solution would be to output _both_ results by default, so that
> > people can see both characteristics at a glance?
>
> Outputting both result of prefaulted and non prefaulted will be useful, but this
> might be not good for using from scripts. So I'll implement --prefault option
> first. If there is request for outputting both, I'll consider to modify default
> output.
Ok - it should definitely be easily scriptable. The default can be have both flags
enabled and both results written to the output.
People will try 'perf bench x86' to see performance at a glance - so printing all
the tests we have is a good idea.
Thanks,
Ingo
On 2010年11月10日 18:12, Ingo Molnar wrote:
>
> * Hitoshi Mitake<[email protected]> wrote:
>
>>> An even better solution would be to output _both_ results by default, so that
>>> people can see both characteristics at a glance?
>>
>> Outputting both result of prefaulted and non prefaulted will be useful, but this
>> might be not good for using from scripts. So I'll implement --prefault option
>> first. If there is request for outputting both, I'll consider to modify default
>> output.
>
> Ok - it should definitely be easily scriptable. The default can be have both flags
> enabled and both results written to the output.
>
> People will try 'perf bench x86' to see performance at a glance - so printing all
> the tests we have is a good idea.
OK, I added --no-prefault and --only-prefault to perf bench mem memcpy.
As you told, printing both of them is convenient.
I send the updated patch later.
Thanks,
After applying this patch, perf bench mem memcpy prints
both of prefualted and without prefaulted score of memcpy().
New options --no-prefault and --only-prefault are added
for printing single result, mainly for scripting usage.
Example of usage:
| mitake@X201i:~/linux/.../tools/perf% ./perf bench mem memcpy -l 500MB
| # Running mem/memcpy benchmark...
| # Copying 500MB Bytes ...
|
| 634.969014 MB/Sec
| 4.828062 GB/Sec (with prefault)
| mitake@X201i:~/linux/.../tools/perf% ./perf bench mem memcpy -l 500MB --only-prefault
| # Running mem/memcpy benchmark...
| # Copying 500MB Bytes ...
|
| 4.705192 GB/Sec (with prefault)
| mitake@X201i:~/linux/.../tools/perf% ./perf bench mem memcpy -l 500MB --no-prefault
| # Running mem/memcpy benchmark...
| # Copying 500MB Bytes ...
|
| 642.725568 MB/Sec
Signed-off-by: Hitoshi Mitake <[email protected]>
Cc: Ma Ling <[email protected]>
Cc: Zhao Yakui <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: H. Peter Anvin <[email protected]>
---
tools/perf/bench/mem-memcpy.c | 215 +++++++++++++++++++++++++++++------------
1 files changed, 152 insertions(+), 63 deletions(-)
diff --git a/tools/perf/bench/mem-memcpy.c b/tools/perf/bench/mem-memcpy.c
index be31ddb..61b6ead 100644
--- a/tools/perf/bench/mem-memcpy.c
+++ b/tools/perf/bench/mem-memcpy.c
@@ -25,7 +25,8 @@ static const char *length_str = "1MB";
static const char *routine = "default";
static bool use_clock;
static int clock_fd;
-static bool prefault;
+static bool only_prefault;
+static bool no_prefault;
static const struct option options[] = {
OPT_STRING('l', "length", &length_str, "1MB",
@@ -35,15 +36,19 @@ static const struct option options[] = {
"Specify routine to copy"),
OPT_BOOLEAN('c', "clock", &use_clock,
"Use CPU clock for measuring"),
- OPT_BOOLEAN('p', "prefault", &prefault,
- "Cause page faults before memcpy()"),
+ OPT_BOOLEAN('o', "only-prefault", &only_prefault,
+ "Show only the result with page faults before memcpy()"),
+ OPT_BOOLEAN('n', "no-prefault", &no_prefault,
+ "Show only the result without page faults before memcpy()"),
OPT_END()
};
+typedef void *(*memcpy_t)(void *, const void *, size_t);
+
struct routine {
const char *name;
const char *desc;
- void * (*fn)(void *dst, const void *src, size_t len);
+ memcpy_t fn;
};
struct routine routines[] = {
@@ -92,29 +97,98 @@ static double timeval2double(struct timeval *ts)
(double)ts->tv_usec / (double)1000000;
}
+static void alloc_mem(void **dst, void **src, size_t length)
+{
+ *dst = zalloc(length);
+ if (!dst)
+ die("memory allocation failed - maybe length is too large?\n");
+
+ *src = zalloc(length);
+ if (!src)
+ die("memory allocation failed - maybe length is too large?\n");
+}
+
+static u64 do_memcpy_clock(memcpy_t fn, size_t len, bool prefault)
+{
+ u64 clock_start = 0ULL, clock_end = 0ULL;
+ void *src = NULL, *dst = NULL;
+
+ alloc_mem(&src, &dst, len);
+
+ if (prefault)
+ fn(dst, src, len);
+
+ clock_start = get_clock();
+ fn(dst, src, len);
+ clock_end = get_clock();
+
+ free(src);
+ free(dst);
+ return clock_end - clock_start;
+}
+
+static double do_memcpy_gettimeofday(memcpy_t fn, size_t len, bool prefault)
+{
+ struct timeval tv_start, tv_end, tv_diff;
+ void *src = NULL, *dst = NULL;
+
+ alloc_mem(&src, &dst, len);
+
+ if (prefault)
+ fn(dst, src, len);
+
+ BUG_ON(gettimeofday(&tv_start, NULL));
+ fn(dst, src, len);
+ BUG_ON(gettimeofday(&tv_end, NULL));
+
+ timersub(&tv_end, &tv_start, &tv_diff);
+
+ free(src);
+ free(dst);
+ return (double)((double)len / timeval2double(&tv_diff));
+}
+
+#define pf (no_prefault ? 0 : 1)
+
+#define print_bps(x) do { \
+ if (x < K) \
+ printf(" %14lf B/Sec", x); \
+ else if (x < K * K) \
+ printf(" %14lfd KB/Sec", x / K); \
+ else if (x < K * K * K) \
+ printf(" %14lf MB/Sec", x / K / K); \
+ else \
+ printf(" %14lf GB/Sec", x / K / K / K); \
+ } while (0)
+
int bench_mem_memcpy(int argc, const char **argv,
const char *prefix __used)
{
int i;
- void *dst, *src;
- size_t length;
- double bps = 0.0;
- struct timeval tv_start, tv_end, tv_diff;
- u64 clock_start, clock_end, clock_diff;
+ size_t len;
+ double result_bps[2];
+ u64 result_clock[2];
- clock_start = clock_end = clock_diff = 0ULL;
argc = parse_options(argc, argv, options,
bench_mem_memcpy_usage, 0);
- tv_diff.tv_sec = 0;
- tv_diff.tv_usec = 0;
- length = (size_t)perf_atoll((char *)length_str);
+ if (use_clock)
+ init_clock();
+
+ len = (size_t)perf_atoll((char *)length_str);
- if ((s64)length <= 0) {
+ result_clock[0] = result_clock[1] = 0ULL;
+ result_bps[0] = result_bps[1] = 0.0;
+
+ if ((s64)len <= 0) {
fprintf(stderr, "Invalid length:%s\n", length_str);
return 1;
}
+ /* same to without specifying either of prefault and no-prefault */
+ if (only_prefault && no_prefault)
+ only_prefault = no_prefault = false;
+
for (i = 0; routines[i].name; i++) {
if (!strcmp(routines[i].name, routine))
break;
@@ -129,65 +203,80 @@ int bench_mem_memcpy(int argc, const char **argv,
return 1;
}
- dst = zalloc(length);
- if (!dst)
- die("memory allocation failed - maybe length is too large?\n");
-
- src = zalloc(length);
- if (!src)
- die("memory allocation failed - maybe length is too large?\n");
-
- if (bench_format == BENCH_FORMAT_DEFAULT) {
- printf("# Copying %s Bytes from %p to %p ...\n\n",
- length_str, src, dst);
- }
-
-
- if (prefault)
- routines[i].fn(dst, src, length);
-
- if (use_clock) {
- init_clock();
- clock_start = get_clock();
- } else {
- BUG_ON(gettimeofday(&tv_start, NULL));
- }
+ if (bench_format == BENCH_FORMAT_DEFAULT)
+ printf("# Copying %s Bytes ...\n\n", length_str);
- routines[i].fn(dst, src, length);
-
- if (use_clock) {
- clock_end = get_clock();
- clock_diff = clock_end - clock_start;
+ if (!only_prefault && !no_prefault) {
+ /* show both of results */
+ if (use_clock) {
+ result_clock[0] =
+ do_memcpy_clock(routines[i].fn, len, false);
+ result_clock[1] =
+ do_memcpy_clock(routines[i].fn, len, true);
+ } else {
+ result_bps[0] =
+ do_memcpy_gettimeofday(routines[i].fn,
+ len, false);
+ result_bps[1] =
+ do_memcpy_gettimeofday(routines[i].fn,
+ len, true);
+ }
} else {
- BUG_ON(gettimeofday(&tv_end, NULL));
- timersub(&tv_end, &tv_start, &tv_diff);
- bps = (double)((double)length / timeval2double(&tv_diff));
+ if (use_clock) {
+ result_clock[pf] =
+ do_memcpy_clock(routines[i].fn,
+ len, only_prefault);
+ } else {
+ result_bps[pf] =
+ do_memcpy_gettimeofday(routines[i].fn,
+ len, only_prefault);
+ }
}
switch (bench_format) {
case BENCH_FORMAT_DEFAULT:
- if (use_clock) {
- printf(" %14lf Clock/Byte\n",
- (double)clock_diff / (double)length);
- } else {
- if (bps < K)
- printf(" %14lf B/Sec\n", bps);
- else if (bps < K * K)
- printf(" %14lfd KB/Sec\n", bps / 1024);
- else if (bps < K * K * K)
- printf(" %14lf MB/Sec\n", bps / 1024 / 1024);
- else {
- printf(" %14lf GB/Sec\n",
- bps / 1024 / 1024 / 1024);
+ if (!only_prefault && !no_prefault) {
+ if (use_clock) {
+ printf(" %14lf Clock/Byte\n",
+ (double)result_clock[0]
+ / (double)len);
+ printf(" %14lf Clock/Byte (with prefault)\n",
+ (double)result_clock[1]
+ / (double)len);
+ } else {
+ print_bps(result_bps[0]);
+ printf("\n");
+ print_bps(result_bps[1]);
+ printf(" (with prefault)\n");
}
+ } else {
+ if (use_clock) {
+ printf(" %14lf Clock/Byte",
+ (double)result_clock[pf]
+ / (double)len);
+ } else
+ print_bps(result_bps[pf]);
+
+ printf("%s\n", only_prefault ? " (with prefault)" : "");
}
break;
case BENCH_FORMAT_SIMPLE:
- if (use_clock) {
- printf("%14lf\n",
- (double)clock_diff / (double)length);
- } else
- printf("%lf\n", bps);
+ if (!only_prefault && !no_prefault) {
+ if (use_clock) {
+ printf("%lf %lf\n",
+ (double)result_clock[0] / (double)len,
+ (double)result_clock[1] / (double)len);
+ } else {
+ printf("%lf %lf\n",
+ result_bps[0], result_bps[1]);
+ }
+ } else {
+ if (use_clock) {
+ printf("%lf\n", (double)result_clock[pf]
+ / (double)len);
+ } else
+ printf("%lf\n", result_bps[pf]);
+ }
break;
default:
/* reaching this means there's some disaster: */
--
1.7.1.1
* Hitoshi Mitake <[email protected]> wrote:
> After applying this patch, perf bench mem memcpy prints
> both of prefualted and without prefaulted score of memcpy().
>
> New options --no-prefault and --only-prefault are added
> for printing single result, mainly for scripting usage.
Ok. Mind resending the whole series once all review feedback has been incorporated?
Thanks,
Ingo
Really sorry for my late reply..
On 11/18/10 16:58, Ingo Molnar wrote:
>
> * Hitoshi Mitake<[email protected]> wrote:
>
>> After applying this patch, perf bench mem memcpy prints
>> both of prefualted and without prefaulted score of memcpy().
>>
>> New options --no-prefault and --only-prefault are added
>> for printing single result, mainly for scripting usage.
>
> Ok. Mind resending the whole series once all review feedback has been
incorporated?
>
OK, I'll send the patch series for prefaulting and
porting memcpy_64.S to perf bench later.
This series do some dirty things especially in Makefile
of perf and defining ENTRY(). So I'd like to hear your comment.
Could you review these?
And I have another problem. I cannot see the name of
memcpy based on rep prefix because the symbol of it is ".Lmemcpy_c".
It seems that the symbol name start from "." cannot be seen
from other object files. So I have to seek the way to
find the name of rep memcpy...
Thanks,
Hitoshi
After applying this patch, perf bench mem memcpy prints
both of prefualted and without prefaulted score of memcpy().
New options --no-prefault and --only-prefault are added
to print single result, mainly for scripting usage.
Example of usage:
| mitake@X201i:~/linux/.../tools/perf% ./perf bench mem memcpy -l 500MB
| # Running mem/memcpy benchmark...
| # Copying 500MB Bytes ...
|
| 634.969014 MB/Sec
| 4.828062 GB/Sec (with prefault)
| mitake@X201i:~/linux/.../tools/perf% ./perf bench mem memcpy -l 500MB --only-prefault
| # Running mem/memcpy benchmark...
| # Copying 500MB Bytes ...
|
| 4.705192 GB/Sec (with prefault)
| mitake@X201i:~/linux/.../tools/perf% ./perf bench mem memcpy -l 500MB --no-prefault
| # Running mem/memcpy benchmark...
| # Copying 500MB Bytes ...
|
| 642.725568 MB/Sec
Signed-off-by: Hitoshi Mitake <[email protected]>
Cc: Miao Xie <[email protected]>
Cc: Ma Ling <[email protected]>
Cc: Zhao Yakui <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Andi Kleen <[email protected]>
---
tools/perf/bench/mem-memcpy.c | 219 ++++++++++++++++++++++++++++++-----------
1 files changed, 162 insertions(+), 57 deletions(-)
diff --git a/tools/perf/bench/mem-memcpy.c b/tools/perf/bench/mem-memcpy.c
index 38dae74..db82021 100644
--- a/tools/perf/bench/mem-memcpy.c
+++ b/tools/perf/bench/mem-memcpy.c
@@ -12,6 +12,7 @@
#include "../util/parse-options.h"
#include "../util/header.h"
#include "bench.h"
+#include "mem-memcpy-arch.h"
#include <stdio.h>
#include <stdlib.h>
@@ -23,8 +24,10 @@
static const char *length_str = "1MB";
static const char *routine = "default";
-static bool use_clock = false;
+static bool use_clock;
static int clock_fd;
+static bool only_prefault;
+static bool no_prefault;
static const struct option options[] = {
OPT_STRING('l', "length", &length_str, "1MB",
@@ -34,19 +37,33 @@ static const struct option options[] = {
"Specify routine to copy"),
OPT_BOOLEAN('c', "clock", &use_clock,
"Use CPU clock for measuring"),
+ OPT_BOOLEAN('o', "only-prefault", &only_prefault,
+ "Show only the result with page faults before memcpy()"),
+ OPT_BOOLEAN('n', "no-prefault", &no_prefault,
+ "Show only the result without page faults before memcpy()"),
OPT_END()
};
+typedef void *(*memcpy_t)(void *, const void *, size_t);
+
struct routine {
const char *name;
const char *desc;
- void * (*fn)(void *dst, const void *src, size_t len);
+ memcpy_t fn;
};
struct routine routines[] = {
{ "default",
"Default memcpy() provided by glibc",
memcpy },
+#ifdef ARCH_X86_64
+
+#define MEMCPY_FN(fn, name, desc) { name, desc, fn },
+#include "mem-memcpy-x86-64-asm-def.h"
+#undef MEMCPY_FN
+
+#endif
+
{ NULL,
NULL,
NULL }
@@ -89,29 +106,98 @@ static double timeval2double(struct timeval *ts)
(double)ts->tv_usec / (double)1000000;
}
+static void alloc_mem(void **dst, void **src, size_t length)
+{
+ *dst = zalloc(length);
+ if (!dst)
+ die("memory allocation failed - maybe length is too large?\n");
+
+ *src = zalloc(length);
+ if (!src)
+ die("memory allocation failed - maybe length is too large?\n");
+}
+
+static u64 do_memcpy_clock(memcpy_t fn, size_t len, bool prefault)
+{
+ u64 clock_start = 0ULL, clock_end = 0ULL;
+ void *src = NULL, *dst = NULL;
+
+ alloc_mem(&src, &dst, len);
+
+ if (prefault)
+ fn(dst, src, len);
+
+ clock_start = get_clock();
+ fn(dst, src, len);
+ clock_end = get_clock();
+
+ free(src);
+ free(dst);
+ return clock_end - clock_start;
+}
+
+static double do_memcpy_gettimeofday(memcpy_t fn, size_t len, bool prefault)
+{
+ struct timeval tv_start, tv_end, tv_diff;
+ void *src = NULL, *dst = NULL;
+
+ alloc_mem(&src, &dst, len);
+
+ if (prefault)
+ fn(dst, src, len);
+
+ BUG_ON(gettimeofday(&tv_start, NULL));
+ fn(dst, src, len);
+ BUG_ON(gettimeofday(&tv_end, NULL));
+
+ timersub(&tv_end, &tv_start, &tv_diff);
+
+ free(src);
+ free(dst);
+ return (double)((double)len / timeval2double(&tv_diff));
+}
+
+#define pf (no_prefault ? 0 : 1)
+
+#define print_bps(x) do { \
+ if (x < K) \
+ printf(" %14lf B/Sec", x); \
+ else if (x < K * K) \
+ printf(" %14lfd KB/Sec", x / K); \
+ else if (x < K * K * K) \
+ printf(" %14lf MB/Sec", x / K / K); \
+ else \
+ printf(" %14lf GB/Sec", x / K / K / K); \
+ } while (0)
+
int bench_mem_memcpy(int argc, const char **argv,
const char *prefix __used)
{
int i;
- void *dst, *src;
- size_t length;
- double bps = 0.0;
- struct timeval tv_start, tv_end, tv_diff;
- u64 clock_start, clock_end, clock_diff;
+ size_t len;
+ double result_bps[2];
+ u64 result_clock[2];
- clock_start = clock_end = clock_diff = 0ULL;
argc = parse_options(argc, argv, options,
bench_mem_memcpy_usage, 0);
- tv_diff.tv_sec = 0;
- tv_diff.tv_usec = 0;
- length = (size_t)perf_atoll((char *)length_str);
+ if (use_clock)
+ init_clock();
+
+ len = (size_t)perf_atoll((char *)length_str);
- if ((s64)length <= 0) {
+ result_clock[0] = result_clock[1] = 0ULL;
+ result_bps[0] = result_bps[1] = 0.0;
+
+ if ((s64)len <= 0) {
fprintf(stderr, "Invalid length:%s\n", length_str);
return 1;
}
+ /* same to without specifying either of prefault and no-prefault */
+ if (only_prefault && no_prefault)
+ only_prefault = no_prefault = false;
+
for (i = 0; routines[i].name; i++) {
if (!strcmp(routines[i].name, routine))
break;
@@ -126,61 +212,80 @@ int bench_mem_memcpy(int argc, const char **argv,
return 1;
}
- dst = zalloc(length);
- if (!dst)
- die("memory allocation failed - maybe length is too large?\n");
-
- src = zalloc(length);
- if (!src)
- die("memory allocation failed - maybe length is too large?\n");
-
- if (bench_format == BENCH_FORMAT_DEFAULT) {
- printf("# Copying %s Bytes from %p to %p ...\n\n",
- length_str, src, dst);
- }
-
- if (use_clock) {
- init_clock();
- clock_start = get_clock();
- } else {
- BUG_ON(gettimeofday(&tv_start, NULL));
- }
-
- routines[i].fn(dst, src, length);
+ if (bench_format == BENCH_FORMAT_DEFAULT)
+ printf("# Copying %s Bytes ...\n\n", length_str);
- if (use_clock) {
- clock_end = get_clock();
- clock_diff = clock_end - clock_start;
+ if (!only_prefault && !no_prefault) {
+ /* show both of results */
+ if (use_clock) {
+ result_clock[0] =
+ do_memcpy_clock(routines[i].fn, len, false);
+ result_clock[1] =
+ do_memcpy_clock(routines[i].fn, len, true);
+ } else {
+ result_bps[0] =
+ do_memcpy_gettimeofday(routines[i].fn,
+ len, false);
+ result_bps[1] =
+ do_memcpy_gettimeofday(routines[i].fn,
+ len, true);
+ }
} else {
- BUG_ON(gettimeofday(&tv_end, NULL));
- timersub(&tv_end, &tv_start, &tv_diff);
- bps = (double)((double)length / timeval2double(&tv_diff));
+ if (use_clock) {
+ result_clock[pf] =
+ do_memcpy_clock(routines[i].fn,
+ len, only_prefault);
+ } else {
+ result_bps[pf] =
+ do_memcpy_gettimeofday(routines[i].fn,
+ len, only_prefault);
+ }
}
switch (bench_format) {
case BENCH_FORMAT_DEFAULT:
- if (use_clock) {
- printf(" %14lf Clock/Byte\n",
- (double)clock_diff / (double)length);
- } else {
- if (bps < K)
- printf(" %14lf B/Sec\n", bps);
- else if (bps < K * K)
- printf(" %14lfd KB/Sec\n", bps / 1024);
- else if (bps < K * K * K)
- printf(" %14lf MB/Sec\n", bps / 1024 / 1024);
- else {
- printf(" %14lf GB/Sec\n",
- bps / 1024 / 1024 / 1024);
+ if (!only_prefault && !no_prefault) {
+ if (use_clock) {
+ printf(" %14lf Clock/Byte\n",
+ (double)result_clock[0]
+ / (double)len);
+ printf(" %14lf Clock/Byte (with prefault)\n",
+ (double)result_clock[1]
+ / (double)len);
+ } else {
+ print_bps(result_bps[0]);
+ printf("\n");
+ print_bps(result_bps[1]);
+ printf(" (with prefault)\n");
}
+ } else {
+ if (use_clock) {
+ printf(" %14lf Clock/Byte",
+ (double)result_clock[pf]
+ / (double)len);
+ } else
+ print_bps(result_bps[pf]);
+
+ printf("%s\n", only_prefault ? " (with prefault)" : "");
}
break;
case BENCH_FORMAT_SIMPLE:
- if (use_clock) {
- printf("%14lf\n",
- (double)clock_diff / (double)length);
- } else
- printf("%lf\n", bps);
+ if (!only_prefault && !no_prefault) {
+ if (use_clock) {
+ printf("%lf %lf\n",
+ (double)result_clock[0] / (double)len,
+ (double)result_clock[1] / (double)len);
+ } else {
+ printf("%lf %lf\n",
+ result_bps[0], result_bps[1]);
+ }
+ } else {
+ if (use_clock) {
+ printf("%lf\n", (double)result_clock[pf]
+ / (double)len);
+ } else
+ printf("%lf\n", result_bps[pf]);
+ }
break;
default:
/* reaching this means there's some disaster: */
--
1.6.5.2
This patch ports arch/x86/lib/memcpy_64.S to perf bench mem memcpy
for benchmarking memcpy() in userland with tricky and dirty way.
util/include/asm/cpufeature.h, util/include/asm/dwarf2.h, and
util/include/linux/linkage.h are dummy (but do a little work) for
including memcpy_64.S without modification to it (e.g. defining ENTRY()).
This makes checkpatch.pl angry like this:
\#177: FILE: tools/perf/util/include/linux/linkage.h:7:
+#define ENTRY(name) \
+ .globl name; \
+ name:
WARNING: labels should not be indented
\#179: FILE: tools/perf/util/include/linux/linkage.h:9:
+ name:
because checkpatch.pl treat this file as the file written in C.
But I think this can be forgived because original include/linux/linkage.h
is doing the similar thing.
Signed-off-by: Hitoshi Mitake <[email protected]>
Cc: Miao Xie <[email protected]>
Cc: Ma Ling <[email protected]>
Cc: Zhao Yakui <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Andi Kleen <[email protected]>
---
tools/perf/Makefile | 11 +++++++++++
tools/perf/bench/mem-memcpy-arch.h | 12 ++++++++++++
tools/perf/bench/mem-memcpy-x86-64-asm-def.h | 4 ++++
tools/perf/bench/mem-memcpy-x86-64-asm.S | 2 ++
tools/perf/util/include/asm/cpufeature.h | 9 +++++++++
tools/perf/util/include/asm/dwarf2.h | 11 +++++++++++
tools/perf/util/include/linux/linkage.h | 13 +++++++++++++
7 files changed, 62 insertions(+), 0 deletions(-)
create mode 100644 tools/perf/bench/mem-memcpy-arch.h
create mode 100644 tools/perf/bench/mem-memcpy-x86-64-asm-def.h
create mode 100644 tools/perf/bench/mem-memcpy-x86-64-asm.S
create mode 100644 tools/perf/util/include/asm/cpufeature.h
create mode 100644 tools/perf/util/include/asm/dwarf2.h
create mode 100644 tools/perf/util/include/linux/linkage.h
diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 2d414b3..b3e6bc6 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -185,7 +185,10 @@ ifeq ($(ARCH),i386)
ARCH := x86
endif
ifeq ($(ARCH),x86_64)
+ RAW_ARCH := x86_64
ARCH := x86
+ ARCH_CFLAGS := -DARCH_X86_64
+ ARCH_INCLUDE = ../../arch/x86/lib/memcpy_64.S
endif
# CFLAGS and LDFLAGS are for the users to override from the command line.
@@ -375,6 +378,7 @@ LIB_H += util/include/linux/prefetch.h
LIB_H += util/include/linux/rbtree.h
LIB_H += util/include/linux/string.h
LIB_H += util/include/linux/types.h
+LIB_H += util/include/linux/linkage.h
LIB_H += util/include/asm/asm-offsets.h
LIB_H += util/include/asm/bug.h
LIB_H += util/include/asm/byteorder.h
@@ -383,6 +387,8 @@ LIB_H += util/include/asm/swab.h
LIB_H += util/include/asm/system.h
LIB_H += util/include/asm/uaccess.h
LIB_H += util/include/dwarf-regs.h
+LIB_H += util/include/asm/dwarf2.h
+LIB_H += util/include/asm/cpufeature.h
LIB_H += perf.h
LIB_H += util/cache.h
LIB_H += util/callchain.h
@@ -417,6 +423,7 @@ LIB_H += util/probe-finder.h
LIB_H += util/probe-event.h
LIB_H += util/pstack.h
LIB_H += util/cpumap.h
+LIB_H += $(ARCH_INCLUDE)
LIB_OBJS += $(OUTPUT)util/abspath.o
LIB_OBJS += $(OUTPUT)util/alias.o
@@ -472,6 +479,9 @@ BUILTIN_OBJS += $(OUTPUT)builtin-bench.o
# Benchmark modules
BUILTIN_OBJS += $(OUTPUT)bench/sched-messaging.o
BUILTIN_OBJS += $(OUTPUT)bench/sched-pipe.o
+ifeq ($(RAW_ARCH),x86_64)
+BUILTIN_OBJS += $(OUTPUT)bench/mem-memcpy-x86-64-asm.o
+endif
BUILTIN_OBJS += $(OUTPUT)bench/mem-memcpy.o
BUILTIN_OBJS += $(OUTPUT)builtin-diff.o
@@ -909,6 +919,7 @@ BASIC_CFLAGS += -DSHA1_HEADER='$(SHA1_HEADER_SQ)' \
LIB_OBJS += $(COMPAT_OBJS)
ALL_CFLAGS += $(BASIC_CFLAGS)
+ALL_CFLAGS += $(ARCH_CFLAGS)
ALL_LDFLAGS += $(BASIC_LDFLAGS)
export TAR INSTALL DESTDIR SHELL_PATH
diff --git a/tools/perf/bench/mem-memcpy-arch.h b/tools/perf/bench/mem-memcpy-arch.h
new file mode 100644
index 0000000..a72e36c
--- /dev/null
+++ b/tools/perf/bench/mem-memcpy-arch.h
@@ -0,0 +1,12 @@
+
+#ifdef ARCH_X86_64
+
+#define MEMCPY_FN(fn, name, desc) \
+ extern void *fn(void *, const void *, size_t);
+
+#include "mem-memcpy-x86-64-asm-def.h"
+
+#undef MEMCPY_FN
+
+#endif
+
diff --git a/tools/perf/bench/mem-memcpy-x86-64-asm-def.h b/tools/perf/bench/mem-memcpy-x86-64-asm-def.h
new file mode 100644
index 0000000..d588b87
--- /dev/null
+++ b/tools/perf/bench/mem-memcpy-x86-64-asm-def.h
@@ -0,0 +1,4 @@
+
+MEMCPY_FN(__memcpy,
+ "x86-64-unrolled",
+ "unrolled memcpy() in arch/x86/lib/memcpy_64.S")
diff --git a/tools/perf/bench/mem-memcpy-x86-64-asm.S b/tools/perf/bench/mem-memcpy-x86-64-asm.S
new file mode 100644
index 0000000..a57b66e
--- /dev/null
+++ b/tools/perf/bench/mem-memcpy-x86-64-asm.S
@@ -0,0 +1,2 @@
+
+#include "../../../arch/x86/lib/memcpy_64.S"
diff --git a/tools/perf/util/include/asm/cpufeature.h b/tools/perf/util/include/asm/cpufeature.h
new file mode 100644
index 0000000..acffd5e
--- /dev/null
+++ b/tools/perf/util/include/asm/cpufeature.h
@@ -0,0 +1,9 @@
+
+#ifndef PERF_CPUFEATURE_H
+#define PERF_CPUFEATURE_H
+
+/* cpufeature.h ... dummy header file for including arch/x86/lib/memcpy_64.S */
+
+#define X86_FEATURE_REP_GOOD 0
+
+#endif /* PERF_CPUFEATURE_H */
diff --git a/tools/perf/util/include/asm/dwarf2.h b/tools/perf/util/include/asm/dwarf2.h
new file mode 100644
index 0000000..bb4198e
--- /dev/null
+++ b/tools/perf/util/include/asm/dwarf2.h
@@ -0,0 +1,11 @@
+
+#ifndef PERF_DWARF2_H
+#define PERF_DWARF2_H
+
+/* dwarf2.h ... dummy header file for including arch/x86/lib/memcpy_64.S */
+
+#define CFI_STARTPROC
+#define CFI_ENDPROC
+
+#endif /* PERF_DWARF2_H */
+
diff --git a/tools/perf/util/include/linux/linkage.h b/tools/perf/util/include/linux/linkage.h
new file mode 100644
index 0000000..06387cf
--- /dev/null
+++ b/tools/perf/util/include/linux/linkage.h
@@ -0,0 +1,13 @@
+
+#ifndef PERF_LINUX_LINKAGE_H_
+#define PERF_LINUX_LINKAGE_H_
+
+/* linkage.h ... for including arch/x86/lib/memcpy_64.S */
+
+#define ENTRY(name) \
+ .globl name; \
+ name:
+
+#define ENDPROC(name)
+
+#endif /* PERF_LINUX_LINKAGE_H_ */
--
1.6.5.2
Commit-ID: ea7872b9d6a81101f6ba0ec141544a62fea35876
Gitweb: http://git.kernel.org/tip/ea7872b9d6a81101f6ba0ec141544a62fea35876
Author: Hitoshi Mitake <[email protected]>
AuthorDate: Thu, 25 Nov 2010 16:04:53 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 26 Nov 2010 08:15:57 +0100
perf bench: Add feature that measures the performance of the arch/x86/lib/memcpy_64.S memcpy routines via 'perf bench mem'
This patch ports arch/x86/lib/memcpy_64.S to perf bench mem
memcpy for benchmarking memcpy() in userland with tricky and
dirty way.
util/include/asm/cpufeature.h, util/include/asm/dwarf2.h, and
util/include/linux/linkage.h are mostly dummy files with small
wrappers, so that we are able to include memcpy_64.S
unmodified.
Signed-off-by: Hitoshi Mitake <[email protected]>
Cc: [email protected]
Cc: Miao Xie <[email protected]>
Cc: Ma Ling <[email protected]>
Cc: Zhao Yakui <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Andi Kleen <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
tools/perf/Makefile | 11 +++++++++++
tools/perf/bench/mem-memcpy-arch.h | 12 ++++++++++++
tools/perf/bench/mem-memcpy-x86-64-asm-def.h | 4 ++++
tools/perf/bench/mem-memcpy-x86-64-asm.S | 2 ++
tools/perf/util/include/asm/cpufeature.h | 9 +++++++++
tools/perf/util/include/asm/dwarf2.h | 11 +++++++++++
tools/perf/util/include/linux/linkage.h | 13 +++++++++++++
7 files changed, 62 insertions(+), 0 deletions(-)
diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 74b684d..e0db197 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -185,7 +185,10 @@ ifeq ($(ARCH),i386)
ARCH := x86
endif
ifeq ($(ARCH),x86_64)
+ RAW_ARCH := x86_64
ARCH := x86
+ ARCH_CFLAGS := -DARCH_X86_64
+ ARCH_INCLUDE = ../../arch/x86/lib/memcpy_64.S
endif
# CFLAGS and LDFLAGS are for the users to override from the command line.
@@ -375,6 +378,7 @@ LIB_H += util/include/linux/prefetch.h
LIB_H += util/include/linux/rbtree.h
LIB_H += util/include/linux/string.h
LIB_H += util/include/linux/types.h
+LIB_H += util/include/linux/linkage.h
LIB_H += util/include/asm/asm-offsets.h
LIB_H += util/include/asm/bug.h
LIB_H += util/include/asm/byteorder.h
@@ -383,6 +387,8 @@ LIB_H += util/include/asm/swab.h
LIB_H += util/include/asm/system.h
LIB_H += util/include/asm/uaccess.h
LIB_H += util/include/dwarf-regs.h
+LIB_H += util/include/asm/dwarf2.h
+LIB_H += util/include/asm/cpufeature.h
LIB_H += perf.h
LIB_H += util/cache.h
LIB_H += util/callchain.h
@@ -417,6 +423,7 @@ LIB_H += util/probe-finder.h
LIB_H += util/probe-event.h
LIB_H += util/pstack.h
LIB_H += util/cpumap.h
+LIB_H += $(ARCH_INCLUDE)
LIB_OBJS += $(OUTPUT)util/abspath.o
LIB_OBJS += $(OUTPUT)util/alias.o
@@ -472,6 +479,9 @@ BUILTIN_OBJS += $(OUTPUT)builtin-bench.o
# Benchmark modules
BUILTIN_OBJS += $(OUTPUT)bench/sched-messaging.o
BUILTIN_OBJS += $(OUTPUT)bench/sched-pipe.o
+ifeq ($(RAW_ARCH),x86_64)
+BUILTIN_OBJS += $(OUTPUT)bench/mem-memcpy-x86-64-asm.o
+endif
BUILTIN_OBJS += $(OUTPUT)bench/mem-memcpy.o
BUILTIN_OBJS += $(OUTPUT)builtin-diff.o
@@ -898,6 +908,7 @@ BASIC_CFLAGS += -DSHA1_HEADER='$(SHA1_HEADER_SQ)' \
LIB_OBJS += $(COMPAT_OBJS)
ALL_CFLAGS += $(BASIC_CFLAGS)
+ALL_CFLAGS += $(ARCH_CFLAGS)
ALL_LDFLAGS += $(BASIC_LDFLAGS)
export TAR INSTALL DESTDIR SHELL_PATH
diff --git a/tools/perf/bench/mem-memcpy-arch.h b/tools/perf/bench/mem-memcpy-arch.h
new file mode 100644
index 0000000..a72e36c
--- /dev/null
+++ b/tools/perf/bench/mem-memcpy-arch.h
@@ -0,0 +1,12 @@
+
+#ifdef ARCH_X86_64
+
+#define MEMCPY_FN(fn, name, desc) \
+ extern void *fn(void *, const void *, size_t);
+
+#include "mem-memcpy-x86-64-asm-def.h"
+
+#undef MEMCPY_FN
+
+#endif
+
diff --git a/tools/perf/bench/mem-memcpy-x86-64-asm-def.h b/tools/perf/bench/mem-memcpy-x86-64-asm-def.h
new file mode 100644
index 0000000..d588b87
--- /dev/null
+++ b/tools/perf/bench/mem-memcpy-x86-64-asm-def.h
@@ -0,0 +1,4 @@
+
+MEMCPY_FN(__memcpy,
+ "x86-64-unrolled",
+ "unrolled memcpy() in arch/x86/lib/memcpy_64.S")
diff --git a/tools/perf/bench/mem-memcpy-x86-64-asm.S b/tools/perf/bench/mem-memcpy-x86-64-asm.S
new file mode 100644
index 0000000..a57b66e
--- /dev/null
+++ b/tools/perf/bench/mem-memcpy-x86-64-asm.S
@@ -0,0 +1,2 @@
+
+#include "../../../arch/x86/lib/memcpy_64.S"
diff --git a/tools/perf/util/include/asm/cpufeature.h b/tools/perf/util/include/asm/cpufeature.h
new file mode 100644
index 0000000..acffd5e
--- /dev/null
+++ b/tools/perf/util/include/asm/cpufeature.h
@@ -0,0 +1,9 @@
+
+#ifndef PERF_CPUFEATURE_H
+#define PERF_CPUFEATURE_H
+
+/* cpufeature.h ... dummy header file for including arch/x86/lib/memcpy_64.S */
+
+#define X86_FEATURE_REP_GOOD 0
+
+#endif /* PERF_CPUFEATURE_H */
diff --git a/tools/perf/util/include/asm/dwarf2.h b/tools/perf/util/include/asm/dwarf2.h
new file mode 100644
index 0000000..bb4198e
--- /dev/null
+++ b/tools/perf/util/include/asm/dwarf2.h
@@ -0,0 +1,11 @@
+
+#ifndef PERF_DWARF2_H
+#define PERF_DWARF2_H
+
+/* dwarf2.h ... dummy header file for including arch/x86/lib/memcpy_64.S */
+
+#define CFI_STARTPROC
+#define CFI_ENDPROC
+
+#endif /* PERF_DWARF2_H */
+
diff --git a/tools/perf/util/include/linux/linkage.h b/tools/perf/util/include/linux/linkage.h
new file mode 100644
index 0000000..06387cf
--- /dev/null
+++ b/tools/perf/util/include/linux/linkage.h
@@ -0,0 +1,13 @@
+
+#ifndef PERF_LINUX_LINKAGE_H_
+#define PERF_LINUX_LINKAGE_H_
+
+/* linkage.h ... for including arch/x86/lib/memcpy_64.S */
+
+#define ENTRY(name) \
+ .globl name; \
+ name:
+
+#define ENDPROC(name)
+
+#endif /* PERF_LINUX_LINKAGE_H_ */
Commit-ID: 49ce8fc651794878189fd5f273228832cdfb5be9
Gitweb: http://git.kernel.org/tip/49ce8fc651794878189fd5f273228832cdfb5be9
Author: Hitoshi Mitake <[email protected]>
AuthorDate: Thu, 25 Nov 2010 16:04:52 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 26 Nov 2010 08:15:57 +0100
perf bench: Print both of prefaulted and no prefaulted results by default
After applying this patch, perf bench mem memcpy prints
both of prefualted and without prefaulted score of memcpy().
New options --no-prefault and --only-prefault are added
to print single result, mainly for scripting usage.
Usage example:
| mitake@X201i:~/linux/.../tools/perf% ./perf bench mem memcpy -l 500MB
| # Running mem/memcpy benchmark...
| # Copying 500MB Bytes ...
|
| 634.969014 MB/Sec
| 4.828062 GB/Sec (with prefault)
| mitake@X201i:~/linux/.../tools/perf% ./perf bench mem memcpy -l 500MB --only-prefault
| # Running mem/memcpy benchmark...
| # Copying 500MB Bytes ...
|
| 4.705192 GB/Sec (with prefault)
| mitake@X201i:~/linux/.../tools/perf% ./perf bench mem memcpy -l 500MB --no-prefault
| # Running mem/memcpy benchmark...
| # Copying 500MB Bytes ...
|
| 642.725568 MB/Sec
Signed-off-by: Hitoshi Mitake <[email protected]>
Cc: [email protected]
Cc: Miao Xie <[email protected]>
Cc: Ma Ling <[email protected]>
Cc: Zhao Yakui <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Andi Kleen <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
tools/perf/bench/mem-memcpy.c | 219 ++++++++++++++++++++++++++++++-----------
1 files changed, 162 insertions(+), 57 deletions(-)
diff --git a/tools/perf/bench/mem-memcpy.c b/tools/perf/bench/mem-memcpy.c
index 38dae74..db82021 100644
--- a/tools/perf/bench/mem-memcpy.c
+++ b/tools/perf/bench/mem-memcpy.c
@@ -12,6 +12,7 @@
#include "../util/parse-options.h"
#include "../util/header.h"
#include "bench.h"
+#include "mem-memcpy-arch.h"
#include <stdio.h>
#include <stdlib.h>
@@ -23,8 +24,10 @@
static const char *length_str = "1MB";
static const char *routine = "default";
-static bool use_clock = false;
+static bool use_clock;
static int clock_fd;
+static bool only_prefault;
+static bool no_prefault;
static const struct option options[] = {
OPT_STRING('l', "length", &length_str, "1MB",
@@ -34,19 +37,33 @@ static const struct option options[] = {
"Specify routine to copy"),
OPT_BOOLEAN('c', "clock", &use_clock,
"Use CPU clock for measuring"),
+ OPT_BOOLEAN('o', "only-prefault", &only_prefault,
+ "Show only the result with page faults before memcpy()"),
+ OPT_BOOLEAN('n', "no-prefault", &no_prefault,
+ "Show only the result without page faults before memcpy()"),
OPT_END()
};
+typedef void *(*memcpy_t)(void *, const void *, size_t);
+
struct routine {
const char *name;
const char *desc;
- void * (*fn)(void *dst, const void *src, size_t len);
+ memcpy_t fn;
};
struct routine routines[] = {
{ "default",
"Default memcpy() provided by glibc",
memcpy },
+#ifdef ARCH_X86_64
+
+#define MEMCPY_FN(fn, name, desc) { name, desc, fn },
+#include "mem-memcpy-x86-64-asm-def.h"
+#undef MEMCPY_FN
+
+#endif
+
{ NULL,
NULL,
NULL }
@@ -89,29 +106,98 @@ static double timeval2double(struct timeval *ts)
(double)ts->tv_usec / (double)1000000;
}
+static void alloc_mem(void **dst, void **src, size_t length)
+{
+ *dst = zalloc(length);
+ if (!dst)
+ die("memory allocation failed - maybe length is too large?\n");
+
+ *src = zalloc(length);
+ if (!src)
+ die("memory allocation failed - maybe length is too large?\n");
+}
+
+static u64 do_memcpy_clock(memcpy_t fn, size_t len, bool prefault)
+{
+ u64 clock_start = 0ULL, clock_end = 0ULL;
+ void *src = NULL, *dst = NULL;
+
+ alloc_mem(&src, &dst, len);
+
+ if (prefault)
+ fn(dst, src, len);
+
+ clock_start = get_clock();
+ fn(dst, src, len);
+ clock_end = get_clock();
+
+ free(src);
+ free(dst);
+ return clock_end - clock_start;
+}
+
+static double do_memcpy_gettimeofday(memcpy_t fn, size_t len, bool prefault)
+{
+ struct timeval tv_start, tv_end, tv_diff;
+ void *src = NULL, *dst = NULL;
+
+ alloc_mem(&src, &dst, len);
+
+ if (prefault)
+ fn(dst, src, len);
+
+ BUG_ON(gettimeofday(&tv_start, NULL));
+ fn(dst, src, len);
+ BUG_ON(gettimeofday(&tv_end, NULL));
+
+ timersub(&tv_end, &tv_start, &tv_diff);
+
+ free(src);
+ free(dst);
+ return (double)((double)len / timeval2double(&tv_diff));
+}
+
+#define pf (no_prefault ? 0 : 1)
+
+#define print_bps(x) do { \
+ if (x < K) \
+ printf(" %14lf B/Sec", x); \
+ else if (x < K * K) \
+ printf(" %14lfd KB/Sec", x / K); \
+ else if (x < K * K * K) \
+ printf(" %14lf MB/Sec", x / K / K); \
+ else \
+ printf(" %14lf GB/Sec", x / K / K / K); \
+ } while (0)
+
int bench_mem_memcpy(int argc, const char **argv,
const char *prefix __used)
{
int i;
- void *dst, *src;
- size_t length;
- double bps = 0.0;
- struct timeval tv_start, tv_end, tv_diff;
- u64 clock_start, clock_end, clock_diff;
+ size_t len;
+ double result_bps[2];
+ u64 result_clock[2];
- clock_start = clock_end = clock_diff = 0ULL;
argc = parse_options(argc, argv, options,
bench_mem_memcpy_usage, 0);
- tv_diff.tv_sec = 0;
- tv_diff.tv_usec = 0;
- length = (size_t)perf_atoll((char *)length_str);
+ if (use_clock)
+ init_clock();
+
+ len = (size_t)perf_atoll((char *)length_str);
- if ((s64)length <= 0) {
+ result_clock[0] = result_clock[1] = 0ULL;
+ result_bps[0] = result_bps[1] = 0.0;
+
+ if ((s64)len <= 0) {
fprintf(stderr, "Invalid length:%s\n", length_str);
return 1;
}
+ /* same to without specifying either of prefault and no-prefault */
+ if (only_prefault && no_prefault)
+ only_prefault = no_prefault = false;
+
for (i = 0; routines[i].name; i++) {
if (!strcmp(routines[i].name, routine))
break;
@@ -126,61 +212,80 @@ int bench_mem_memcpy(int argc, const char **argv,
return 1;
}
- dst = zalloc(length);
- if (!dst)
- die("memory allocation failed - maybe length is too large?\n");
-
- src = zalloc(length);
- if (!src)
- die("memory allocation failed - maybe length is too large?\n");
-
- if (bench_format == BENCH_FORMAT_DEFAULT) {
- printf("# Copying %s Bytes from %p to %p ...\n\n",
- length_str, src, dst);
- }
-
- if (use_clock) {
- init_clock();
- clock_start = get_clock();
- } else {
- BUG_ON(gettimeofday(&tv_start, NULL));
- }
-
- routines[i].fn(dst, src, length);
+ if (bench_format == BENCH_FORMAT_DEFAULT)
+ printf("# Copying %s Bytes ...\n\n", length_str);
- if (use_clock) {
- clock_end = get_clock();
- clock_diff = clock_end - clock_start;
+ if (!only_prefault && !no_prefault) {
+ /* show both of results */
+ if (use_clock) {
+ result_clock[0] =
+ do_memcpy_clock(routines[i].fn, len, false);
+ result_clock[1] =
+ do_memcpy_clock(routines[i].fn, len, true);
+ } else {
+ result_bps[0] =
+ do_memcpy_gettimeofday(routines[i].fn,
+ len, false);
+ result_bps[1] =
+ do_memcpy_gettimeofday(routines[i].fn,
+ len, true);
+ }
} else {
- BUG_ON(gettimeofday(&tv_end, NULL));
- timersub(&tv_end, &tv_start, &tv_diff);
- bps = (double)((double)length / timeval2double(&tv_diff));
+ if (use_clock) {
+ result_clock[pf] =
+ do_memcpy_clock(routines[i].fn,
+ len, only_prefault);
+ } else {
+ result_bps[pf] =
+ do_memcpy_gettimeofday(routines[i].fn,
+ len, only_prefault);
+ }
}
switch (bench_format) {
case BENCH_FORMAT_DEFAULT:
- if (use_clock) {
- printf(" %14lf Clock/Byte\n",
- (double)clock_diff / (double)length);
- } else {
- if (bps < K)
- printf(" %14lf B/Sec\n", bps);
- else if (bps < K * K)
- printf(" %14lfd KB/Sec\n", bps / 1024);
- else if (bps < K * K * K)
- printf(" %14lf MB/Sec\n", bps / 1024 / 1024);
- else {
- printf(" %14lf GB/Sec\n",
- bps / 1024 / 1024 / 1024);
+ if (!only_prefault && !no_prefault) {
+ if (use_clock) {
+ printf(" %14lf Clock/Byte\n",
+ (double)result_clock[0]
+ / (double)len);
+ printf(" %14lf Clock/Byte (with prefault)\n",
+ (double)result_clock[1]
+ / (double)len);
+ } else {
+ print_bps(result_bps[0]);
+ printf("\n");
+ print_bps(result_bps[1]);
+ printf(" (with prefault)\n");
}
+ } else {
+ if (use_clock) {
+ printf(" %14lf Clock/Byte",
+ (double)result_clock[pf]
+ / (double)len);
+ } else
+ print_bps(result_bps[pf]);
+
+ printf("%s\n", only_prefault ? " (with prefault)" : "");
}
break;
case BENCH_FORMAT_SIMPLE:
- if (use_clock) {
- printf("%14lf\n",
- (double)clock_diff / (double)length);
- } else
- printf("%lf\n", bps);
+ if (!only_prefault && !no_prefault) {
+ if (use_clock) {
+ printf("%lf %lf\n",
+ (double)result_clock[0] / (double)len,
+ (double)result_clock[1] / (double)len);
+ } else {
+ printf("%lf %lf\n",
+ result_bps[0], result_bps[1]);
+ }
+ } else {
+ if (use_clock) {
+ printf("%lf\n", (double)result_clock[pf]
+ / (double)len);
+ } else
+ printf("%lf\n", result_bps[pf]);
+ }
break;
default:
/* reaching this means there's some disaster: */
On 2010年11月26日 19:31, tip-bot for Hitoshi Mitake wrote:
> Commit-ID: ea7872b9d6a81101f6ba0ec141544a62fea35876
> Gitweb:
http://git.kernel.org/tip/ea7872b9d6a81101f6ba0ec141544a62fea35876
> Author: Hitoshi Mitake<[email protected]>
> AuthorDate: Thu, 25 Nov 2010 16:04:53 +0900
> Committer: Ingo Molnar<[email protected]>
> CommitDate: Fri, 26 Nov 2010 08:15:57 +0100
>
> perf bench: Add feature that measures the performance of the
arch/x86/lib/memcpy_64.S memcpy routines via 'perf bench mem'
>
> This patch ports arch/x86/lib/memcpy_64.S to perf bench mem
> memcpy for benchmarking memcpy() in userland with tricky and
> dirty way.
>
> util/include/asm/cpufeature.h, util/include/asm/dwarf2.h, and
> util/include/linux/linkage.h are mostly dummy files with small
> wrappers, so that we are able to include memcpy_64.S
> unmodified.
>
> Signed-off-by: Hitoshi Mitake<[email protected]>
> Cc: [email protected]
> Cc: Miao Xie<[email protected]>
> Cc: Ma Ling<[email protected]>
> Cc: Zhao Yakui<[email protected]>
> Cc: Peter Zijlstra<[email protected]>
> Cc: Arnaldo Carvalho de Melo<[email protected]>
> Cc: Paul Mackerras<[email protected]>
> Cc: Frederic Weisbecker<[email protected]>
> Cc: Steven Rostedt<[email protected]>
> Cc: Andi Kleen<[email protected]>
>
LKML-Reference:<[email protected]>
> Signed-off-by: Ingo Molnar<[email protected]>
> ---
> tools/perf/Makefile | 11 +++++++++++
> tools/perf/bench/mem-memcpy-arch.h | 12 ++++++++++++
> tools/perf/bench/mem-memcpy-x86-64-asm-def.h | 4 ++++
> tools/perf/bench/mem-memcpy-x86-64-asm.S | 2 ++
> tools/perf/util/include/asm/cpufeature.h | 9 +++++++++
> tools/perf/util/include/asm/dwarf2.h | 11 +++++++++++
> tools/perf/util/include/linux/linkage.h | 13 +++++++++++++
> 7 files changed, 62 insertions(+), 0 deletions(-)
>
> diff --git a/tools/perf/Makefile b/tools/perf/Makefile
> index 74b684d..e0db197 100644
> --- a/tools/perf/Makefile
> +++ b/tools/perf/Makefile
> @@ -185,7 +185,10 @@ ifeq ($(ARCH),i386)
> ARCH := x86
> endif
> ifeq ($(ARCH),x86_64)
> + RAW_ARCH := x86_64
> ARCH := x86
> + ARCH_CFLAGS := -DARCH_X86_64
> + ARCH_INCLUDE = ../../arch/x86/lib/memcpy_64.S
> endif
>
> # CFLAGS and LDFLAGS are for the users to override from the command
line.
> @@ -375,6 +378,7 @@ LIB_H += util/include/linux/prefetch.h
> LIB_H += util/include/linux/rbtree.h
> LIB_H += util/include/linux/string.h
> LIB_H += util/include/linux/types.h
> +LIB_H += util/include/linux/linkage.h
> LIB_H += util/include/asm/asm-offsets.h
> LIB_H += util/include/asm/bug.h
> LIB_H += util/include/asm/byteorder.h
> @@ -383,6 +387,8 @@ LIB_H += util/include/asm/swab.h
> LIB_H += util/include/asm/system.h
> LIB_H += util/include/asm/uaccess.h
> LIB_H += util/include/dwarf-regs.h
> +LIB_H += util/include/asm/dwarf2.h
> +LIB_H += util/include/asm/cpufeature.h
> LIB_H += perf.h
> LIB_H += util/cache.h
> LIB_H += util/callchain.h
> @@ -417,6 +423,7 @@ LIB_H += util/probe-finder.h
> LIB_H += util/probe-event.h
> LIB_H += util/pstack.h
> LIB_H += util/cpumap.h
> +LIB_H += $(ARCH_INCLUDE)
>
> LIB_OBJS += $(OUTPUT)util/abspath.o
> LIB_OBJS += $(OUTPUT)util/alias.o
> @@ -472,6 +479,9 @@ BUILTIN_OBJS += $(OUTPUT)builtin-bench.o
> # Benchmark modules
> BUILTIN_OBJS += $(OUTPUT)bench/sched-messaging.o
> BUILTIN_OBJS += $(OUTPUT)bench/sched-pipe.o
> +ifeq ($(RAW_ARCH),x86_64)
> +BUILTIN_OBJS += $(OUTPUT)bench/mem-memcpy-x86-64-asm.o
> +endif
> BUILTIN_OBJS += $(OUTPUT)bench/mem-memcpy.o
>
> BUILTIN_OBJS += $(OUTPUT)builtin-diff.o
> @@ -898,6 +908,7 @@ BASIC_CFLAGS += -DSHA1_HEADER='$(SHA1_HEADER_SQ)' \
> LIB_OBJS += $(COMPAT_OBJS)
>
> ALL_CFLAGS += $(BASIC_CFLAGS)
> +ALL_CFLAGS += $(ARCH_CFLAGS)
> ALL_LDFLAGS += $(BASIC_LDFLAGS)
>
> export TAR INSTALL DESTDIR SHELL_PATH
> diff --git a/tools/perf/bench/mem-memcpy-arch.h
b/tools/perf/bench/mem-memcpy-arch.h
> new file mode 100644
> index 0000000..a72e36c
> --- /dev/null
> +++ b/tools/perf/bench/mem-memcpy-arch.h
> @@ -0,0 +1,12 @@
> +
> +#ifdef ARCH_X86_64
> +
> +#define MEMCPY_FN(fn, name, desc) \
> + extern void *fn(void *, const void *, size_t);
> +
> +#include "mem-memcpy-x86-64-asm-def.h"
> +
> +#undef MEMCPY_FN
> +
> +#endif
> +
> diff --git a/tools/perf/bench/mem-memcpy-x86-64-asm-def.h
b/tools/perf/bench/mem-memcpy-x86-64-asm-def.h
> new file mode 100644
> index 0000000..d588b87
> --- /dev/null
> +++ b/tools/perf/bench/mem-memcpy-x86-64-asm-def.h
> @@ -0,0 +1,4 @@
> +
> +MEMCPY_FN(__memcpy,
> + "x86-64-unrolled",
> + "unrolled memcpy() in arch/x86/lib/memcpy_64.S")
> diff --git a/tools/perf/bench/mem-memcpy-x86-64-asm.S
b/tools/perf/bench/mem-memcpy-x86-64-asm.S
> new file mode 100644
> index 0000000..a57b66e
> --- /dev/null
> +++ b/tools/perf/bench/mem-memcpy-x86-64-asm.S
> @@ -0,0 +1,2 @@
> +
> +#include "../../../arch/x86/lib/memcpy_64.S"
> diff --git a/tools/perf/util/include/asm/cpufeature.h
b/tools/perf/util/include/asm/cpufeature.h
> new file mode 100644
> index 0000000..acffd5e
> --- /dev/null
> +++ b/tools/perf/util/include/asm/cpufeature.h
> @@ -0,0 +1,9 @@
> +
> +#ifndef PERF_CPUFEATURE_H
> +#define PERF_CPUFEATURE_H
> +
> +/* cpufeature.h ... dummy header file for including
arch/x86/lib/memcpy_64.S */
> +
> +#define X86_FEATURE_REP_GOOD 0
> +
> +#endif /* PERF_CPUFEATURE_H */
> diff --git a/tools/perf/util/include/asm/dwarf2.h
b/tools/perf/util/include/asm/dwarf2.h
> new file mode 100644
> index 0000000..bb4198e
> --- /dev/null
> +++ b/tools/perf/util/include/asm/dwarf2.h
> @@ -0,0 +1,11 @@
> +
> +#ifndef PERF_DWARF2_H
> +#define PERF_DWARF2_H
> +
> +/* dwarf2.h ... dummy header file for including
arch/x86/lib/memcpy_64.S */
> +
> +#define CFI_STARTPROC
> +#define CFI_ENDPROC
> +
> +#endif /* PERF_DWARF2_H */
> +
> diff --git a/tools/perf/util/include/linux/linkage.h
b/tools/perf/util/include/linux/linkage.h
> new file mode 100644
> index 0000000..06387cf
> --- /dev/null
> +++ b/tools/perf/util/include/linux/linkage.h
> @@ -0,0 +1,13 @@
> +
> +#ifndef PERF_LINUX_LINKAGE_H_
> +#define PERF_LINUX_LINKAGE_H_
> +
> +/* linkage.h ... for including arch/x86/lib/memcpy_64.S */
> +
> +#define ENTRY(name) \
> + .globl name; \
> + name:
> +
> +#define ENDPROC(name)
> +
> +#endif /* PERF_LINUX_LINKAGE_H_ */
>
Thanks for your applying, Ingo!
BTW, I have a question.
Why does the symbol name of rep prefix memcpy() start from '.'?
The symbol name starts from '.' like ".Lmemcpy_c" cannot seen as
symbol name after compile.
I couldn't find the reason why .Lmemcpy_c has to start from '.'.
For example, clear_page in arch/x86/lib/clear_page_64.S
doesn't start from '.' but it is alternative function.
If there is no special reason, I'd like to rename it.
Thanks,
Hitoshi