2009-12-06 10:59:07

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH] perf/sched: fix for getting task's execute time

In current code, we get task's execute time by reading
"/proc/<pid>/sched" file, it's wrong if the task is created
by pthread_create(), because every thread task has same pid.
So, the correct way is reading "/proc/<ppid>/task/<tid>/sched"
file.

This patch also remove redundant include files since <sys/types.h>
is included in "perf.h"

Signed-off-by: Xiao Guangrong <[email protected]>
---
tools/perf/builtin-sched.c | 7 ++++---
1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 26b782f..b2e910e 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -13,7 +13,6 @@
#include "util/debug.h"
#include "util/data_map.h"

-#include <sys/types.h>
#include <sys/prctl.h>

#include <semaphore.h>
@@ -416,7 +415,7 @@ static u64 get_cpu_usage_nsec_parent(void)

static u64 get_cpu_usage_nsec_self(void)
{
- char filename [] = "/proc/1234567890/sched";
+ char filename [] = "/proc/1234567890/task/1234567890/sched";
unsigned long msecs, nsecs;
char *line = NULL;
u64 total = 0;
@@ -425,7 +424,9 @@ static u64 get_cpu_usage_nsec_self(void)
FILE *file;
int ret;

- sprintf(filename, "/proc/%d/sched", getpid());
+ sprintf(filename, "/proc/%d/task/%d/sched", getpid(),
+ (pid_t)syscall(SYS_gettid));
+
file = fopen(filename, "r");
BUG_ON(!file);

--
1.6.1.2


2009-12-06 11:05:16

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] perf/sched: fix for getting task's execute time

On Sun, 2009-12-06 at 18:57 +0800, Xiao Guangrong wrote:
> In current code, we get task's execute time by reading
> "/proc/<pid>/sched" file, it's wrong if the task is created
> by pthread_create(), because every thread task has same pid.
> So, the correct way is reading "/proc/<ppid>/task/<tid>/sched"
> file.
>
> This patch also remove redundant include files since <sys/types.h>
> is included in "perf.h"

We really should not be using these proc files but instead make sure
this information gets transferred through a tracepoint or similar.

Reading these proc files is too prone to races.

2009-12-06 11:06:51

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] perf/sched: fix for getting task's execute time

On Sun, 2009-12-06 at 12:05 +0100, Peter Zijlstra wrote:
> On Sun, 2009-12-06 at 18:57 +0800, Xiao Guangrong wrote:
> > In current code, we get task's execute time by reading
> > "/proc/<pid>/sched" file, it's wrong if the task is created
> > by pthread_create(), because every thread task has same pid.
> > So, the correct way is reading "/proc/<ppid>/task/<tid>/sched"
> > file.
> >
> > This patch also remove redundant include files since <sys/types.h>
> > is included in "perf.h"
>
> We really should not be using these proc files but instead make sure
> this information gets transferred through a tracepoint or similar.
>
> Reading these proc files is too prone to races.

We can probably get the runtime by grouping a task-clock swcounter with
an appropriate other event.

2009-12-06 11:52:26

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] perf/sched: fix for getting task's execute time


* Peter Zijlstra <[email protected]> wrote:

> On Sun, 2009-12-06 at 12:05 +0100, Peter Zijlstra wrote:
> > On Sun, 2009-12-06 at 18:57 +0800, Xiao Guangrong wrote:
> > > In current code, we get task's execute time by reading
> > > "/proc/<pid>/sched" file, it's wrong if the task is created
> > > by pthread_create(), because every thread task has same pid.
> > > So, the correct way is reading "/proc/<ppid>/task/<tid>/sched"
> > > file.
> > >
> > > This patch also remove redundant include files since <sys/types.h>
> > > is included in "perf.h"
> >
> > We really should not be using these proc files but instead make sure
> > this information gets transferred through a tracepoint or similar.
> >
> > Reading these proc files is too prone to races.

yeah. Ideally we'd want all valuable information that is available via
/proc to be available via perf events as well. In the future it should
be possible to run perf even without /proc mounted for example.

Furthermore it's good for consistency and simplicity as well, plus it's
faster too to get the information from the perf syscall and mmap-ed
ringbuffer than to read things via /proc. No need for ASCII conversion,
fixed record formats, fast streaming and buffering, no read() overhead,
etc.

> We can probably get the runtime by grouping a task-clock swcounter
> with an appropriate other event.

Would be lovely.

Ingo

2009-12-06 17:16:34

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH] perf/sched: fix for getting task's execute time

Peter Zijlstra wrote:

>> We really should not be using these proc files but instead make sure
>> this information gets transferred through a tracepoint or similar.
>>
>> Reading these proc files is too prone to races.
>
> We can probably get the runtime by grouping a task-clock swcounter with
> an appropriate other event.
>

Hi Peter,

Thanks your suggestion.

Actually, we can call getrusage(RUSAGE_THREAD, ru) to get current task's
execute time, and I think this is a simpler way.

I'll send v2 patch later.

Thanks,
Xiao

2009-12-07 07:22:12

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2] perf/sched: fix for getting task's execution time

In current code, task's execute time is got by reading
'/proc/<pid>/sched' file, it's wrong if the task is created
by pthread_create(), because every thread task has same pid.

This way also has two demerits:

1: 'perf sched replay' can't work if the kernel not compile with
'CONFIG_SCHED_DEBUG' option
2: perf tool should depend on proc file system

So, this patch call getrusage() to get task's execution time instead
of reading /proc file

Reported-by: Török Edwin <[email protected]>
Signed-off-by: Xiao Guangrong <[email protected]>
---
tools/perf/builtin-sched.c | 46 ++++++++++++++-----------------------------
1 files changed, 15 insertions(+), 31 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 19f43fa..bd57994 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -13,7 +13,6 @@
#include "util/debug.h"
#include "util/data_map.h"

-#include <sys/types.h>
#include <sys/prctl.h>

#include <semaphore.h>
@@ -399,49 +398,34 @@ process_sched_event(struct task_desc *this_task __used, struct sched_atom *atom)
}
}

+static u64 get_sum_time(struct rusage *ru)
+{
+ u64 sum;
+
+ sum = ru->ru_utime.tv_sec*1e9 + ru->ru_utime.tv_usec*1e3;
+ sum += ru->ru_stime.tv_sec*1e9 + ru->ru_stime.tv_usec*1e3;
+ return sum;
+}
+
static u64 get_cpu_usage_nsec_parent(void)
{
struct rusage ru;
- u64 sum;
int err;

err = getrusage(RUSAGE_SELF, &ru);
BUG_ON(err);

- sum = ru.ru_utime.tv_sec*1e9 + ru.ru_utime.tv_usec*1e3;
- sum += ru.ru_stime.tv_sec*1e9 + ru.ru_stime.tv_usec*1e3;
-
- return sum;
+ return get_sum_time(&ru);
}

static u64 get_cpu_usage_nsec_self(void)
{
- char filename [] = "/proc/1234567890/sched";
- unsigned long msecs, nsecs;
- char *line = NULL;
- u64 total = 0;
- size_t len = 0;
- ssize_t chars;
- FILE *file;
- int ret;
-
- sprintf(filename, "/proc/%d/sched", getpid());
- file = fopen(filename, "r");
- BUG_ON(!file);
-
- while ((chars = getline(&line, &len, file)) != -1) {
- ret = sscanf(line, "se.sum_exec_runtime : %ld.%06ld\n",
- &msecs, &nsecs);
- if (ret == 2) {
- total = msecs*1e6 + nsecs;
- break;
- }
- }
- if (line)
- free(line);
- fclose(file);
+ struct rusage ru;
+ int err;

- return total;
+ err = getrusage(RUSAGE_THREAD, &ru);
+ BUG_ON(err);
+ return get_sum_time(&ru);
}

static void *thread_func(void *ctx)
--
1.6.1.2

2009-12-07 07:32:09

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v2] perf/sched: fix for getting task's execution time


* Xiao Guangrong <[email protected]> wrote:

> In current code, task's execute time is got by reading
> '/proc/<pid>/sched' file, it's wrong if the task is created
> by pthread_create(), because every thread task has same pid.
>
> This way also has two demerits:
>
> 1: 'perf sched replay' can't work if the kernel not compile with
> 'CONFIG_SCHED_DEBUG' option
> 2: perf tool should depend on proc file system
>
> So, this patch call getrusage() to get task's execution time instead
> of reading /proc file

ok, that's better than /proc, but how about using
PERF_COUNT_SW_TASK_CLOCK instead of rusage, to recover the CPU time
used? It would be more precise, and it would use the perf API. (Some
helper functions in tools/perf/lib/ would be nice to make it as easy to
use as rusage (or even easier if possible).)

Ingo

2009-12-09 09:53:11

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v3] perf/sched: fix for getting task's execution time

In current code, task's execute time is got by reading
'/proc/<pid>/sched' file, it's wrong if the task is created
by pthread_create(), because every thread task has same pid.

This way also has two demerits:

1: 'perf sched replay' can't work if the kernel not compile
with 'CONFIG_SCHED_DEBUG' option
2: perf tool should depend on proc file system

So, this patch use PERF_COUNT_SW_TASK_CLOCK to get task's
execution time instead of reading /proc file

Changelog v2 -> v3:
use PERF_COUNT_SW_TASK_CLOCK instead of rusage() as Ingo's suggestion

Signed-off-by: Xiao Guangrong <[email protected]>
---
tools/perf/builtin-sched.c | 55 +++++++++++++++++++++----------------------
1 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 19f43fa..b12b23a 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -13,7 +13,6 @@
#include "util/debug.h"
#include "util/data_map.h"

-#include <sys/types.h>
#include <sys/prctl.h>

#include <semaphore.h>
@@ -414,34 +413,33 @@ static u64 get_cpu_usage_nsec_parent(void)
return sum;
}

-static u64 get_cpu_usage_nsec_self(void)
+static int self_open_counters(void)
{
- char filename [] = "/proc/1234567890/sched";
- unsigned long msecs, nsecs;
- char *line = NULL;
- u64 total = 0;
- size_t len = 0;
- ssize_t chars;
- FILE *file;
- int ret;
+ struct perf_event_attr attr;
+ int fd;

- sprintf(filename, "/proc/%d/sched", getpid());
- file = fopen(filename, "r");
- BUG_ON(!file);
+ memset(&attr, 0, sizeof(attr));

- while ((chars = getline(&line, &len, file)) != -1) {
- ret = sscanf(line, "se.sum_exec_runtime : %ld.%06ld\n",
- &msecs, &nsecs);
- if (ret == 2) {
- total = msecs*1e6 + nsecs;
- break;
- }
- }
- if (line)
- free(line);
- fclose(file);
+ attr.type = PERF_TYPE_SOFTWARE;
+ attr.config = PERF_COUNT_SW_TASK_CLOCK;

- return total;
+ fd = sys_perf_event_open(&attr, 0, -1, -1, 0);
+
+ if (fd < 0)
+ die("Error: sys_perf_event_open() syscall returned"
+ "with %d (%s)\n", fd, strerror(errno));
+ return fd;
+}
+
+static u64 get_cpu_usage_nsec_self(int fd)
+{
+ u64 runtime;
+ int ret;
+
+ ret = read(fd, &runtime, sizeof(runtime));
+ BUG_ON(ret != sizeof(runtime));
+
+ return runtime;
}

static void *thread_func(void *ctx)
@@ -450,9 +448,11 @@ static void *thread_func(void *ctx)
u64 cpu_usage_0, cpu_usage_1;
unsigned long i, ret;
char comm2[22];
+ int fd;

sprintf(comm2, ":%s", this_task->comm);
prctl(PR_SET_NAME, comm2);
+ fd = self_open_counters();

again:
ret = sem_post(&this_task->ready_for_work);
@@ -462,16 +462,15 @@ again:
ret = pthread_mutex_unlock(&start_work_mutex);
BUG_ON(ret);

- cpu_usage_0 = get_cpu_usage_nsec_self();
+ cpu_usage_0 = get_cpu_usage_nsec_self(fd);

for (i = 0; i < this_task->nr_events; i++) {
this_task->curr_event = i;
process_sched_event(this_task, this_task->atoms[i]);
}

- cpu_usage_1 = get_cpu_usage_nsec_self();
+ cpu_usage_1 = get_cpu_usage_nsec_self(fd);
this_task->cpu_usage = cpu_usage_1 - cpu_usage_0;
-
ret = sem_post(&this_task->work_done_sem);
BUG_ON(ret);

--
1.6.1.2




2009-12-09 09:56:21

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH v3] perf/sched: fix for getting task's execution time

Sorry, miss Reported-by, please ignore this mail, i'll resend it

Xiao Guangrong wrote:
> In current code, task's execute time is got by reading
> '/proc/<pid>/sched' file, it's wrong if the task is created
> by pthread_create(), because every thread task has same pid.
>
> This way also has two demerits:
>
> 1: 'perf sched replay' can't work if the kernel not compile
> with 'CONFIG_SCHED_DEBUG' option
> 2: perf tool should depend on proc file system
>
> So, this patch use PERF_COUNT_SW_TASK_CLOCK to get task's
> execution time instead of reading /proc file
>
> Changelog v2 -> v3:
> use PERF_COUNT_SW_TASK_CLOCK instead of rusage() as Ingo's suggestion
>
> Signed-off-by: Xiao Guangrong <[email protected]>

2009-12-09 09:58:44

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v3] perf/sched: fix for getting task's execution time

In current code, task's execute time is got by reading
'/proc/<pid>/sched' file, it's wrong if the task is created
by pthread_create(), because every thread task has same pid.

This way also has two demerits:

1: 'perf sched replay' can't work if the kernel not compile
with 'CONFIG_SCHED_DEBUG' option
2: perf tool should depend on proc file system

So, this patch use PERF_COUNT_SW_TASK_CLOCK to get task's
execution time instead of reading /proc file

Changelog v2 -> v3:
use PERF_COUNT_SW_TASK_CLOCK instead of rusage() as Ingo's suggestion

Reported-by: Török Edwin <[email protected]>
Signed-off-by: Xiao Guangrong <[email protected]>
---
tools/perf/builtin-sched.c | 55 +++++++++++++++++++++----------------------
1 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 19f43fa..b12b23a 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -13,7 +13,6 @@
#include "util/debug.h"
#include "util/data_map.h"

-#include <sys/types.h>
#include <sys/prctl.h>

#include <semaphore.h>
@@ -414,34 +413,33 @@ static u64 get_cpu_usage_nsec_parent(void)
return sum;
}

-static u64 get_cpu_usage_nsec_self(void)
+static int self_open_counters(void)
{
- char filename [] = "/proc/1234567890/sched";
- unsigned long msecs, nsecs;
- char *line = NULL;
- u64 total = 0;
- size_t len = 0;
- ssize_t chars;
- FILE *file;
- int ret;
+ struct perf_event_attr attr;
+ int fd;

- sprintf(filename, "/proc/%d/sched", getpid());
- file = fopen(filename, "r");
- BUG_ON(!file);
+ memset(&attr, 0, sizeof(attr));

- while ((chars = getline(&line, &len, file)) != -1) {
- ret = sscanf(line, "se.sum_exec_runtime : %ld.%06ld\n",
- &msecs, &nsecs);
- if (ret == 2) {
- total = msecs*1e6 + nsecs;
- break;
- }
- }
- if (line)
- free(line);
- fclose(file);
+ attr.type = PERF_TYPE_SOFTWARE;
+ attr.config = PERF_COUNT_SW_TASK_CLOCK;

- return total;
+ fd = sys_perf_event_open(&attr, 0, -1, -1, 0);
+
+ if (fd < 0)
+ die("Error: sys_perf_event_open() syscall returned"
+ "with %d (%s)\n", fd, strerror(errno));
+ return fd;
+}
+
+static u64 get_cpu_usage_nsec_self(int fd)
+{
+ u64 runtime;
+ int ret;
+
+ ret = read(fd, &runtime, sizeof(runtime));
+ BUG_ON(ret != sizeof(runtime));
+
+ return runtime;
}

static void *thread_func(void *ctx)
@@ -450,9 +448,11 @@ static void *thread_func(void *ctx)
u64 cpu_usage_0, cpu_usage_1;
unsigned long i, ret;
char comm2[22];
+ int fd;

sprintf(comm2, ":%s", this_task->comm);
prctl(PR_SET_NAME, comm2);
+ fd = self_open_counters();

again:
ret = sem_post(&this_task->ready_for_work);
@@ -462,16 +462,15 @@ again:
ret = pthread_mutex_unlock(&start_work_mutex);
BUG_ON(ret);

- cpu_usage_0 = get_cpu_usage_nsec_self();
+ cpu_usage_0 = get_cpu_usage_nsec_self(fd);

for (i = 0; i < this_task->nr_events; i++) {
this_task->curr_event = i;
process_sched_event(this_task, this_task->atoms[i]);
}

- cpu_usage_1 = get_cpu_usage_nsec_self();
+ cpu_usage_1 = get_cpu_usage_nsec_self(fd);
this_task->cpu_usage = cpu_usage_1 - cpu_usage_0;
-
ret = sem_post(&this_task->work_done_sem);
BUG_ON(ret);

--
1.6.1.2

2009-12-09 09:57:50

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v3] perf/sched: fix for getting task's execution time


* Xiao Guangrong <[email protected]> wrote:

> In current code, task's execute time is got by reading
> '/proc/<pid>/sched' file, it's wrong if the task is created
> by pthread_create(), because every thread task has same pid.
>
> This way also has two demerits:
>
> 1: 'perf sched replay' can't work if the kernel not compile
> with 'CONFIG_SCHED_DEBUG' option
> 2: perf tool should depend on proc file system
>
> So, this patch use PERF_COUNT_SW_TASK_CLOCK to get task's
> execution time instead of reading /proc file
>
> Changelog v2 -> v3:
> use PERF_COUNT_SW_TASK_CLOCK instead of rusage() as Ingo's suggestion
>
> Signed-off-by: Xiao Guangrong <[email protected]>
> ---
> tools/perf/builtin-sched.c | 55 +++++++++++++++++++++----------------------
> 1 files changed, 27 insertions(+), 28 deletions(-)
>
> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> index 19f43fa..b12b23a 100644
> --- a/tools/perf/builtin-sched.c
> +++ b/tools/perf/builtin-sched.c
> @@ -13,7 +13,6 @@
> #include "util/debug.h"
> #include "util/data_map.h"
>
> -#include <sys/types.h>
> #include <sys/prctl.h>
>
> #include <semaphore.h>
> @@ -414,34 +413,33 @@ static u64 get_cpu_usage_nsec_parent(void)
> return sum;
> }
>
> -static u64 get_cpu_usage_nsec_self(void)
> +static int self_open_counters(void)
> {
> - char filename [] = "/proc/1234567890/sched";
> - unsigned long msecs, nsecs;
> - char *line = NULL;
> - u64 total = 0;
> - size_t len = 0;
> - ssize_t chars;
> - FILE *file;
> - int ret;
> + struct perf_event_attr attr;
> + int fd;
>
> - sprintf(filename, "/proc/%d/sched", getpid());
> - file = fopen(filename, "r");
> - BUG_ON(!file);
> + memset(&attr, 0, sizeof(attr));
>
> - while ((chars = getline(&line, &len, file)) != -1) {
> - ret = sscanf(line, "se.sum_exec_runtime : %ld.%06ld\n",
> - &msecs, &nsecs);
> - if (ret == 2) {
> - total = msecs*1e6 + nsecs;
> - break;
> - }
> - }
> - if (line)
> - free(line);
> - fclose(file);
> + attr.type = PERF_TYPE_SOFTWARE;
> + attr.config = PERF_COUNT_SW_TASK_CLOCK;
>
> - return total;
> + fd = sys_perf_event_open(&attr, 0, -1, -1, 0);
> +
> + if (fd < 0)
> + die("Error: sys_perf_event_open() syscall returned"
> + "with %d (%s)\n", fd, strerror(errno));
> + return fd;
> +}
> +
> +static u64 get_cpu_usage_nsec_self(int fd)
> +{
> + u64 runtime;
> + int ret;
> +
> + ret = read(fd, &runtime, sizeof(runtime));
> + BUG_ON(ret != sizeof(runtime));
> +
> + return runtime;
> }
>
> static void *thread_func(void *ctx)
> @@ -450,9 +448,11 @@ static void *thread_func(void *ctx)
> u64 cpu_usage_0, cpu_usage_1;
> unsigned long i, ret;
> char comm2[22];
> + int fd;
>
> sprintf(comm2, ":%s", this_task->comm);
> prctl(PR_SET_NAME, comm2);
> + fd = self_open_counters();
>
> again:
> ret = sem_post(&this_task->ready_for_work);
> @@ -462,16 +462,15 @@ again:
> ret = pthread_mutex_unlock(&start_work_mutex);
> BUG_ON(ret);
>
> - cpu_usage_0 = get_cpu_usage_nsec_self();
> + cpu_usage_0 = get_cpu_usage_nsec_self(fd);
>
> for (i = 0; i < this_task->nr_events; i++) {
> this_task->curr_event = i;
> process_sched_event(this_task, this_task->atoms[i]);
> }
>
> - cpu_usage_1 = get_cpu_usage_nsec_self();
> + cpu_usage_1 = get_cpu_usage_nsec_self(fd);
> this_task->cpu_usage = cpu_usage_1 - cpu_usage_0;
> -
> ret = sem_post(&this_task->work_done_sem);
> BUG_ON(ret);

Very nice - and the code even got shorter a tiny bit.

Applied, thanks!

Ingo

2009-12-09 09:59:24

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v3] perf/sched: fix for getting task's execution time


* Xiao Guangrong <[email protected]> wrote:

> Sorry, miss Reported-by, please ignore this mail, i'll resend it

I've added Edwin's Reported-by, no need to resend.

Ingo

2009-12-09 10:04:08

by Xiao Guangrong

[permalink] [raw]
Subject: [tip:perf/urgent] perf sched: Fix for getting task's execution time

Commit-ID: c0c9e72150c07b4a6766cd24a6f685ec2dc9c343
Gitweb: http://git.kernel.org/tip/c0c9e72150c07b4a6766cd24a6f685ec2dc9c343
Author: Xiao Guangrong <[email protected]>
AuthorDate: Wed, 9 Dec 2009 17:51:30 +0800
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 9 Dec 2009 10:59:12 +0100

perf sched: Fix for getting task's execution time

In current code, task's execute time is got by reading
'/proc/<pid>/sched' file, it's wrong if the task is created
by pthread_create(), because every thread task has same pid.

This way also has two demerits:

1: 'perf sched replay' can't work if the kernel is not
compiled with the 'CONFIG_SCHED_DEBUG' option

2: perf tool should depend on proc file system

So, this patch uses PERF_COUNT_SW_TASK_CLOCK to get task's
execution time instead of reading /proc file.

Changelog v2 -> v3:
use PERF_COUNT_SW_TASK_CLOCK instead of rusage() as Ingo's
suggestion

Reported-by: Torok Edwin <[email protected]>
Signed-off-by: Xiao Guangrong <[email protected]>
Cc: Xiao Guangrong <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Paul Mackerras <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
tools/perf/builtin-sched.c | 55 +++++++++++++++++++++----------------------
1 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 19f43fa..b12b23a 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -13,7 +13,6 @@
#include "util/debug.h"
#include "util/data_map.h"

-#include <sys/types.h>
#include <sys/prctl.h>

#include <semaphore.h>
@@ -414,34 +413,33 @@ static u64 get_cpu_usage_nsec_parent(void)
return sum;
}

-static u64 get_cpu_usage_nsec_self(void)
+static int self_open_counters(void)
{
- char filename [] = "/proc/1234567890/sched";
- unsigned long msecs, nsecs;
- char *line = NULL;
- u64 total = 0;
- size_t len = 0;
- ssize_t chars;
- FILE *file;
- int ret;
+ struct perf_event_attr attr;
+ int fd;

- sprintf(filename, "/proc/%d/sched", getpid());
- file = fopen(filename, "r");
- BUG_ON(!file);
+ memset(&attr, 0, sizeof(attr));

- while ((chars = getline(&line, &len, file)) != -1) {
- ret = sscanf(line, "se.sum_exec_runtime : %ld.%06ld\n",
- &msecs, &nsecs);
- if (ret == 2) {
- total = msecs*1e6 + nsecs;
- break;
- }
- }
- if (line)
- free(line);
- fclose(file);
+ attr.type = PERF_TYPE_SOFTWARE;
+ attr.config = PERF_COUNT_SW_TASK_CLOCK;

- return total;
+ fd = sys_perf_event_open(&attr, 0, -1, -1, 0);
+
+ if (fd < 0)
+ die("Error: sys_perf_event_open() syscall returned"
+ "with %d (%s)\n", fd, strerror(errno));
+ return fd;
+}
+
+static u64 get_cpu_usage_nsec_self(int fd)
+{
+ u64 runtime;
+ int ret;
+
+ ret = read(fd, &runtime, sizeof(runtime));
+ BUG_ON(ret != sizeof(runtime));
+
+ return runtime;
}

static void *thread_func(void *ctx)
@@ -450,9 +448,11 @@ static void *thread_func(void *ctx)
u64 cpu_usage_0, cpu_usage_1;
unsigned long i, ret;
char comm2[22];
+ int fd;

sprintf(comm2, ":%s", this_task->comm);
prctl(PR_SET_NAME, comm2);
+ fd = self_open_counters();

again:
ret = sem_post(&this_task->ready_for_work);
@@ -462,16 +462,15 @@ again:
ret = pthread_mutex_unlock(&start_work_mutex);
BUG_ON(ret);

- cpu_usage_0 = get_cpu_usage_nsec_self();
+ cpu_usage_0 = get_cpu_usage_nsec_self(fd);

for (i = 0; i < this_task->nr_events; i++) {
this_task->curr_event = i;
process_sched_event(this_task, this_task->atoms[i]);
}

- cpu_usage_1 = get_cpu_usage_nsec_self();
+ cpu_usage_1 = get_cpu_usage_nsec_self(fd);
this_task->cpu_usage = cpu_usage_1 - cpu_usage_0;
-
ret = sem_post(&this_task->work_done_sem);
BUG_ON(ret);