perf data files cannot be processed until the header file is update
which is done via an on_exit handler. If perf is killed due to a SIGTERM
it does not run the on_exit hooks leaving the perf.data file in a
random state which perf-report will happily spin on trying to read. As
noted by Mike an easy reproducer is:
perf record -a -g & sleep 1; killall perf
Fix by catching SIGTERM like it does SIGINT. Also need to remove the
kill which was added via commit f7b7c26e.
Signed-off-by: David Ahern <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Stephane Eranian <[email protected]>
---
tools/perf/builtin-record.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index cdf58ec..fff985c 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -198,7 +198,6 @@ static void perf_record__sig_exit(int exit_status __maybe_unused, void *arg)
return;
signal(signr, SIG_DFL);
- kill(getpid(), signr);
}
static bool perf_evlist__equal(struct perf_evlist *evlist,
@@ -404,6 +403,7 @@ static int __cmd_record(struct perf_record *rec, int argc, const char **argv)
signal(SIGCHLD, sig_handler);
signal(SIGINT, sig_handler);
signal(SIGUSR1, sig_handler);
+ signal(SIGTERM, sig_handler);
if (!output_name) {
if (!fstat(STDOUT_FILENO, &st) && S_ISFIFO(st.st_mode))
--
1.7.10.1
On 5/6/13 12:24 PM, David Ahern wrote:
> perf data files cannot be processed until the header file is update
yuk, that is supposed to say "until the header is updated"
David
> which is done via an on_exit handler. If perf is killed due to a SIGTERM
> it does not run the on_exit hooks leaving the perf.data file in a
> random state which perf-report will happily spin on trying to read. As
> noted by Mike an easy reproducer is:
> perf record -a -g & sleep 1; killall perf
>
> Fix by catching SIGTERM like it does SIGINT. Also need to remove the
> kill which was added via commit f7b7c26e.
>
> Signed-off-by: David Ahern <[email protected]>
> Cc: Mike Galbraith <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Jiri Olsa <[email protected]>
> Cc: Namhyung Kim <[email protected]>
> Cc: Stephane Eranian <[email protected]>
> ---
> tools/perf/builtin-record.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index cdf58ec..fff985c 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -198,7 +198,6 @@ static void perf_record__sig_exit(int exit_status __maybe_unused, void *arg)
> return;
>
> signal(signr, SIG_DFL);
> - kill(getpid(), signr);
> }
>
> static bool perf_evlist__equal(struct perf_evlist *evlist,
> @@ -404,6 +403,7 @@ static int __cmd_record(struct perf_record *rec, int argc, const char **argv)
> signal(SIGCHLD, sig_handler);
> signal(SIGINT, sig_handler);
> signal(SIGUSR1, sig_handler);
> + signal(SIGTERM, sig_handler);
>
> if (!output_name) {
> if (!fstat(STDOUT_FILENO, &st) && S_ISFIFO(st.st_mode))
>
This is a good fix. I have run into this infinite loop in perf report
many times.
Thanks David.
On Mon, May 6, 2013 at 8:24 PM, David Ahern <[email protected]> wrote:
> perf data files cannot be processed until the header file is update
> which is done via an on_exit handler. If perf is killed due to a SIGTERM
> it does not run the on_exit hooks leaving the perf.data file in a
> random state which perf-report will happily spin on trying to read. As
> noted by Mike an easy reproducer is:
> perf record -a -g & sleep 1; killall perf
>
> Fix by catching SIGTERM like it does SIGINT. Also need to remove the
> kill which was added via commit f7b7c26e.
>
> Signed-off-by: David Ahern <[email protected]>
> Cc: Mike Galbraith <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Jiri Olsa <[email protected]>
> Cc: Namhyung Kim <[email protected]>
> Cc: Stephane Eranian <[email protected]>
> ---
> tools/perf/builtin-record.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index cdf58ec..fff985c 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -198,7 +198,6 @@ static void perf_record__sig_exit(int exit_status __maybe_unused, void *arg)
> return;
>
> signal(signr, SIG_DFL);
> - kill(getpid(), signr);
> }
>
> static bool perf_evlist__equal(struct perf_evlist *evlist,
> @@ -404,6 +403,7 @@ static int __cmd_record(struct perf_record *rec, int argc, const char **argv)
> signal(SIGCHLD, sig_handler);
> signal(SIGINT, sig_handler);
> signal(SIGUSR1, sig_handler);
> + signal(SIGTERM, sig_handler);
>
> if (!output_name) {
> if (!fstat(STDOUT_FILENO, &st) && S_ISFIFO(st.st_mode))
> --
> 1.7.10.1
>
On 5/6/13 4:40 PM, Stephane Eranian wrote:
> This is a good fix. I have run into this infinite loop in perf report
> many times.
The perf_file_header could use an 'I am sane' bit which is only set when
the file is closed properly. Perhaps we could overload the magic field
like this:
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 326068a..cd9fad6 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -2364,7 +2364,6 @@ out_err_write:
}
f_header = (struct perf_file_header){
- .magic = PERF_MAGIC,
.size = sizeof(f_header),
.attr_size = sizeof(f_attr),
.attrs = {
@@ -2382,6 +2381,8 @@ out_err_write:
};
memcpy(&f_header.adds_features, &header->adds_features,
sizeof(header->adds_features));
+ if (at_exit)
+ f_header.magic = PERF_MAGIC,
lseek(fd, 0, SEEK_SET);
err = do_write(fd, &f_header, sizeof(f_header));
The perf magic is only written when the file is closed properly. In
Mike's case you end up with the message (which can be enhanced)
magic/endian check failed
incompatible file format (rerun with -v to learn more)
which is better than an infinite loop.
David
* Stephane Eranian <[email protected]> wrote:
> This is a good fix. I have run into this infinite loop in perf report
> many times.
Hm, perf record should really not assume much about the perf.data and
should avoid infinite loops ...
So while making perf.data more consistent on SIGTERM is a nice fix, perf
report should be fixed as well to detect loops and such.
Thanks,
Ingo
On 5/7/13 12:29 AM, Ingo Molnar wrote:
>
> * Stephane Eranian <[email protected]> wrote:
>
>> This is a good fix. I have run into this infinite loop in perf report
>> many times.
>
> Hm, perf record should really not assume much about the perf.data and
> should avoid infinite loops ...
>
> So while making perf.data more consistent on SIGTERM is a nice fix, perf
> report should be fixed as well to detect loops and such.
>
> Thanks,
>
> Ingo
>
This seems to do the trick:
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 326068a..e82646f 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -2802,6 +2802,17 @@ int perf_session__read_header(struct perf_session
*session, int fd)
if (perf_file_header__read(&f_header, header, fd) < 0)
return -EINVAL;
+ /*
+ * sanity check that perf.data was written cleanly: data size
+ * is initialized to 0 and updated only if the on_exit function
+ * is run. If data size is still 0 then the file cannot be
+ * processed.
+ */
+ if (f_header.data.size == 0) {
+ pr_err("data size is 0. Was record properly terminated?\n");
+ return -1;
+ }
+
nr_attrs = f_header.attrs.size / f_header.attr_size;
lseek(fd, f_header.attrs.offset, SEEK_SET);
Hi David,
On Tue, 07 May 2013 14:56:17 -0600, David Ahern wrote:
> On 5/7/13 12:29 AM, Ingo Molnar wrote:
>>
>> * Stephane Eranian <[email protected]> wrote:
>>
>>> This is a good fix. I have run into this infinite loop in perf report
>>> many times.
>>
>> Hm, perf record should really not assume much about the perf.data and
>> should avoid infinite loops ...
>>
>> So while making perf.data more consistent on SIGTERM is a nice fix, perf
>> report should be fixed as well to detect loops and such.
>>
>> Thanks,
>>
>> Ingo
>>
>
> This seems to do the trick:
Acked-by: Namhyung Kim <[email protected]>
Thanks,
Namhyung
>
> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index 326068a..e82646f 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c
> @@ -2802,6 +2802,17 @@ int perf_session__read_header(struct
> perf_session *session, int fd)
> if (perf_file_header__read(&f_header, header, fd) < 0)
> return -EINVAL;
>
> + /*
> + * sanity check that perf.data was written cleanly: data size
> + * is initialized to 0 and updated only if the on_exit function
> + * is run. If data size is still 0 then the file cannot be
> + * processed.
> + */
> + if (f_header.data.size == 0) {
> + pr_err("data size is 0. Was record properly terminated?\n");
> + return -1;
> + }
> +
> nr_attrs = f_header.attrs.size / f_header.attr_size;
> lseek(fd, f_header.attrs.offset, SEEK_SET);
* David Ahern <[email protected]> wrote:
> On 5/7/13 12:29 AM, Ingo Molnar wrote:
> >
> >* Stephane Eranian <[email protected]> wrote:
> >
> >>This is a good fix. I have run into this infinite loop in perf report
> >>many times.
> >
> >Hm, perf record should really not assume much about the perf.data and
> >should avoid infinite loops ...
> >
> >So while making perf.data more consistent on SIGTERM is a nice fix, perf
> >report should be fixed as well to detect loops and such.
> >
> >Thanks,
> >
> > Ingo
> >
>
> This seems to do the trick:
>
> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index 326068a..e82646f 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c
> @@ -2802,6 +2802,17 @@ int perf_session__read_header(struct
> perf_session *session, int fd)
> if (perf_file_header__read(&f_header, header, fd) < 0)
> return -EINVAL;
>
> + /*
> + * sanity check that perf.data was written cleanly: data size
> + * is initialized to 0 and updated only if the on_exit function
> + * is run. If data size is still 0 then the file cannot be
> + * processed.
> + */
> + if (f_header.data.size == 0) {
> + pr_err("data size is 0. Was record properly terminated?\n");
> + return -1;
> + }
Hm, this detects the condition - but where does the looping come from?
Can it happen with a perf.data that 'seems' clean but is corrupted
(because not fully written, buggy kernel just crashed, etc.).
In essence it would be _very_ nice if someone reproduced the looping and
checked what to do to fix the looping itself. Or does the above
data.size == 0 check fully fix the looping under every possible state of a
perf.data?
Thanks,
Ingo
On 5/8/13 12:54 AM, Ingo Molnar wrote:
>
> * David Ahern <[email protected]> wrote:
>
>> On 5/7/13 12:29 AM, Ingo Molnar wrote:
>>>
>>> * Stephane Eranian <[email protected]> wrote:
>>>
>>>> This is a good fix. I have run into this infinite loop in perf report
>>>> many times.
>>>
>>> Hm, perf record should really not assume much about the perf.data and
>>> should avoid infinite loops ...
>>>
>>> So while making perf.data more consistent on SIGTERM is a nice fix, perf
>>> report should be fixed as well to detect loops and such.
>>>
>>> Thanks,
>>>
>>> Ingo
>>>
>>
>> This seems to do the trick:
>>
>> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
>> index 326068a..e82646f 100644
>> --- a/tools/perf/util/header.c
>> +++ b/tools/perf/util/header.c
>> @@ -2802,6 +2802,17 @@ int perf_session__read_header(struct
>> perf_session *session, int fd)
>> if (perf_file_header__read(&f_header, header, fd) < 0)
>> return -EINVAL;
>>
>> + /*
>> + * sanity check that perf.data was written cleanly: data size
>> + * is initialized to 0 and updated only if the on_exit function
>> + * is run. If data size is still 0 then the file cannot be
>> + * processed.
>> + */
>> + if (f_header.data.size == 0) {
>> + pr_err("data size is 0. Was record properly terminated?\n");
>> + return -1;
>> + }
>
> Hm, this detects the condition - but where does the looping come from?
>
> Can it happen with a perf.data that 'seems' clean but is corrupted
> (because not fully written, buggy kernel just crashed, etc.).
>
> In essence it would be _very_ nice if someone reproduced the looping and
> checked what to do to fix the looping itself. Or does the above
> data.size == 0 check fully fix the looping under every possible state of a
> perf.data?
I think so. If you want the dirty details here you go.
The looping is in __perf_session__process_events. When the data file is
not closed properly data_size is 0 and n my case data_offset is 288.
Dropping into this function:
page_offset = page_size * (data_offset / page_size);
file_offset = page_offset;
head = data_offset - page_offset;
which means
page_offset = 0
file_offset = 0
head = 288
The looping is here:
remap:
buf = mmap(NULL, mmap_size, mmap_prot, mmap_flags, session->fd,
file_offset);
if (buf == MAP_FAILED) {
pr_err("failed to mmap file\n");
err = -errno;
goto out_err;
}
mmaps[map_idx] = buf;
map_idx = (map_idx + 1) & (ARRAY_SIZE(mmaps) - 1);
file_pos = file_offset + head;
more:
event = fetch_mmaped_event(session, head, mmap_size, buf);
--> returned event is NULL
if (!event) {
if (mmaps[map_idx]) {
munmap(mmaps[map_idx], mmap_size);
mmaps[map_idx] = NULL;
}
page_offset = page_size * (head / page_size);
file_offset += page_offset;
head -= page_offset;
--> head is 288 which means the new page_offset is 0 and the new
file_offset is 0. head never changes. and then we go back to remap.
goto remap;
}
So, if you want to handle the looping then seeing that page_offset new
in the above is 0 would suffice. A 0 value means file_offset does not
change and the jump to remap means the mmap does not change. ie., in a
loop where no values are changing.
David
On Mon, May 06, 2013 at 12:24:23PM -0600, David Ahern wrote:
> perf data files cannot be processed until the header file is update
> which is done via an on_exit handler. If perf is killed due to a SIGTERM
> it does not run the on_exit hooks leaving the perf.data file in a
> random state which perf-report will happily spin on trying to read. As
> noted by Mike an easy reproducer is:
> perf record -a -g & sleep 1; killall perf
>
> Fix by catching SIGTERM like it does SIGINT. Also need to remove the
> kill which was added via commit f7b7c26e.
>
> Signed-off-by: David Ahern <[email protected]>
> Cc: Mike Galbraith <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Jiri Olsa <[email protected]>
> Cc: Namhyung Kim <[email protected]>
> Cc: Stephane Eranian <[email protected]>
> ---
> tools/perf/builtin-record.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index cdf58ec..fff985c 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -198,7 +198,6 @@ static void perf_record__sig_exit(int exit_status __maybe_unused, void *arg)
> return;
>
> signal(signr, SIG_DFL);
> - kill(getpid(), signr);
> }
>
> static bool perf_evlist__equal(struct perf_evlist *evlist,
> @@ -404,6 +403,7 @@ static int __cmd_record(struct perf_record *rec, int argc, const char **argv)
> signal(SIGCHLD, sig_handler);
> signal(SIGINT, sig_handler);
> signal(SIGUSR1, sig_handler);
> + signal(SIGTERM, sig_handler);
>
> if (!output_name) {
> if (!fstat(STDOUT_FILENO, &st) && S_ISFIFO(st.st_mode))
> --
> 1.7.10.1
>
hi,
got a perf test hanging on me on latest acme's perf/core and bisected
it to this one. I separated reproducer so far but probably wont get to
it this week:
[jolsa@krava2 perf]$ cat /proc/sys/kernel/perf_event_paranoid
1
[jolsa@krava2 perf]$ ./perf record -C 0 kill
Error:
You may not have permission to collect %sstats.
Consider tweaking /proc/sys/kernel/perf_event_paranoid:
-1 - Not paranoid at all
0 - Disallow raw tracepoint access for unpriv
1 - Disallow cpu events for unpriv
2 - Disallow kernel profiling for unpriv
jirka
On 5/24/13 3:08 AM, Jiri Olsa wrote:
> On Mon, May 06, 2013 at 12:24:23PM -0600, David Ahern wrote:
>> perf data files cannot be processed until the header file is update
>> which is done via an on_exit handler. If perf is killed due to a SIGTERM
>> it does not run the on_exit hooks leaving the perf.data file in a
>> random state which perf-report will happily spin on trying to read. As
>> noted by Mike an easy reproducer is:
>> perf record -a -g & sleep 1; killall perf
>>
>> Fix by catching SIGTERM like it does SIGINT. Also need to remove the
>> kill which was added via commit f7b7c26e.
>>
>> Signed-off-by: David Ahern <[email protected]>
>> Cc: Mike Galbraith <[email protected]>
>> Cc: Arnaldo Carvalho de Melo <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>> Cc: Frederic Weisbecker <[email protected]>
>> Cc: Peter Zijlstra <[email protected]>
>> Cc: Jiri Olsa <[email protected]>
>> Cc: Namhyung Kim <[email protected]>
>> Cc: Stephane Eranian <[email protected]>
>> ---
>> tools/perf/builtin-record.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> index cdf58ec..fff985c 100644
>> --- a/tools/perf/builtin-record.c
>> +++ b/tools/perf/builtin-record.c
>> @@ -198,7 +198,6 @@ static void perf_record__sig_exit(int exit_status __maybe_unused, void *arg)
>> return;
>>
>> signal(signr, SIG_DFL);
>> - kill(getpid(), signr);
>> }
>>
>> static bool perf_evlist__equal(struct perf_evlist *evlist,
>> @@ -404,6 +403,7 @@ static int __cmd_record(struct perf_record *rec, int argc, const char **argv)
>> signal(SIGCHLD, sig_handler);
>> signal(SIGINT, sig_handler);
>> signal(SIGUSR1, sig_handler);
>> + signal(SIGTERM, sig_handler);
>>
>> if (!output_name) {
>> if (!fstat(STDOUT_FILENO, &st) && S_ISFIFO(st.st_mode))
>> --
>> 1.7.10.1
>>
>
> hi,
> got a perf test hanging on me on latest acme's perf/core and bisected
> it to this one. I separated reproducer so far but probably wont get to
> it this week:
Ok, I should have some time to look into it this weekend.
David
>
> [jolsa@krava2 perf]$ cat /proc/sys/kernel/perf_event_paranoid
> 1
> [jolsa@krava2 perf]$ ./perf record -C 0 kill
> Error:
> You may not have permission to collect %sstats.
> Consider tweaking /proc/sys/kernel/perf_event_paranoid:
> -1 - Not paranoid at all
> 0 - Disallow raw tracepoint access for unpriv
> 1 - Disallow cpu events for unpriv
> 2 - Disallow kernel profiling for unpriv
>
> jirka
>
Commit-ID: 804f7ac78803ed095bb0402d540f859ecb1be9f1
Gitweb: http://git.kernel.org/tip/804f7ac78803ed095bb0402d540f859ecb1be9f1
Author: David Ahern <[email protected]>
AuthorDate: Mon, 6 May 2013 12:24:23 -0600
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Tue, 28 May 2013 16:23:58 +0300
perf record: handle death by SIGTERM
Perf data files cannot be processed until the header is updated which is
done via an on_exit handler.
If perf is killed due to a SIGTERM it does not run the on_exit hooks
leaving the perf.data file in a random state which perf-report will
happily spin on trying to read.
As noted by Mike an easy reproducer is:
perf record -a -g & sleep 1; killall perf
Fix by catching SIGTERM like it does SIGINT.
Also need to remove the kill which was added via commit f7b7c26e.
Acked-by: Stephane Eranian <[email protected]>
Signed-off-by: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/builtin-record.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index cdf58ec..fff985c 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -198,7 +198,6 @@ static void perf_record__sig_exit(int exit_status __maybe_unused, void *arg)
return;
signal(signr, SIG_DFL);
- kill(getpid(), signr);
}
static bool perf_evlist__equal(struct perf_evlist *evlist,
@@ -404,6 +403,7 @@ static int __cmd_record(struct perf_record *rec, int argc, const char **argv)
signal(SIGCHLD, sig_handler);
signal(SIGINT, sig_handler);
signal(SIGUSR1, sig_handler);
+ signal(SIGTERM, sig_handler);
if (!output_name) {
if (!fstat(STDOUT_FILENO, &st) && S_ISFIFO(st.st_mode))