LinuxLists.cc - [tracing] 06e0a548ba: WARNING:at_kernel/trace/ring_buffer.c:#ring_buffer_iter

2020-04-29 09:17:12

Subject: [tracing] 06e0a548ba: WARNING:at_kernel/trace/ring_buffer.c:#ring_buffer_iter_peek

Greeting,

FYI, we noticed the following commit (built with gcc-7):

commit: 06e0a548bad0f43a21e036db018e4dadb501ce8b ("tracing: Do not disable tracing when reading the trace file")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: kernel-selftests
with following parameters:

group: kselftests-01
ucode: 0x500002c

test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
test-url: https://www.kernel.org/doc/Documentation/kselftest.txt

on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):

If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>

kern :warn : [ 886.763510] WARNING: CPU: 70 PID: 22584 at kernel/trace/ring_buffer.c:4067 ring_buffer_iter_peek+0x13c/0x1d0
kern :warn : [ 886.776216] Modules linked in: test_firmware intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm btrfs x86_pkg_temp_thermal intel_powerclamp blake2b_generic xor zstd_decompress zstd_compress coretemp ast kvm_intel kvm raid6_pq drm_vram_helper drm_ttm_helper libcrc32c irqbypass ttm crct10dif_pclmul crc32_pclmul drm_kms_helper snd_pcm crc32c_intel ghash_clmulni_intel ahci snd_timer syscopyarea libahci aesni_intel ipmi_ssif sysfillrect snd crypto_simd sysimgblt cryptd nvme fb_sys_fops mei_me soundcore ipmi_si glue_helper pcspkr drm nvme_core ioatdma ipmi_devintf t10_pi libata mei i2c_i801 joydev lpc_ich dca ipmi_msghandler wmi acpi_pad acpi_power_meter ip_tables
kern :warn : [ 886.838112] CPU: 70 PID: 22584 Comm: cat Not tainted 5.6.0-rc4-00017-g06e0a548bad0f #1
kern :warn : [ 886.846573] RIP: 0010:ring_buffer_iter_peek+0x13c/0x1d0
kern :warn : [ 886.852355] Code: 41 5e 41 5f c3 83 38 1d 0f 85 98 00 00 00 48 89 df e8 78 cb ff ff e9 32 ff ff ff 80 3c 24 00 75 c8 49 8b 44 24 10 f0 ff 40 08 <0f> 0b 4c 89 f6 4c 89 ef e8 37 8c 8f 00 48 83 c4 08 31 c0 5b 5d 41
kern :warn : [ 886.872239] RSP: 0018:ffffc900213e7d78 EFLAGS: 00010002
kern :warn : [ 886.878030] RAX: ffff889883ea7740 RBX: ffff88a3fbd93720 RCX: ffff88a42101b700
kern :warn : [ 886.885731] RDX: ffff88a415c02e00 RSI: ffff88a42101b440 RDI: ffff88a3fbd93720
kern :warn : [ 886.893432] RBP: 0000000000000000 R08: 000000ce959ae6ab R09: ffff88a42101bb80
kern :warn : [ 886.901129] R10: ffff88a3e99e2300 R11: 0000000000000178 R12: ffff88a415c02e00
kern :warn : [ 886.908831] R13: ffff88a415c02e18 R14: 0000000000000296 R15: ffffc900213e7df8
kern :warn : [ 886.916521] FS: 00007f12d51d2640(0000) GS:ffff88a44f780000(0000) knlGS:0000000000000000
kern :warn : [ 886.925179] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kern :warn : [ 886.931495] CR2: 00007f12d4e36000 CR3: 00000023e901e006 CR4: 00000000007606e0
kern :warn : [ 886.939194] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kern :warn : [ 886.946896] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kern :warn : [ 886.954588] PKRU: 55555554
kern :warn : [ 886.957858] Call Trace:
kern :warn : [ 886.960866] peek_next_entry+0x22/0x70
kern :warn : [ 886.965165] __find_next_entry+0xaf/0x180
kern :warn : [ 886.969714] trace_find_next_entry_inc+0x1e/0x80
kern :warn : [ 886.974863] s_next+0x49/0x70
kern :warn : [ 886.978355] seq_read+0x23f/0x400
kern :warn : [ 886.982193] vfs_read+0x9b/0x160
kern :warn : [ 886.985937] ksys_read+0xa1/0xe0
kern :warn : [ 886.989677] do_syscall_64+0x5b/0x1f0
kern :warn : [ 886.993853] entry_SYSCALL_64_after_hwframe+0x44/0xa9
kern :warn : [ 886.999408] RIP: 0033:0x7f12d50f559e
kern :warn : [ 887.003479] Code: c0 e9 c6 fe ff ff 50 48 8d 3d 36 fc 09 00 e8 e9 e7 01 00 66 0f 1f 84 00 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28
kern :warn : [ 887.023260] RSP: 002b:00007ffdab31acd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
kern :warn : [ 887.031350] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f12d50f559e
kern :warn : [ 887.039006] RDX: 0000000000020000 RSI: 00007f12d4e36000 RDI: 0000000000000005
kern :warn : [ 887.046657] RBP: 00007f12d4e36000 R08: 00007f12d4e35010 R09: 0000000000000000
kern :warn : [ 887.054305] R10: fffffffffffffc4d R11: 0000000000000246 R12: 00007f12d4e36000
kern :warn : [ 887.061958] R13: 0000000000000005 R14: 0000000000020000 R15: 0000000000000f93
kern :warn : [ 887.069608] ---[ end trace 507492ef8332a5b4 ]---

To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml

Thanks,
lkp

Attachments:

(No filename) (4.86 kB)
config-5.6.0-rc4-00017-g06e0a548bad0f (206.10 kB)
job-script (6.30 kB)
kmsg.xz (99.27 kB)
kernel-selftests (201.79 kB)
job.yaml (5.40 kB)
reproduce (1.04 kB)
Download all attachments

2020-05-13 09:21:51

by Sven Schnelle

[permalink] [raw]

Subject: Re: [tracing] 06e0a548ba: WARNING:at_kernel/trace/ring_buffer.c:#ring_buffer_iter_peek

Hi Steve,

On Wed, Apr 29, 2020 at 05:05:09PM +0800, kernel test robot wrote:
>
> kern :warn : [ 886.763510] WARNING: CPU: 70 PID: 22584 at kernel/trace/ring_buffer.c:4067 ring_buffer_iter_peek+0x13c/0x1d0
> kern :warn : [ 886.776216] Modules linked in: test_firmware intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm btrfs x86_pkg_temp_thermal intel_powerclamp blake2b_generic xor zstd_decompress zstd_compress coretemp ast kvm_intel kvm raid6_pq drm_vram_helper drm_ttm_helper libcrc32c irqbypass ttm crct10dif_pclmul crc32_pclmul drm_kms_helper snd_pcm crc32c_intel ghash_clmulni_intel ahci snd_timer syscopyarea libahci aesni_intel ipmi_ssif sysfillrect snd crypto_simd sysimgblt cryptd nvme fb_sys_fops mei_me soundcore ipmi_si glue_helper pcspkr drm nvme_core ioatdma ipmi_devintf t10_pi libata mei i2c_i801 joydev lpc_ich dca ipmi_msghandler wmi acpi_pad acpi_power_meter ip_tables
> kern :warn : [ 886.838112] CPU: 70 PID: 22584 Comm: cat Not tainted 5.6.0-rc4-00017-g06e0a548bad0f #1
> kern :warn : [ 886.846573] RIP: 0010:ring_buffer_iter_peek+0x13c/0x1d0
> kern :warn : [ 886.852355] Code: 41 5e 41 5f c3 83 38 1d 0f 85 98 00 00 00 48 89 df e8 78 cb ff ff e9 32 ff ff ff 80 3c 24 00 75 c8 49 8b 44 24 10 f0 ff 40 08 <0f> 0b 4c 89 f6 4c 89 ef e8 37 8c 8f 00 48 83 c4 08 31 c0 5b 5d 41
> kern :warn : [ 886.872239] RSP: 0018:ffffc900213e7d78 EFLAGS: 00010002
> kern :warn : [ 886.878030] RAX: ffff889883ea7740 RBX: ffff88a3fbd93720 RCX: ffff88a42101b700
> kern :warn : [ 886.885731] RDX: ffff88a415c02e00 RSI: ffff88a42101b440 RDI: ffff88a3fbd93720
> kern :warn : [ 886.893432] RBP: 0000000000000000 R08: 000000ce959ae6ab R09: ffff88a42101bb80
> kern :warn : [ 886.901129] R10: ffff88a3e99e2300 R11: 0000000000000178 R12: ffff88a415c02e00
> kern :warn : [ 886.908831] R13: ffff88a415c02e18 R14: 0000000000000296 R15: ffffc900213e7df8
> kern :warn : [ 886.916521] FS: 00007f12d51d2640(0000) GS:ffff88a44f780000(0000) knlGS:0000000000000000
> kern :warn : [ 886.925179] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> kern :warn : [ 886.931495] CR2: 00007f12d4e36000 CR3: 00000023e901e006 CR4: 00000000007606e0
> kern :warn : [ 886.939194] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> kern :warn : [ 886.946896] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> kern :warn : [ 886.954588] PKRU: 55555554
> kern :warn : [ 886.957858] Call Trace:
> kern :warn : [ 886.960866] peek_next_entry+0x22/0x70
> kern :warn : [ 886.965165] __find_next_entry+0xaf/0x180
> kern :warn : [ 886.969714] trace_find_next_entry_inc+0x1e/0x80
> kern :warn : [ 886.974863] s_next+0x49/0x70
> kern :warn : [ 886.978355] seq_read+0x23f/0x400
> kern :warn : [ 886.982193] vfs_read+0x9b/0x160
> kern :warn : [ 886.985937] ksys_read+0xa1/0xe0
> kern :warn : [ 886.989677] do_syscall_64+0x5b/0x1f0
> kern :warn : [ 886.993853] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> kern :warn : [ 886.999408] RIP: 0033:0x7f12d50f559e
> kern :warn : [ 887.003479] Code: c0 e9 c6 fe ff ff 50 48 8d 3d 36 fc 09 00 e8 e9 e7 01 00 66 0f 1f 84 00 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28
> kern :warn : [ 887.023260] RSP: 002b:00007ffdab31acd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> kern :warn : [ 887.031350] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f12d50f559e
> kern :warn : [ 887.039006] RDX: 0000000000020000 RSI: 00007f12d4e36000 RDI: 0000000000000005
> kern :warn : [ 887.046657] RBP: 00007f12d4e36000 R08: 00007f12d4e35010 R09: 0000000000000000
> kern :warn : [ 887.054305] R10: fffffffffffffc4d R11: 0000000000000246 R12: 00007f12d4e36000
> kern :warn : [ 887.061958] R13: 0000000000000005 R14: 0000000000020000 R15: 0000000000000f93
> kern :warn : [ 887.069608] ---[ end trace 507492ef8332a5b4 ]---
>

Did you had a chance to look into this? I can easily reproduce this both on x86
and s390 by doing:

cd /sys/kernel/tracing
cat /dev/zero >/dev/null & # generate some load
echo function >current_tracer
# wait a few seconds to fill the buffer
cat trace

Usually it will print the warn after a few seconds.

I haven't digged through all the ring buffer code yet, so i thought i might ask
whether you have an idea what's going on.

Thanks
Sven

2020-05-13 13:34:45

by Steven Rostedt

[permalink] [raw]

Subject: Re: [tracing] 06e0a548ba: WARNING:at_kernel/trace/ring_buffer.c:#ring_buffer_iter_peek

On Wed, 13 May 2020 11:19:06 +0200
Sven Schnelle <[email protected]> wrote:

> Did you had a chance to look into this? I can easily reproduce this both on x86
> and s390 by doing:
>
> cd /sys/kernel/tracing
> cat /dev/zero >/dev/null & # generate some load
> echo function >current_tracer
> # wait a few seconds to fill the buffer
> cat trace
>
> Usually it will print the warn after a few seconds.
>
> I haven't digged through all the ring buffer code yet, so i thought i might ask
> whether you have an idea what's going on.

Can you send me the config for where you can reproduce it on x86?

The iterator now doesn't stop the ring buffer when it iterates, and is
doing so over a live buffer (but should be able to handle it). It's
triggering something I thought wasn't suppose to happen (which must be
happening).

Perhaps with your config I'd be able to reproduce it.

-- Steve

2020-05-13 19:34:34

by Steven Rostedt

[permalink] [raw]

Subject: Re: [tracing] 06e0a548ba: WARNING:at_kernel/trace/ring_buffer.c:#ring_buffer_iter_peek

On Wed, 13 May 2020 18:15:57 +0200
Sven Schnelle <[email protected]> wrote:

> Thanks for looking into this. I've attached my /proc/config.gz to this Mail.
> The x86 system is my Laptop which is a Thinkpad X280 with 4 HT CPUs (so 8 cpus
> in total). I've tried disabling preemption, but this didn't help.
>
> It's always this check that causes the loop:
>
> if (iter->head >= rb_page_size(iter->head_page)) {
> rb_inc_iter(iter);
> goto again;
> }
>
> On the first loop iter->head is some value > 0 and rb_page_size returns
> 0, afterwards it gets twice to this check with both values 0. The third
> time the warning is triggered. Maybe that information helps.

I figured out what was causing this, and that's just that the writer and
the iterator could end up almost "in sync" where the writer writes to each
of the pages the iterator is trying to read, and this can trigger the three
failures of "zero commits" per page.

I had a way to detect this, but then realized that it may be possible for
an active writer to possibly trigger the other failures to get an event,
that I just decided to force it to try three times, and simply return NULL
if an active writer is messing with the iterator. The iterator is a "best
effort" to read the buffer if there's an active writer. The consumer read
(trace_pipe) is made for that.

This patch should solve you issues.

(care to give a Tested-by: if it works for you?)

-- Steve

From 29b36cd00ddcae278430e70e55dd64855e9a54b3 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <[email protected]>
Date: Wed, 13 May 2020 15:18:01 -0400
Subject: [PATCH] ring-buffer: Don't deactivate the ring buffer on failed
iterator reads

If the function tracer is running and the trace file is read (which uses the
ring buffer iterator), the iterator can get in sync with the writes, and
caues it to fail to find a page with content it can read three times. This
causes a warning and deactivation of the ring buffer code.

Looking at the other cases of failure to get an event, it appears that
there's a chance that the writer could cause them too. Since the iterator is
a "best effort" to read the ring buffer if there's an active writer (the
consumer reader is made for this case "see trace_pipe"), if it fails to get
an event after three tries, simply give up and return NULL. Don't warn, nor
disable the ring buffer on this failure.

Link: https://lore.kernel.org/r/20200429090508.GG5770@shao2-debian

Reported-by: kernel test robot <[email protected]>
Fixes: ff84c50cfb4b ("ring-buffer: Do not die if rb_iter_peek() fails more than thrice")
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
---
kernel/trace/ring_buffer.c | 22 +++++++---------------
1 file changed, 7 insertions(+), 15 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 6f0b42ceeb00..448d5f528764 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -4034,7 +4034,6 @@ rb_iter_peek(struct ring_buffer_iter *iter, u64 *ts)
struct ring_buffer_per_cpu *cpu_buffer;
struct ring_buffer_event *event;
int nr_loops = 0;
- bool failed = false;

if (ts)
*ts = 0;
@@ -4056,19 +4055,14 @@ rb_iter_peek(struct ring_buffer_iter *iter, u64 *ts)
return NULL;

/*
- * We repeat when a time extend is encountered or we hit
- * the end of the page. Since the time extend is always attached
- * to a data event, we should never loop more than three times.
- * Once for going to next page, once on time extend, and
- * finally once to get the event.
- * We should never hit the following condition more than thrice,
- * unless the buffer is very small, and there's a writer
- * that is causing the reader to fail getting an event.
+ * As the writer can mess with what the iterator is trying
+ * to read, just give up if we fail to get an event after
+ * three tries. The iterator is not as reliable when reading
+ * the ring buffer with an active write as the consumer is.
+ * Do not warn if the three failures is reached.
*/
- if (++nr_loops > 3) {
- RB_WARN_ON(cpu_buffer, !failed);
+ if (++nr_loops > 3)
return NULL;
- }

if (rb_per_cpu_empty(cpu_buffer))
return NULL;
@@ -4079,10 +4073,8 @@ rb_iter_peek(struct ring_buffer_iter *iter, u64 *ts)
}

event = rb_iter_head_event(iter);
- if (!event) {
- failed = true;
+ if (!event)
goto again;
- }

switch (event->type_len) {
case RINGBUF_TYPE_PADDING:
--
2.20.1

2020-05-13 20:50:11

by Sven Schnelle

[permalink] [raw]

Subject: Re: [tracing] 06e0a548ba: WARNING:at_kernel/trace/ring_buffer.c:#ring_buffer_iter_peek

Hi Steve,

On Wed, May 13, 2020 at 09:29:22AM -0400, Steven Rostedt wrote:
> On Wed, 13 May 2020 11:19:06 +0200
> Sven Schnelle <[email protected]> wrote:
>
> > Did you had a chance to look into this? I can easily reproduce this both on x86
> > and s390 by doing:
> >
> > cd /sys/kernel/tracing
> > cat /dev/zero >/dev/null & # generate some load
> > echo function >current_tracer
> > # wait a few seconds to fill the buffer
> > cat trace
> >
> > Usually it will print the warn after a few seconds.
> >
> > I haven't digged through all the ring buffer code yet, so i thought i might ask
> > whether you have an idea what's going on.
>
> Can you send me the config for where you can reproduce it on x86?
>
> The iterator now doesn't stop the ring buffer when it iterates, and is
> doing so over a live buffer (but should be able to handle it). It's
> triggering something I thought wasn't suppose to happen (which must be
> happening).
>
> Perhaps with your config I'd be able to reproduce it.

Thanks for looking into this. I've attached my /proc/config.gz to this Mail.
The x86 system is my Laptop which is a Thinkpad X280 with 4 HT CPUs (so 8 cpus
in total). I've tried disabling preemption, but this didn't help.

It's always this check that causes the loop:

if (iter->head >= rb_page_size(iter->head_page)) {
rb_inc_iter(iter);
goto again;
}

On the first loop iter->head is some value > 0 and rb_page_size returns
0, afterwards it gets twice to this check with both values 0. The third
time the warning is triggered. Maybe that information helps.

Thanks,
Sven

Attachments:

(No filename) (1.59 kB)
config.gz (28.84 kB)
Download all attachments

2020-05-13 20:50:32

by Steven Rostedt

[permalink] [raw]

Subject: Re: [tracing] 06e0a548ba: WARNING:at_kernel/trace/ring_buffer.c:#ring_buffer_iter_peek

2020-05-14 06:14:02

by Sven Schnelle

[permalink] [raw]

Subject: Re: [tracing] 06e0a548ba: WARNING:at_kernel/trace/ring_buffer.c:#ring_buffer_iter_peek

Hi Steve,

On Wed, May 13, 2020 at 03:30:33PM -0400, Steven Rostedt wrote:
> On Wed, 13 May 2020 18:15:57 +0200
> Sven Schnelle <[email protected]> wrote:
>
> > Thanks for looking into this. I've attached my /proc/config.gz to this Mail.
> > The x86 system is my Laptop which is a Thinkpad X280 with 4 HT CPUs (so 8 cpus
> > in total). I've tried disabling preemption, but this didn't help.
> >
> > It's always this check that causes the loop:
> >
> > if (iter->head >= rb_page_size(iter->head_page)) {
> > rb_inc_iter(iter);
> > goto again;
> > }
> >
> > On the first loop iter->head is some value > 0 and rb_page_size returns
> > 0, afterwards it gets twice to this check with both values 0. The third
> > time the warning is triggered. Maybe that information helps.
>
> I figured out what was causing this, and that's just that the writer and
> the iterator could end up almost "in sync" where the writer writes to each
> of the pages the iterator is trying to read, and this can trigger the three
> failures of "zero commits" per page.
>
> I had a way to detect this, but then realized that it may be possible for
> an active writer to possibly trigger the other failures to get an event,
> that I just decided to force it to try three times, and simply return NULL
> if an active writer is messing with the iterator. The iterator is a "best
> effort" to read the buffer if there's an active writer. The consumer read
> (trace_pipe) is made for that.
>
> This patch should solve you issues.
>
> (care to give a Tested-by: if it works for you?)

Well, as there's no longer a RB_WARN_ON that indeed fixes the issue :-)

Tested-by: Sven Schnelle <[email protected]>

Thanks!

Sven