2018-09-07 18:23:43

by Vaibhav Nagarnaik

[permalink] [raw]
Subject: [PATCH] ring-buffer: Allow for rescheduling when removing pages

When reducing ring buffer size, pages are removed by scheduling a work
item on each CPU for the corresponding CPU ring buffer. After the pages
are removed from ring buffer linked list, the pages are free()d in a
tight loop. The loop does not give up CPU until all pages are removed.
In a worst case behavior, when lot of pages are to be freed, it can
cause system stall.

After the pages are removed from the list, the free() can happen while
the work is rescheduled. Add a check for need_sched() within the loop
to prevent the system hangup.

Reported-by: Jason Behmer <[email protected]>
Signed-off-by: Vaibhav Nagarnaik <[email protected]>
---
kernel/trace/ring_buffer.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 1d92d4a982fd..bc1789df7c53 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1546,6 +1546,9 @@ rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned long nr_pages)
tmp_iter_page = first_page;

do {
+ if (need_resched())
+ schedule();
+
to_remove_page = tmp_iter_page;
rb_inc_page(cpu_buffer, &tmp_iter_page);

--
2.19.0.rc2.392.g5ba43deb5a-goog



2018-09-07 18:33:07

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] ring-buffer: Allow for rescheduling when removing pages

On Fri, 7 Sep 2018 11:21:31 -0700
Vaibhav Nagarnaik <[email protected]> wrote:

> When reducing ring buffer size, pages are removed by scheduling a work
> item on each CPU for the corresponding CPU ring buffer. After the pages
> are removed from ring buffer linked list, the pages are free()d in a
> tight loop. The loop does not give up CPU until all pages are removed.
> In a worst case behavior, when lot of pages are to be freed, it can
> cause system stall.
>
> After the pages are removed from the list, the free() can happen while
> the work is rescheduled. Add a check for need_sched() within the loop
> to prevent the system hangup.
>
> Reported-by: Jason Behmer <[email protected]>
> Signed-off-by: Vaibhav Nagarnaik <[email protected]>
> ---
> kernel/trace/ring_buffer.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 1d92d4a982fd..bc1789df7c53 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -1546,6 +1546,9 @@ rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned long nr_pages)
> tmp_iter_page = first_page;
>
> do {
> + if (need_resched())
> + schedule();
> +

Hi, thanks for the patch, but the proper way to do this is to stick in:

cond_resched();

And that should solve it for you. Want to send in another patch?

-- Steve

> to_remove_page = tmp_iter_page;
> rb_inc_page(cpu_buffer, &tmp_iter_page);
>


2018-09-07 19:31:41

by Vaibhav Nagarnaik

[permalink] [raw]
Subject: Re: [PATCH] ring-buffer: Allow for rescheduling when removing pages

On Fri, Sep 7, 2018 at 11:30 AM Steven Rostedt <[email protected]> wrote:
>
> On Fri, 7 Sep 2018 11:21:31 -0700
> Vaibhav Nagarnaik <[email protected]> wrote:
>
> > When reducing ring buffer size, pages are removed by scheduling a work
> > item on each CPU for the corresponding CPU ring buffer. After the pages
> > are removed from ring buffer linked list, the pages are free()d in a
> > tight loop. The loop does not give up CPU until all pages are removed.
> > In a worst case behavior, when lot of pages are to be freed, it can
> > cause system stall.
> >
> > After the pages are removed from the list, the free() can happen while
> > the work is rescheduled. Add a check for need_sched() within the loop
> > to prevent the system hangup.
> >
> > Reported-by: Jason Behmer <[email protected]>
> > Signed-off-by: Vaibhav Nagarnaik <[email protected]>
> > ---
> > kernel/trace/ring_buffer.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> > index 1d92d4a982fd..bc1789df7c53 100644
> > --- a/kernel/trace/ring_buffer.c
> > +++ b/kernel/trace/ring_buffer.c
> > @@ -1546,6 +1546,9 @@ rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned long nr_pages)
> > tmp_iter_page = first_page;
> >
> > do {
> > + if (need_resched())
> > + schedule();
> > +
>
> Hi, thanks for the patch, but the proper way to do this is to stick in:
>
> cond_resched();
>
> And that should solve it for you. Want to send in another patch?

Sounds good. Let me update the patch. Testing it first though.

Vaibhav

> -- Steve
>
> > to_remove_page = tmp_iter_page;
> > rb_inc_page(cpu_buffer, &tmp_iter_page);
> >
>


Attachments:
smime.p7s (4.74 kB)
S/MIME Cryptographic Signature

2018-09-07 22:33:38

by Vaibhav Nagarnaik

[permalink] [raw]
Subject: [PATCH] ring-buffer: Allow for rescheduling when removing pages

When reducing ring buffer size, pages are removed by scheduling a work
item on each CPU for the corresponding CPU ring buffer. After the pages
are removed from ring buffer linked list, the pages are free()d in a
tight loop. The loop does not give up CPU until all pages are removed.
In a worst case behavior, when lot of pages are to be freed, it can
cause system stall.

After the pages are removed from the list, the free() can happen while
the work is rescheduled. Call cond_resched() in the loop to prevent the
system hangup.

Reported-by: Jason Behmer <[email protected]>
Signed-off-by: Vaibhav Nagarnaik <[email protected]>
---
kernel/trace/ring_buffer.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 1d92d4a982fd..65bd4616220d 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1546,6 +1546,8 @@ rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned long nr_pages)
tmp_iter_page = first_page;

do {
+ cond_resched();
+
to_remove_page = tmp_iter_page;
rb_inc_page(cpu_buffer, &tmp_iter_page);

--
2.19.0.rc2.392.g5ba43deb5a-goog


2018-09-17 20:53:58

by Vaibhav Nagarnaik

[permalink] [raw]
Subject: Re: [PATCH] ring-buffer: Allow for rescheduling when removing pages

Hi Steven,

Does the patch look good? Can this be picked up in the next rc?

Vaibhav

On Fri, Sep 7, 2018 at 3:31 PM Vaibhav Nagarnaik <[email protected]> wrote:
>
> When reducing ring buffer size, pages are removed by scheduling a work
> item on each CPU for the corresponding CPU ring buffer. After the pages
> are removed from ring buffer linked list, the pages are free()d in a
> tight loop. The loop does not give up CPU until all pages are removed.
> In a worst case behavior, when lot of pages are to be freed, it can
> cause system stall.
>
> After the pages are removed from the list, the free() can happen while
> the work is rescheduled. Call cond_resched() in the loop to prevent the
> system hangup.
>
> Reported-by: Jason Behmer <[email protected]>
> Signed-off-by: Vaibhav Nagarnaik <[email protected]>
> ---
> kernel/trace/ring_buffer.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 1d92d4a982fd..65bd4616220d 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -1546,6 +1546,8 @@ rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned long nr_pages)
> tmp_iter_page = first_page;
>
> do {
> + cond_resched();
> +
> to_remove_page = tmp_iter_page;
> rb_inc_page(cpu_buffer, &tmp_iter_page);
>
> --
> 2.19.0.rc2.392.g5ba43deb5a-goog
>


Attachments:
smime.p7s (4.74 kB)
S/MIME Cryptographic Signature

2018-09-17 21:01:49

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] ring-buffer: Allow for rescheduling when removing pages

On Mon, 17 Sep 2018 13:53:05 -0700
Vaibhav Nagarnaik <[email protected]> wrote:

> Hi Steven,
>
> Does the patch look good? Can this be picked up in the next rc?
>

Yes it's fine. I can pick it up. Does it need to be marked for stable?

Thanks!

-- Steve

2018-09-17 21:04:24

by Vaibhav Nagarnaik

[permalink] [raw]
Subject: Re: [PATCH] ring-buffer: Allow for rescheduling when removing pages

On Mon, Sep 17, 2018 at 2:01 PM Steven Rostedt <[email protected]> wrote:
>
> On Mon, 17 Sep 2018 13:53:05 -0700
> Vaibhav Nagarnaik <[email protected]> wrote:
>
> > Hi Steven,
> >
> > Does the patch look good? Can this be picked up in the next rc?
> >
>
> Yes it's fine. I can pick it up. Does it need to be marked for stable?

Thanks.

It'd be great to be marked for stable, since older kernels are also
affected. But I can't judge if it crosses the threshold as a bug fix.

Vaibhav

> Thanks!
>
> -- Steve


Attachments:
smime.p7s (4.74 kB)
S/MIME Cryptographic Signature

2018-09-17 21:37:09

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] ring-buffer: Allow for rescheduling when removing pages

On Mon, 17 Sep 2018 14:02:59 -0700
Vaibhav Nagarnaik <[email protected]> wrote:

> On Mon, Sep 17, 2018 at 2:01 PM Steven Rostedt <[email protected]> wrote:
> >
> > On Mon, 17 Sep 2018 13:53:05 -0700
> > Vaibhav Nagarnaik <[email protected]> wrote:
> >
> > > Hi Steven,
> > >
> > > Does the patch look good? Can this be picked up in the next rc?
> > >
> >
> > Yes it's fine. I can pick it up. Does it need to be marked for stable?
>
> Thanks.
>
> It'd be great to be marked for stable, since older kernels are also
> affected. But I can't judge if it crosses the threshold as a bug fix.
>

When backporting to stable I use two (admittedly subjective) metrics.
One is the severity of the bug (this is low), the other is the risk of
adding it (this is even lower). If the severity divided by the risk is
high enough, I tag it for stable. Because the risk is so low, I have no
problems with tagging it.

Note, it will run through lots of tests before it gets out the door.

-- Steve

2018-09-17 21:43:16

by Vaibhav Nagarnaik

[permalink] [raw]
Subject: Re: [PATCH] ring-buffer: Allow for rescheduling when removing pages

On Mon, Sep 17, 2018 at 2:36 PM Steven Rostedt <[email protected]> wrote:
>
> On Mon, 17 Sep 2018 14:02:59 -0700
> Vaibhav Nagarnaik <[email protected]> wrote:
>
> > On Mon, Sep 17, 2018 at 2:01 PM Steven Rostedt <[email protected]> wrote:
> > >
> > > On Mon, 17 Sep 2018 13:53:05 -0700
> > > Vaibhav Nagarnaik <[email protected]> wrote:
> > >
> > > > Hi Steven,
> > > >
> > > > Does the patch look good? Can this be picked up in the next rc?
> > > >
> > >
> > > Yes it's fine. I can pick it up. Does it need to be marked for stable?
> >
> > Thanks.
> >
> > It'd be great to be marked for stable, since older kernels are also
> > affected. But I can't judge if it crosses the threshold as a bug fix.
> >
>
> When backporting to stable I use two (admittedly subjective) metrics.
> One is the severity of the bug (this is low), the other is the risk of
> adding it (this is even lower). If the severity divided by the risk is
> high enough, I tag it for stable. Because the risk is so low, I have no
> problems with tagging it.
>
> Note, it will run through lots of tests before it gets out the door.

I think it's low risk as well. Let's mark it for stable.

Thanks

Vaibhav

> -- Steve


Attachments:
smime.p7s (4.74 kB)
S/MIME Cryptographic Signature