2012-02-22 01:16:58

by Josh Boyer

[permalink] [raw]
Subject: Large slowdown with 'x86: Avoid invoking RCU when CPU is idle'

Hi Paul,

Over in Fedora land, I applied your patch from this thread:

https://lkml.org/lkml/2012/1/24/441

to our 3.3-rc3/rc4 based rawhide kernels. The intention was to solve an
RCU issue that was very similar to what Eric originally reported, and
the RCU splat did indeed go away[1].

However, we then got a few reports of kernels containing that patch
being extremely slow. When the patch was dropped, the slowness goes
away according to one reporter. The details can be found in this bug:

https://bugzilla.redhat.com/show_bug.cgi?id=795050

The slowness doesn't seem to hit everyone, and in my local testing
things seem to be working just fine. The reporters have widely varying
hardware as well, so it doesn't seem machine specific.

Perhaps I misdiagnosed the original issue, or perhaps I missed something
else that needs to be applied prior to this but I thought I would point
this out in case you had any ideas.

josh

[1] https://bugzilla.redhat.com/show_bug.cgi?id=789641


2012-02-22 01:33:00

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Large slowdown with 'x86: Avoid invoking RCU when CPU is idle'

On Tue, Feb 21, 2012 at 08:16:53PM -0500, Josh Boyer wrote:
> Hi Paul,
>
> Over in Fedora land, I applied your patch from this thread:
>
> https://lkml.org/lkml/2012/1/24/441
>
> to our 3.3-rc3/rc4 based rawhide kernels. The intention was to solve an
> RCU issue that was very similar to what Eric originally reported, and
> the RCU splat did indeed go away[1].
>
> However, we then got a few reports of kernels containing that patch
> being extremely slow. When the patch was dropped, the slowness goes
> away according to one reporter. The details can be found in this bug:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=795050
>
> The slowness doesn't seem to hit everyone, and in my local testing
> things seem to be working just fine. The reporters have widely varying
> hardware as well, so it doesn't seem machine specific.
>
> Perhaps I misdiagnosed the original issue, or perhaps I missed something
> else that needs to be applied prior to this but I thought I would point
> this out in case you had any ideas.

This patch has been obsoleted by patches #45-47 in this series:

https://lkml.org/lkml/2012/2/3/459

And patch #47 in that series has been obsoleted by another series
from Steven Rostedt:

https://lkml.org/lkml/2012/2/7/231

Hopefully these fix both splats and slowness.

Thanx, Paul

> josh
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=789641
>

2012-02-22 01:42:48

by Josh Boyer

[permalink] [raw]
Subject: Re: Large slowdown with 'x86: Avoid invoking RCU when CPU is idle'

On Tue, Feb 21, 2012 at 05:32:52PM -0800, Paul E. McKenney wrote:
> On Tue, Feb 21, 2012 at 08:16:53PM -0500, Josh Boyer wrote:
> > Hi Paul,
> >
> > Over in Fedora land, I applied your patch from this thread:
> >
> > https://lkml.org/lkml/2012/1/24/441
> >
> > to our 3.3-rc3/rc4 based rawhide kernels. The intention was to solve an
> > RCU issue that was very similar to what Eric originally reported, and
> > the RCU splat did indeed go away[1].
> >
> > However, we then got a few reports of kernels containing that patch
> > being extremely slow. When the patch was dropped, the slowness goes
> > away according to one reporter. The details can be found in this bug:
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=795050
> >
> > The slowness doesn't seem to hit everyone, and in my local testing
> > things seem to be working just fine. The reporters have widely varying
> > hardware as well, so it doesn't seem machine specific.
> >
> > Perhaps I misdiagnosed the original issue, or perhaps I missed something
> > else that needs to be applied prior to this but I thought I would point
> > this out in case you had any ideas.

First off, thanks for the quick reply!

> This patch has been obsoleted by patches #45-47 in this series:
>
> https://lkml.org/lkml/2012/2/3/459

Holy lots of patches...

> And patch #47 in that series has been obsoleted by another series
> from Steven Rostedt:
>
> https://lkml.org/lkml/2012/2/7/231

Ok.

> Hopefully these fix both splats and slowness.

So again, I'm slightly confused on how RCU patches flow. Eric
originally reported the bug for which you created the patch I applied
against 3.3. The giant patch series above seems queued for 3.4.

I don't see stable CC'd on 45-47, nor any of Steven's patches. I doubt
I'd want to go applying the 47-patch series on 3.3 at the moment, and
given you have these marked for 3.4 I don't think you do either.
However, is there some kind of fix for the original bug report against
3.3?

josh

2012-02-24 14:40:37

by Josh Boyer

[permalink] [raw]
Subject: Re: Large slowdown with 'x86: Avoid invoking RCU when CPU is idle'

On Tue, Feb 21, 2012 at 08:42:43PM -0500, Josh Boyer wrote:
> > And patch #47 in that series has been obsoleted by another series
> > from Steven Rostedt:
> >
> > https://lkml.org/lkml/2012/2/7/231
>
> Ok.
>
> > Hopefully these fix both splats and slowness.
>
> So again, I'm slightly confused on how RCU patches flow. Eric
> originally reported the bug for which you created the patch I applied
> against 3.3. The giant patch series above seems queued for 3.4.
>
> I don't see stable CC'd on 45-47, nor any of Steven's patches. I doubt
> I'd want to go applying the 47-patch series on 3.3 at the moment, and
> given you have these marked for 3.4 I don't think you do either.
> However, is there some kind of fix for the original bug report against
> 3.3?

I was being sincere when I asked the above questions. Could you
describe how you handle RCU patches across releases and if there is a
fix for the 3.3-rcX issue Eric reported that is going into 3.3?

I know you're quite busy, but I'd like to understand your thinking so I
know what to expect going forward.

josh

2012-02-24 16:18:29

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Large slowdown with 'x86: Avoid invoking RCU when CPU is idle'

On Fri, Feb 24, 2012 at 09:40:32AM -0500, Josh Boyer wrote:
> On Tue, Feb 21, 2012 at 08:42:43PM -0500, Josh Boyer wrote:
> > > And patch #47 in that series has been obsoleted by another series
> > > from Steven Rostedt:
> > >
> > > https://lkml.org/lkml/2012/2/7/231
> >
> > Ok.
> >
> > > Hopefully these fix both splats and slowness.
> >
> > So again, I'm slightly confused on how RCU patches flow. Eric
> > originally reported the bug for which you created the patch I applied
> > against 3.3. The giant patch series above seems queued for 3.4.
> >
> > I don't see stable CC'd on 45-47, nor any of Steven's patches. I doubt
> > I'd want to go applying the 47-patch series on 3.3 at the moment, and
> > given you have these marked for 3.4 I don't think you do either.
> > However, is there some kind of fix for the original bug report against
> > 3.3?
>
> I was being sincere when I asked the above questions. Could you
> describe how you handle RCU patches across releases and if there is a
> fix for the 3.3-rcX issue Eric reported that is going into 3.3?
>
> I know you're quite busy, but I'd like to understand your thinking so I
> know what to expect going forward.

Apologies for being slow, but could you please point me at the original
bug report that the old patch was designed to fix? My email filing
seems to have failed me in this case.

My guess is that the best short-term fix for Fedora is to disable the
warning, but I do need to see the original bug to work out if that really
is a prudent course of action.

Thanx, Paul

2012-02-24 16:27:51

by Josh Boyer

[permalink] [raw]
Subject: Re: Large slowdown with 'x86: Avoid invoking RCU when CPU is idle'

On Fri, Feb 24, 2012 at 08:17:43AM -0800, Paul E. McKenney wrote:
> On Fri, Feb 24, 2012 at 09:40:32AM -0500, Josh Boyer wrote:
> > On Tue, Feb 21, 2012 at 08:42:43PM -0500, Josh Boyer wrote:
> > > > And patch #47 in that series has been obsoleted by another series
> > > > from Steven Rostedt:
> > > >
> > > > https://lkml.org/lkml/2012/2/7/231
> > >
> > > Ok.
> > >
> > > > Hopefully these fix both splats and slowness.
> > >
> > > So again, I'm slightly confused on how RCU patches flow. Eric
> > > originally reported the bug for which you created the patch I applied
> > > against 3.3. The giant patch series above seems queued for 3.4.
> > >
> > > I don't see stable CC'd on 45-47, nor any of Steven's patches. I doubt
> > > I'd want to go applying the 47-patch series on 3.3 at the moment, and
> > > given you have these marked for 3.4 I don't think you do either.
> > > However, is there some kind of fix for the original bug report against
> > > 3.3?
> >
> > I was being sincere when I asked the above questions. Could you
> > describe how you handle RCU patches across releases and if there is a
> > fix for the 3.3-rcX issue Eric reported that is going into 3.3?
> >
> > I know you're quite busy, but I'd like to understand your thinking so I
> > know what to expect going forward.
>
> Apologies for being slow, but could you please point me at the original
> bug report that the old patch was designed to fix? My email filing
> seems to have failed me in this case.

Same thread I linked in my original email:

https://lkml.org/lkml/2012/1/24/203

> My guess is that the best short-term fix for Fedora is to disable the
> warning, but I do need to see the original bug to work out if that really
> is a prudent course of action.

Honestly, I don't care from a Fedora perspective. I can do what I need
to do there without too much trouble. I'm asking because afaik, upstream
still has this problem. The thread gets a bit curvy but from what I can
tell it resulted in the patch I highlighted as having issues. Maybe I
overlooked something else that fixed Eric's problem?

josh

2012-02-24 16:51:57

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Large slowdown with 'x86: Avoid invoking RCU when CPU is idle'

On Fri, Feb 24, 2012 at 11:27:46AM -0500, Josh Boyer wrote:
> On Fri, Feb 24, 2012 at 08:17:43AM -0800, Paul E. McKenney wrote:
> > On Fri, Feb 24, 2012 at 09:40:32AM -0500, Josh Boyer wrote:
> > > On Tue, Feb 21, 2012 at 08:42:43PM -0500, Josh Boyer wrote:
> > > > > And patch #47 in that series has been obsoleted by another series
> > > > > from Steven Rostedt:
> > > > >
> > > > > https://lkml.org/lkml/2012/2/7/231
> > > >
> > > > Ok.
> > > >
> > > > > Hopefully these fix both splats and slowness.
> > > >
> > > > So again, I'm slightly confused on how RCU patches flow. Eric
> > > > originally reported the bug for which you created the patch I applied
> > > > against 3.3. The giant patch series above seems queued for 3.4.
> > > >
> > > > I don't see stable CC'd on 45-47, nor any of Steven's patches. I doubt
> > > > I'd want to go applying the 47-patch series on 3.3 at the moment, and
> > > > given you have these marked for 3.4 I don't think you do either.
> > > > However, is there some kind of fix for the original bug report against
> > > > 3.3?
> > >
> > > I was being sincere when I asked the above questions. Could you
> > > describe how you handle RCU patches across releases and if there is a
> > > fix for the 3.3-rcX issue Eric reported that is going into 3.3?
> > >
> > > I know you're quite busy, but I'd like to understand your thinking so I
> > > know what to expect going forward.
> >
> > Apologies for being slow, but could you please point me at the original
> > bug report that the old patch was designed to fix? My email filing
> > seems to have failed me in this case.
>
> Same thread I linked in my original email:
>
> https://lkml.org/lkml/2012/1/24/203

Thank you!

> > My guess is that the best short-term fix for Fedora is to disable the
> > warning, but I do need to see the original bug to work out if that really
> > is a prudent course of action.
>
> Honestly, I don't care from a Fedora perspective. I can do what I need
> to do there without too much trouble. I'm asking because afaik, upstream
> still has this problem. The thread gets a bit curvy but from what I can
> tell it resulted in the patch I highlighted as having issues. Maybe I
> overlooked something else that fixed Eric's problem?

"A bit curvy" is right -- which is why the fixes ended up at the end of
my large patch series for 3.4.

But after looking this over, Steven Rostedt's three-patch set should
suffice:

https://lkml.org/lkml/2012/2/7/231

The reason that mine are not needed is that the problematic code is
called -only- from idle, not from process context, and also that the
problematic code is tracing. My patch #45 is required for code that is
called from both process context and from idle. My patch #46 is required
for non-tracing uses of RCU from within the idle loop -- along with TBD
patches to wrap those uses of RCU in the RCU_NONIDLE() macro.

So again, in your particular case of x86's power-tracing features,
Steven Rostedt's three-patch series called out above should be all
that you need.

I have CCed Steven in case there is some prerequisite to his patch set.

Thanx, Paul

2012-02-24 17:20:48

by Steven Rostedt

[permalink] [raw]
Subject: Re: Large slowdown with 'x86: Avoid invoking RCU when CPU is idle'

On Fri, 2012-02-24 at 08:51 -0800, Paul E. McKenney wrote:

> But after looking this over, Steven Rostedt's three-patch set should
> suffice:
>
> https://lkml.org/lkml/2012/2/7/231
>
> The reason that mine are not needed is that the problematic code is
> called -only- from idle, not from process context, and also that the
> problematic code is tracing. My patch #45 is required for code that is
> called from both process context and from idle. My patch #46 is required
> for non-tracing uses of RCU from within the idle loop -- along with TBD
> patches to wrap those uses of RCU in the RCU_NONIDLE() macro.
>
> So again, in your particular case of x86's power-tracing features,
> Steven Rostedt's three-patch series called out above should be all
> that you need.
>
> I have CCed Steven in case there is some prerequisite to his patch set.

The above link is the RFC, it probably still suffices, but the patches
that are going into mainline are here:

https://lkml.org/lkml/2012/2/13/530
https://lkml.org/lkml/2012/2/13/525
https://lkml.org/lkml/2012/2/13/524

The above is the order to apply them, (patch 3, 4, and 5) even though
they arrived to LKML out of order.

-- Steve