LinuxLists.cc - [BUG] 2.6.37-rc3 massive interactivity regression on ARM

2010-11-27 15:16:14

Subject: [BUG] 2.6.37-rc3 massive interactivity regression on ARM

The scenario is that I do a remote login to an ARM build server,
use screen to start a sub-shell, in that shell start a largish
compile job, detach from that screen, and from the original login
shell I occasionally monitor the compile job with top or ps or
by attaching to the screen.

With kernels 2.6.37-rc2 and -rc3 this causes the machine to become
very sluggish: top takes forever to start, once started it shows no
activity from the compile job (it's as if it's sleeping on a lock),
and ps also takes forever and shows no activity from the compile job.

Rebooting into 2.6.36 eliminates these issues.

I do pretty much the same thing (remote login -> screen -> compile job)
on other archs, but so far I've only seen the 2.6.37-rc misbehaviour
on ARM EABI, specifically on an IOP n2100. (I have access to other ARM
sub-archs, but haven't had time to test 2.6.37-rc on them yet.)

Has anyone else seen this? Any ideas about the cause?

/Mikael

2010-12-05 12:32:41

by Mikael Pettersson

[permalink] [raw]

Subject: Re: [BUG] 2.6.37-rc3 massive interactivity regression on ARM

Mikael Pettersson writes:
> The scenario is that I do a remote login to an ARM build server,
> use screen to start a sub-shell, in that shell start a largish
> compile job, detach from that screen, and from the original login
> shell I occasionally monitor the compile job with top or ps or
> by attaching to the screen.
>
> With kernels 2.6.37-rc2 and -rc3 this causes the machine to become
> very sluggish: top takes forever to start, once started it shows no
> activity from the compile job (it's as if it's sleeping on a lock),
> and ps also takes forever and shows no activity from the compile job.
>
> Rebooting into 2.6.36 eliminates these issues.
>
> I do pretty much the same thing (remote login -> screen -> compile job)
> on other archs, but so far I've only seen the 2.6.37-rc misbehaviour
> on ARM EABI, specifically on an IOP n2100. (I have access to other ARM
> sub-archs, but haven't had time to test 2.6.37-rc on them yet.)
>
> Has anyone else seen this? Any ideas about the cause?

(Re-followup since I just realised my previous followups were to Rafael's
regressions mailbot rather than the original thread.)

> The bug is still present in 2.6.37-rc4. I'm currently trying to bisect it.

git bisect identified

[305e6835e05513406fa12820e40e4a8ecb63743c] sched: Do not account irq time to current task

as the cause of this regression. Reverting it from 2.6.37-rc4 (requires some
hackery due to subsequent changes in the same area) restores sane behaviour.

The original patch submission talks about irq-heavy scenarios. My case is the
exact opposite: UP, !PREEMPT, NO_HZ, very low irq rate, essentially 100% CPU
bound in userspace but expected to schedule quickly when needed (e.g. running
top or ps or just hitting CR in one shell while another runs a compile job).

I've reproduced the misbehaviour with 2.6.37-rc4 on ARM/mach-iop32x and
ARM/mach-ixp4xx, but ARM/mach-kirkwood does not misbehave, and other archs
(x86 SMP, SPARC64 UP and SMP, PowerPC32 UP, Alpha UP) also do not misbehave.

So it looks like an ARM-only issue, possibly depending on platform specifics.

One difference I noticed between my Kirkwood machine and my ixp4xx and iop32x
machines is that even though all have CONFIG_NO_HZ=y, the timer irq rate is
much higher on Kirkwood, even when the machine is idle.

/Mikael

2010-12-05 13:17:41

by Russell King - ARM Linux

[permalink] [raw]

Subject: Re: [BUG] 2.6.37-rc3 massive interactivity regression on ARM

On Sun, Dec 05, 2010 at 01:32:37PM +0100, Mikael Pettersson wrote:
> Mikael Pettersson writes:
> > The scenario is that I do a remote login to an ARM build server,
> > use screen to start a sub-shell, in that shell start a largish
> > compile job, detach from that screen, and from the original login
> > shell I occasionally monitor the compile job with top or ps or
> > by attaching to the screen.
> >
> > With kernels 2.6.37-rc2 and -rc3 this causes the machine to become
> > very sluggish: top takes forever to start, once started it shows no
> > activity from the compile job (it's as if it's sleeping on a lock),
> > and ps also takes forever and shows no activity from the compile job.
> >
> > Rebooting into 2.6.36 eliminates these issues.
> >
> > I do pretty much the same thing (remote login -> screen -> compile job)
> > on other archs, but so far I've only seen the 2.6.37-rc misbehaviour
> > on ARM EABI, specifically on an IOP n2100. (I have access to other ARM
> > sub-archs, but haven't had time to test 2.6.37-rc on them yet.)
> >
> > Has anyone else seen this? Any ideas about the cause?
>
> (Re-followup since I just realised my previous followups were to Rafael's
> regressions mailbot rather than the original thread.)
>
> > The bug is still present in 2.6.37-rc4. I'm currently trying to bisect it.
>
> git bisect identified
>
> [305e6835e05513406fa12820e40e4a8ecb63743c] sched: Do not account irq time to current task
>
> as the cause of this regression. Reverting it from 2.6.37-rc4 (requires some
> hackery due to subsequent changes in the same area) restores sane behaviour.
>
> The original patch submission talks about irq-heavy scenarios. My case is the
> exact opposite: UP, !PREEMPT, NO_HZ, very low irq rate, essentially 100% CPU
> bound in userspace but expected to schedule quickly when needed (e.g. running
> top or ps or just hitting CR in one shell while another runs a compile job).
>
> I've reproduced the misbehaviour with 2.6.37-rc4 on ARM/mach-iop32x and
> ARM/mach-ixp4xx, but ARM/mach-kirkwood does not misbehave, and other archs
> (x86 SMP, SPARC64 UP and SMP, PowerPC32 UP, Alpha UP) also do not misbehave.
>
> So it looks like an ARM-only issue, possibly depending on platform specifics.
>
> One difference I noticed between my Kirkwood machine and my ixp4xx and iop32x
> machines is that even though all have CONFIG_NO_HZ=y, the timer irq rate is
> much higher on Kirkwood, even when the machine is idle.

The above patch you point out is fundamentally broken.

+ rq->clock = sched_clock_cpu(cpu);
+ irq_time = irq_time_cpu(cpu);
+ if (rq->clock - irq_time > rq->clock_task)
+ rq->clock_task = rq->clock - irq_time;

This means that we will only update rq->clock_task if it is smaller than
rq->clock. So, eventually over time, rq->clock_task becomes the maximum
value that rq->clock can ever be. Or in other words, the maximum value
of sched_clock_cpu().

Once that has been reached, although rq->clock will wrap back to zero,
rq->clock_task will not, and so (I think) task execution time accounting
effectively stops dead.

I guess this hasn't been noticed on x86 as they have a 64-bit sched_clock,
and so need to wait a long time for this to be noticed. However, on ARM
where we tend to have 32-bit counters feeding sched_clock(), this value
will wrap far sooner.

2010-12-05 14:19:43

by Russell King - ARM Linux

[permalink] [raw]

Subject: Re: [BUG] 2.6.37-rc3 massive interactivity regression on ARM

On Sun, Dec 05, 2010 at 01:17:02PM +0000, Russell King - ARM Linux wrote:
> On Sun, Dec 05, 2010 at 01:32:37PM +0100, Mikael Pettersson wrote:
> > Mikael Pettersson writes:
> > > The scenario is that I do a remote login to an ARM build server,
> > > use screen to start a sub-shell, in that shell start a largish
> > > compile job, detach from that screen, and from the original login
> > > shell I occasionally monitor the compile job with top or ps or
> > > by attaching to the screen.
> > >
> > > With kernels 2.6.37-rc2 and -rc3 this causes the machine to become
> > > very sluggish: top takes forever to start, once started it shows no
> > > activity from the compile job (it's as if it's sleeping on a lock),
> > > and ps also takes forever and shows no activity from the compile job.
> > >
> > > Rebooting into 2.6.36 eliminates these issues.
> > >
> > > I do pretty much the same thing (remote login -> screen -> compile job)
> > > on other archs, but so far I've only seen the 2.6.37-rc misbehaviour
> > > on ARM EABI, specifically on an IOP n2100. (I have access to other ARM
> > > sub-archs, but haven't had time to test 2.6.37-rc on them yet.)
> > >
> > > Has anyone else seen this? Any ideas about the cause?
> >
> > (Re-followup since I just realised my previous followups were to Rafael's
> > regressions mailbot rather than the original thread.)
> >
> > > The bug is still present in 2.6.37-rc4. I'm currently trying to bisect it.
> >
> > git bisect identified
> >
> > [305e6835e05513406fa12820e40e4a8ecb63743c] sched: Do not account irq time to current task
> >
> > as the cause of this regression. Reverting it from 2.6.37-rc4 (requires some
> > hackery due to subsequent changes in the same area) restores sane behaviour.
> >
> > The original patch submission talks about irq-heavy scenarios. My case is the
> > exact opposite: UP, !PREEMPT, NO_HZ, very low irq rate, essentially 100% CPU
> > bound in userspace but expected to schedule quickly when needed (e.g. running
> > top or ps or just hitting CR in one shell while another runs a compile job).
> >
> > I've reproduced the misbehaviour with 2.6.37-rc4 on ARM/mach-iop32x and
> > ARM/mach-ixp4xx, but ARM/mach-kirkwood does not misbehave, and other archs
> > (x86 SMP, SPARC64 UP and SMP, PowerPC32 UP, Alpha UP) also do not misbehave.
> >
> > So it looks like an ARM-only issue, possibly depending on platform specifics.
> >
> > One difference I noticed between my Kirkwood machine and my ixp4xx and iop32x
> > machines is that even though all have CONFIG_NO_HZ=y, the timer irq rate is
> > much higher on Kirkwood, even when the machine is idle.
>
> The above patch you point out is fundamentally broken.
>
> + rq->clock = sched_clock_cpu(cpu);
> + irq_time = irq_time_cpu(cpu);
> + if (rq->clock - irq_time > rq->clock_task)
> + rq->clock_task = rq->clock - irq_time;
>
> This means that we will only update rq->clock_task if it is smaller than
> rq->clock. So, eventually over time, rq->clock_task becomes the maximum
> value that rq->clock can ever be. Or in other words, the maximum value
> of sched_clock_cpu().
>
> Once that has been reached, although rq->clock will wrap back to zero,
> rq->clock_task will not, and so (I think) task execution time accounting
> effectively stops dead.
>
> I guess this hasn't been noticed on x86 as they have a 64-bit sched_clock,
> and so need to wait a long time for this to be noticed. However, on ARM
> where we tend to have 32-bit counters feeding sched_clock(), this value
> will wrap far sooner.

I'm not so sure about this - certainly that if() statement looks very
suspicious above. As irq_time_cpu() will always be zero, can you try
removing the conditional?

In any case, sched_clock_cpu() should be resilient against sched_clock()
wrapping. However, your comments about it being iop32x and ixp4xx
(both of which are 32-bit-counter-to-ns based implementations) and
kirkwood being a 32-bit-extended-to-63-bit-counter-to-ns implementation
does make me wonder...

2010-12-05 16:07:41

by Mikael Pettersson

[permalink] [raw]

Subject: Re: [BUG] 2.6.37-rc3 massive interactivity regression on ARM

Russell King - ARM Linux writes:
> On Sun, Dec 05, 2010 at 01:17:02PM +0000, Russell King - ARM Linux wrote:
> > On Sun, Dec 05, 2010 at 01:32:37PM +0100, Mikael Pettersson wrote:
> > > Mikael Pettersson writes:
> > > > The scenario is that I do a remote login to an ARM build server,
> > > > use screen to start a sub-shell, in that shell start a largish
> > > > compile job, detach from that screen, and from the original login
> > > > shell I occasionally monitor the compile job with top or ps or
> > > > by attaching to the screen.
> > > >
> > > > With kernels 2.6.37-rc2 and -rc3 this causes the machine to become
> > > > very sluggish: top takes forever to start, once started it shows no
> > > > activity from the compile job (it's as if it's sleeping on a lock),
> > > > and ps also takes forever and shows no activity from the compile job.
> > > >
> > > > Rebooting into 2.6.36 eliminates these issues.
> > > >
> > > > I do pretty much the same thing (remote login -> screen -> compile job)
> > > > on other archs, but so far I've only seen the 2.6.37-rc misbehaviour
> > > > on ARM EABI, specifically on an IOP n2100. (I have access to other ARM
> > > > sub-archs, but haven't had time to test 2.6.37-rc on them yet.)
> > > >
> > > > Has anyone else seen this? Any ideas about the cause?
> > >
> > > (Re-followup since I just realised my previous followups were to Rafael's
> > > regressions mailbot rather than the original thread.)
> > >
> > > > The bug is still present in 2.6.37-rc4. I'm currently trying to bisect it.
> > >
> > > git bisect identified
> > >
> > > [305e6835e05513406fa12820e40e4a8ecb63743c] sched: Do not account irq time to current task
> > >
> > > as the cause of this regression. Reverting it from 2.6.37-rc4 (requires some
> > > hackery due to subsequent changes in the same area) restores sane behaviour.
> > >
> > > The original patch submission talks about irq-heavy scenarios. My case is the
> > > exact opposite: UP, !PREEMPT, NO_HZ, very low irq rate, essentially 100% CPU
> > > bound in userspace but expected to schedule quickly when needed (e.g. running
> > > top or ps or just hitting CR in one shell while another runs a compile job).
> > >
> > > I've reproduced the misbehaviour with 2.6.37-rc4 on ARM/mach-iop32x and
> > > ARM/mach-ixp4xx, but ARM/mach-kirkwood does not misbehave, and other archs
> > > (x86 SMP, SPARC64 UP and SMP, PowerPC32 UP, Alpha UP) also do not misbehave.
> > >
> > > So it looks like an ARM-only issue, possibly depending on platform specifics.
> > >
> > > One difference I noticed between my Kirkwood machine and my ixp4xx and iop32x
> > > machines is that even though all have CONFIG_NO_HZ=y, the timer irq rate is
> > > much higher on Kirkwood, even when the machine is idle.
> >
> > The above patch you point out is fundamentally broken.
> >
> > + rq->clock = sched_clock_cpu(cpu);
> > + irq_time = irq_time_cpu(cpu);
> > + if (rq->clock - irq_time > rq->clock_task)
> > + rq->clock_task = rq->clock - irq_time;
> >
> > This means that we will only update rq->clock_task if it is smaller than
> > rq->clock. So, eventually over time, rq->clock_task becomes the maximum
> > value that rq->clock can ever be. Or in other words, the maximum value
> > of sched_clock_cpu().
> >
> > Once that has been reached, although rq->clock will wrap back to zero,
> > rq->clock_task will not, and so (I think) task execution time accounting
> > effectively stops dead.
> >
> > I guess this hasn't been noticed on x86 as they have a 64-bit sched_clock,
> > and so need to wait a long time for this to be noticed. However, on ARM
> > where we tend to have 32-bit counters feeding sched_clock(), this value
> > will wrap far sooner.
>
> I'm not so sure about this - certainly that if() statement looks very
> suspicious above. As irq_time_cpu() will always be zero, can you try
> removing the conditional?
>
> In any case, sched_clock_cpu() should be resilient against sched_clock()
> wrapping. However, your comments about it being iop32x and ixp4xx
> (both of which are 32-bit-counter-to-ns based implementations) and
> kirkwood being a 32-bit-extended-to-63-bit-counter-to-ns implementation
> does make me wonder...

I ran two tests on my iop32x machine:

1. Made the above-mentioned assignment to rq->clock_task unconditional.
That cured the interactivity regressions.

2. Restored the conditional assignment to rq->clock_task and disabled the
platform-specific sched_clock() so the kernel used the generic one.
That too cured the interactivity regressions.

I then repeated these tests on my ixp4xx machine, with the same results.

I'll try to come up with a fix for the ixp4xx and plat-iop 32-bit sched_clock()s.

2010-12-05 16:22:18

by Russell King - ARM Linux

[permalink] [raw]

Subject: Re: [BUG] 2.6.37-rc3 massive interactivity regression on ARM

On Sun, Dec 05, 2010 at 05:07:36PM +0100, Mikael Pettersson wrote:
> I ran two tests on my iop32x machine:
>
> 1. Made the above-mentioned assignment to rq->clock_task unconditional.
> That cured the interactivity regressions.
>
> 2. Restored the conditional assignment to rq->clock_task and disabled the
> platform-specific sched_clock() so the kernel used the generic one.
> That too cured the interactivity regressions.
>
> I then repeated these tests on my ixp4xx machine, with the same results.
>
> I'll try to come up with a fix for the ixp4xx and plat-iop 32-bit
> sched_clock()s.

I'm not sure that's the correct fix - it looks like sched_clock_cpu()
should already be preventing scheduler clock time going backwards.

Hmm. IOP32x seems to have a 32-bit timer clocked at 200MHz. That means
it wraps once every 21s. However, we have that converted to ns by an
unknown multiplier and shift. It seems that those are chosen to
guarantee that we will cover only 4s without wrapping in the clocksource
conversion. Maybe that's not sufficient?

Could you try looking into sched_clock_cpu(), sched_clock_local() and
sched_clock() to see whether anything odd stands out?

2010-12-06 21:30:01

by Venkatesh Pallipadi

[permalink] [raw]

Subject: Re: [BUG] 2.6.37-rc3 massive interactivity regression on ARM

On Sun, Dec 5, 2010 at 6:19 AM, Russell King - ARM Linux
<[email protected]> wrote:
> On Sun, Dec 05, 2010 at 01:17:02PM +0000, Russell King - ARM Linux wrote:
>> On Sun, Dec 05, 2010 at 01:32:37PM +0100, Mikael Pettersson wrote:
>> > Mikael Pettersson writes:
>> > ?> The scenario is that I do a remote login to an ARM build server,
>> > ?> use screen to start a sub-shell, in that shell start a largish
>> > ?> compile job, detach from that screen, and from the original login
>> > ?> shell I occasionally monitor the compile job with top or ps or
>> > ?> by attaching to the screen.
>> > ?>
>> > ?> With kernels 2.6.37-rc2 and -rc3 this causes the machine to become
>> > ?> very sluggish: top takes forever to start, once started it shows no
>> > ?> activity from the compile job (it's as if it's sleeping on a lock),
>> > ?> and ps also takes forever and shows no activity from the compile job.
>> > ?>
>> > ?> Rebooting into 2.6.36 eliminates these issues.
>> > ?>
>> > ?> I do pretty much the same thing (remote login -> screen -> compile job)
>> > ?> on other archs, but so far I've only seen the 2.6.37-rc misbehaviour
>> > ?> on ARM EABI, specifically on an IOP n2100. (I have access to other ARM
>> > ?> sub-archs, but haven't had time to test 2.6.37-rc on them yet.)
>> > ?>
>> > ?> Has anyone else seen this? Any ideas about the cause?
>> >
>> > (Re-followup since I just realised my previous followups were to Rafael's
>> > regressions mailbot rather than the original thread.)
>> >
>> > > The bug is still present in 2.6.37-rc4. ?I'm currently trying to bisect it.
>> >
>> > git bisect identified
>> >
>> > [305e6835e05513406fa12820e40e4a8ecb63743c] sched: Do not account irq time to current task
>> >
>> > as the cause of this regression. ?Reverting it from 2.6.37-rc4 (requires some
>> > hackery due to subsequent changes in the same area) restores sane behaviour.
>> >
>> > The original patch submission talks about irq-heavy scenarios. ?My case is the
>> > exact opposite: UP, !PREEMPT, NO_HZ, very low irq rate, essentially 100% CPU
>> > bound in userspace but expected to schedule quickly when needed (e.g. running
>> > top or ps or just hitting CR in one shell while another runs a compile job).
>> >
>> > I've reproduced the misbehaviour with 2.6.37-rc4 on ARM/mach-iop32x and
>> > ARM/mach-ixp4xx, but ARM/mach-kirkwood does not misbehave, and other archs
>> > (x86 SMP, SPARC64 UP and SMP, PowerPC32 UP, Alpha UP) also do not misbehave.
>> >
>> > So it looks like an ARM-only issue, possibly depending on platform specifics.
>> >
>> > One difference I noticed between my Kirkwood machine and my ixp4xx and iop32x
>> > machines is that even though all have CONFIG_NO_HZ=y, the timer irq rate is
>> > much higher on Kirkwood, even when the machine is idle.
>>
>> The above patch you point out is fundamentally broken.
>>
>> + ? ? ? ? ? ? ? rq->clock = sched_clock_cpu(cpu);
>> + ? ? ? ? ? ? ? irq_time = irq_time_cpu(cpu);
>> + ? ? ? ? ? ? ? if (rq->clock - irq_time > rq->clock_task)
>> + ? ? ? ? ? ? ? ? ? ? ? rq->clock_task = rq->clock - irq_time;
>>
>> This means that we will only update rq->clock_task if it is smaller than
>> rq->clock. ?So, eventually over time, rq->clock_task becomes the maximum
>> value that rq->clock can ever be. ?Or in other words, the maximum value
>> of sched_clock_cpu().
>>
>> Once that has been reached, although rq->clock will wrap back to zero,
>> rq->clock_task will not, and so (I think) task execution time accounting
>> effectively stops dead.
>>
>> I guess this hasn't been noticed on x86 as they have a 64-bit sched_clock,
>> and so need to wait a long time for this to be noticed. ?However, on ARM
>> where we tend to have 32-bit counters feeding sched_clock(), this value
>> will wrap far sooner.
>
> I'm not so sure about this - certainly that if() statement looks very
> suspicious above. ?As irq_time_cpu() will always be zero, can you try
> removing the conditional?
>
> In any case, sched_clock_cpu() should be resilient against sched_clock()
> wrapping. ?However, your comments about it being iop32x and ixp4xx
> (both of which are 32-bit-counter-to-ns based implementations) and
> kirkwood being a 32-bit-extended-to-63-bit-counter-to-ns implementation
> does make me wonder...
>

That conditional is based on assumption that sched_clock_cpu() is u64.
If that is not true and sched_clock_cpu() is 32 wrapping around, then there
are other places in scheduler which may have problems as well, where
we do curr_time - prev_time kind of calculations in u64.

For example, update_curr() has:
delta_exec = (unsigned long)(now - curr->exec_start);
which is based on rq->clock and can end up as high positive number
in case of 32 bit wraparound.

Having said that, this conditional can be cleaned up to handle the potential
64 bit overflow (even after a long long time) cleanly. But, it will be good to
know what exactly is going wrong here though.

Thanks,
Venki

2010-12-08 12:40:32