2003-08-17 12:12:19

by MånsRullgård

[permalink] [raw]
Subject: [BUG] Serious scheduler starvation


I'm reposting this, since I got no response last time.

First the machine details. It's a Pentium4 running at 2 GHz. Linux
version 2.6.0-test3 + O16int + softrr.

I just experienced something that might be a scheduler problem. I was
working in XEmacs, when suddenly the machine became very
unresponsive. The mouse pointer in X moved sporadically. I could
switch to a text console and log in, though typing lagged tens of
seconds. Switching between text consoles was fast, though. I killed
xemacs, and the system was back to normal. Further investigation
showed that xemacs was stuck in a nasty regexp match. If I was quick
enough, I could interrupt it with C-g.

With X and the window manager reniced to -10, they seem to be able to
get their job done. This leads me to believe that maybe xemacs is
considered interactive, and given too high priority when it suddenly
starts burning the cpu.

I'll try it later with other kernel versions, but right now I don't
want to reboot.

What can I do to collect more information about the problem?

--
M?ns Rullg?rd
[email protected]


2003-08-17 12:46:46

by Con Kolivas

[permalink] [raw]
Subject: Re: [BUG] Serious scheduler starvation

On Sun, 17 Aug 2003 22:11, M?ns Rullg?rd wrote:
> I'm reposting this, since I got no response last time.
>
> First the machine details. It's a Pentium4 running at 2 GHz. Linux
> version 2.6.0-test3 + O16int + softrr.

Softrr ? Which patch? Davide's? Noone has tried to make them compatible
(yet?). Even so, this may be unrelated to softrr.

> I just experienced something that might be a scheduler problem. I was

Almost certainly is.

> working in XEmacs, when suddenly the machine became very
> unresponsive. The mouse pointer in X moved sporadically. I could
> switch to a text console and log in, though typing lagged tens of
> seconds. Switching between text consoles was fast, though. I killed
> xemacs, and the system was back to normal. Further investigation
> showed that xemacs was stuck in a nasty regexp match. If I was quick
> enough, I could interrupt it with C-g.
>
> With X and the window manager reniced to -10, they seem to be able to
> get their job done. This leads me to believe that maybe xemacs is
> considered interactive, and given too high priority when it suddenly
> starts burning the cpu.
>
> I'll try it later with other kernel versions, but right now I don't
> want to reboot.
>
> What can I do to collect more information about the problem?

Run top in batch mode as root reniced to -11 so it doesn't get preempted and
capture it happening before you kill XEmacs. Then try running XEmacs niced
+10 and see if it doesn't happen there. Also if it was lucky enough that you
booted with profiling enabled you could profile it, but top will tell if it's
a simple scheduler starvation error.

Con

2003-08-17 12:52:37

by Daniel Phillips

[permalink] [raw]
Subject: Re: [BUG] Serious scheduler starvation

On Sunday 17 August 2003 14:11, M?ns Rullg?rd wrote:
> What can I do to collect more information about the problem?

Look in top's "PRI" column, which is where you see the effects of the dynamic
priority adjustment.

Regards,

Daniel

2003-08-17 13:17:20

by MånsRullgård

[permalink] [raw]
Subject: Re: [BUG] Serious scheduler starvation

Con Kolivas <[email protected]> writes:

>> First the machine details. It's a Pentium4 running at 2 GHz. Linux
>> version 2.6.0-test3 + O16int + softrr.
>
> Softrr ? Which patch? Davide's? Noone has tried to make them compatible
> (yet?). Even so, this may be unrelated to softrr.

Are there more than one. I'm using something off xmailserver.org.
Anyhow, no softrr tasks were running at the time.

>> What can I do to collect more information about the problem?
>
> Run top in batch mode as root reniced to -11 so it doesn't get preempted and
> capture it happening before you kill XEmacs. Then try running XEmacs niced
> +10 and see if it doesn't happen there. Also if it was lucky enough that you
> booted with profiling enabled you could profile it, but top will
> tell if it's a simple scheduler starvation error.

I'll do that, it's easily reproducible, at least.

--
M?ns Rullg?rd
[email protected]

2003-08-17 13:43:39

by Con Kolivas

[permalink] [raw]
Subject: Re: [BUG] Serious scheduler starvation

On Sun, 17 Aug 2003 23:43, Daniel Phillips wrote:
> On Sunday 17 August 2003 14:52, Con Kolivas wrote:
> > On Sun, 17 Aug 2003 22:11, M?ns Rullg?rd wrote:
> > > First the machine details. It's a Pentium4 running at 2 GHz. Linux
> > > version 2.6.0-test3 + O16int + softrr.
> >
> > Softrr ? Which patch? Davide's? Noone has tried to make them compatible
> > (yet?). Even so, this may be unrelated to softrr.
>
> Almost certainly unrelated, since there is no effect unless he runs
> SCHED_RR applications as non-root.
>
> Obviously, he should back the patches out one by one when he does get time
> to reboot.

No need, there is a known issue in my patches that can cause it. Check the
email I just sent about a similar issue.

Con

2003-08-17 13:40:28

by Daniel Phillips

[permalink] [raw]
Subject: Re: [BUG] Serious scheduler starvation

On Sunday 17 August 2003 14:52, Con Kolivas wrote:
> On Sun, 17 Aug 2003 22:11, M?ns Rullg?rd wrote:
> > First the machine details. It's a Pentium4 running at 2 GHz. Linux
> > version 2.6.0-test3 + O16int + softrr.
>
> Softrr ? Which patch? Davide's? Noone has tried to make them compatible
> (yet?). Even so, this may be unrelated to softrr.

Almost certainly unrelated, since there is no effect unless he runs SCHED_RR
applications as non-root.

Obviously, he should back the patches out one by one when he does get time to
reboot.

Regards,

Daniel