2003-08-17 13:30:56

by Con Kolivas

[permalink] [raw]
Subject: [RFC] Re: Blender profiling-1 O16.2int

I've investigated this starvation issue now that I've been given a testcase.
First some background. Mats reported that he could elicit starvation in
blender so I downloaded that and have been thrashing it out trying to find
out what happens. Here are relevant excerpts from the discussion.

On Sun, 17 Aug 2003 15:38, Voluspa wrote:
> Ah, the freezes has been there with all 2.6s, just shorter (1/2 - 2
> seconds). Where I didn't see them was in the older blender version of
> 2.23. I'll investigate it more thorough.

This struck me as interesting because the starvation was there in the vanilla
scheduler anyway, but for a shorter period. Even more interesting was the
suggestion that it didn't happen with previous versions of blender.

Reproducing it was:

> Voluspa <[email protected]> wrote:
>Thusly: Start program, move mouse to get rid of splash screen. Click and
>hold down 3rd mouse button (unless you use simulate 3 buttons in X, then
>it's left-right-mousebutton simultaniously) anywhere on the white grid
>(first view is top-down) and move mouse around... That is how you rotate
>the world plane around an axes. Doing this quickly, or slowly for a
>longer period, I freeze.

Which I tried for some time and eventually managed to get locally. In the
meantime,

> Voluspa <[email protected]> wrote:
> But not in 2.23, not even with this O16.2! Don't see any freezes with
> that older Blender version.

So I had my suspicions about what was happening already based on that
information. Anyway I ran top in batch mode, reproduced the starvation and
got this (just X and 3[blender] at one second intervals):

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ Command
782 root 16 0 109m 39m 77m R 80.5 7.9 2:37.17 X
12553 con 15 0 12648 6252 9072 S 19.9 1.2 0:07.98 3

782 root 16 0 109m 39m 77m R 73.6 7.9 2:37.91 X
12553 con 15 0 12648 6252 9072 R 23.9 1.2 0:08.22 3

782 root 16 0 109m 39m 77m R 79.5 7.9 2:38.71 X
12553 con 15 0 12648 6252 9072 S 20.9 1.2 0:08.43 3

Starvation starts here:
782 root 16 0 109m 39m 77m R 95.4 7.9 2:39.67 X
12553 con 15 0 12648 6252 9072 S 3.0 1.2 0:08.46 3

12553 con 16 0 12648 6252 9072 R 73.6 1.2 0:09.20 3
782 root 17 0 109m 39m 77m R 25.8 7.9 2:39.93 X

12553 con 16 0 12648 6252 9072 R 98.4 1.2 0:10.19 3
782 root 17 0 109m 39m 77m R 0.0 7.9 2:39.93 X

12553 con 16 0 12648 6252 9072 R 99.4 1.2 0:11.19 3
782 root 17 0 109m 39m 77m R 0.0 7.9 2:39.93 X

12553 con 16 0 12648 6252 9072 R 99.4 1.2 0:12.19 3
782 root 17 0 109m 39m 77m R 0.0 7.9 2:39.93 X

12553 con 16 0 12648 6252 9072 R 99.9 1.2 0:13.20 3
782 root 17 0 109m 39m 77m R 0.0 7.9 2:39.93 X

[snip some more seconds]
and ends here:
12553 con 15 0 12648 6252 9072 S 39.8 1.2 0:23.59 3
782 root 16 0 109m 39m 77m S 16.9 7.9 2:40.10 X

782 root 15 0 109m 39m 77m S 8.9 7.9 2:40.19 X
12553 con 15 0 12964 6576 9072 S 2.0 1.3 0:23.61 3

Now for those that can't see what this is, blender and X are interactive tasks
and getting high priority (PRI < 18) which makes sense. During heavy usage of
blender, it is X that gets pegged for cpu usage (ie it is doing the work for
blender), and eventually it gets expired onto the expired array for being
naughty and stops doing anything till all other tasks have finished working
on the active array. Now normally, blender should just sleep and wait till X
comes alive again before it does anything. However here it shows clearly that
it is spinning madly looking for something from X, and poor X can't do
anything. This is the busy on wait I've described. Meanwhile, since blender
was seen as an interactive task (which it is), it preempts everything lower
priority than it till it also gets booted.

Now in O16.2 I managed to minimise this happening by not allowing tasks to
preempt their waker, but this has no effect if X gets expired. The evidence I
really needed was that this also happened to the vanilla scheduler to a
lesser degree and that it is a software flaw. This same thing happens to
2.6.0-test3 for 1-2 seconds, and doesn't happen at all with a previous
version of blender.

So where does this leave us?

Well first I can remove entirely the not preempting the waker in O16s since
that is faulty for smp, and doesn't really fix the problem.

Second, any applications that exhibit this should be fixed since it is a bug.

Finally I still need to find a way to prevent this from transiently starving
the system without undoing the interactive improvements to normal
applications; I certainly don't intend to make them "work nicely" just for
the sake of it.

I do have some ideas about how to progress with this (some Mike has suggested
to me previously), but so far most of them involve some detriment to the
interactive performance on other apps. So I'm appealing to others for ideas.

Comments?

Con


2003-08-17 16:28:16

by Con Kolivas

[permalink] [raw]
Subject: Re: [RFC] Re: Blender profiling-1 O16.2int

Addendum:

KernProfile during starvation:
435 get_offset_tsc 18.9130
273 do_softirq 1.8079
263 sys_gettimeofday 1.3350
121 do_gettimeofday 0.9167
72 fget 1.1077
68 schedule 0.0465
20 do_sys_settimeofday 0.0943
20 do_settimeofday 0.0627
14 fput 0.7000
14 add_wait_queue 0.1474
11 __wake_up 0.1250
11 remove_wait_queue 0.0815
11 do_anonymous_page 0.0199
9 pipe_poll 0.0726
9 link_path_walk 0.0040
8 free_hot_cold_page 0.0311

vmstat during starvation
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 0 99716 56348 193256 0 0 0 0 1687 5962 92 8 0 0
1 0 0 99716 56376 193256 0 0 0 144 1691 5976 81 19 0 0
1 0 0 99748 56376 193256 0 0 0 0 1656 6057 87 13 0 0
2 0 0 99748 56376 193256 0 0 0 0 1636 5477 85 15 0 0
11 0 0 99748 56376 193256 0 0 0 0 1430 445 51 49 0 0
12 0 0 99748 56376 193256 0 0 0 0 1459 362 52 48 0 0
12 0 0 99764 56392 193256 0 0 0 92 1474 400 51 49 0 0
12 0 0 99764 56392 193256 0 0 0 0 1463 356 53 47 0 0
12 0 0 99764 56392 193256 0 0 0 0 1466 379 50 50 0 0

Note drop in ctx, rise in load (although in top only blender is actually
getting cpu at 99%) and massive rise in sys time.

Con

2003-08-18 00:28:59

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [RFC] Re: Blender profiling-1 O16.2int

On Sun, Aug 17, 2003 at 11:36:42PM +1000, Con Kolivas wrote:
> Now for those that can't see what this is, blender and X are
> interactive tasks and getting high priority (PRI < 18) which makes
> sense. During heavy usage of blender, it is X that gets pegged for
> cpu usage (ie it is doing the work for blender), and eventually it
> gets expired onto the expired array for being naughty and stops doing
> anything till all other tasks have finished working on the active
> array. Now normally, blender should just sleep and wait till X comes
> alive again before it does anything. However here it shows clearly that
> it is spinning madly looking for something from X, and poor X can't do
> anything. This is the busy on wait I've described. Meanwhile, since blender
> was seen as an interactive task (which it is), it preempts everything lower
> priority than it till it also gets booted.

Ugh, priority inversion.


-- wli

2003-08-18 12:58:41

by Con Kolivas

[permalink] [raw]
Subject: Re: [RFC] Re: Blender profiling-1 O16.2int

On Mon, 18 Aug 2003 22:52, Max Hailperin wrote:
> ... I do have some ideas about how to progress with this (some Mike
> has suggested to me previously), but so far most of them involve
> some detriment to the interactive performance on other apps. So I'm
> appealing to others for ideas.

> I suggest you put a small limit on the number of times that a task can
> be requeued onto the active array before it needs to go to the expired
> array. -max

Thanks; I appreciate your comments. That's what's currently in place at the
moment, and the lower the number of requeues the less it starves...
Unfortunately, this also means that the shorter it takes a real interactive
task to expire as well. This is the theme I'm currently working on though, as
the starver gets rescheduled ahead of anything else and I've developed a way
of letting other tasks sneak in ahead if it starts doing that. It needs more
tuning but seems to work so far.

Cheers,
Con

2003-08-18 12:52:50

by Max Hailperin

[permalink] [raw]
Subject: Re: [RFC] Re: Blender profiling-1 O16.2int

From: Con Kolivas <[email protected]>
To: Voluspa <[email protected]>
Subject: [RFC] Re: Blender profiling-1 O16.2int
Date: Sun, 17 Aug 2003 23:36:42 +1000
Cc: linux kernel mailing list <[email protected]>,
Mike Galbraith <[email protected]>, Andrew Morton <[email protected]>,
William Lee Irwin III <[email protected]>

... I do have some ideas about how to progress with this (some Mike
has suggested to me previously), but so far most of them involve
some detriment to the interactive performance on other apps. So I'm
appealing to others for ideas.

Comments?

Con

I suggest you put a small limit on the number of times that a task can
be requeued onto the active array before it needs to go to the expired
array. -max

2003-08-18 14:41:54

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [RFC] Re: Blender profiling-1 O16.2int

From: Con Kolivas <[email protected]>
> ... I do have some ideas about how to progress with this (some Mike
> has suggested to me previously), but so far most of them involve
> some detriment to the interactive performance on other apps. So I'm
> appealing to others for ideas.

On Mon, Aug 18, 2003 at 07:52:35AM -0500, Max Hailperin wrote:
> I suggest you put a small limit on the number of times that a task can
> be requeued onto the active array before it needs to go to the expired
> array. -max

Well, first off this undoes the effect of code already in place (the
interactivity heuristics), and second it's defeated by high arrival
rates spread across many tasks. The second is fatal but there isn't
much that can be done about it in the current design, and I don't
have enough subtlety to see how to fiddle with the bits necessary to
recover anything out of the first. AFAICT no fixed limit will ever
do; any given limit will be too many for some tasks and not enough
for others, and there's no way to guess based on the task (if what said
it was interactive was enough in the first place, you wouldn't have to
resort to this), so you've been hanged just for trying.

-- wli

2003-08-19 00:13:58

by Wes Janzen

[permalink] [raw]
Subject: Re: [RFC] Re: Blender profiling-1 O16.2int

That makes sense, I was running a program that I found on IBM's website
that's supposed to test context switching speed this weekend. It has 1
free lock and passes them around the group. If I put it up to 32
threads or so with one spare lock, I can start to see the starvation.
When running vmstat, it's apparent when the the starvation occurs as the
context switching sky-rockets. I was going to add to the example code
to check for how many times a thread wakes up waiting for the lock and
can't get it after reading that message about locks in the list. I
guess I won't have to do that now. Anyway, that'll bring my system to a
halt when the thread count gets up over 256. Still, it's usuable as
long as I'm not doing something else that makes heavy use of the processor.

I also had a nasty experience with a nautilus crash today that caused
this problem with X, I'm sure much similar to the results from Blender.
Obviously, X keeps getting expired because I was forced to reset after
waiting 1 1/2 hours for the system to shutdown (i.e. after the runlevel
changed to 6 to the actual rebooting of the system). Even after I
managed to get out of X (though I don't know if the process was able
ever to actually exit) the system took 5 minutes to get from "Stopping
sound services" to the next message. After that, I got a cursor that
didn't flash, then a blank screen but no reboot. I don't think X ever
exited since it never took me back to VT7, and the XDM shutdown comes
shortly before killing cron. The last message I saw indicated it was
shutting down service at... At that point, I could no longer change
between terminals though I could see continuing activity on my hard
drive. I waited 30 minutes from when I had exited X until I finally
reset. I had just clicked on the debug information button in bug buddy
when this started.

I think this problem is exacerbated when another app is competing for
the processor. The machine just pauses unless I'm also doing something
else, in this case compiling XINE. Once something is competing, it
looks like X takes an extraordinarily long time to come back into the
running queue.

Is there a way to figure out when a process is spinning on a wait and
exponentially decrease it's bonus if they are consecutive? I should
probably read through the source and some of these posts before I make
suggestions though, because I don't currently know much about how all
that works.

Wes

Con Kolivas wrote:

>Second, any applications that exhibit this should be fixed since it is a bug.
>
>Finally I still need to find a way to prevent this from transiently starving
>the system without undoing the interactive improvements to normal
>applications; I certainly don't intend to make them "work nicely" just for
>the sake of it.
>
>I do have some ideas about how to progress with this (some Mike has suggested
>to me previously), but so far most of them involve some detriment to the
>interactive performance on other apps. So I'm appealing to others for ideas.
>
>Comments?
>
>Con
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
>

2003-08-19 00:21:33

by Con Kolivas

[permalink] [raw]
Subject: Re: [RFC] Re: Blender profiling-1 O16.2int

On Tue, 19 Aug 2003 10:14, Wes Janzen wrote:
> I think this problem is exacerbated when another app is competing for
> the processor. The machine just pauses unless I'm also doing something
> else, in this case compiling XINE. Once something is competing, it
> looks like X takes an extraordinarily long time to come back into the
> running queue.

Yes that is correct behaviour and to be expected.

> Is there a way to figure out when a process is spinning on a wait and

That's the trick isn't it? No there isn't or else I'd fix it in a jiffy. If
someone can think of a way I'd love to know.

> exponentially decrease it's bonus if they are consecutive? I should
> probably read through the source and some of these posts before I make
> suggestions though, because I don't currently know much about how all
> that works.

If you get a chance could you try the prerelease O17 patch against O16.3 in
http://kernel.kolivas.org/2.5/experimental to see if that helps? It's purely
an anti-starvation patch.

Thanks for comments.

Cheers,
Con

2003-08-19 00:30:08

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [RFC] Re: Blender profiling-1 O16.2int

On Mon, Aug 18, 2003 at 07:14:12PM -0500, Wes Janzen wrote:
> That makes sense, I was running a program that I found on IBM's website
> that's supposed to test context switching speed this weekend. It has 1
> free lock and passes them around the group. If I put it up to 32
> threads or so with one spare lock, I can start to see the starvation.
> When running vmstat, it's apparent when the the starvation occurs as the
> context switching sky-rockets. I was going to add to the example code
> to check for how many times a thread wakes up waiting for the lock and
> can't get it after reading that message about locks in the list. I
> guess I won't have to do that now. Anyway, that'll bring my system to a
> halt when the thread count gets up over 256. Still, it's usuable as
> long as I'm not doing something else that makes heavy use of the processor.

Could you give a URL to that benchmark?

Thanks.


-- wli

2003-08-19 00:40:05

by Richard A Nelson

[permalink] [raw]
Subject: Re: [RFC] Re: Blender profiling-1 O16.2int

On Tue, 19 Aug 2003, Con Kolivas wrote:

> > Is there a way to figure out when a process is spinning on a wait and
>
> That's the trick isn't it? No there isn't or else I'd fix it in a jiffy. If
> someone can think of a way I'd love to know.

Have you considered instrumenting waits/locks ? I had to do that for a
schedular/locks/events (not Linux based), by extending the structures
to include current ownership and waiters... At a problem point, you'd
then have to use something like sysreq to trigger a scan of the relevant
data areas. I had to cause an error and do a postmortem via paper dump
:(

--
Rick Nelson
> : Any porters out there should feel happier knowing that DEC is shipping
> : me an AlphaPC that I intend to try getting linux running on: this will
> : definitely help flush out some of the most flagrant unportable stuff.
> : The Alpha is much more different from the i386 than the 68k stuff is, so
> : it's likely to get most of the stuff fixed.
>
> It's posts like this that almost convince us non-believers that there
> really is a god.
-- Anthony Lovell, to Linus's remarks about porting

2003-08-19 00:57:53

by Wes Janzen

[permalink] [raw]
Subject: Re: [RFC] Re: Blender profiling-1 O16.2int

Those benchmarks/tests I'm referring to are at:

I had to correct some things to get them to compile, but it was very
simple. I think one of the versions actually checks to see if it has
been woken up more than once.

cspipe2a.cpp/csfast5.cpp:
-->Article with links to source:
http://www-106.ibm.com/developerworks/linux/library/l-rt9/

csfast5a.cpp/fract5.cpp:
-->Original site (doesn't have the source, but explains the program):

http://www-106.ibm.com/developerworks/linux/library/l-rt10/?t=gr,lnxw01=ConSwiP2
-->Google search found me this site which has the source linked:

http://www-900.ibm.com/developerWorks/cn/linux/sdk/rt/part10/index.shtml#resources

Wes

2003-08-19 14:57:48

by Rob Landley

[permalink] [raw]
Subject: Re: [RFC] Re: Blender profiling-1 O16.2int

On Monday 18 August 2003 20:28, Con Kolivas wrote:
> On Tue, 19 Aug 2003 10:14, Wes Janzen wrote:
> > I think this problem is exacerbated when another app is competing for
> > the processor. The machine just pauses unless I'm also doing something
> > else, in this case compiling XINE. Once something is competing, it
> > looks like X takes an extraordinarily long time to come back into the
> > running queue.
>
> Yes that is correct behaviour and to be expected.
>
> > Is there a way to figure out when a process is spinning on a wait and
>
> That's the trick isn't it? No there isn't or else I'd fix it in a jiffy. If
> someone can think of a way I'd love to know.

Well, in theory you could fiddle with its memory map and force it to
soft-fault all its pages back in, and count the soft faults. But I know just
enough about that to suspect it's a rast nest of evil complexity and loss of
orthogonality.

Another thing is if it's calling the same syscalls in sequence with the same
arguments... (What syscalls IS it calling?)

Looking at the profile list you gave earlier, schedule is showing up. If the
sucker's calling schedule in the loop, that should be a heck of a hint that
it's busy-waiting, dontcha think?

As long as you've got a queue for each priority, how about having an "I called
schedule" queue? (Or can you re-queue it somewhere lower than its actual
priority? How about one queue lower than the queue was on when it called
schedule, without necessarily changing its dynamic priority...)

What DOES calling "schedule" from userspace mean with the current scheduler?

Rob