2003-06-01 04:11:32

by Tom Sightler

[permalink] [raw]
Subject: Strange load issues with 2.5.69/70 in both -mm and -bk trees.

I'm looking for help in identifying what might be causing some very
strange issues I have recently noticed with my Dell laptop running
recent 2.5.69/70 kernels.

The symptoms are a little strange, at least to me, but I'll try to
describe them as short and completely as possible. Basically, I first
noticed an issue when I was testing a demo version of Crossover Plugin
with some web sites using heavy Shockwave content. This first noticable
symptom was that, any sound event that corresponded with screen updates
would pop and crackle. At first I thought it was a problem with my
sound card, but as I being to look at the issue I noticed that the
problem seemed to be caused by the fact that the pluginserver (wine) was
using 100% of the CPU. I simply reniced this process to -10 and
everything started working fine. Upon looking a little further it
seemed that the kernel was dynamically boosting the priority of the
process much higher than it probably should be, in the end, not leaving
enough CPU for playing the sounds without skipping.

After doing some other research I found several other programs that
cause what appear to be the same basic symptoms. For example, viewing a
PDF file from withing Mozilla using the Acrobat plugin causes my X
server (don't know what X) to get a boost and suddenly it takes 100% of
the CPU.

VMware 4 also seems to cause a similar problem, where lots of processes
get boosts leaving very little left for simple things like the
occasional sound.

Would these issues be explained by the scheduler starvation issues that
others have seen? I thought those had been mostly fixed.

I'm not 100% sure, but I don't remember seeing these problems with
2.5.68-mm2, but I have since tried 2.5.69-mm4 and today 2.5.70-mm3 as
well as 2.5.70 and they both have this same symptom.

Booting the system into 2.4.20 makes all of these symptoms go away.

It doesn't seem reasonable that I should have to play with nice values
and priorities to get things running right. Is there anything I should
look at tuning? Other things I may be doing wrong?

I've tried with preemption both enabled and disabled with no effect.

Any help or suggestions would be greatly appreciated. Overall this
system still works well, and none of these issues keep the system from
being usable. Overall performance on my laptop is much smoother and
snappier than anything I have seen with 2.4.x, but having to play with
nice levels to get these programs cooperating seems wrong as they're
pretty basic functionality.

Later,
Tom



2003-06-01 04:31:42

by Andrew Morton

[permalink] [raw]
Subject: Re: Strange load issues with 2.5.69/70 in both -mm and -bk trees.

Tom Sightler <[email protected]> wrote:
>
> I simply reniced this process to -10 and
> everything started working fine. Upon looking a little further it
> seemed that the kernel was dynamically boosting the priority of the
> process much higher than it probably should be, in the end, not leaving
> enough CPU for playing the sounds without skipping.

Yes, it seems that too many real-world applications are accidentally
triggering this problem.

Could you please run an strace of the boosted process, find out what it is
doing to get itself boosted in this manner? Wait until things are in
steady state and the process is boosted, then run `strace -tt <pid>' so we
see the timing info.


Thanks.

2003-06-01 17:24:22

by Tom Sightler

[permalink] [raw]
Subject: Re: Strange load issues with 2.5.69/70 in both -mm and -bk trees.

On Sun, 2003-06-01 at 00:45, Andrew Morton wrote:
> Tom Sightler <[email protected]> wrote:
> >
> > I simply reniced this process to -10 and
> > everything started working fine. Upon looking a little further it
> > seemed that the kernel was dynamically boosting the priority of the
> > process much higher than it probably should be, in the end, not leaving
> > enough CPU for playing the sounds without skipping.
>
> Yes, it seems that too many real-world applications are accidentally
> triggering this problem.
>
> Could you please run an strace of the boosted process, find out what it is
> doing to get itself boosted in this manner? Wait until things are in
> steady state and the process is boosted, then run `strace -tt <pid>' so we
> see the timing info.

The strace was quite large so I have uploaded it to
http://tuxyturvy.com/strace-pluginserver.gz

Please let me know if you need more info or if I can help in other ways.

Thanks,
Tom




2003-06-01 17:31:14

by Tom Sightler

[permalink] [raw]
Subject: Re: Strange load issues with 2.5.69/70 in both -mm and -bk trees.

On Sun, 2003-06-01 at 02:54, Mike Galbraith wrote:
>
> Hi,
>
> Wine is the shockwave plugin server right? You reniced _wine_ and the
> problem went away?
>
> -Mike
>

Yes, this is correct. It's showed as pluginserver in the 'ps ax' output
but I've since noticed that it is simply a symlink to wine. Of the two
wine processes, wine and wineserver, it was the wine frontend process
that was getting all of the CPU, showing 100% utilization. Renicing the
wine process made the problem go away.

Running the exact same config on a 2.4.20 kernel uses only a few % of
the CPU.

Later,
Tom


2003-06-01 19:53:42

by Andrew Morton

[permalink] [raw]
Subject: Re: Strange load issues with 2.5.69/70 in both -mm and -bk trees.

Tom Sightler <[email protected]> wrote:
>
> On Sun, 2003-06-01 at 00:45, Andrew Morton wrote:
> > Tom Sightler <[email protected]> wrote:
> > >
> > > I simply reniced this process to -10 and
> > > everything started working fine. Upon looking a little further it
> > > seemed that the kernel was dynamically boosting the priority of the
> > > process much higher than it probably should be, in the end, not leaving
> > > enough CPU for playing the sounds without skipping.
> >
> > Yes, it seems that too many real-world applications are accidentally
> > triggering this problem.
> >
> > Could you please run an strace of the boosted process, find out what it is
> > doing to get itself boosted in this manner? Wait until things are in
> > steady state and the process is boosted, then run `strace -tt <pid>' so we
> > see the timing info.
>
> The strace was quite large so I have uploaded it to
> http://tuxyturvy.com/strace-pluginserver.gz
>

Seems to be doing lots of small reads and writes. Maybe to a pipe. What
is the system context switch rate while this is happening? From `vmstat
1'?

2003-06-02 01:47:53

by Tom Sightler

[permalink] [raw]
Subject: Re: Strange load issues with 2.5.69/70 in both -mm and -bk trees.

On Sun, 2003-06-01 at 16:07, Andrew Morton wrote:
> Tom Sightler <[email protected]> wrote:
> >
> > On Sun, 2003-06-01 at 00:45, Andrew Morton wrote:
> > > Tom Sightler <[email protected]> wrote:
> > > >
> > > > I simply reniced this process to -10 and
> > > > everything started working fine. Upon looking a little further it
> > > > seemed that the kernel was dynamically boosting the priority of the
> > > > process much higher than it probably should be, in the end, not leaving
> > > > enough CPU for playing the sounds without skipping.
> > >
> > > Yes, it seems that too many real-world applications are accidentally
> > > triggering this problem.
> > >
> > > Could you please run an strace of the boosted process, find out what it is
> > > doing to get itself boosted in this manner? Wait until things are in
> > > steady state and the process is boosted, then run `strace -tt <pid>' so we
> > > see the timing info.
> >
> > The strace was quite large so I have uploaded it to
> > http://tuxyturvy.com/strace-pluginserver.gz
> >
>
> Seems to be doing lots of small reads and writes. Maybe to a pipe. What
> is the system context switch rate while this is happening? From `vmstat
> 1'?
>

I just did a 10 second run, average was about 2000/sec, with a minimum
of around 1500/sec and a peak of 3700/sec.

With the rest of the system the same (same programs running, etc) but on
a page that doesn't use that particular plugin, the context switch rate
sits around 250/sec.

Would you like the actual full output from vmstat?

Later,
Tom


2003-06-02 07:23:24

by Ingo Molnar

[permalink] [raw]
Subject: Re: Strange load issues with 2.5.69/70 in both -mm and -bk trees.


On Sat, 31 May 2003, Andrew Morton wrote:

> > [...] Upon looking a little further it
> > seemed that the kernel was dynamically boosting the priority of the
> > process much higher than it probably should be, in the end, not leaving
> > enough CPU for playing the sounds without skipping.
>
> Yes, it seems that too many real-world applications are accidentally
> triggering this problem.

no, the problem is exactly the opposite. Here's the key observation:

> the problem seemed to be caused by the fact that the pluginserver (wine)
> was using 100% of the CPU. I simply reniced this process to -10 and
> everything started working fine.

the kernel has detected this process to be a CPU-hog - and indeed the
traces and the above description all say that it really is a CPU hog.

by renicing it to -10 it gets super-attention from the scheduler, so it
can be a CPU hog _and_ create sound.

this is analogous to the 'game problem'. Games too tend to be CPU hogs,
and if anything else is running on the system, they might be hurt and see
longer latencies in getting scheduled. By renicing them the user
(sysadmin) signals towards the kernel that despite these application's CPU
usage pattern, they should get extra attention.

Ingo

2003-06-02 07:26:39

by Ingo Molnar

[permalink] [raw]
Subject: Re: Strange load issues with 2.5.69/70 in both -mm and -bk trees.


On 1 Jun 2003, Tom Sightler wrote:

> Yes, this is correct. It's showed as pluginserver in the 'ps ax' output
> but I've since noticed that it is simply a symlink to wine. Of the two
> wine processes, wine and wineserver, it was the wine frontend process
> that was getting all of the CPU, showing 100% utilization. Renicing the
> wine process made the problem go away.
>
> Running the exact same config on a 2.4.20 kernel uses only a few % of
> the CPU.

could you apply the attached patch to 2.5.70 and check whether wine still
uses up 100% CPU time? This might be an artifact introduced by the
different HZ values of 2.4 and 2.5.

Ingo

--- include/asm-i386/param.h.orig
+++ include/asm-i386/param.h
@@ -2,7 +2,7 @@
#define _ASMi386_PARAM_H

#ifdef __KERNEL__
-# define HZ 1000 /* Internal kernel timer frequency */
+# define HZ 100 /* Internal kernel timer frequency */
# define USER_HZ 100 /* .. some user interfaces are in "ticks" */
# define CLOCKS_PER_SEC (USER_HZ) /* like times() */
#endif

2003-06-02 13:23:50

by Tom Sightler

[permalink] [raw]
Subject: Re: Strange load issues with 2.5.69/70 in both -mm and -bk trees.

On Mon, 2003-06-02 at 03:36, Ingo Molnar wrote:
> On Sat, 31 May 2003, Andrew Morton wrote:
>
> > > [...] Upon looking a little further it
> > > seemed that the kernel was dynamically boosting the priority of the
> > > process much higher than it probably should be, in the end, not leaving
> > > enough CPU for playing the sounds without skipping.
> >
> > Yes, it seems that too many real-world applications are accidentally
> > triggering this problem.
>
> no, the problem is exactly the opposite. Here's the key observation:
>
> > the problem seemed to be caused by the fact that the pluginserver (wine)
> > was using 100% of the CPU. I simply reniced this process to -10 and
> > everything started working fine.
>
> the kernel has detected this process to be a CPU-hog - and indeed the
> traces and the above description all say that it really is a CPU hog.
>
> by renicing it to -10 it gets super-attention from the scheduler, so it
> can be a CPU hog _and_ create sound.

Sorry, this is my fault, I'm actually renicing the process to '10' not
'-10' that's a typo. I tested this again this morning to make sure.
I'm renicing this as a regular user, I don't think that a regular user
is allowed to renice to a negative value.

Later,
Tom


2003-06-02 14:18:07

by Tom Sightler

[permalink] [raw]
Subject: Re: Strange load issues with 2.5.69/70 in both -mm and -bk trees.

On Mon, 2003-06-02 at 03:39, Ingo Molnar wrote:
> On 1 Jun 2003, Tom Sightler wrote:
>
> > Yes, this is correct. It's showed as pluginserver in the 'ps ax' output
> > but I've since noticed that it is simply a symlink to wine. Of the two
> > wine processes, wine and wineserver, it was the wine frontend process
> > that was getting all of the CPU, showing 100% utilization. Renicing the
> > wine process made the problem go away.
> >
> > Running the exact same config on a 2.4.20 kernel uses only a few % of
> > the CPU.
>
> could you apply the attached patch to 2.5.70 and check whether wine still
> uses up 100% CPU time? This might be an artifact introduced by the
> different HZ values of 2.4 and 2.5.

This made no difference. I suppose it's important to note that this
problem is much easier to reproduce on pages which have multiple flash
objects. Actually, the http://www.disney.com home page (as pointed out by my
daughter) shows the problem better than any page I've found so far.

This page has at least two, very busy, flash objects, including menus
which popup and play sounds as you mouse around the page. This is the
one that is using nearly 100% of the CPU.

Most other pages, even fairly busy ones, don't seem to make wine use
this much CPU, although there are other pages which show the problem,
just much less.

I'm almost positive that wine doesn't consume that much CPU under 2.4,
but I'm off to run some tests to prove or disprove that right now.

Later,
Tom


2003-06-02 15:01:24

by Ingo Molnar

[permalink] [raw]
Subject: Re: Strange load issues with 2.5.69/70 in both -mm and -bk trees.


On 2 Jun 2003, Tom Sightler wrote:

> Sorry, this is my fault, I'm actually renicing the process to '10' not
> '-10' that's a typo. I tested this again this morning to make sure.
> I'm renicing this as a regular user, I don't think that a regular user
> is allowed to renice to a negative value.

hm. Which process is generating the sound? But yes, if a positive renicing
for the wine process solved the audio problem then this is bad.

Ingo

2003-06-02 15:11:29

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Strange load issues with 2.5.69/70 in both -mm and -bk trees.

On Mon, 2003-06-02 at 17:14, Ingo Molnar wrote:
> On 2 Jun 2003, Tom Sightler wrote:
>
> > Sorry, this is my fault, I'm actually renicing the process to '10' not
> > '-10' that's a typo. I tested this again this morning to make sure.
> > I'm renicing this as a regular user, I don't think that a regular user
> > is allowed to renice to a negative value.
>
> hm. Which process is generating the sound? But yes, if a positive renicing
> for the wine process solved the audio problem then this is bad.

given that audio mixing also happens in userspace it doesn't sound that
weird..... niceing wine gives the userspace sound mixer more cpu time :)


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2003-06-02 15:13:03

by Ingo Molnar

[permalink] [raw]
Subject: Re: Strange load issues with 2.5.69/70 in both -mm and -bk trees.


On 2 Jun 2003, Arjan van de Ven wrote:

> given that audio mixing also happens in userspace it doesn't sound that
> weird..... niceing wine gives the userspace sound mixer more cpu time :)

well, this depends on the circumstances. Normally the mixing shouldnt take
all that much CPU time, and thus the audio server thread should in theory
be quite interactive.

Ingo

2003-06-02 15:35:40

by Tom Sightler

[permalink] [raw]
Subject: Re: Strange load issues with 2.5.69/70 in both -mm and -bk trees.

On Mon, 2003-06-02 at 11:24, Tom Sightler wrote:
> I'm not sure why it's worse
> under 2.5 though, I still wonder if maybe it's because it's getting a
> priority boost, it almost seems it should get a penalty for being a CPU
> hog as Ingo pointed out. I can easily fix this in userspace so maybe
> this is a non-issue. If so, I apologize for bringing it to the list.

In trying to figure out why this might be worse under 2.5 I took some
simple vmstat numbers under 2.4 and 2.5, this biggest difference is the
number of context switches. Under 2.4, with the page loaded, but
otherwise idle, the system averages around 700/sec, and when I mouse
around the page I get 2000-3000/sec.

However, under 2.5, as I reported previously, I get 2000/sec all the
time, and 3000-4000 as I mouse around the page.

Would this be expected behavior? Does 2.5 do something that would cause
more context switches that 2.4? I have no idea if this would have any
impact at all, but it was the only difference I could observe in my
fairly simple testing of the two kernels.

Later,
Tom


2003-06-02 15:40:37

by William Lee Irwin III

[permalink] [raw]
Subject: Re: Strange load issues with 2.5.69/70 in both -mm and -bk trees.

On Mon, Jun 02, 2003 at 11:47:56AM -0400, Tom Sightler wrote:
> In trying to figure out why this might be worse under 2.5 I took some
> simple vmstat numbers under 2.4 and 2.5, this biggest difference is the
> number of context switches. Under 2.4, with the page loaded, but
> otherwise idle, the system averages around 700/sec, and when I mouse
> around the page I get 2000-3000/sec.
> However, under 2.5, as I reported previously, I get 2000/sec all the
> time, and 3000-4000 as I mouse around the page.
> Would this be expected behavior? Does 2.5 do something that would cause
> more context switches that 2.4? I have no idea if this would have any
> impact at all, but it was the only difference I could observe in my
> fairly simple testing of the two kernels.

A quick patch to register profile hit counts for codepaths calling
schedule() and/or yield() appears to be in order for such occasions.


-- wli

2003-06-02 16:01:05

by Tom Sightler

[permalink] [raw]
Subject: Re: Strange load issues with 2.5.69/70 in both -mm and -bk trees.

On Mon, 2003-06-02 at 10:30, Tom Sightler wrote:
> I'm almost positive that wine doesn't consume that much CPU under 2.4,
> but I'm off to run some tests to prove or disprove that right now.

Well, I'm going to have to eat these words. The problem with wine does
seem to show up with 2.4 as well, although not as bad, and seems
slightly harder to trigger. That http://www.disney.com page does indeed show
the problem on both 2.4 and 2.5 kernels. I'm not sure why it's worse
under 2.5 though, I still wonder if maybe it's because it's getting a
priority boost, it almost seems it should get a penalty for being a CPU
hog as Ingo pointed out. I can easily fix this in userspace so maybe
this is a non-issue. If so, I apologize for bringing it to the list.

The only other case that is obviously worse under 2.5 than with 2.4 is
VMware 4 (interestingly, VMware 3.2 seems exactly the opposite) however,
I believe that this may indeed be related more to the HZ change than
scheduling. I'm doing more testing on that now.

Later,
Tom




2003-06-02 17:15:42

by Ingo Molnar

[permalink] [raw]
Subject: Re: Strange load issues with 2.5.69/70 in both -mm and -bk trees.


On 2 Jun 2003, Tom Sightler wrote:

> I think this may be because wine uses a client/server model. There is
> the wine client which runs the actual applications, but they seem to
> share the core wineserver process which seems to be responsible for
> actually mixing and generating the sound output. Renicing the 'wine'
> (frontend) process give the 'wineserver' (backend) process more CPU time
> to actually get the sound out.

yes, this is an accurate description of the wineserver model.

to prove this point, could you try and renice wineserver to -10 (as root)
- does that fix the latency issues still?

(if this doesnt then it could be the foreground process starving yet
another process - we have to find out which one.)

Ingo

2003-06-02 17:13:19

by Tom Sightler

[permalink] [raw]
Subject: Re: Strange load issues with 2.5.69/70 in both -mm and -bk trees.

On Mon, 2003-06-02 at 11:25, Ingo Molnar wrote:
> On 2 Jun 2003, Arjan van de Ven wrote:
>
> > given that audio mixing also happens in userspace it doesn't sound that
> > weird..... niceing wine gives the userspace sound mixer more cpu time :)
>
> well, this depends on the circumstances. Normally the mixing shouldnt take
> all that much CPU time, and thus the audio server thread should in theory
> be quite interactive.

I think this may be because wine uses a client/server model. There is
the wine client which runs the actual applications, but they seem to
share the core wineserver process which seems to be responsible for
actually mixing and generating the sound output. Renicing the 'wine'
(frontend) process give the 'wineserver' (backend) process more CPU time
to actually get the sound out.

I don't know much about WINE, so all of that is only a guess as to how
it appears to be working to me. I guess I should go read up on it.

Later,
Tom


2003-06-02 19:15:05

by Tom Sightler

[permalink] [raw]
Subject: Re: Strange load issues with 2.5.69/70 in both -mm and -bk trees.

On Mon, 2003-06-02 at 13:28, Ingo Molnar wrote:
> to prove this point, could you try and renice wineserver to -10 (as root)
> - does that fix the latency issues still?
>
> (if this doesnt then it could be the foreground process starving yet
> another process - we have to find out which one.)

Yes, I thought the same thing, and I did just that, but no, it doesn't
fix the latency issue. This system has very little running, I made sure
that there were no sound servers such as esd or arts running, nothing.
Basically, a plain KDE (with artsd disabled), mozilla, and Crossover
wine plugin. Even though I couldn't see how it would affect anything I
tried bumping up the priorities of other processes such as mozilla
itself, X, etc. Nothing fixed the problem except for lowering the
priority of the wine process.

Could this process be starving the kernel itself so that it simply
doesn't have time to service the sound correctly?

Later,
Tom




2003-06-02 20:41:16

by Andreas Boman

[permalink] [raw]
Subject: Re: Strange load issues with 2.5.69/70 in both -mm and -bk trees.

On Mon, 2003-06-02 at 03:36, Ingo Molnar wrote:
> On Sat, 31 May 2003, Andrew Morton wrote:
>
> > > [...] Upon looking a little further it
> > > seemed that the kernel was dynamically boosting the priority of the
> > > process much higher than it probably should be, in the end, not leaving
> > > enough CPU for playing the sounds without skipping.
> >
> > Yes, it seems that too many real-world applications are accidentally
> > triggering this problem.

Not sure if what I'm seeing is the same thing, Without using wine, weird
mozilla plugins, or a userspace mixer/sound server (audigy with hardware
mixing).

During the first few seconds of playing a song in xmms sound will skip
if even just loading or refreshing a page in mozilla, though after the
song has been playing a few seconds it will stop skipping for the
duration of that song, until the next song in the playlist starts
playing. renicing -5 <pid of xmms>, or 5 <pid of mozilla> 'solves' the
issue. Ofcourse, if mozilla is the reniced process, xmms may still skip
when switching desktops, again that is much more likely to occur during
the first few seconds of playtime than later in the song. I cant recall
xmms ever skipping after it has been reniced to -5, even if I also have
a real cpu hog running (oggenc or similar).

Also, if nothing has been reniced and I swith desktops around like mad
for a while xmms will stop skipping even when switching to the next song
in the playlist, i suppose X/the wm/mozilla/etc has all been scheduled
off as cpu hogs by the exessive desktop switching..

I cant remember when this became really noticable, but it was probably
in the 2.5.68 timeframe. I am running 2.5.69-mm8+schedB0 at the moment
(software raid1 keeps me from going to 2.5.70-*)

a strace of mozilla shows that even when it is just sitting there
without a page loaded this stuff is just looping:

15:37:35.808520 ioctl(3, FIONREAD, [0]) = 0
15:37:35.808584 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN,
revents=POLLIN}], 2, -1) = 1
15:37:36.308127 gettimeofday({1054582656, 308165}, NULL) = 0
15:37:36.308311 read(4, "\372", 1) = 1
15:37:36.308396 write(3,
"F\4\5\0l\0\240\1\332\0\240\1\355\0)\0\1\0\20\0", 20) = 20

The below `vmstat 1` was started just before xmms -p. I then switched
desktops for a while until xmms stopped skipping.:

procs memory swap io
system cpu
r b w swpd free buff cache si so bi bo in cs us
sy id
2 0 0 10416 47560 18728 230860 2 1 68 1 66 58 6
2 11
0 0 0 10416 47424 18728 230860 0 0 0 0 1371 1185 12
11 77
0 0 0 10416 47424 18728 230860 0 0 0 33 1372 906 5
6 89
1 0 0 10416 47352 18728 230860 0 0 0 0 1413 1984 21
9 70
1 0 0 10416 47464 18728 230860 0 0 0 0 1927 1925 55
12 33
2 0 0 10416 47168 18728 230860 0 0 0 0 1185 1011 74
11 15
3 0 0 10416 44416 18728 230860 0 0 0 0 1182 1516 33
6 61
1 0 0 10416 47348 18728 230860 0 0 0 0 1171 1041 80
16 4
3 0 0 10416 45044 18728 230860 0 0 0 0 1195 1327 44
13 44
3 0 0 10416 47348 18728 230860 0 0 0 0 1173 1049 54
10 36
2 0 0 10416 46712 18728 230860 0 0 0 0 1186 1206 65
11 24
2 0 0 10416 30264 18728 230860 0 0 0 0 1190 1152 46
10 44
1 0 0 10416 47212 18728 230860 0 0 0 0 1168 1058 75
10 15
2 0 0 10416 47212 18728 230860 0 0 0 0 1184 1463 3
5 92
3 0 0 10416 36900 18728 231020 0 0 0 0 1183 1114 65
11 24
1 0 0 10416 47268 18728 230860 0 0 0 0 1176 1370 23
5 72
2 0 0 10416 24360 18728 230860 0 0 0 0 1192 1467 30
7 63
1 0 0 10416 47220 18728 230860 0 0 0 0 1189 1163 63
11 26
1 0 0 10416 47220 18728 230860 0 0 0 0 1184 1624 2
5 93
4 0 0 10416 43828 18728 230860 0 0 0 0 1193 1202 31
5 64
1 0 0 10416 47224 18728 230860 0 0 0 0 1173 1190 57
11 32
procs memory swap io
system cpu
r b w swpd free buff cache si so bi bo in cs us
sy id
3 0 0 10416 47224 18728 230860 0 0 0 0 1184 1513 1
1 98
3 0 0 10416 17620 18728 244360 0 0 0 0 1195 893 66
11 23
3 0 0 10416 37304 18728 230860 0 0 0 0 1194 1277 45
10 45
2 0 0 10416 47224 18728 230860 0 0 0 0 1189 1147 84
16 0
2 0 0 10416 42552 18728 230860 0 0 0 0 1194 1867 84
15 1
2 0 0 10416 47224 18728 230860 0 0 0 0 1485 2230 20
5 75
1 0 0 10416 47232 18728 230860 0 0 0 0 1325 1629 1
2 97
0 0 0 10416 47232 18728 230860 0 0 0 0 1373 1168 13
8 79
1 0 0 10416 47232 18728 230860 0 0 0 0 1163 551 1
2 97


> no, the problem is exactly the opposite. Here's the key observation:
>
> > the problem seemed to be caused by the fact that the pluginserver (wine)
> > was using 100% of the CPU. I simply reniced this process to -10 and
> > everything started working fine.
>
> the kernel has detected this process to be a CPU-hog - and indeed the
> traces and the above description all say that it really is a CPU hog.

What I am seeing happens if the box has a load avg of 0.1 or 2.0, in a
pretty much identical fasion.

Andreas

2003-06-02 22:33:37

by Rob Landley

[permalink] [raw]
Subject: Re: Strange load issues with 2.5.69/70 in both -mm and -bk trees.

On Monday 02 June 2003 15:27, Tom Sightler wrote:
> On Mon, 2003-06-02 at 13:28, Ingo Molnar wrote:
> > to prove this point, could you try and renice wineserver to -10 (as root)
> > - does that fix the latency issues still?
> >
> > (if this doesnt then it could be the foreground process starving yet
> > another process - we have to find out which one.)
>
> Yes, I thought the same thing, and I did just that, but no, it doesn't
> fix the latency issue. This system has very little running, I made sure
> that there were no sound servers such as esd or arts running, nothing.
> Basically, a plain KDE (with artsd disabled), mozilla, and Crossover
> wine plugin. Even though I couldn't see how it would affect anything I
> tried bumping up the priorities of other processes such as mozilla
> itself, X, etc. Nothing fixed the problem except for lowering the
> priority of the wine process.

Back around March there was a discussion of sharing interactivity bonus with
the server an interactive process was waiting for. It was mostly about
XFree86 not getting batch scheduled and making mouse movement unusable so
easily, but this sounds eerily similar...

In this case, it seems like the wine client either isn't accumulating an
interactivity bonus (busy-waiting?), or else it's not transmitting it to the
wine server (going through the network stack)?

I've been a bit out of touch since then (old ISP blew up, then i got busy).
Just resurfacing now. Maybe it's old news, but assuming the patch I'm
thinking of wasn't backed out while I was away, it may be relevant. The
thread about it started here:

http://lists.insecure.org/lists/linux-kernel/2003/Mar/1244.html

> Could this process be starving the kernel itself so that it simply
> doesn't have time to service the sound correctly?

Unlikely. Interrupts don't depend on the scheduler. (Neither did bottom
halves or tasklets. I don't think work queues do either, but I'm a bit
behind...)

Rob

2003-06-02 22:57:06

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: Strange load issues with 2.5.69/70 in both -mm and -bk trees.

On Mon, 02 Jun 2003 16:53:48 EDT, Andreas Boman said:
> playing. renicing -5 <pid of xmms>, or 5 <pid of mozilla> 'solves' the
> issue. Ofcourse, if mozilla is the reniced process, xmms may still skip
> when switching desktops, again that is much more likely to occur during

Just as another datapoint, my Dell Latitude C840 would do the 'xmms skips
on desktop switch' fairly regularly under 2.5.69 and 2.5.70. It's gotten
a lot more rock-solid in 2.5.70-mm3.


Attachments:
(No filename) (226.00 B)

2003-06-04 07:56:14

by Mike Galbraith

[permalink] [raw]
Subject: Re: Strange load issues with 2.5.69/70 in both -mm and -bk trees.

At 03:27 PM 6/2/2003 -0400, Tom Sightler wrote:
>On Mon, 2003-06-02 at 13:28, Ingo Molnar wrote:
> > to prove this point, could you try and renice wineserver to -10 (as root)
> > - does that fix the latency issues still?
> >
> > (if this doesnt then it could be the foreground process starving yet
> > another process - we have to find out which one.)
>
>Yes, I thought the same thing, and I did just that, but no, it doesn't
>fix the latency issue. This system has very little running, I made sure
>that there were no sound servers such as esd or arts running, nothing.
>Basically, a plain KDE (with artsd disabled), mozilla, and Crossover
>wine plugin. Even though I couldn't see how it would affect anything I
>tried bumping up the priorities of other processes such as mozilla
>itself, X, etc. Nothing fixed the problem except for lowering the
>priority of the wine process.

Feel like trying something else for grins? If it's thud.c type starvation
you're seeing, the attached club should beat it into submission.

-Mike


Attachments:
xx.diff (1.13 kB)

2003-06-04 14:56:25

by Tom Sightler

[permalink] [raw]
Subject: Re: Strange load issues with 2.5.69/70 in both -mm and -bk trees.


> >Yes, I thought the same thing, and I did just that, but no, it doesn't
> >fix the latency issue. This system has very little running, I made sure
> >that there were no sound servers such as esd or arts running, nothing.
> >Basically, a plain KDE (with artsd disabled), mozilla, and Crossover
> >wine plugin. Even though I couldn't see how it would affect anything I
> >tried bumping up the priorities of other processes such as mozilla
> >itself, X, etc. Nothing fixed the problem except for lowering the
> >priority of the wine process.
>
> Feel like trying something else for grins? If it's thud.c type starvation
> you're seeing, the attached club should beat it into submission.

I gave this a try this morning and it still doesn't seem to solve my
issue. I have no idea what is going on with this particular scenario.
For now I've fixed it with a simple wrapper script that start up wine
with a '15' nice level.

I did do some playing, it seems the problem mostly goes away right
around nice level 8, before that I seen no noticable difference, after
that it seems completely gone. Would there be anything special about
that range?

I do have one, probably wildly incorrect theory. Most of the problems
I'm seeing seem to revolve around issues when there is a fairly CPU and
graphics intensive application running. In this case flash has lots of
glitzy stuff happening, interactive menus popping up using lots of
graphics and sound, etc., while in the meantime wine is using lots of
CPU to keep these things all working. It almost seems that it's the
combination of the two of them that leave too little time for sound to
be played correctly.

As a test of this idea I simply reniced the X server to 19 and the
problem did get a LOT better, although it did not go completely away. I
could make the problem go completely away with the X server niced at 19
and wine niced at 5. With X at it's normal 0 nice level I had to renice
wine to 8 before the problem was corrected.

This seems to match up with the issue that some people have noted that
their XMMS skips during virtual desktop switched, etc.

I'm not sure if any of that helps anything really, or if there is really
a "correct" fix for this, or even if this behavior would be considered
broken.

I've corrected it for now with some simple wrapper scripts to set nice
levels on the offending processes. So far this works great.

I'll gladly test any other patches, suggestions.

Thanks,
Tom


2003-06-04 14:59:27

by Ingo Molnar

[permalink] [raw]
Subject: Re: Strange load issues with 2.5.69/70 in both -mm and -bk trees.


a question - which process in your system is responsible for the sound
output?

Ingo

2003-06-04 15:48:59

by Tom Sightler

[permalink] [raw]
Subject: Re: Strange load issues with 2.5.69/70 in both -mm and -bk trees.

On Wed, 2003-06-04 at 11:12, Ingo Molnar wrote:
> a question - which process in your system is responsible for the sound
> output?

In this simple test scenario it is wine itself (the actual process which
I am renicing). I verified this with a 'lsof' while sound was playing.

This is why I keep saying that it almost seems as if it is the kernel
itself that is being starved (perhaps starved is the wrong word here).

If the environment is a simple as:

pluginserver-->/dev/dsp-->hardware

and renicing the pluginserver process to a lower priority is what makes
the sound stop skipping, then what else can it be?

I am using ALSA with OSS emulation, but I did try the OSS driver as
well. My primary testing machine is a laptop with a Maestro3 sound card
which is known to be buggy sometimes (although I've actually had very
good success with it, especially with ALSA), but I have reproduced this
issue with my desktop at home which is an Athlon 950Mhz with a
Soundblaster live, although it is less noticable.

It's been suggested I may be tripping some type of hardware quirk here,
perhaps a shared interrupt issue, or simple PCI bandwidth or latency. I
suppose this is possible and I'm looking at these kinds of tweaks, but I
think they are unlikely as why would renicing a userspace process make
this type of problem go away?

Later,
Tom

2003-06-04 15:47:32

by Mike Galbraith

[permalink] [raw]
Subject: Re: Strange load issues with 2.5.69/70 in both -mm and -bk trees.

At 11:08 AM 6/4/2003 -0400, Tom Sightler wrote:

> > >Yes, I thought the same thing, and I did just that, but no, it doesn't
> > >fix the latency issue. This system has very little running, I made sure
> > >that there were no sound servers such as esd or arts running, nothing.
> > >Basically, a plain KDE (with artsd disabled), mozilla, and Crossover
> > >wine plugin. Even though I couldn't see how it would affect anything I
> > >tried bumping up the priorities of other processes such as mozilla
> > >itself, X, etc. Nothing fixed the problem except for lowering the
> > >priority of the wine process.
> >
> > Feel like trying something else for grins? If it's thud.c type starvation
> > you're seeing, the attached club should beat it into submission.
>
>I gave this a try this morning and it still doesn't seem to solve my
>issue. I have no idea what is going on with this particular scenario.
>For now I've fixed it with a simple wrapper script that start up wine
>with a '15' nice level.

Ok, at least we know what it's not. (_slightly_ better than complete unknown;)

>I did do some playing, it seems the problem mostly goes away right
>around nice level 8, before that I seen no noticable difference, after
>that it seems completely gone. Would there be anything special about
>that range?

A queue's a queue. What likely matters is what's above there.

>I do have one, probably wildly incorrect theory. Most of the problems
>I'm seeing seem to revolve around issues when there is a fairly CPU and
>graphics intensive application running. In this case flash has lots of
>glitzy stuff happening, interactive menus popping up using lots of
>graphics and sound, etc., while in the meantime wine is using lots of
>CPU to keep these things all working. It almost seems that it's the
>combination of the two of them that leave too little time for sound to
>be played correctly.
>
>As a test of this idea I simply reniced the X server to 19 and the
>problem did get a LOT better, although it did not go completely away. I
>could make the problem go completely away with the X server niced at 19
>and wine niced at 5. With X at it's normal 0 nice level I had to renice
>wine to 8 before the problem was corrected.

Can you send me (offline) some top output while it's bust and working to
ponder?

>This seems to match up with the issue that some people have noted that
>their XMMS skips during virtual desktop switched, etc.

Hmm. I thought the TIMESLICE_GRANULARITY change that's in mm fixed those.

>I'm not sure if any of that helps anything really, or if there is really
>a "correct" fix for this, or even if this behavior would be considered
>broken.
>
>I've corrected it for now with some simple wrapper scripts to set nice
>levels on the offending processes. So far this works great.

That's good, but you shouldn't need to.

> I'll gladly test any other patches, suggestions.

Good. I do have another rock I'd like to throw at it, but I need to play
with the idea some more first. I'll drop you a line privately if I can
convince it to work as intended.

-Mike