2003-05-23 12:48:30

by Christian Klose

[permalink] [raw]
Subject: I/O problems in 2.4.19/2.4.20/2.4.21-rc3

Hello all :-)

I have a problem since Linux Kernel 2.4.19. Copying huge amount of data gives
me pauses where pauses are disk io pauses, keyboard does not accept input and
mouse won't move. This depends, sometimes those pauses are 1 to 2 seconds,
sometimes even more up to 15 seconds where I can not do anything with my
linux but waiting :-(

For the past weeks I've searched google alot and found almost the same reports
there but with imho no real fixes. I've tried many different kernel patches
I've found while searching google (2.4-aa, 2.4-ck, 2.4-wolk, 2.4-rmap) but
none of them fixes the problem I've experienced. The kernel patches 2.4-ck
and 2.4-wolk are very good, those pauses are almost gone, but also the
throughput is horribly decreased. Yesterday, mcp from #kernelnewbies told me,
that this is the decrease of nr_requests to 32 (or maybe 4? I don't remember
exactly). I've also tried 2.4-aa patch because I've read about his lowlatency
elevator which should fix these pauses. Unfortunately the pauses are still
there and also a decrease in throughput :-(

I've switched my desktop machine back from 2.4.20 to 2.4.18 because these
pauses are really annoying me. I wonder what changes were made to 2.4.19
causing these pauses. Please don't get me wrong but it seems so that the Linux
Kernel is not ready for desktop yet, and I even wonder about guaranteed io
throughput for serverusage (please read down below)

This is also not a problem of my hardware. I've tried the same scenario on
almost 20 different machines in my company, starting with a small 500mhz cpu
and udma 100 intel controller up to a Pentium 4 with 2,4 GHz with an Adaptec
U160 scsi controller and u160 scsi disks with software raid-0 and raid-1.

Carl-Daniel Hailfinger has been very helpfull yesterday on #kernelnewbies
trying to track this behaviour down. He advised me to use SysRq-T and to
press this key combination while there are io pauses. Maybe he will find out
what's going on there :-)



Beside that, I've also noticed that there is no guaranteed io throughput while
copying data in 2.4.18 up to 2.4.21-rc3. My machine has 512MB of memory and
512MB swap. Right after bootup of linux, there is guaranteed io throughput
until the memory is almost completely used (with buffers or cache? ...
/proc/meminfo tells me so) and linux starts to swap. After this, copying data
starts up with 30mb per second and goes down real fast to 1mb per second and
even more worse down to ~250kb per second, goes up to 10mb per second and so
on, so this varies alot.


Anyway, I've also read about kernel 2.5 and that this kernel should fix all of
the above I've mentioned. So by reading all these great oppinions about
kernel 2.5 I've tried it out last week and I just have to say that I cannot
see any advantages, at least not for these 2 cases I've mentioned :-(((

Is it just me or are there many others noticing this too?


Please excuse my bad english but I hope everyone understands me :-)


PS: Should I CC this to Marcello Tosatti and Linus Torvalds too? I haven't
done this yet but maybe it may help because both are the maintainers of 2.4/
2.5 (at least that's what I've found in google). Sorry, I am using Linux
since ~ 1 1/2 years now and my knowledge about the Linux Kernel is not that
big.


Thank you so much and have a nice weekend :-)

bye, Chris



2003-05-23 13:37:55

by Christian Klose

[permalink] [raw]
Subject: Re: I/O problems in 2.4.19/2.4.20/2.4.21-rc3

On Friday 23 May 2003 15:00, Christian Klose wrote:

Hello all again :-)

> I have a problem since Linux Kernel 2.4.19. Copying huge amount of data
> gives me pauses where pauses are disk io pauses, keyboard does not accept
> input and mouse won't move. This depends, sometimes those pauses are 1 to 2
> seconds, sometimes even more up to 15 seconds where I can not do anything
> with my linux but waiting :-(
I forgot to mention that this is filesystem independant. ext2, ext3, reiserfs;
always same problem.

bye, Chris

2003-05-23 14:23:13

by Marc-Christian Petersen

[permalink] [raw]
Subject: Re: I/O problems in 2.4.19/2.4.20/2.4.21-rc3

On Friday 23 May 2003 15:46, Christian Klose wrote:

Moin Christian,

> > I have a problem since Linux Kernel 2.4.19. Copying huge amount of data
> > gives me pauses where pauses are disk io pauses, keyboard does not accept
> > input and mouse won't move. This depends, sometimes those pauses are 1 to
> > 2 seconds, sometimes even more up to 15 seconds where I can not do
> > anything with my linux but waiting :-(
> I forgot to mention that this is filesystem independant. ext2, ext3,
> reiserfs; always same problem.
it _seems_ the offending diff is the attached one. Went in into .19-pre5.

More to come.

ciao, Marc


Attachments:
(No filename) (620.00 B)
pre4-to-pre5-ll_rw_blk.c (4.75 kB)
Download all attachments

2003-05-24 14:07:07

by Marc-Christian Petersen

[permalink] [raw]
Subject: Re: I/O problems in 2.4.19/2.4.20/2.4.21-rc3

On Friday 23 May 2003 15:46, Christian Klose wrote:

Hi Christian,

> > I have a problem since Linux Kernel 2.4.19. Copying huge amount of data
> > gives me pauses where pauses are disk io pauses, keyboard does not accept
> > input and mouse won't move. This depends, sometimes those pauses are 1 to
> > 2 seconds, sometimes even more up to 15 seconds where I can not do
> > anything with my linux but waiting :-(
> I forgot to mention that this is filesystem independant. ext2, ext3,
> reiserfs; always same problem.
You've mentioned that you also tried 2.5 and you don't see any difference
there. I am very interrested to hear what version of 2.5 you tried.

I am using 2.5.69 + some patches and it does not suck or behave brain dead
like 2.4 does. For me I've quit with 2.4 from now on.

2.5 still has many things to do / to fix but 2.5 is _active_ developed and
fixes _are_ going into 2.5 where 2.4 does not accept real bug fixes.

If you want to give 2.5.69 a try, please apply the attached patch if you want
to use it for your desktop. It gives you more interactivity.

ciao, Marc


Attachments:
(No filename) (1.06 kB)
sched-interactivity.patch (368.00 B)
Download all attachments

2003-05-24 14:15:13

by William Lee Irwin III

[permalink] [raw]
Subject: Re: I/O problems in 2.4.19/2.4.20/2.4.21-rc3

On Sat, May 24, 2003 at 04:19:40PM +0200, Marc-Christian Petersen wrote:
> --- old/kernel/sched.c 2003-05-24 14:45:57.000000000 +0200
> +++ 2.5-mcp/kernel/sched.c 2003-05-24 16:18:42.000000000 +0200
> @@ -65,7 +65,7 @@
> * they expire.
> */
> #define MIN_TIMESLICE ( 10 * HZ / 1000)
> -#define MAX_TIMESLICE (200 * HZ / 1000)
> +#define MAX_TIMESLICE ( 10 * HZ / 1000)
> #define CHILD_PENALTY 50
> #define PARENT_PENALTY 100
> #define EXIT_WEIGHT 3


This looks highly suspicious as it essentially removes dynamic timeslice
sizing. If this fixes something, then dynamic timeslice heuristics are
going wrong somewhere that should be properly described and handled, not
this kind of shenanigan.


-- wli

2003-05-25 00:30:58

by Christian Klose

[permalink] [raw]
Subject: Re: I/O problems in 2.4.19/2.4.20/2.4.21-rc3

On Saturday 24 May 2003 16:28, William Lee Irwin III wrote:

Hi wli,

> > --- old/kernel/sched.c 2003-05-24 14:45:57.000000000 +0200
> > +++ 2.5-mcp/kernel/sched.c 2003-05-24 16:18:42.000000000 +0200
> > @@ -65,7 +65,7 @@
> > * they expire.
> > */
> > #define MIN_TIMESLICE ( 10 * HZ / 1000)
> > -#define MAX_TIMESLICE (200 * HZ / 1000)
> > +#define MAX_TIMESLICE ( 10 * HZ / 1000)
> > #define CHILD_PENALTY 50
> > #define PARENT_PENALTY 100
> > #define EXIT_WEIGHT 3

> This looks highly suspicious as it essentially removes dynamic timeslice
> sizing. If this fixes something, then dynamic timeslice heuristics are
> going wrong somewhere that should be properly described and handled, not
> this kind of shenanigan.
I somewhat agree with you but this "properly described" are all the bug
reports on lkml containing "bad interactivity in 2.5, cpu starving in 2.5"
and such...

This isn't a shenanigan, at least not for the interactivity for a desktop.
This is a workaround for users who are complaining about bad interactivity in
2.5!

ciao, Marc


2003-05-25 00:34:40

by Marc-Christian Petersen

[permalink] [raw]
Subject: Re: I/O problems in 2.4.19/2.4.20/2.4.21-rc3

Hi Christian,

> I somewhat agree with you but this "properly described" are all the bug
> reports on lkml containing "bad interactivity in 2.5, cpu starving in 2.5"
> and such...
>
> This isn't a shenanigan, at least not for the interactivity for a desktop.
> This is a workaround for users who are complaining about bad interactivity
> in 2.5!
>
> ciao, Marc
err, could you please tell me why you are writing "ciao, Marc" like I do in my
emails if you are pretending beeing Christian Klose??

2003-05-25 01:12:44

by Con Kolivas

[permalink] [raw]
Subject: Re: I/O problems in 2.4.19/2.4.20/2.4.21-rc3

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sun, 25 May 2003 10:43, Christian Klose wrote:
> On Saturday 24 May 2003 16:28, William Lee Irwin III wrote:
>
> Hi wli,
>
> > > --- old/kernel/sched.c 2003-05-24 14:45:57.000000000 +0200
> > > +++ 2.5-mcp/kernel/sched.c 2003-05-24 16:18:42.000000000 +0200
> > > @@ -65,7 +65,7 @@
> > > * they expire.
> > > */
> > > #define MIN_TIMESLICE ( 10 * HZ / 1000)
> > > -#define MAX_TIMESLICE (200 * HZ / 1000)
> > > +#define MAX_TIMESLICE ( 10 * HZ / 1000)
> > > #define CHILD_PENALTY 50
> > > #define PARENT_PENALTY 100
> > > #define EXIT_WEIGHT 3
> >
> > This looks highly suspicious as it essentially removes dynamic timeslice
> > sizing. If this fixes something, then dynamic timeslice heuristics are
> > going wrong somewhere that should be properly described and handled, not
> > this kind of shenanigan.
>
> I somewhat agree with you but this "properly described" are all the bug
> reports on lkml containing "bad interactivity in 2.5, cpu starving in 2.5"
> and such...
>
> This isn't a shenanigan, at least not for the interactivity for a desktop.
> This is a workaround for users who are complaining about bad interactivity
> in 2.5!
>
> ciao, Marc

Even though you're not Marc I do agree with you. The problem is well described
as either poor interactivity (the window wiggle test) or starvation in the
presence of certain scheduler hogs (for whatever reason) since the
interactivity patch from mingo. Dropping the max timeslice is a bandaid but
destroys priority based timeslice scheduling. Dropping the min timeslice will
bring this back, but at some point the timeslice will be so low that low
priority cpu intensive tasks will spend most of their time cache trashing.

A reasonable compromise for the desktop would be
min 5
max 25
but some granularity will be lost in the different sizes of timeslices at
different priorities.

is there any point having longer timeslices than this?

Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+0Bv6F6dfvkL3i1gRAhpfAKCG3fjkK02lYbAs1p3978rSL/PYAQCcCeK7
gHqR6bgrITE3CSjKCqntw+g=
=rq1o
-----END PGP SIGNATURE-----

2003-05-25 04:15:10

by William Lee Irwin III

[permalink] [raw]
Subject: Re: I/O problems in 2.4.19/2.4.20/2.4.21-rc3

On Sun, May 25, 2003 at 11:27:20AM +1000, Con Kolivas wrote:
> Even though you're not Marc I do agree with you. The problem is well
> described as either poor interactivity (the window wiggle test) or
> starvation in the presence of certain scheduler hogs (for whatever
> reason) since the interactivity patch from mingo. Dropping the max
> timeslice is a bandaid but destroys priority based timeslice
> scheduling. Dropping the min timeslice will bring this back, but at
> some point the timeslice will be so low that low priority cpu
> intensive tasks will spend most of their time cache trashing.

The fact that it's a "bandaid" and that it "destroys priority-based
timeslice scheduling" makes it a shenanigan. If you're having problems
solved by capping timeslices, you have someone's timeslice and/or
priority growing too large for some reason.

It'd be far better to help figure out what went wrong.


-- wli

2003-05-25 04:23:02

by Con Kolivas

[permalink] [raw]
Subject: Re: I/O problems in 2.4.19/2.4.20/2.4.21-rc3

On Sun, 25 May 2003 14:28, William Lee Irwin III wrote:
> On Sun, May 25, 2003 at 11:27:20AM +1000, Con Kolivas wrote:
> > Even though you're not Marc I do agree with you. The problem is well
> > described as either poor interactivity (the window wiggle test) or
> > starvation in the presence of certain scheduler hogs (for whatever
> > reason) since the interactivity patch from mingo. Dropping the max
> > timeslice is a bandaid but destroys priority based timeslice
> > scheduling. Dropping the min timeslice will bring this back, but at
> > some point the timeslice will be so low that low priority cpu
> > intensive tasks will spend most of their time cache trashing.
>
> The fact that it's a "bandaid" and that it "destroys priority-based
> timeslice scheduling" makes it a shenanigan. If you're having problems

I don't disagree it's a shenanigan.

> solved by capping timeslices, you have someone's timeslice and/or
> priority growing too large for some reason.

So there is a benefit to timeslices being as large as 200ms? I'll take your
word for it.

> It'd be far better to help figure out what went wrong.

Love to help. No idea where to begin. All we can do is report what helps the
symptoms and hope those in the know can decipher it from that.

Con