On a 768 MB SMP box (2x466 MHz Celeron), I see some weird problems with
interactive performance on 2.4.15pre{1,2}. A good example of this is the
following scenario:
- copy a large file (eg. an iso image file) to a directory on the same
(reiserfs in this case) filesystem, or...
- do a filesystem comparison between a CD and the original file (with cmp
/mnt/cdrom/<filename> /mnt/reiserfs/1/data/<original_file_location>, using a
PLEXTOR Model: CD-ROM PX-40TS SCSI CD-ROM drive),
- and THEN (while the copy or comparison runs) try any simple command (like
'ls /mnt/reiserfs/1/data' or 'top' or anything else...).
Response time is abysmal, a simple 'ls /some/dir' takes tens of seconds to
start. Once the command is running, performance is normal. Try this when a
cdrecord session is running and you'll get a buffer underrun.
The box has 768 MB of RAM, 512 MB of swap. There is no significant load on the
system (according to an already running copy of top) neither before nor during
the test. Try tab-completing a command in a terminal, and that terminal freezes
for tens of seconds, usually until after the file system load has gone down.
In a few words, heavy filesystem activity seems to wreak havoc on the system.
Not by loading the CPU (it hardly breaks out a sweat at 178% idle (SMP...)).
Turning off swap (swapoff -a) does not change the observed behaviour.
Anyone else seen something like this?
--
WWWWW _______________________
## o o\ / Frank de Lange \
}# \| / \
##---# _/ <Hacker for Hire> \
#### \ +31-320-252965 /
\ [email protected] /
-------------------------
[ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est." ]
On Mon, 12 Nov 2001, Frank de Lange wrote:
> On a 768 MB SMP box (2x466 MHz Celeron), I see some weird problems with
> interactive performance on 2.4.15pre{1,2}. A good example of this is the
> following scenario:
>
> - copy a large file (eg. an iso image file) to a directory on the same
> (reiserfs in this case) filesystem, or...
> - do a filesystem comparison between a CD and the original file (with cmp
> /mnt/cdrom/<filename> /mnt/reiserfs/1/data/<original_file_location>, using a
> PLEXTOR Model: CD-ROM PX-40TS SCSI CD-ROM drive),
>
> - and THEN (while the copy or comparison runs) try any simple command (like
> 'ls /mnt/reiserfs/1/data' or 'top' or anything else...).
>
> Response time is abysmal, a simple 'ls /some/dir' takes tens of seconds to
> start. Once the command is running, performance is normal. Try this when a
> cdrecord session is running and you'll get a buffer underrun.
>
> The box has 768 MB of RAM, 512 MB of swap. There is no significant load on the
> system (according to an already running copy of top) neither before nor during
> the test. Try tab-completing a command in a terminal, and that terminal freezes
> for tens of seconds, usually until after the file system load has gone down.
>
> In a few words, heavy filesystem activity seems to wreak havoc on the system.
> Not by loading the CPU (it hardly breaks out a sweat at 178% idle (SMP...)).
>
> Turning off swap (swapoff -a) does not change the observed behaviour.
>
> Anyone else seen something like this?
I have 2.4.10-ac12 here and reiserfs filesystems and I have to say that
performance is terrible when doing anything diskintensive. It seems like
diskscheduling is very broken for my IDE disk, or it's a reiserfs
problem. maybe it pushes a _lot_ into the diskscheduling at once?
I've heard that read-ahead for IDE has been broken for a while but's fixed
in -ac and in Andre Hedrick's IDE-patches.
I'm going to upgrade to a more recent kernel and see how it behaves.
/Martin
Never argue with an idiot. They drag you down to their level, then beat you with experience.
Frank de Lange wrote:
>
> On a 768 MB SMP box (2x466 MHz Celeron), I see some weird problems with
> interactive performance on 2.4.15pre{1,2}. A good example of this is the
> following scenario:
>
> - copy a large file (eg. an iso image file) to a directory on the same
> (reiserfs in this case) filesystem, or...
> - do a filesystem comparison between a CD and the original file (with cmp
> /mnt/cdrom/<filename> /mnt/reiserfs/1/data/<original_file_location>, using a
> PLEXTOR Model: CD-ROM PX-40TS SCSI CD-ROM drive),
>
> - and THEN (while the copy or comparison runs) try any simple command (like
> 'ls /mnt/reiserfs/1/data' or 'top' or anything else...).
>
> Response time is abysmal, a simple 'ls /some/dir' takes tens of seconds to
> start. Once the command is running, performance is normal. Try this when a
> cdrecord session is running and you'll get a buffer underrun.
Can you try 2.4.13ac6 (not 7/8), and 2.2.20, and post a comparison?
--
Jeff Garzik | Only so many songs can be sung
Building 1024 | with two lips, two lungs, and one tongue.
MandrakeSoft | - nomeansno
I have tried both 2.4.13ac6 and 2.4.15-pre2, and am getting
the same behaviour in 2.4.15-pre2, but not in the ac kernel.
mike
--
My Operat~1 System supports long filena~1, does yours?
On Mon, 12 Nov 2001, Jeff Garzik wrote:
> Frank de Lange wrote:
> >
> > On a 768 MB SMP box (2x466 MHz Celeron), I see some weird problems with
> > interactive performance on 2.4.15pre{1,2}. A good example of this is the
> > following scenario:
> >
> > - copy a large file (eg. an iso image file) to a directory on the same
> > (reiserfs in this case) filesystem, or...
> > - do a filesystem comparison between a CD and the original file (with cmp
> > /mnt/cdrom/<filename> /mnt/reiserfs/1/data/<original_file_location>, using a
> > PLEXTOR Model: CD-ROM PX-40TS SCSI CD-ROM drive),
> >
> > - and THEN (while the copy or comparison runs) try any simple command (like
> > 'ls /mnt/reiserfs/1/data' or 'top' or anything else...).
> >
> > Response time is abysmal, a simple 'ls /some/dir' takes tens of seconds to
> > start. Once the command is running, performance is normal. Try this when a
> > cdrecord session is running and you'll get a buffer underrun.
>
> Can you try 2.4.13ac6 (not 7/8), and 2.2.20, and post a comparison?
>
> --
> Jeff Garzik | Only so many songs can be sung
> Building 1024 | with two lips, two lungs, and one tongue.
> MandrakeSoft | - nomeansno
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
Also sprach Jeff Garzik:
>Can you try 2.4.13ac6 (not 7/8), and 2.2.20, and post a comparison?
Well, as the system is partly reiserfs-based, 2.2.20 is a bit difficult. I've
run the -ac series before trying the 'linus' kernels, and noticed similar
performance problems (but not as severe). I don't have any hard numbers yet,
will try to get some if at all possible (it would be like comparing apples to
reiserfs-oranges).
[ given the nature of the problem - weird delays during file system activity -
those numbers will be more or less meaningless. This is not a matter of
'better' performance on a given hardware platform, but one of 'abysmal'
performance versus 'normal' performance. It doesn't really matter how long
the observed delays are, the problem lies in the fact that those delays are
there in the first place... ]
--
WWWWW _______________________
## o o\ / Frank de Lange \
}# \| / \
##---# _/ <Hacker for Hire> \
#### \ +31-320-252965 /
\ [email protected] /
-------------------------
[ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est." ]
On Mon, Nov 12, 2001 at 03:05:56PM -0500, Jeff Garzik wrote:
> Can you try 2.4.13ac6 (not 7/8), and 2.2.20, and post a comparison?
Here's the results from some tests I did:
2.2.20
======
without filesystem activity
no slowdowns observed
time ls -al /usr/|sort -k 5 -n
real 0m0.121s
user 0m0.000s
sys 0m0.090s
with filesystem activity on ext2
no slowdowns observed
time ls -al /opt/|sort -k 5 -n
real 0m0.079s
user 0m0.010s
sys 0m0.100s
2.4.13-ac5
==========
no slowdowns observed
without filesystem activity
time ls -al /usr/|sort -k 5 -n
real 0m0.142s
user 0m0.000s
sys 0m0.000s
with filesystem activity on ext2
no slowdowns observed
time ls -al /opt/|sort -k 5 -n
real 0m0.022s
user 0m0.020s
sys 0m0.010s
with filesystem activity on reiserfs
- it took 31 seconds to just open this small ( < 1 kb) text file (which
resides in my home directory, on an ext2 filesystem) in vi...
time ls -al /usr/|sort -k 5 -n
real 0m6.136s
user 0m0.020s
sys 0m0.020s
2.4.15-pre4
===========
without filesystem activity
no slowdowns observed
time ls -al /usr/|sort -k 5 -n
real 0m0.081s
user 0m0.010s
sys 0m0.010s
with filesystem activity on ext2
no slowdowns observed
time ls -al /usr/|sort -k 5 -n
real 0m0.146s
user 0m0.000s
sys 0m0.020s
with filesystem activity on reiserfs
system behaviour erratic, some slowdowns
time ls -al /opt|sort -k5 -n
real 0m13.232s
user 0m0.020s
sys 0m0.010s
Seems that reiserfs is the common factor here, at least on my box. This is a 35
GB reiserfs filesystem, app 80% used, both large and small files.
As said in my previous message, the numbers themselves don't mean squat. It is
the large delays (the fact that user+sys <<< real) which are the problem here.
Any other magic anyone wants me to perform? Hans, you reading this?
Cheers//Frank
--
WWWWW _______________________
## o o\ / Frank de Lange \
}# \| / \
##---# _/ <Hacker for Hire> \
#### \ +31-320-252965 /
\ [email protected] /
-------------------------
[ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est." ]
On Mon, Nov 12, 2001 at 11:56:42PM +0100, Frank de Lange wrote:
[snip]
> Seems that reiserfs is the common factor here, at least on my box. This is a 35
> GB reiserfs filesystem, app 80% used, both large and small files.
>
> As said in my previous message, the numbers themselves don't mean squat. It is
> the large delays (the fact that user+sys <<< real) which are the problem here.
>
> Any other magic anyone wants me to perform? Hans, you reading this?
>
Do you see/hear a lot of seeking happing during the delays?
If so, your Reiser partition is probably fragmented to hell...
IIRC this problem is being looked at, check some archives of lkml or reiser...
Mike
On Mon, 12 Nov 2001, Mike Fedyk wrote:
> On Mon, Nov 12, 2001 at 11:56:42PM +0100, Frank de Lange wrote:
> [snip]
> > Seems that reiserfs is the common factor here, at least on my box. This is a 35
> > GB reiserfs filesystem, app 80% used, both large and small files.
> >
> > As said in my previous message, the numbers themselves don't mean squat. It is
> > the large delays (the fact that user+sys <<< real) which are the problem here.
> >
> > Any other magic anyone wants me to perform? Hans, you reading this?
> >
>
> Do you see/hear a lot of seeking happing during the delays?
Yup this is probably what's happening to me. I didn't think a harddrive
could do so many seeks so fast :)
> If so, your Reiser partition is probably fragmented to hell...
:(
> IIRC this problem is being looked at, check some archives of lkml or reiser...
Ok, thanks
/Martin
Never argue with an idiot. They drag you down to their level, then beat you with experience.
In mlist.linux-kernel, you wrote:
> Seems that reiserfs is the common factor here, at least on my box. This is a 35
> GB reiserfs filesystem, app 80% used, both large and small files.
>
> As said in my previous message, the numbers themselves don't mean squat. It is
> the large delays (the fact that user+sys <<< real) which are the problem here.
As another data point, I'm seeing the exact same thing. I haven't tried
any non-Linus kernels, though. But recent 2.4.x (x >= 10?) linus kernels
with reiserfs have these several-second delays during moderate-to-heavy
disk i/o, exactly as you've described. I've seen this on both an SMP
PIII system and a UP Athlon.
--
Jason Lunz Trellis Network Security
[email protected] http://www.trellisinc.com/
On Tue, Nov 13, 2001 at 12:27:00AM +0100, Martin Josefsson wrote:
> On Mon, 12 Nov 2001, Mike Fedyk wrote:
>
> > On Mon, Nov 12, 2001 at 11:56:42PM +0100, Frank de Lange wrote:
> > [snip]
> > > Seems that reiserfs is the common factor here, at least on my box. This is a 35
> > > GB reiserfs filesystem, app 80% used, both large and small files.
> > >
> > > As said in my previous message, the numbers themselves don't mean squat. It is
> > > the large delays (the fact that user+sys <<< real) which are the problem here.
> > >
> > > Any other magic anyone wants me to perform? Hans, you reading this?
> > >
> >
> > Do you see/hear a lot of seeking happing during the delays?
>
> Yup this is probably what's happening to me. I didn't think a harddrive
> could do so many seeks so fast :)
>
Check out the thread "reiserfs performance loss" back in oct 12 and 13...
Mike
>
> Seems that reiserfs is the common factor here, at least on my box. This is a 35
> GB reiserfs filesystem, app 80% used, both large and small files.
>
> As said in my previous message, the numbers themselves don't mean squat. It is
> the large delays (the fact that user+sys <<< real) which are the problem here.
This was also reported as
Suspected bug - System slowdown under unexplained excessive disk I/O - 2.4.13
with huge delays during compiles (sasha Pachev) or mysql-benchmarks (me).
But today I do not find this reiser-specific, this also seems to happen
with ext3.
But as you wrote, not with ext2. I see that there is more disk-activity
due to journaling in both cases, but waiting 30 seconds for simple tasks
or waking from screen-apm seems not to be right.
Oktay Akbal
Frank de Lange wrote:
>On Mon, Nov 12, 2001 at 03:05:56PM -0500, Jeff Garzik wrote:
>
>>Can you try 2.4.13ac6 (not 7/8), and 2.2.20, and post a comparison?
>>
>
>Here's the results from some tests I did:
>
>2.2.20
>======
>without filesystem activity
>no slowdowns observed
>time ls -al /usr/|sort -k 5 -n
>real 0m0.121s
>user 0m0.000s
>sys 0m0.090s
>
>with filesystem activity on ext2
>no slowdowns observed
>time ls -al /opt/|sort -k 5 -n
>real 0m0.079s
>user 0m0.010s
>sys 0m0.100s
>
>2.4.13-ac5
>==========
>no slowdowns observed
>without filesystem activity
>time ls -al /usr/|sort -k 5 -n
>real 0m0.142s
>user 0m0.000s
>sys 0m0.000s
>
>with filesystem activity on ext2
>no slowdowns observed
>time ls -al /opt/|sort -k 5 -n
>real 0m0.022s
>user 0m0.020s
>sys 0m0.010s
>
>with filesystem activity on reiserfs
> - it took 31 seconds to just open this small ( < 1 kb) text file (which
> resides in my home directory, on an ext2 filesystem) in vi...
>time ls -al /usr/|sort -k 5 -n
>real 0m6.136s
>user 0m0.020s
>sys 0m0.020s
>
>
>2.4.15-pre4
>===========
>without filesystem activity
>no slowdowns observed
>time ls -al /usr/|sort -k 5 -n
>real 0m0.081s
>user 0m0.010s
>sys 0m0.010s
>
>with filesystem activity on ext2
>no slowdowns observed
>time ls -al /usr/|sort -k 5 -n
>real 0m0.146s
>user 0m0.000s
>sys 0m0.020s
>
>with filesystem activity on reiserfs
>system behaviour erratic, some slowdowns
>time ls -al /opt|sort -k5 -n
>real 0m13.232s
>user 0m0.020s
>sys 0m0.010s
>
>Seems that reiserfs is the common factor here, at least on my box. This is a 35
>GB reiserfs filesystem, app 80% used, both large and small files.
>
>As said in my previous message, the numbers themselves don't mean squat. It is
>the large delays (the fact that user+sys <<< real) which are the problem here.
>
>Any other magic anyone wants me to perform? Hans, you reading this?
>
>Cheers//Frank
>
Yura, see if you can reproduce this and analyze the cause. If I
understand correctly, he is saying the problem is not throughput but
latency. Is that correct Frank? Once Yura reproduces it, I will
speculate as to the cause.
Hans
On Wed, Nov 21, 2001 at 11:51:21AM +0300, Hans Reiser wrote:
> Yura, see if you can reproduce this and analyze the cause. If I
> understand correctly, he is saying the problem is not throughput but
> latency. Is that correct Frank? Once Yura reproduces it, I will
> speculate as to the cause.
Correct.
--
WWWWW _______________________
## o o\ / Frank de Lange \
}# \| / \
##---# _/ <Hacker for Hire> \
#### \ +31-320-252965 /
\ [email protected] /
-------------------------
[ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est." ]
Hans Reiser wrote:
> Frank de Lange wrote:
>
> >On Mon, Nov 12, 2001 at 03:05:56PM -0500, Jeff Garzik wrote:
> >
> >>Can you try 2.4.13ac6 (not 7/8), and 2.2.20, and post a comparison?
> >>
> >
> >Here's the results from some tests I did:
> >
> >2.2.20
> >======
> >without filesystem activity
> >no slowdowns observed
> >time ls -al /usr/|sort -k 5 -n
> >real 0m0.121s
> >user 0m0.000s
> >sys 0m0.090s
> >
> >with filesystem activity on ext2
> >no slowdowns observed
> >time ls -al /opt/|sort -k 5 -n
> >real 0m0.079s
> >user 0m0.010s
> >sys 0m0.100s
> >
> >2.4.13-ac5
> >==========
> >no slowdowns observed
> >without filesystem activity
> >time ls -al /usr/|sort -k 5 -n
> >real 0m0.142s
> >user 0m0.000s
> >sys 0m0.000s
> >
> >with filesystem activity on ext2
> >no slowdowns observed
> >time ls -al /opt/|sort -k 5 -n
> >real 0m0.022s
> >user 0m0.020s
> >sys 0m0.010s
> >
> >with filesystem activity on reiserfs
> > - it took 31 seconds to just open this small ( < 1 kb) text file (which
> > resides in my home directory, on an ext2 filesystem) in vi...
> >time ls -al /usr/|sort -k 5 -n
> >real 0m6.136s
> >user 0m0.020s
> >sys 0m0.020s
> >
> >
> >2.4.15-pre4
> >===========
> >without filesystem activity
> >no slowdowns observed
> >time ls -al /usr/|sort -k 5 -n
> >real 0m0.081s
> >user 0m0.010s
> >sys 0m0.010s
> >
> >with filesystem activity on ext2
> >no slowdowns observed
> >time ls -al /usr/|sort -k 5 -n
> >real 0m0.146s
> >user 0m0.000s
> >sys 0m0.020s
> >
> >with filesystem activity on reiserfs
> >system behaviour erratic, some slowdowns
> >time ls -al /opt|sort -k5 -n
> >real 0m13.232s
> >user 0m0.020s
> >sys 0m0.010s
> >
> >Seems that reiserfs is the common factor here, at least on my box. This is a 35
> >GB reiserfs filesystem, app 80% used, both large and small files.
> >
> >As said in my previous message, the numbers themselves don't mean squat. It is
> >the large delays (the fact that user+sys <<< real) which are the problem here.
> >
> >Any other magic anyone wants me to perform? Hans, you reading this?
> >
> >Cheers//Frank
> >
> Yura, see if you can reproduce this and analyze the cause. If I
> understand correctly, he is saying the problem is not throughput but
> latency. Is that correct Frank? Once Yura reproduces it, I will
> speculate as to the cause.
>
> Hans
Hello,
Yes, the latency problem exist. I was using "dd" and "cp" commands
to create and copy 1 GB file as "filesystem activity".
In both cases the set of :
"time ls -al /opt|sort -k5 -n" show the same delay.
One way to improve the situation is to use the patch below,
suggested by Chris Mason :
--- linux/fs/buffer.c Fri, 16 Nov 2001 10:58:28 -0500
+++ linux/fs/buffer.c Sun, 18 Nov 2001 12:44:40 -0500
@@ -1020,9 +1020,10 @@
struct buffer_head * bh;
bh = get_hash_table(dev, block, size);
- if (bh)
+ if (bh) {
+ touch_buffer(bh) ;
return bh;
-
+ }
if (!grow_buffers(dev, block, size))
free_more_memory();
}
Thanks,
Yura.