hi all
After testing this for a while, I'm quite sure there's some kind of bug
that locks up I/O under heavy traffic.
Hardware configuration:
1xAthlon 1133
1GB RAM
1 20G boot disk
2 120G ide drives on a promise ata133 (20269) controller
Kernel: Vanilla 2.4.16 + tux-D0
/etc/raidtab:
raiddev /dev/md0
raid-level 0
nr-raid-disks 2
persistent-superblock 0
chunk-size 4096
device /dev/hde
raid-disk 0
device /dev/hdg
raid-disk 1
IDE readahead setting:
echo file_readahead:1024 > /proc/ide/hd[eg]/settings
(I've tried down to 256 with no change.)
file system: independant. I've tried with xfs and ext2 and get the same
result.
Testing:
I make some 100 files, each ~1GB, and start ~100 wget processes to
retrieve data from http://localhost/file-nnnn. Each process is retrieving
a separate file, as to simulate the app. Usually, this works fine in the
beginning, but after a while it all locks up, and the [TUX worker]
(mother) process stops giving me any data, and starts using 100% system
time. If I restart tux, I can do some data retrieval for some time, but
then it locks up again. It's easily reproducable to just start, say, 50
wget processes, killall wget, and then restart the 50 wget processes.
Thanks for all help
regards
roy
--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA
Computers are like air conditioners.
They stop working when you open Windows.
> > After testing this for a while, I'm quite sure there's some kind of bug
> > that locks up I/O under heavy traffic.
>
> there's definitely no problem with heavy load on one stream,
> or with multistream load and default readahead settings.
> (I certianly have tested the former, and the latter is tested
> by all the dbench scores you see here). I'm guessing you'd
> see no lockup if you removed the readahead. though it's also worth
> asking: have you memtest86's the cpu/ram? and can you cause the
> lock with single-threaded bonnie? also, do you have highmem on?
I have highmem turned off.
By using default readahead (124), starting the 50 streams, killing them,
and restarting them, I reproduced the problem. I rebooted the server
before this.
I haven't memtest86'd the hardware, but I will. I still beleive this is an
OS problem - not hardware
--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA
Computers are like air conditioners.
They stop working when you open Windows.
> there is no problem with single-stream IO load. period.
I'm aware of that.
> there is also no problem with multi-stream mixed (realistic) IO loads,
> since dbench is widely run. I'm guessing there's something specific
> to your all-read, many-stream load that noone else has ever tested,
> for obvious reasons.
Well. Someone had to do it. I hope this can be fixed. I really really want
to replace the M$ boxes we use for this today.
--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA
Computers are like air conditioners.
They stop working when you open Windows.
hi
Just wanted to say I've reproduced the error in tux-D1.
thanks for any help
roy
On Thu, 13 Dec 2001, Roy Sigurd Karlsbakk wrote:
> hi all
>
> After testing this for a while, I'm quite sure there's some kind of bug
> that locks up I/O under heavy traffic.
>
> Hardware configuration:
>
> 1xAthlon 1133
> 1GB RAM
> 1 20G boot disk
> 2 120G ide drives on a promise ata133 (20269) controller
>
> Kernel: Vanilla 2.4.16 + tux-D0
>
> /etc/raidtab:
>
> raiddev /dev/md0
> raid-level 0
> nr-raid-disks 2
> persistent-superblock 0
> chunk-size 4096
>
> device /dev/hde
> raid-disk 0
> device /dev/hdg
> raid-disk 1
>
> IDE readahead setting:
>
> echo file_readahead:1024 > /proc/ide/hd[eg]/settings
> (I've tried down to 256 with no change.)
>
> file system: independant. I've tried with xfs and ext2 and get the same
> result.
>
> Testing:
> I make some 100 files, each ~1GB, and start ~100 wget processes to
> retrieve data from http://localhost/file-nnnn. Each process is retrieving
> a separate file, as to simulate the app. Usually, this works fine in the
> beginning, but after a while it all locks up, and the [TUX worker]
> (mother) process stops giving me any data, and starts using 100% system
> time. If I restart tux, I can do some data retrieval for some time, but
> then it locks up again. It's easily reproducable to just start, say, 50
> wget processes, killall wget, and then restart the 50 wget processes.
>
> Thanks for all help
>
> regards
>
> roy
>
> --
> Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA
>
> Computers are like air conditioners.
> They stop working when you open Windows.
>
>
>
>
> _______________________________________________
> tux-list mailing list
> [email protected]
> https://listman.redhat.com/mailman/listinfo/tux-list
>
--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA
Computers are like air conditioners.
They stop working when you open Windows.
> > > > Just wanted to say I've reproduced the error in tux-D1.
> > > how about without tux?
> > >
> >
> > I don't know how to do an intensive read like that without tux.
> > Perhaps you've got an idea?
>
> why not apache?
I first tried, but it all somehow fucked up.
So I tried again - better this time. Same result as with Tux, just that
the system's idle. All the apache processes goes defunct, and the total
i/o as reported by vmstat is peaking at aouund 1000 blocks per sec. Not a
lot, really...
So sorry, Ingo and you other tux guys, it wasn't your fault.
roy
--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA
Computers are like air conditioners.
They stop working when you open Windows.
> > why not apache?
>
> I first tried, but it all somehow fucked up.
>
> So I tried again - better this time. Same result as with Tux, just that
> the system's idle. All the apache processes goes defunct, and the total
> i/o as reported by vmstat is peaking at aouund 1000 blocks per sec. Not a
> lot, really...
A few more lines of info...
The following files are attached:
lspci-vvv (lspci -vvv)
simplewebbench.sh (my simple script)
.config (current kernel config)
I had modified quite a few /proc/sys parameters, but setting them back to
default (by rebooting with an empty sysctl.conf), didn't make any
difference.
As said before, I've tried this config without RAID, and it works,
although I can't get much speed out of it (max 30 megs per sec). With
RAID-0, I get up to some 50 og 60 megs per sec, but then, the I/O hangs,
and nothing more happens. Reading processes go defunct, and it's all
silent...
thanks
roy
--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA
Computers are like air conditioners.
They stop working when you open Windows.