2002-02-13 01:24:45

by Mukund Ingle

[permalink] [raw]
Subject: Quick question on Software RAID support.


1) Does the Software RAID-5 support automatic detection
of a drive failure? How?

2) Has Linux Software RAID-5 been used in the Enterprise environment
to support redundancy by any real-world networking company
or this is just a tool used by individuals to provide redundancy on
their own PCs in the labs and at home?

Thanks a lot!
Mukund


2002-02-13 01:32:17

by Alan

[permalink] [raw]
Subject: Re: Quick question on Software RAID support.

> 1) Does the Software RAID-5 support automatic detection
> of a drive failure? How?

It sees the commands failing on the underlying controller. Set up a software
raid 5 and just yank a drive out of a bay if you want to test it

> 2) Has Linux Software RAID-5 been used in the Enterprise environment
> to support redundancy by any real-world networking company
> or this is just a tool used by individuals to provide redundancy on
> their own PCs in the labs and at home?

Dunno about that. I just hack code 8)

2002-02-13 01:51:49

by Chris Chabot

[permalink] [raw]
Subject: Re: Quick question on Software RAID support.

Alan Cox wrote:
>>1) Does the Software RAID-5 support automatic detection
>> of a drive failure? How?
>>
>
> It sees the commands failing on the underlying controller. Set up a software
> raid 5 and just yank a drive out of a bay if you want to test it

This is also why software raid 5 + IDE is a bad combo. It has a high
chance of locking up the IDE controller, and requiring you to power down
& fix the system before reconstruction can commence. However with SCSI
hot-swapable solutions, on-the-fly reconstruction after failure works
perfectly.


>>2) Has Linux Software RAID-5 been used in the Enterprise environment
>> to support redundancy by any real-world networking company
>> or this is just a tool used by individuals to provide redundancy on
>> their own PCs in the labs and at home?
>>
>
> Dunno about that. I just hack code 8)

I am using software raid 5 and several Dell PowerEdge 2550 servers
(since the hardware raid was to slow for some heavy IO operations), with
great results. We have had 5 seperate disk failures so far, and no
problems what so ever. Either the spare disk kicked right in, or after
adding the new drive, reconstruction work perfectly.


I don't know if 20 PE2550 servers qualifies as a 'enterprise' solution,

but it works great for the kinds of thing we are doing

--Chris

2002-02-13 02:19:34

by Rob Landley

[permalink] [raw]
Subject: Re: Quick question on Software RAID support.

On Tuesday 12 February 2002 08:45 pm, Alan Cox wrote:
> > 1) Does the Software RAID-5 support automatic detection
> > of a drive failure? How?
>
> It sees the commands failing on the underlying controller. Set up a
> software raid 5 and just yank a drive out of a bay if you want to test it
>
> > 2) Has Linux Software RAID-5 been used in the Enterprise environment
> > to support redundancy by any real-world networking company
> > or this is just a tool used by individuals to provide redundancy on
> > their own PCs in the labs and at home?
>
> Dunno about that. I just hack code 8)
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

I've seen a 20-way Linux software raid used to capture uncompressed HTDV
video in realtime, as part of an HTDV video editing system for which I
believe the client was billed six figures.

That was SCSI. (Well, dual qlogic fiber channel controllers that pretended
to be scsi.) I've also encountered a couple companies selling 14-drive
enclosures (IDE, they rackmount in a 3U or 4U) that are turned into big
software raid systems for data hosting.

And of course, you might want to talk to IBM and their global file system
stuff, and their implementation of the logical volume management stuff last
year (what was not the one that Linus eventually went with, I believe...)

Does this count?

(I kind of doubt IBM, HP, or Sun are insterested in tools for individual
end-users...)

Rob

2002-02-13 04:12:19

by Bill Davidsen

[permalink] [raw]
Subject: Re: Quick question on Software RAID support.

On Wed, 13 Feb 2002, Chris Chabot wrote:

> Alan Cox wrote:
> >>1) Does the Software RAID-5 support automatic detection
> >> of a drive failure? How?
> >>
> >
> > It sees the commands failing on the underlying controller. Set up a software
> > raid 5 and just yank a drive out of a bay if you want to test it
>
> This is also why software raid 5 + IDE is a bad combo. It has a high
> chance of locking up the IDE controller, and requiring you to power down
> & fix the system before reconstruction can commence. However with SCSI
> hot-swapable solutions, on-the-fly reconstruction after failure works
> perfectly.

>From personal experience software RAID is quite fast, and very reliable
regarding failures while running. If a disk fails the system drops back to
recovery, and after a new drive is added and `raidhotadd' is run it is
rebuilt.

The dark side of the force is that is a drive fails on boot, I have had
problems getting the system to boot (even when not the boot drive). The
system doesn't always recognize that there is a failed drive, and I've had
to build a new raid config with "failed disk" entries to get the system
up. Later version may be better at that (comments, please), I have not had
to address this in over a year, since most of my system are not taken down
unless they fall down.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2002-02-13 09:26:53

by Francois Romieu

[permalink] [raw]
Subject: Re: Quick question on Software RAID support.

Greetings,

Bill Davidsen <[email protected]> :
[...]
> >From personal experience software RAID is quite fast, and very reliable
> regarding failures while running. If a disk fails the system drops back to
> recovery, and after a new drive is added and `raidhotadd' is run it is
> rebuilt.
>
> The dark side of the force is that is a drive fails on boot, I have had

(raid1)
- planned reboot;
- spontaneous fsck;
- rarely accessed part of a disk isn't happy
- is it normal for an scsi error to take more than 10 minutes ?
- LRB
- removal of faulty drive;
- reboot;
- spontaneous fsck;
-> now there's a nice fs with a 3 months old content.

Interesting experience for an otherwise usual sunday.

Btw, this log entry is a bit terse:

http://www.kernel.org/pub/linux/kernel/v2.4/testing/patch-2.4.18.log
[...]
- Fix rare data loss case with RAID-1

--
Ueimor

2002-02-13 10:34:13

by Marco Colombo

[permalink] [raw]
Subject: Re: Quick question on Software RAID support.

On Wed, 13 Feb 2002, Alan Cox wrote:

> > 1) Does the Software RAID-5 support automatic detection
> > of a drive failure? How?
>
> It sees the commands failing on the underlying controller. Set up a software
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Is it supposed to detect a failed disk and *stop* using it?

I had a raid1 IDE system, and it was continuosly raising hard errors on
hdc (the disk was dead, non just some bad blocks): the net result was that
it was unusable - too slow, too busy on IDE errors (a lot of them - even
syslog wasn't happy).

Ok, all it took me was to replace the disk, partition it and raidhotadd
devices. Yet it needed manual intervention. I wish it performed an
raidhotremove automagically so to run with decent performance...
even if in "degraded mode". It was RH 2.2.19, so things may have changed
meanwhile.

BTW, given a 2 disks IDE raid1 setup (hda / hdc), does it pay to put a
third disk in (say hdb) and configure it as "spare disk"? I've got
concerns about the slave not actually beeing able to operate if the
master (hda) fails badly.

TIA,
.TM.

2002-02-13 11:03:03

by Alan

[permalink] [raw]
Subject: Re: Quick question on Software RAID support.

> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Is it supposed to detect a failed disk and *stop* using it?

Yes, it will stop using it and if appropriate try and do a rebuild

> I had a raid1 IDE system, and it was continuosly raising hard errors on
> hdc (the disk was dead, non just some bad blocks): the net result was that
> it was unusable - too slow, too busy on IDE errors (a lot of them - even
> syslog wasn't happy).

Don't try and do "hot pluggable" IDE raid it really doesn't work out. With
scsi the impact of a sulking drive is minimal unless you get unlucky
(I have here a failed SCSI SCA drive that hangs the entire bus merely by
being present - I use it to terrify HA people 8))

> BTW, given a 2 disks IDE raid1 setup (hda / hdc), does it pay to put a
> third disk in (say hdb) and configure it as "spare disk"? I've got
> concerns about the slave not actually beeing able to operate if the
> master (hda) fails badly.

Well placed concerns. I don't know what Andre thinks but IMHO spend the
extra $20 to put an extra highpoint controller in the machine for the third
IDE bus.

Alan

2002-02-13 14:26:14

by Marco Colombo

[permalink] [raw]
Subject: Re: Quick question on Software RAID support.

On Wed, 13 Feb 2002, Alan Cox wrote:

> > Is it supposed to detect a failed disk and *stop* using it?
>
> Yes, it will stop using it and if appropriate try and do a rebuild

So I guess something went wrong in my case.
The disk died bad. You could hear (from outsite the rack the PC was
mounted on) a repeated sound of "Ziing -> TOC!". There's no way now to make
a BIOS (or a DOS test tool) detect it.

> > I had a raid1 IDE system, and it was continuosly raising hard errors on
> > hdc (the disk was dead, non just some bad blocks): the net result was that
> > it was unusable - too slow, too busy on IDE errors (a lot of them - even
> > syslog wasn't happy).
>
> Don't try and do "hot pluggable" IDE raid it really doesn't work out. With
> scsi the impact of a sulking drive is minimal unless you get unlucky

Does the above apply to ATA HW RAID controllers, too? I mean, is it
something strictly related to electrical specs of the interface or
is it possible to find workarounds? (whether a vendor apply them with
success it's another story - I wonder if it's possible in theory)

Anyway, the problem is not replacing the disk, is to have the system
stop using it - automatically, without human action. If you say it is so,
then I just must have been unlucky.

> (I have here a failed SCSI SCA drive that hangs the entire bus merely by
> being present - I use it to terrify HA people 8))

the topic here is data safety, what do HA people know about it? B-)

Again, do HW RAID ATA controllers have an hope to handle a failure better
than the average IDE controllers you find integrated into a typical MB?

Right now, to implement a 2+ disks RAID (sw) with IDE/ATA, I'd put
one disk per channel, on some multi-channel controllers (i.e. some
HPTs you've mentioned below). I'm just curious if RAID HW support brings
something new into the game... Here I'm considering resilience, not
performance: I know (by experience) that 2 disks in the same channel
is a small gain performance-wise. I'm tempted to buy an HPT RocketRAID
133 (just to name one): it supports (on paper / web) "disk mirroring,
hot-spare options for automatic array-rebuilds, hot-swap support for
swapping failed disks on the fly [...], and disk failure notification".
I still think SW RAID is better, since I don't really like relying
on a black box (read: some unknown firmware).

> > BTW, given a 2 disks IDE raid1 setup (hda / hdc), does it pay to put a
> > third disk in (say hdb) and configure it as "spare disk"? I've got
> > concerns about the slave not actually beeing able to operate if the
> > master (hda) fails badly.
>
> Well placed concerns. I don't know what Andre thinks but IMHO spend the
> extra $20 to put an extra highpoint controller in the machine for the third
> IDE bus.
>
> Alan
>

TIA,
.TM.

2002-02-13 18:30:24

by Mark Cooke

[permalink] [raw]
Subject: Re: Quick question on Software RAID support.

Hi Alan,

Just a note that I have almost exactly the setup you outlined on a
KT7A-RAID, HPT370 onboard.

I have a single disk on each highpoint chain, and a 3rd (parity) on
one of the onboard 686B channels.

I have been seeing odd corruptions since I setup the system as RAID-5
though. Have you seen any reports of 686B ide corruption recently (or
RAID-5 for that matter) ?

kernel 2.4.18pre6... just compiling pre9-ac3...
Athlon MP 1500+, mem=nopentium apm=off, NvAGP=0 in X-setup.

Mark

On Wed, 13 Feb 2002, Alan Cox wrote:

> Date: Wed, 13 Feb 2002 11:15:54 +0000 (GMT)
> From: Alan Cox <[email protected]>
> To: Marco Colombo <[email protected]>
> Cc: Alan Cox <[email protected]>, [email protected]
> Subject: Re: Quick question on Software RAID support.
>
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > Is it supposed to detect a failed disk and *stop* using it?
>
> Yes, it will stop using it and if appropriate try and do a rebuild
>
> > I had a raid1 IDE system, and it was continuosly raising hard errors on
> > hdc (the disk was dead, non just some bad blocks): the net result was that
> > it was unusable - too slow, too busy on IDE errors (a lot of them - even
> > syslog wasn't happy).
>
> Don't try and do "hot pluggable" IDE raid it really doesn't work out. With
> scsi the impact of a sulking drive is minimal unless you get unlucky
> (I have here a failed SCSI SCA drive that hangs the entire bus merely by
> being present - I use it to terrify HA people 8))
>
> > BTW, given a 2 disks IDE raid1 setup (hda / hdc), does it pay to put a
> > third disk in (say hdb) and configure it as "spare disk"? I've got
> > concerns about the slave not actually beeing able to operate if the
> > master (hda) fails badly.
>
> Well placed concerns. I don't know what Andre thinks but IMHO spend the
> extra $20 to put an extra highpoint controller in the machine for the third
> IDE bus.
>
> Alan
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
+-------------------------------------------------------------------------+
Mark Cooke The views expressed above are mine and are not
Systems Programmer necessarily representative of university policy
University Of Birmingham URL: http://www.sr.bham.ac.uk/~mpc/
+-------------------------------------------------------------------------+

2002-02-13 19:03:15

by Thomas Schenk

[permalink] [raw]
Subject: Re: Quick question on Software RAID support.

On Tue, 2002-02-12 at 19:34, Mukund Ingle wrote:
>
> 1) Does the Software RAID-5 support automatic detection
> of a drive failure? How?
>
> 2) Has Linux Software RAID-5 been used in the Enterprise environment
> to support redundancy by any real-world networking company
> or this is just a tool used by individuals to provide redundancy on
> their own PCs in the labs and at home?

I don't know if this qualifies as "in the Enterprise environment to
support redundancy by any real-world networking company", but when I
worked at Deja.com (aka Dejanews), we used software RAID on production
servers (database hosts mostly) and it worked fine. The only problems
we ever had with it were due to human error (running fsck on individual
drives in the arrays).

Tom S.

--

+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
| Tom Schenk | A positive attitude may not solve all your
|
| Online Ops, EA.COM | problems, but it will annoy enough people
to |
| [email protected] | make it worth the effort. -- Herm Albright
|

+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+

2002-02-13 21:34:23

by Ville Herva

[permalink] [raw]
Subject: Re: Quick question on Software RAID support.

On Wed, Feb 13, 2002 at 06:30:01PM +0000, you [Mark Cooke] wrote:
> Hi Alan,
>
> Just a note that I have almost exactly the setup you outlined on a
> KT7A-RAID, HPT370 onboard.
>
> I have a single disk on each highpoint chain, and a 3rd (parity) on
> one of the onboard 686B channels.
>
> I have been seeing odd corruptions since I setup the system as RAID-5
> though. Have you seen any reports of 686B ide corruption recently (or
> RAID-5 for that matter) ?
>
> kernel 2.4.18pre6... just compiling pre9-ac3...
> Athlon MP 1500+, mem=nopentium apm=off, NvAGP=0 in X-setup.

After months of testing, we found that KT7-RAID (we tested KT7A-RAID as
well) is basicly impossible to get working reliably. It *always* corrupted
data from HPT370, no matter what we tried. It seemed VIA PCI problem as
things like the pci slot of the nic, network load, nic model etc greatly
affected corruption rate. (Via 686b ide never corrupted data, but then again
it's integrated in the south bridge and perhaps avoids full PCI path). Our
combination was software RAID0 (one disk on ide2 and ide3 (HPT370
channels)).

We ditched the board deep, took an Abit ST6-RAID (i815+HPT370) and have had
no problems since.

My position is that for heavy PCI load (additional IDE adapters etc), stay
away from Via.

BTW: I have a little program to stress the raid volume (or any disk device
for that matter) that I used to trigger the corruption. It is destructive
for the data, though. I can mail it to you, if you like.


-- v --

[email protected]

2002-02-13 21:56:42

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: Quick question on Software RAID support.

On Wed, Feb 13, 2002 at 11:33:41PM +0200, Ville Herva wrote:
> On Wed, Feb 13, 2002 at 06:30:01PM +0000, you [Mark Cooke] wrote:
> > Hi Alan,
> >
> > Just a note that I have almost exactly the setup you outlined on a
> > KT7A-RAID, HPT370 onboard.
> >
> > I have a single disk on each highpoint chain, and a 3rd (parity) on
> > one of the onboard 686B channels.
> >
> > I have been seeing odd corruptions since I setup the system as RAID-5
> > though. Have you seen any reports of 686B ide corruption recently (or
> > RAID-5 for that matter) ?
> >
> > kernel 2.4.18pre6... just compiling pre9-ac3...
> > Athlon MP 1500+, mem=nopentium apm=off, NvAGP=0 in X-setup.
>
> After months of testing, we found that KT7-RAID (we tested KT7A-RAID as
> well) is basicly impossible to get working reliably. It *always* corrupted
> data from HPT370, no matter what we tried. It seemed VIA PCI problem as
> things like the pci slot of the nic, network load, nic model etc greatly
> affected corruption rate. (Via 686b ide never corrupted data, but then again
> it's integrated in the south bridge and perhaps avoids full PCI path). Our
> combination was software RAID0 (one disk on ide2 and ide3 (HPT370
> channels)).
>
> We ditched the board deep, took an Abit ST6-RAID (i815+HPT370) and have had
> no problems since.
>
> My position is that for heavy PCI load (additional IDE adapters etc), stay
> away from Via.
>
> BTW: I have a little program to stress the raid volume (or any disk device
> for that matter) that I used to trigger the corruption. It is destructive
> for the data, though. I can mail it to you, if you like.

I'd like to try that, too, so if you can send me the program ...

--
Vojtech Pavlik
SuSE Labs

2002-02-13 22:13:11

by Ville Herva

[permalink] [raw]
Subject: Re: Quick question on Software RAID support.

> I'd like to try that, too, so if you can send me the program ...

I run it on /dev/md0 (which consists of one hd on each HPT370 channel).
You can also do it for /dev/hd{e,g} in parallel - the effects are pretty
much the same. To make it trigger easier, try "ping -f -s 64000" on
background and stress scsi system if you have one. I think any pci load
affects it, but I found 3c905b network load by far the easiest way to
trigger the bug (I even got OOPSes if 3c905b was in certain slot while
doing that.)

Oh, and please excuse the state of the code - it was meant as a quick hack
only...

--
Ville Herva [email protected] +358-50-5164500
Viasys Oy Hannuntie 6 FIN-02360 Espoo +358-9-2313-2160
PGP key available: http://www.iki.fi/v/pgp.html fax +358-9-2313-2250


Attachments:
(No filename) (806.00 B)
wrchk.c (4.11 kB)
Download all attachments

2002-02-14 20:50:43

by Pavel Machek

[permalink] [raw]
Subject: Re: Quick question on Software RAID support.

Hi!

> > I had a raid1 IDE system, and it was continuosly raising hard errors on
> > hdc (the disk was dead, non just some bad blocks): the net result was that
> > it was unusable - too slow, too busy on IDE errors (a lot of them - even
> > syslog wasn't happy).
>
> Don't try and do "hot pluggable" IDE raid it really doesn't work out. With
> scsi the impact of a sulking drive is minimal unless you get unlucky
> (I have here a failed SCSI SCA drive that hangs the entire bus merely by
> being present - I use it to terrify HA people 8))

I could imagine scenario when disk would set itself on fire...

...which was reason why disks in sun4/330 were separate by steel so
fire in disks would not damage mainboard ;-).
Pavel
--
(about SSSCA) "I don't say this lightly. However, I really think that the U.S.
no longer is classifiable as a democracy, but rather as a plutocracy." --hpa

2002-02-15 23:49:26

by Mark Cooke

[permalink] [raw]
Subject: Re: Quick question on Software RAID support.

On Thu, 14 Feb 2002, Ville Herva wrote:

> I run it on /dev/md0 (which consists of one hd on each HPT370 channel).
> You can also do it for /dev/hd{e,g} in parallel - the effects are pretty
> much the same. To make it trigger easier, try "ping -f -s 64000" on
> background and stress scsi system if you have one. I think any pci load
> affects it, but I found 3c905b network load by far the easiest way to
> trigger the bug (I even got OOPSes if 3c905b was in certain slot while
> doing that.)

Hi Ville,

I've just been trying this here, with the following setup, and it's
(so far) been reliable.... Just doing a 3rd pass..

hdc: seagate 80G, 1Gb partition (r5 parity)
hde: seagate 40G, 1Gb partition (r5 data)
hdg: seagate 40G, 1Gv partition (r5 data)

AGP currently disabled (NvAgp=0 in the Xserver config).

Running: ./w /dev/md2 2000 8 50
mplayer divx playback
gears (for accel gl stressing)
ping -f s 64000
xawtv running for more traffic
xmms playing back mp3s

System's running pretty decently still (it's on pass 5 of the
partition blasting). Note however, that I currently have all the disk
interfaces reset to only udma 3 as part of the startup scripts. I'll
pull out the exact pci-tweaking bios settings when I next restart.

As and when I get confidence in the system (and a bigger case fan) at
the current settings, I'll push up the transfer rates - though with
just a single disk on each chain, there's not that much to be gained
by it (though udma 3 is supposedly just shy of the maximum xfer rate
the barracuda IV's can produce).

At least a large portion of my trouble appears to have gone since I
stopped using md2(raid5) for a swap partition and I'd just setup 3
independant swap areas instead. While doing this stress testing, I
currently have no swapfile setup at all. Kernel's 2.4.18pre9-ac4
now, and the via tweaking in there might be a factor too.

Hardware in machine/irq setup:

# cat /proc/interrupts
CPU0
0: 5995487 XT-PIC timer
1: 100561 XT-PIC keyboard
2: 0 XT-PIC cascade
8: 8509758 XT-PIC rtc
9: 1475133 XT-PIC usb-uhci, usb-uhci, eth0, eth1
10: 5322285 XT-PIC bttv, nvidia
11: 1117995 XT-PIC ide2, ide3
12: 793060 XT-PIC Trident Audio
14: 1407 XT-PIC ide0
15: 577645 XT-PIC ide1

00:00.0 Host bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133] (rev 03)
00:01.0 PCI bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133 AGP]
00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 40)
00:07.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
00:07.2 USB Controller: VIA Technologies, Inc. UHCI USB (rev 1a)
00:07.3 USB Controller: VIA Technologies, Inc. UHCI USB (rev 1a)
00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40)
00:08.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8029(AS)
00:0b.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone] (rev 30)
00:0f.0 Multimedia audio controller: Trident Microsystems 4DWave DX (rev 02)
00:11.0 Multimedia video controller: Brooktree Corporation Bt848 TV with DMA push (rev 12)
00:13.0 Unknown mass storage controller: HighPoint Technologies, Inc. HPT366/370 UltraDMA 66/100 IDE Controller (rev 04)
01:00.0 VGA compatible controller: nVidia Corporation NV11 DDR (rev b2)


Cheers,

Mark

--
+-------------------------------------------------------------------------+
Mark Cooke The views expressed above are mine and are not
Systems Programmer necessarily representative of university policy
University Of Birmingham URL: http://www.sr.bham.ac.uk/~mpc/
+-------------------------------------------------------------------------+

2002-02-16 17:02:27

by Ville Herva

[permalink] [raw]
Subject: VIA KT133 (was: Re: Quick question on Software RAID support.)

On Fri, Feb 15, 2002 at 11:48:41PM +0000, you [Mark Cooke] wrote:
>
> Hi Ville,
>
> I've just been trying this here, with the following setup, and it's
> (so far) been reliable.... Just doing a 3rd pass..
>
> hdc: seagate 80G, 1Gb partition (r5 parity)
> hde: seagate 40G, 1Gb partition (r5 data)
> hdg: seagate 40G, 1Gv partition (r5 data)

Hm, doesn't this mean that you can't read hde+hdg faster than hdc gives
parity data? That'd mean hde+hdg are not maxing out HPT and PCI channel...

You could perhaps run 2 separate wrchk's on hde and hdg (and one on hdc if
you please) - you can use it on file as well as on device.

> AGP currently disabled (NvAgp=0 in the Xserver config).
>
> Running: ./w /dev/md2 2000 8 50

I have usually used ~64MB blocks?, but I don't it matters.

?) For those too lazy to read the source, wrchk args are [1] device
or file [2] test file size [3] read/write block size [4] num of
iterations (0 for infinite test) ;).

> ping -f s 64000

Is this RTL or 905B? We had better success with RTL8139 (but corruption
happened still), whereas with 3c905b would trigger corruption almost
instantly if it was attached to PCI slot 4. In slot 3, it behaved a lot
better, but the corruption eventually happened.

> xawtv running for more traffic
> mplayer divx playback
> gears (for accel gl stressing)

Hmm, we never ran X at all.

> xmms playing back mp3s
>
> System's running pretty decently still (it's on pass 5 of the
> partition blasting). Note however, that I currently have all the disk
> interfaces reset to only udma 3 as part of the startup scripts. I'll
> pull out the exact pci-tweaking bios settings when I next restart.

Yep, I think the udma mode makes difference. Though we did try UDMA33, but
it didn't solve the problem for us.

> (though udma 3 is supposedly just shy of the maximum xfer rate the
> barracuda IV's can produce).

Better verify that with hdparm -tT...

> While doing this stress testing, I
> currently have no swapfile setup at all.

Neither did we. We usually lanched the kernel from boot floppy and had the
rootfs on cd. This way it wasn't possble to destruct anything while
testing...

> Kernel's 2.4.18pre9-ac4
> now, and the via tweaking in there might be a factor too.

We tried 2.2.20+ide, 2.2.21pre2+ide, 2.4.15, 2.4.18pre-something+ide etc. It
didn't make difference.


-- v --

[email protected]