I need to deploy some very resilient servers with hot swapable drives.
I always used dac960 based hardware raid for hot swapping in the past, but
sata drives are so cheap compared to scsi that I'm considering the Tyan GT24
server with 4 hot swappable SATA II drives (nforce4 pro controller)
http://www.tyan.com/products/html/gt24b2891.html
Before I place an order, I need to know whether sata II hot swapping is up to
scratch in the linux kernel, and whether it works nicely with linux software
raid (which I already use/am familiar with).
Any knowledge greatfully accepted :)
Andrew Walrond
Andrew Walrond wrote:
> I need to deploy some very resilient servers with hot swapable drives.
>
> I always used dac960 based hardware raid for hot swapping in the past, but
> sata drives are so cheap compared to scsi that I'm considering the Tyan GT24
> server with 4 hot swappable SATA II drives (nforce4 pro controller)
>
> http://www.tyan.com/products/html/gt24b2891.html
>
> Before I place an order, I need to know whether sata II hot swapping is up to
> scratch in the linux kernel, and whether it works nicely with linux software
> raid (which I already use/am familiar with).
>
> Any knowledge greatfully accepted :)
>
> Andrew Walrond
Unfortunately, SATA hoplug support is not ready yet. Preliminary
works are in progress though and it will happen. Of course, I have
absolutely no idea how distant the future is. :-)
One more thing to note is that nVidia cannot supply information
regarding SATA part (I think network part too) of its chipset to open
source community. So, it is possible that not everything goes smoothly
with nf4 hotplug support even after other pieces come together eventually.
If you're looking for stability/resilience for production machine,
IMHO libata isn't still quite ready.
libata maintainer Jeff Garzik maintains the following status page you
might be interested in.
http://linux.yyz.us/sata/
--
tejun
Andrew Walrond wrote:
> I need to deploy some very resilient servers with hot swapable drives.
[snip]
> Before I place an order, I need to know whether sata II hot swapping is up to
> scratch in the linux kernel, and whether it works nicely with linux software
> raid (which I already use/am familiar with).
>
> Any knowledge greatfully accepted :)
IDE hotswap has never worked (OOTB at least) in Linux, and based on my
experience it never will. Seems the IDE folks doesn't care a bit
about it. (No offence meant. Just keeping it real.)
So if you really need this, here's the opportunity to make a whole lot
of people happy by implementing it yourself. You'll probably need a
lot of time on your hands - there's a very real chance that the IDE
maintainers are too busy or whatever to answer any newbie questions
you might have about how to attack the IDE layer.
Tejun Heo wrote:
> If you're looking for stability/resilience for production machine,
> IMHO libata isn't still quite ready.
I disagree...
I've used it for TBs of data without any problems.
OTOH, with the regular ATA stuff I've experienced loads of IRQ
problems, crashes and hangups.
On Saturday 08 October 2005 15:26, Molle Bestefich wrote:
>
> IDE hotswap has never worked (OOTB at least) in Linux, and based on my
> experience it never will. Seems the IDE folks doesn't care a bit
> about it. (No offence meant. Just keeping it real.)
Fair enough. What about SCSI? Do any of the in-kernel scsi drivers support
hotswap? And if so, how well does it cooperate with linux raid?
>
> Tejun Heo wrote:
> > If you're looking for stability/resilience for production machine,
> > IMHO libata isn't still quite ready.
>
> I disagree...
> I've used it for TBs of data without any problems.
Likewise. I've been using exclusively SATA with linux raid for quite a while
now, with great success. But for the super resilient zero downtime servers I
now need to deploy, I must be able to swap dead drives without taking the
server down. Hence my query.
Off-list respondants have recommended 3ware hardware raid products, but
throughput concerns on another thread here have really put me off that idea.
So unless linux SCSI provides a useful solution, I'll stick with what seems
the only reliable solution out there; hardware scsi raid ( = small expensive
drives ).
The lack of hot swapping does seem to be a serious weakness in linux, at least
for resilient server applications. It would really complete the linux raid
picture, and make it quite compelling.
But I'm in no position to do it myself; I can only hope this thread inspires
some capable person to plug the gap :)
Thanks to all who responded.
Andrew Walrond
On 10/8/05, Andrew Walrond <[email protected]> wrote:
> The lack of hot swapping does seem to be a serious weakness in linux, at least
> for resilient server applications. It would really complete the linux raid
> picture, and make it quite compelling.
>
> But I'm in no position to do it myself; I can only hope this thread inspires
> some capable person to plug the gap :)
Hey Andrew,
I've actually been working on implementing the core set of routines
that will allow for hot-swapping SATA drives in Linux. The core is
not quite ready yet, but you can expect the next iteration within the
week. Once the core is integrated, someone will have to implement
capturing hotswap events on the nForce4 SATA controller, and using the
core functions. I don't know how long that will take, but if the
Linux SATA maintainer, Jeff Garzik (CCed on this email) knows how to
do it, then it might be just a few weeks' time.
That said, if you want to use this for servers you might still want to
wait a bit before committing your resources to this :)
Luke Kosewski
On Sat, 8 Oct 2005, Andrew Walrond wrote:
> On Saturday 08 October 2005 15:26, Molle Bestefich wrote:
> >
> > IDE hotswap has never worked (OOTB at least) in Linux, and based on my
> > experience it never will. Seems the IDE folks doesn't care a bit
> > about it. (No offence meant. Just keeping it real.)
>
> Fair enough. What about SCSI? Do any of the in-kernel scsi drivers support
> hotswap? And if so, how well does it cooperate with linux raid?
I've successfully hot-swapped SCSI drives in a live server, so yes, I
guess it does!
You have to fail the drive (if it's not failed already!) then remove it,
(mdadm --fail /dev/mdX /dev/sdxy, then mdadm --remove /dev/mdX /dev/sdyz)
then use the runes:
echo "scsi remove-single-device 0 1 2 3" > /proc/scsi
where 0 1 2 3 represent the scsi host, channel, device id and lun, (get
this out of /proc/scsi/scsi if unsure)
then (assuming your hardware supports it), you can power down that drive
and unplug it, put a new one it, then do the opposite rune:
echo "scsi add-single-device 0 1 2 3" > /proc/scsi
make sure the kernel sees it (look in /var/log/kern.log, or wherever your
distribution puts this stuff), then mdadm --add ...
Then you can partition (if required) and add it back into the array with
the usual mdadm --add /dev/mdX /dev/sdyz
If your drive is partitioned and each partition is part of a separate RAID
set then you will have to FAIL each partition and remove it in-turn. The
scsi remove-single-device command will only be successfull of all
partitions are not in-use. (similarly you'll have to partition and mdadm
--add each partition with the new drive)
Ideally you want hardware that will power the drive down nicely before you
take it out (and power it up nicely after you plug it back in again) to
avoid any glitches on the SCSI bus, etc...
I've had to do this in a Dell and a home-made box, neither of which had
any facilities for soft powering the drives down or up - I got away with
it, so maybe I was lucky, but I'd do it again if I had to.
One thing to watch out for - if you reboot after taking the drive out the
scsi drive letters will be logically renumbered, so if you take out sda,
then reboot, what was sdb will now become sda, and so on, so if you then
subsequently hot plug a drive in, it will still have the same scsi host,
channel, id, lun numbers, but it'll be the last device in the array (eg.
it will be sdf if it was a 6-disk array) Reboot again and the original
numbering/lettering would be restored.
Good job the RAID code doesn't really care about this...
Good luck!
Gordon
On Saturday 08 October 2005 16:01, Lukasz Kosewski wrote:
>
> I've actually been working on implementing the core set of routines
> that will allow for hot-swapping SATA drives in Linux. The core is
> not quite ready yet, but you can expect the next iteration within the
> week. Once the core is integrated, someone will have to implement
> capturing hotswap events on the nForce4 SATA controller, and using the
> core functions. I don't know how long that will take, but if the
> Linux SATA maintainer, Jeff Garzik (CCed on this email) knows how to
> do it, then it might be just a few weeks' time.
Good news! I'll be watching with great interest, and I'm sure I won't be
alone.
>
> That said, if you want to use this for servers you might still want to
> wait a bit before committing your resources to this :)
Yeah; I need these servers working by November, so I'll have to find another
solution for now. But perhaps the next cluster can use your work? Hope so!
Andrew Walrond
Hi Gordon,
On Saturday 08 October 2005 16:23, Gordon Henderson wrote:
> On Sat, 8 Oct 2005, Andrew Walrond wrote:
> > On Saturday 08 October 2005 15:26, Molle Bestefich wrote:
> > > IDE hotswap has never worked (OOTB at least) in Linux, and based on my
>
> Ideally you want hardware that will power the drive down nicely before you
> take it out (and power it up nicely after you plug it back in again) to
> avoid any glitches on the SCSI bus, etc...
Sounds hairy! Are you aware of any linux scsi drivers which support this
powering up/down, via /proc or some userspace tools perhaps?
>
> One thing to watch out for - if you reboot after taking the drive out the
> scsi drive letters will be logically renumbered, so if you take out sda,
> then reboot, what was sdb will now become sda, and so on, so if you then
> subsequently hot plug a drive in, it will still have the same scsi host,
> channel, id, lun numbers, but it'll be the last device in the array (eg.
> it will be sdf if it was a 6-disk array) Reboot again and the original
> numbering/lettering would be restored.
>
> Good job the RAID code doesn't really care about this...
Indeed. Linux raid is very fine. If we can just fixup this hotplug weakness,
it would be peerless.
>
> Good luck!
>
Thanks, and good to hear from you ;)
Andrew
>>>>> "Andrew" == Andrew Walrond <[email protected]> writes:
Andrew> Likewise. I've been using exclusively SATA with linux raid for
Andrew> quite a while now, with great success. But for the super
Andrew> resilient zero downtime servers I now need to deploy, I must
Andrew> be able to swap dead drives without taking the server
Andrew> down. Hence my query.
Andrew> Off-list respondants have recommended 3ware hardware raid
Andrew> products, but throughput concerns on another thread here have
Andrew> really put me off that idea.
Hmm... I've been watching those 3ware discussions with interest as
well, but I haven't seen any commments on how well they work as JBOD
controllers, esp if you get smaller ones with fewer channels and
stripe/mirror between controllers. If you pair disks between
controllers, then that should limit the downtime, and also improve
performance.
I've been thinking that getting a pair of the two or four change old
74xx series 3ware controllers and then striping across RAID pairs done
between controllers.
John
On Saturday 08 October 2005 17:08, John Stoffel wrote:
>
> Hmm... I've been watching those 3ware discussions with interest as
> well, but I haven't seen any commments on how well they work as JBOD
> controllers, esp if you get smaller ones with fewer channels and
> stripe/mirror between controllers. If you pair disks between
> controllers, then that should limit the downtime, and also improve
> performance.
My application has hundreds/thousands of threads doing simultaneous small
reads, with infrequent small writes. Any problems would probably be mitigated
by having loads of ram for linux to use as disk cache, but this does seem to
be an access model at which the 3ware hardware is not good at handling. Of
course, it never hurts to remind them in a public forum; nothing focuses the
corporate mind better than bad press ;)
Andrew Walrond
Molle Bestefich wrote:
> IDE hotswap has never worked (OOTB at least) in Linux, and based on my
> experience it never will. Seems the IDE folks doesn't care a bit
> about it. (No offence meant. Just keeping it real.)
If you mean IDE as in PATA, it's not supported in the kernel because
PATA hardware does not generally support hotswap, the controllers and
drives are not designed for it.
SATA is very different in regards to hardware capabilities and kernel
support for hotswap..
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/
> One more thing to note is that nVidia cannot supply information
> regarding SATA part (I think network part too) of its chipset to open
> source community. So, it is possible that not everything goes
smoothly
> with nf4 hotplug support even after other pieces come together
eventually.
The sata_nv libata driver has had full support for hotplug for a while.
When the rest of libata supports hotplug nForce4 SATA hotplug should
just work.
-Allen
Hi Luke,
I'm interested in getting the SATA hot-swapping code/patch and try it with
my PromiseTX4 and VIA SATA controllers. Would you please show me where is
the code? I tried to download from Jeff's folder in kernel.org but somehow I
could not untar the file. I felt it may be encrypted. Please help if it's
possible.
Thanks,
Ken
-----Original Message-----
From: [email protected]
[mailto:[email protected]]On Behalf Of Lukasz Kosewski
Sent: Saturday, October 08, 2005 8:01 AM
To: Andrew Walrond
Cc: [email protected]; Molle Bestefich; [email protected];
[email protected]; Jeff Garzik
Subject: Re: Anybody know about nforce4 SATA II hot swapping + linux raid?
On 10/8/05, Andrew Walrond <[email protected]> wrote:
> The lack of hot swapping does seem to be a serious weakness in linux, at
least
> for resilient server applications. It would really complete the linux raid
> picture, and make it quite compelling.
>
> But I'm in no position to do it myself; I can only hope this thread
inspires
> some capable person to plug the gap :)
Hey Andrew,
I've actually been working on implementing the core set of routines
that will allow for hot-swapping SATA drives in Linux. The core is
not quite ready yet, but you can expect the next iteration within the
week. Once the core is integrated, someone will have to implement
capturing hotswap events on the nForce4 SATA controller, and using the
core functions. I don't know how long that will take, but if the
Linux SATA maintainer, Jeff Garzik (CCed on this email) knows how to
do it, then it might be just a few weeks' time.
That said, if you want to use this for servers you might still want to
wait a bit before committing your resources to this :)
Luke Kosewski
Andrew Walrond wrote:
>I need to deploy some very resilient servers with hot swapable drives.
>
>I always used dac960 based hardware raid for hot swapping in the past, but
>sata drives are so cheap compared to scsi that I'm considering the Tyan GT24
>server with 4 hot swappable SATA II drives (nforce4 pro controller)
>
> http://www.tyan.com/products/html/gt24b2891.html
>
>Before I place an order, I need to know whether sata II hot swapping is up to
>scratch in the linux kernel, and whether it works nicely with linux software
>raid (which I already use/am familiar with).
>
>Any knowledge greatfully accepted :)
>
As others have noted, SATA is young and should not be used for hot-swap,
at least in a production manner. I suggest the IBM ServeRAID controller
as one solution for SCSI. I have a bunch of servers in various places
around the country, and these have been good to me, work pretty well
with typical failures, and IBM supports them.
I've deployed about 35 of these and am still happy, so you have a data
point. Most of my servers have 3-6TB.
--
bill davidsen <[email protected]>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979
On Sat, 15 Oct 2005, Bill Davidsen wrote:
> As others have noted, SATA is young and should not be used for hot-swap,
> at least in a production manner. I suggest the IBM ServeRAID controller
> as one solution for SCSI. I have a bunch of servers in various places
> around the country, and these have been good to me, work pretty well
> with typical failures, and IBM supports them.
We have about 7 serverraid card from 4L to 5i. All of them is sitting on
shelf. They are pain to manage, ipssend tool is weak, serverdirector
complicated. And they are slow, the Fusion MPT SCSI with sw raid
significant faster, as we measured with bonnie++. Even the old aic7892 is
faster (these built-in scsi controllers on xseries motherboards).
For example: an xseries 345, serverraid 5i, raid5 from 10k rpm U160 disk
read/write 30/12 MB/s. Same machine, same disk, with linux 2.6.x sw raid1
perform 40/26 MB/s. (this machine with QLA2340 FC HBA, emc cx700 storage:
116/67 MB/s).
For raid5, an other x345 with Fusion MPT, 10k rpm U320 discs, 2.6.x sw
raid 5 from 4 disk perform 110/75 MB/s, and this machine put out 127/85
on qla2340 FC HBA, emc cx700, raid5 from 8 10k rpm FC disks.
Beside this, the scsi hotswap works well in xSeries.
Bye,
-=Lajbi=----------------------------------------------------------------
LAJBER Zoltan Szent Istvan Egyetem, Informatika Hivatal
Most of the time, if you think you are in trouble, crank that throttle!
Lajber Zoltan wrote:
> We have about 7 serverraid card from 4L to 5i. All of them is sitting on
> shelf. They are pain to manage, ipssend tool is weak, serverdirector
> complicated. And they are slow, the Fusion MPT SCSI with sw raid
> significant faster, as we measured with bonnie++. Even the old aic7892 is
> faster (these built-in scsi controllers on xseries motherboards).
The 6i and 7 series of cards seem to have quite a bit better relative
speeds. Certainly the 4Lx cards can be outperformed in simple "hdparm"
tests by a 3ware SATA controller/disks of half the price..
Plus, software RAID can't provide good performance on many server/DB
applications without risking data loss in certain cases - for such
things one really wants something with a battery-backed cache on it..
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/
Robert Hancock wrote:
> Lajber Zoltan wrote:
>
>> We have about 7 serverraid card from 4L to 5i. All of them is sitting on
>> shelf. They are pain to manage, ipssend tool is weak, serverdirector
>> complicated. And they are slow, the Fusion MPT SCSI with sw raid
>> significant faster, as we measured with bonnie++. Even the old aic7892 is
>> faster (these built-in scsi controllers on xseries motherboards).
>
>
> The 6i and 7 series of cards seem to have quite a bit better relative
> speeds. Certainly the 4Lx cards can be outperformed in simple "hdparm"
> tests by a 3ware SATA controller/disks of half the price..
>
> Plus, software RAID can't provide good performance on many server/DB
> applications without risking data loss in certain cases - for such
> things one really wants something with a battery-backed cache on it..
>
I just have a better feeling about hardware when there are dozens of
multi-TB servers from NY to CA. If it goes down IBM fixes it instead of
someone trying to get it back up at the console.
--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me