I just plugged in a new RAID-1(+0, 2 2-disk stripe sets mirrored) to a
2.4.12-ac3 machine. The md code decided it was going to resync the mirror
at between 100KB/sec and 100000KB/sec. The actual rate was 100KB/sec,
while the device was otherwise idle. By increasing
/proc/.../speed_limit_min, I was able to crank the resync rate up to
20MB/sec, which is slightly more reasonable but still short of the
~60MB/sec this RAID is capable of.
So, two things: there is something wrong with the resync code that makes
it run at the minimum rate even when the device is idle, and why is the
resync proceeding so slowly?
raid1d and raid1syncd are barely getting any CPU time on this otherwise
idle SMP system.
There must be some optimization to mostly skip the sync on an array of new
drives, ja?
-jwb
> raid1d and raid1syncd are barely getting any CPU time on this otherwise
> idle SMP system.
I noticed this too, on a uni, raid5 system;
the resync-throttling code doesn't seem to work well.
On Monday October 15, [email protected] wrote:
> > raid1d and raid1syncd are barely getting any CPU time on this otherwise
> > idle SMP system.
>
> I noticed this too, on a uni, raid5 system;
> the resync-throttling code doesn't seem to work well.
It works great for me...
What sort of drives do you have? SCSI? IDE? are you using both master
and slave on an IDE controller?
NeilBrown
On Monday October 15, [email protected] wrote:
> I just plugged in a new RAID-1(+0, 2 2-disk stripe sets mirrored) to a
> 2.4.12-ac3 machine. The md code decided it was going to resync the mirror
> at between 100KB/sec and 100000KB/sec. The actual rate was 100KB/sec,
> while the device was otherwise idle. By increasing
> /proc/.../speed_limit_min, I was able to crank the resync rate up to
> 20MB/sec, which is slightly more reasonable but still short of the
> ~60MB/sec this RAID is capable of.
>
> So, two things: there is something wrong with the resync code that makes
> it run at the minimum rate even when the device is idle, and why is the
> resync proceeding so slowly?
The way that it works out where there is other activity on the drives
is a bit fragile. It works particularly badly when the underlying
devices are md devices.
I would recommend that instead of mirroring 2 stipe sets, you stripe
two mirrored pairs. The resync should be faster and the resilience to
failure is much better.
NeilBrown
On Tue, 16 Oct 2001, Neil Brown wrote:
> On Monday October 15, [email protected] wrote:
> > I just plugged in a new RAID-1(+0, 2 2-disk stripe sets mirrored) to a
> > 2.4.12-ac3 machine. The md code decided it was going to resync the mirror
> > at between 100KB/sec and 100000KB/sec. The actual rate was 100KB/sec,
> > while the device was otherwise idle. By increasing
> > /proc/.../speed_limit_min, I was able to crank the resync rate up to
> > 20MB/sec, which is slightly more reasonable but still short of the
> > ~60MB/sec this RAID is capable of.
> >
> > So, two things: there is something wrong with the resync code that makes
> > it run at the minimum rate even when the device is idle, and why is the
> > resync proceeding so slowly?
>
> The way that it works out where there is other activity on the drives
There wasn't any activity at all.
> is a bit fragile. It works particularly badly when the underlying
> devices are md devices.
Bummer.
> I would recommend that instead of mirroring 2 stipe sets, you stripe
> two mirrored pairs. The resync should be faster and the resilience to
> failure is much better.
I did eventually do it that way, but the sync speed was the same. I'm
very curious to know why you think striping mirrors is more reliable than
mirroring stripes. Either way, you can lose any one drive and some
combinations of two drives. Either way you can hot-swap the bad disk.
-jwb
On Tue, 16 Oct 2001, Neil Brown wrote:
> On Monday October 15, [email protected] wrote:
> > > raid1d and raid1syncd are barely getting any CPU time on this otherwise
> > > idle SMP system.
> >
> > I noticed this too, on a uni, raid5 system;
> > the resync-throttling code doesn't seem to work well.
>
> It works great for me...
> What sort of drives do you have? SCSI? IDE? are you using both master
> and slave on an IDE controller?
15,000 RPM SCSI u160 disks.
-jwb
On Monday October 15, [email protected] wrote:
> On Tue, 16 Oct 2001, Neil Brown wrote:
>
> > On Monday October 15, [email protected] wrote:
> > > I just plugged in a new RAID-1(+0, 2 2-disk stripe sets mirrored) to a
> > > 2.4.12-ac3 machine. The md code decided it was going to resync the mirror
> > > at between 100KB/sec and 100000KB/sec. The actual rate was 100KB/sec,
> > > while the device was otherwise idle. By increasing
> > > /proc/.../speed_limit_min, I was able to crank the resync rate up to
> > > 20MB/sec, which is slightly more reasonable but still short of the
> > > ~60MB/sec this RAID is capable of.
> > >
> > > So, two things: there is something wrong with the resync code that makes
> > > it run at the minimum rate even when the device is idle, and why is the
> > > resync proceeding so slowly?
> >
> > The way that it works out where there is other activity on the drives
>
> There wasn't any activity at all.
See how fragile it is?
It only really works if the underlying devices are real drives with
major number less than 16, and that are among the first 16 real
devices with that major number (not counting different partitions on
the device).
>
> > is a bit fragile. It works particularly badly when the underlying
> > devices are md devices.
>
> Bummer.
>
> > I would recommend that instead of mirroring 2 stipe sets, you stripe
> > two mirrored pairs. The resync should be faster and the resilience to
> > failure is much better.
>
> I did eventually do it that way, but the sync speed was the same. I'm
> very curious to know why you think striping mirrors is more reliable than
> mirroring stripes. Either way, you can lose any one drive and some
> combinations of two drives. Either way you can hot-swap the bad
> disk.
With striped mirrors there are more combinations of two drives that
can fail without data loss, and less data needs to be copied during a
resync. Also, hotadd is easier - you don't have to build a raid0 and
then hot-add that, you just hot-add a drive.
It is odd that you still aren't getting good rebuild speed. What
drives to you have and how are they connected to what controllers?
NeilBrown
>
> -jwb
"A month of sundays ago Jeffrey W. Baker wrote:"
> I just plugged in a new RAID-1(+0, 2 2-disk stripe sets mirrored) to a
> 2.4.12-ac3 machine. The md code decided it was going to resync the mirror
> at between 100KB/sec and 100000KB/sec. The actual rate was 100KB/sec,
> while the device was otherwise idle. By increasing
> /proc/.../speed_limit_min, I was able to crank the resync rate up to
> 20MB/sec, which is slightly more reasonable but still short of the
> ~60MB/sec this RAID is capable of.
>
> So, two things: there is something wrong with the resync code that makes
> it run at the minimum rate even when the device is idle, and why is the
> resync proceeding so slowly?
This has been the trend throughout the 2.4 series. 2.4.0 was quite
snappish at resyncs and speed has generally dropped from version to
version. I recall seeing a speed halving somewhere early in the series
(2.4.2?).
> raid1d and raid1syncd are barely getting any CPU time on this otherwise
> idle SMP system.
>
> There must be some optimization to mostly skip the sync on an array of new
> drives, ja?
Not that I have seen. Raid resyncs are throttled via a braking mechanism
in the generic md code (I think it's called fooresyncbar). It attempts
to guage the current resync speed and compares with the min and max values
and either calls for more resyncs or schedules. But even removing this
brake from the code doesnt't spped things up, so I am mystified as to
where the throttling effect comes from. It must be somewhere else in
the structire of the code.
Another problem is that there seem sto be some kind of state .. if the
raid resync starts while the machine is under load, then it runs slowly
and continues at that rate even when the other load is removed.
Peter
On Monday October 15, [email protected] wrote:
> On Tue, 16 Oct 2001, Neil Brown wrote:
>
> > On Monday October 15, [email protected] wrote:
> > > > raid1d and raid1syncd are barely getting any CPU time on this otherwise
> > > > idle SMP system.
> > >
> > > I noticed this too, on a uni, raid5 system;
> > > the resync-throttling code doesn't seem to work well.
> >
> > It works great for me...
> > What sort of drives do you have? SCSI? IDE? are you using both master
> > and slave on an IDE controller?
>
> 15,000 RPM SCSI u160 disks.
Just like mine.....
I would expect around 30Mb/sec when resyncing a single mirrored pair,
and slightly less than that on each if you are syncing two mirrored
pairs at once, as you would be getting close to the theoretical buss
max (to resync two pairs at once at 30Mb/sec each you would need to
but pushing 120Mb/sec over the buss, and I doubt that you would get
that from u160 in practice).
Thats a big more than the 20Mb/sec that you report, but less than the
60Mb/sec that you hoped for...
NeilBrown
>
> -jwb
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On Tuesday October 16, [email protected] wrote:
> On Monday October 15, [email protected] wrote:
> > On Tue, 16 Oct 2001, Neil Brown wrote:
> >
> > > On Monday October 15, [email protected] wrote:
> > > > > raid1d and raid1syncd are barely getting any CPU time on this otherwise
> > > > > idle SMP system.
> > > >
> > > > I noticed this too, on a uni, raid5 system;
> > > > the resync-throttling code doesn't seem to work well.
> > >
> > > It works great for me...
> > > What sort of drives do you have? SCSI? IDE? are you using both master
> > > and slave on an IDE controller?
> >
> > 15,000 RPM SCSI u160 disks.
>
> Just like mine.....
>
> I would expect around 30Mb/sec when resyncing a single mirrored pair,
> and slightly less than that on each if you are syncing two mirrored
> pairs at once, as you would be getting close to the theoretical buss
> max (to resync two pairs at once at 30Mb/sec each you would need to
> but pushing 120Mb/sec over the buss, and I doubt that you would get
> that from u160 in practice).
>
Just to follow up on this. I did some testing on my test machine
which has two u160 adaptec scsi chains (one on the mother board, on
a separate pci card) with a bunch of seagate st318451LC 18Gig 15000rpm
(Cheetah X15) drives which claim a transfer rate of 37.4 to 48.9
MB/sec.
That transfer rate is off a single track. This might be a bit
simplist, but it takes 4ms to read a track, and 0.7 to step to the
next track, so thats a 16% drop in throughput when writing multiple
consecutive tracks, so I would expect a maximum sustained throughout
of 31.8 to 41.6 MB/sec when writing.
If I create a raid1 array with one drive on each scsi chain, I get a
rebuild rate of about 40M/sec, which is what I would expect.
If I create a second while the first is still building, the rates drop
to about 38M/sec. I guess there is a bit more buss contention as we
are now at 50% of buss utilisation.
A third one and the speeds drop to around 31 M/sec, which is 93M/sec
on each buss.
If I create a raid5 array using 9 drives (4 on one channel, 5 on
other), and create it with one 8 working drives, one failed, and one
spare, then reconstruction starts on the spare at about 22Mbytes/sec.
This sees 110 Mbytes/sec pass over one of the SCSI channels.
So the drives are not maxing out, and nor are the scsi busses.
I'm curious to know where the speed loss is coming from, but I think
that on the whole, the raid layer is going quite a good job of
keeping the drives busy.
Note that if I create a RAID5 array without any failed or spare
drives, then reconstruction speed is much lower. I get 13M/sec.
This is because the "resync" process is optimised for an array that is
mostly in sync. "reconstruction" is a much more efficient way to
create a new raid5 array.
NeilBrown