2005-12-13 05:23:06

by CaT

[permalink] [raw]
Subject: anticipatory scheduler and raid rebuild

I'll be able to play a bit more with this later but for now I thought
I'd toss it into the wilderness.

I had jsut setup a nice little server with WD 10k drives and s/w raid 1.
The kernel is 2.6.14.3. The CPU is a p4 3Ghz and it's an Intel 82875P
chipset. In order to test that it'll build ok with missing disks I
pulled one out, booted, shutdown, put it back in and rebooted. I then
went on to try and get one of the raids to rebuild with:

mdadm --manage -a /dev/md6 /dev/sdb8

And then the server slowed to a crawl. Well not even that. It slowed to
the point of freezing and occasionally stuttering with activity other
then the rebuild. I got a similar reaction when it was rebuilding it.
Essentially, whilst the rebuild was in progress no disk io or even /sys
and /proc io could be done then a flurry of excitement and then back to
the freeze.

I then remembered that I could change schedulers on the disks and so did
the following (having read cfq is a nice choice):

echo cfq >/sys/block/sda/queue/scheduler
echo cfq >/sys/block/sdb/queue/scheduler

And the system became usable right after the commends were allowed to
pass through. Whilst the load (when I could get a reading) was 8+ before
it now hangs around 0.8-1 and the system is very responsive. The rebuild
speed is slower now; 25k vs 40-50k before.

So, does my hardware suck and AS is pushing it beyond its limits or is
AS unsuitable for the task I am putting it through or is AS buggy and
all should be well with it?

--
"To the extent that we overreact, we proffer the terrorists the
greatest tribute."
- High Court Judge Michael Kirby


2005-12-18 23:37:11

by NeilBrown

[permalink] [raw]
Subject: Re: anticipatory scheduler and raid rebuild

On Tuesday December 13, [email protected] wrote:
> I'll be able to play a bit more with this later but for now I thought
> I'd toss it into the wilderness.

Thanks.

>
> I had jsut setup a nice little server with WD 10k drives and s/w raid 1.
> The kernel is 2.6.14.3. The CPU is a p4 3Ghz and it's an Intel 82875P
> chipset. In order to test that it'll build ok with missing disks I
> pulled one out, booted, shutdown, put it back in and rebooted. I then
> went on to try and get one of the raids to rebuild with:
>
> mdadm --manage -a /dev/md6 /dev/sdb8
>
> And then the server slowed to a crawl. Well not even that. It slowed to
> the point of freezing and occasionally stuttering with activity other
> then the rebuild. I got a similar reaction when it was rebuilding
> it.

I've heard reports of this sort of thing before I think, but I'm
wondering why I never experience it.
What sort of drives do you have? What controller?
What filesystem are you running over the raid1?

>
> So, does my hardware suck and AS is pushing it beyond its limits or is
> AS unsuitable for the task I am putting it through or is AS buggy and
> all should be well with it?

I suspect it is an odd interaction between md/raid1/rebuild and AS.
AS tries to guess how a process is behaving and the raid1/rebuild
process probably is confusing it. But it is hard to say how until I
can reproduce it.

NeilBrown

2005-12-19 01:17:59

by CaT

[permalink] [raw]
Subject: Re: anticipatory scheduler and raid rebuild

On Mon, Dec 19, 2005 at 10:36:53AM +1100, Neil Brown wrote:
> > mdadm --manage -a /dev/md6 /dev/sdb8
> >
> > And then the server slowed to a crawl. Well not even that. It slowed to
> > the point of freezing and occasionally stuttering with activity other
> > then the rebuild. I got a similar reaction when it was rebuilding
> > it.
>
> I've heard reports of this sort of thing before I think, but I'm
> wondering why I never experience it.
> What sort of drives do you have? What controller?

2 x WD 10k RPM 74GB in this case. The other experience was on 36GB versions.
Controller is:

0000:00:1f.2 RAID bus controller: Intel Corp. 6300ESB SATA RAID
Controller (rev 02)

RAID is turned off so it's acting as a straight SATA controller.

The server itself is an IBM x306 xServe.

> What filesystem are you running over the raid1?

ext3.

> > So, does my hardware suck and AS is pushing it beyond its limits or is
> > AS unsuitable for the task I am putting it through or is AS buggy and
> > all should be well with it?
>
> I suspect it is an odd interaction between md/raid1/rebuild and AS.
> AS tries to guess how a process is behaving and the raid1/rebuild
> process probably is confusing it. But it is hard to say how until I
> can reproduce it.

Well so far I've had a 100% reproduction rate. (joy? :)

I'm also experiencing something similar on another box. This time the
kernel is 2.6.8.1 (mandrake kernel build) but the hardware (apart from
hds) is thesame. The HDs are 2 sata drives (an 80gb Maxtor and a 160gb
Seagate) and a USB drive (200GB Seagate). The 80gb Maxtor is very busy
but the box is coasting along. When I do an rsync from the USB drive to
the 160GB Seagate all hell breaks loose. After a few seconds (maybe
15-30) load starts hitting 100 and the whole system begins to stutter.
Mail delivery on the 80GB almost stops and the queue shoots up. It takes
a long time for the system to recover. Even after I ^Z the rsync the
load reminds absurdly high and the queue continues to build. Slowly
though things calm down and maybe half an hour later things are back to
'normal'. AS is in use.

I'm going to see if the same thing happens rsyncing from the 80GB Maxtor to
the 160GB seagate. I'll also probably upgrade the kernel (it IS rather
old and I have a downtime schedueled soon so I should be able to do
this) to a recent 2.6.14. I figured I'd throw this into the mix as the
symptoms appear similar (although that could be coincidance).

Oh, and rsyncing from the 80GB maxtor to the USB drive didn't seem to
hurt things anywhere near as much. I was able to rsync the entire 80GB
drive with the rsync being 2 mins on followed by 2 mins off (ie I let it
copy for 2 mins, driving the load and queue up and then letting it rest
for 2 mins of normal load let it recover fully most of the time), That was
with an older mandrake build of 2.6.8.1 though.

If you need anything from me to help with this ask and I'll see what I
can do.

--
"To the extent that we overreact, we proffer the terrorists the
greatest tribute."
- High Court Judge Michael Kirby

2005-12-19 04:44:05

by CaT

[permalink] [raw]
Subject: Re: anticipatory scheduler and raid rebuild

On this part of it:

On Mon, Dec 19, 2005 at 12:18:00PM +1100, CaT wrote:
> I'm going to see if the same thing happens rsyncing from the 80GB Maxtor to
> the 160GB seagate. I'll also probably upgrade the kernel (it IS rather

I did a cp instead.

Maxtor -> Seagate: manageable load (2min on/2min off)
USB -> Seagate: looks ok for about 15-30seconds and load starts
climbing. I stopped the cp at approx 50 and it kept on climbing until
it hit 93 and then started backing off. It's been maybe 5 minutes and
the server still has not recovered though the load is still dropping
(I can tell it has not recovered yet as the queue keeps on climbing -
in the m->s side of things the queue begins dropping off almost
immediately) - reads from disks are slow (10 seconds to read an 88
entry die with ls -la and possible longer to write a 1k text file
that was just opened). About 4-5 minuts later still the queue finally
starts to drop.
Maxtor -> USB: just fine.

Not sure if this is of interested. The kernel may well be too old to
make it so. It may be that not even the io scheduler is at issue but the
behaviour looks so similar to that of the raid rebuild behaviour I
described that I figured I'd follow up.

If it's of no interest then I'll drop it.

> old and I have a downtime schedueled soon so I should be able to do
> this) to a recent 2.6.14. I figured I'd throw this into the mix as the

I don't think this'll happen as the next outage is too close to make
this of (much) use.

--
"To the extent that we overreact, we proffer the terrorists the
greatest tribute."
- High Court Judge Michael Kirby