2001-04-15 18:19:21

by linas

[permalink] [raw]
Subject: fsck, raid reconstruction & bad bad 2.4.3


Hi,
I want to report a trio of raid-related problems. The third one is
very serious, and effectively prevents 2.4.3 from being usable (by me).

First problem: In kernel-2.4.2 and earlier, if the machine is not cleanly
shut down, then upon reboot, RAID reconstruction is automatically started.
(For RAID-1, this more-or-less ammounts to copying the entire contents
of one disk partition on one disk to another). The reconstruction
code seems to be clever: it will try to use the full bandwidth when
the system is idle, and it will throttle back when busy. It will
only throttle back so far: it tries to maintain at least a minimum amount
of work going, in order to gaurentee forward progress even on a busy system.

The problem: this dramatically slows fsck after an unclean shut-down.
You can hear the drives machine-gunning. I haven't stop-watch timed it,
but its on the order of 5x slower to fsck a raid partition when there's
reconstruction going on, then when the raid thinks its clean. This
makes unclean reboots quite painful.

(There is no config file to disable/alter this .. no work-around that I
know of ..)

--------
The second problem: oparallelizing fsck doesn't realize that different
/dev/md raid volumes are on the same physical disks, and thus tries
to parallelize .... again slowing things down. There is a work-around,
modify /etc/fstab to set the rder of fsck's. However, I doubt the HOWTO
really gets into this .... it would be nice to get fsck to 'do the
right thing'.

----------

Third problem:

I just tried boot 2.4.3 today. (after an unclean shutdown) fsck runs
at a crawl on my RAID-1 volume. It would take all day (!! literally)
to fsck. The disk-drive activity light flashes about once a second,
maybe once every two seconds. (with a corresponding click from the
drive).

On 2.4.2 kernels, the disk activity light is constantly on... and the
fsck proceeds apace.

Whatever it is that changed in 2.4.3, it makes unclean reboots
impossible ...


--linas





2001-04-15 19:59:32

by Colonel

[permalink] [raw]
Subject: Re: fsck, raid reconstruction & bad bad 2.4.3

In list.kernel, linas wrote:
>
>First problem: In kernel-2.4.2 and earlier, if the machine is not cleanly
>shut down, then upon reboot, RAID reconstruction is automatically started.
>(For RAID-1, this more-or-less ammounts to copying the entire contents
>of one disk partition on one disk to another). The reconstruction
>code seems to be clever: it will try to use the full bandwidth when
>the system is idle, and it will throttle back when busy. It will
>only throttle back so far: it tries to maintain at least a minimum amount
>of work going, in order to gaurentee forward progress even on a busy system.

And it works great!

>The problem: this dramatically slows fsck after an unclean shut-down.
>You can hear the drives machine-gunning. I haven't stop-watch timed it,
>but its on the order of 5x slower to fsck a raid partition when there's
>reconstruction going on, then when the raid thinks its clean. This
>makes unclean reboots quite painful.

Since the alternative is to sit there and do NOTHING until the
reconstruction is complete, ala Solaris 2.5, it's WONDERFUL the way it
is. This change was extensively discussed on the raid mailing list a
couple of years ago. You can look it up for review.

>(There is no config file to disable/alter this .. no work-around that I
>know of ..)

You can't be serious. Go sit down and think about what's going on.


>--------
>The second problem: oparallelizing fsck doesn't realize that different
>/dev/md raid volumes are on the same physical disks, and thus tries
>to parallelize .... again slowing things down. There is a work-around,
>modify /etc/fstab to set the rder of fsck's. However, I doubt the HOWTO
>really gets into this .... it would be nice to get fsck to 'do the
>right thing'.

You probably have your fstab incorrectly setup.

<snip>
#> In particular, how does fsck deal with md devices? It parallelizes
#> itself for multiple disks, but if the volumes are all actually striped
#> over the same disks, fsck will perform better if it's serial.
#
#The "pass" field in /etc/fstab is for exactly this: fsck -a will
#serialise devices with different pass numbers. Pass==1 is for root,
#pass==2 is for normal devices which fsck knows how to serialise. If you
#want to force serialisation on md devices, use larger pass numbers.
</snip>

Do a little work, it won't hurt you. Fsck should not (and may not be
able to) decode metadevice structures.


Your third part was ignored, given the above.

2001-04-16 01:15:27

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: fsck, raid reconstruction & bad bad 2.4.3

In article <[email protected]> you wrote:
>>(There is no config file to disable/alter this .. no work-around that I
>>know of ..)

> You can't be serious. Go sit down and think about what's going on.

Well, there are two potential solutions:

a) stop rebuild until fsck is fixed
b) wait with fsck until rebuild is fixed

Both of them are valid. The first one is valid in a scenario where you want to
decrease downtimes in favor of insecure operation/or multiple redundancy

The second one is good if you prefer data consitency over small down times. It
might actually speed up the bootup process, one has to measure this.

Greetings
Bernd

2001-04-16 02:30:44

by Jesse Pollard

[permalink] [raw]
Subject: Re: fsck, raid reconstruction & bad bad 2.4.3

On Sun, 15 Apr 2001, Bernd Eckenfels wrote:
>In article <[email protected]> you wrote:
>>>(There is no config file to disable/alter this .. no work-around that I
>>>know of ..)
>
>> You can't be serious. Go sit down and think about what's going on.
>
>Well, there are two potential solutions:
>
>a) stop rebuild until fsck is fixed

And let fsck read bad data because the raid doesn't yet recognize the correct
one....

There is nothing to fix in fsck. It should NOT know about the low level
block storage devices. If it does, then fsck for EACH filesystem will
have to know about ALL different raid hardware/software implementations.

>b) wait with fsck until rebuild is fixed

Depends on your definition of "fixed". The most I can see to fix is
reduce the amount of continued update in favor of updating those blocks
being read (by fsck or anything else). This really ought to be a runtime
configuration option. If it is set to 0, then no automatic repair would
be done.

-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]

Any opinions expressed are solely my own.

2001-04-16 02:41:16

by Jonathan Lundell

[permalink] [raw]
Subject: Re: fsck, raid reconstruction & bad bad 2.4.3

At 9:23 PM -0500 2001-04-15, Jesse Pollard wrote:
> >b) wait with fsck until rebuild is fixed
>
>Depends on your definition of "fixed". The most I can see to fix is
>reduce the amount of continued update in favor of updating those blocks
>being read (by fsck or anything else). This really ought to be a runtime
>configuration option. If it is set to 0, then no automatic repair would
>be done.

The original post was referring to RAID 1; there's no repair necessary at the RAID level to give fsck the correct data. Seems to me the basic problem here is that the RAID re-sync is supposed to be throttling back to allow other activity to run, but that in the case of fsck the other activity is still slower by a large factor (compared to no RAID re-sync).

Is this a pathological case because of the way fsck does business, or does the RAID re-sync affect any disk-bound process that severely?
--
/Jonathan Lundell.

2001-04-16 03:30:26

by Jesse Pollard

[permalink] [raw]
Subject: Re: fsck, raid reconstruction & bad bad 2.4.3

On Sun, 15 Apr 2001, Jonathan Lundell wrote:
>At 9:23 PM -0500 2001-04-15, Jesse Pollard wrote:
>> >b) wait with fsck until rebuild is fixed
>>
>>Depends on your definition of "fixed". The most I can see to fix is
>>reduce the amount of continued update in favor of updating those blocks
>>being read (by fsck or anything else). This really ought to be a runtime
>>configuration option. If it is set to 0, then no automatic repair would
>>be done.
>
>The original post was referring to RAID 1; there's no repair necessary at
>the RAID level to give fsck the correct data. Seems to me the basic problem
>here is that the RAID re-sync is supposed to be throttling back to allow other
>activity to run, but that in the case of fsck the other activity is still
>slower by a large factor (compared to no RAID re-sync).

If I've got the numbering right;
0 - concatenated stripes => no sync required
1 - mirrored => resync required
a: which drive has the correct info?
b: having determined that, read the correct block,
it must now be written to the mirror.
all others => resync required (rebuild possible bad block)

>Is this a pathological case because of the way fsck does business, or does
>the RAID re-sync affect any disk-bound process that severely?

My experience has been with hardware raid, and even then there has been
a 1-5% decrease in I/O during resync (not accurately measured - fsck took
longer, and then only when the channel is maxed out -- otherwise the 1-5% is
not visible; filesystem was 3 IRIX efs, spread across two raid luns).

fsck is particularly bad, since nearly every read instigates a write to
the mirror drive. Fsck can then modify the block and write it back, causing
two more writes; for a total of 3 writes for a read (worst case).

It does mean that when fsck finishes, MOST of the re-sync will be finished.
All of the metadata will be synced, and only file data blocks will remain.

--
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]

Any opinions expressed are solely my own.

2001-04-16 04:05:23

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: fsck, raid reconstruction & bad bad 2.4.3

In article <[email protected]> you wrote:
> Is this a pathological case because of the way fsck does business, or does the RAID re-sync affect any disk-bound process that severely?

i gues the seeks are the problem. fsck will quite heavyly reposition, so does
the rebuild, most likely on different ends of the disk.

Greetings
Bernd

2001-04-16 04:04:12

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: fsck, raid reconstruction & bad bad 2.4.3

In article <01041521302600.15046@tabby> you wrote:
>>a) stop rebuild until fsck is fixed

> And let fsck read bad data because the raid doesn't yet recognize the correct
> one....

a degraded raid will not deliver broken data. and even if it does, one more
reason not to check a degraded raid.

> There is nothing to fix in fsck. It should NOT know about the low level
> block storage devices. If it does, then fsck for EACH filesystem will
> have to know about ALL different raid hardware/software implementations.

fsck does not neet to be changed, yoi can have a shell script loop and check
the raid state before caling the fsck.

>>b) wait with fsck until rebuild is fixed

> Depends on your definition of "fixed"

fixed as in rebuild, thats what we where tlking about, no?

. The most I can see to fix is
> reduce the amount of continued update in favor of updating those blocks
> being read (by fsck or anything else). This really ought to be a runtime
> configuration option. If it is set to 0, then no automatic repair would
> be done.

yes would be a nice feature if rebuild can be made to only to io which is
required by the kernel anyway. since fsck will reach a lot of meta data this
is a fairly good start for a slow rebuild.

Greetings
Bernd

2001-04-16 14:17:17

by Francois Romieu

[permalink] [raw]
Subject: Re: fsck, raid reconstruction & bad bad 2.4.3 + some numbers

Linas Vepstas <[email protected]> ?crit :
[...]
> I want to report a trio of raid-related problems. The third one is
> very serious, and effectively prevents 2.4.3 from being usable (by me).
>
[...]
> The problem: this dramatically slows fsck after an unclean shut-down.
> You can hear the drives machine-gunning. I haven't stop-watch timed it,
> but its on the order of 5x slower to fsck a raid partition when there's
> reconstruction going on, then when the raid thinks its clean. This
> makes unclean reboots quite painful.

Here 2.4.3 takes 15-20 minutes to fsck a 45 Go RAID1 fs after an
unclean shutdown (init=/bin/sh + raid1 autodetect = *boom*). The whole
reconstruction takes an hour (but system is in use).
The disks are IBM-DTLA-307045 on two differents ports of a PIIX4.
The machines includes two 9Go on a 53c875 that perform equally well
(tough a bit noisy during normal startup).

[2.4.3 experience]
> Whatever it is that changed in 2.4.3, it makes unclean reboots
> impossible ...

What does the swap says during the fsck ?

--
Ueimor

2001-04-16 19:06:35

by Jakob Oestergaard

[permalink] [raw]
Subject: Re: fsck, raid reconstruction & bad bad 2.4.3

On Sun, Apr 15, 2001 at 09:51:52PM -0500, Jesse Pollard wrote:
...
>
> If I've got the numbering right;
> 0 - concatenated stripes => no sync required
> 1 - mirrored => resync required
> a: which drive has the correct info?

a: there are timestamps in the superblocks. one disk is simply chosen as
the "newest", and the other disks are updateed from that one.

> b: having determined that, read the correct block,
> it must now be written to the mirror.
> all others => resync required (rebuild possible bad block)

If a resync is required, it is because we can derive the correct information
*without* having the array in sync (resync puts correct information in places
where we know it's not currently correct). fsck can never see "old" information,
no matter the state of the resync. If it could, RAID would be broken.

Levels 1, 4 and 5 need a resync - but they all give you the correct
information even *before* resync.

Levels 0 and linear cannot be synced (and will fail if you pull a disk, all because
they do not have redundant information). Talking about sync in these cases doesn't
make sense.

>
> >Is this a pathological case because of the way fsck does business, or does
> >the RAID re-sync affect any disk-bound process that severely?
>
> My experience has been with hardware raid, and even then there has been
> a 1-5% decrease in I/O during resync (not accurately measured - fsck took
> longer, and then only when the channel is maxed out -- otherwise the 1-5% is
> not visible; filesystem was 3 IRIX efs, spread across two raid luns).

I think the problem is that RAID throttles based on bandwidth, not based on
requests. If fsck needs a lot of seeking, the RAID code won't notice that
the array is being used much, and thus won't throttle a lot.

Also, even if fsck does large sequential reads, the RAID code may throttle,
but it will then introduce small frequent seeks to when it updates a few
blocks every now and then. Seeks have a huge impact on otherwise sequential
reads.

>
> fsck is particularly bad, since nearly every read instigates a write to
> the mirror drive. Fsck can then modify the block and write it back, causing
> two more writes; for a total of 3 writes for a read (worst case).

Oh, do we resync blocks that are read ? I didn't know that actually...

This should give little overhead on RAID-1, since most reads would be done
on one disk and the "older disk" (the one being synced with information from
the newer one) would only be written to.

>
> It does mean that when fsck finishes, MOST of the re-sync will be finished.
> All of the metadata will be synced, and only file data blocks will remain.
>

If we resync RAID-1 blocks as they are read, I don't understand the claimed
performance impact of resync.

--
................................................................
: [email protected] : And I see the elder races, :
:.........................: putrid forms of man :
: Jakob ?stergaard : See him rise and claim the earth, :
: OZ9ABN : his downfall is at hand. :
:.........................:............{Konkhra}...............:

2001-04-17 11:54:12

by Ookhoi

[permalink] [raw]
Subject: Re: fsck, raid reconstruction & bad bad 2.4.3

Hi Linas Vepstas,

(nice name ;-)

> First problem: In kernel-2.4.2 and earlier, if the machine is not cleanly
> shut down, then upon reboot, RAID reconstruction is automatically started.
> (For RAID-1, this more-or-less ammounts to copying the entire contents
> of one disk partition on one disk to another). The reconstruction
> code seems to be clever: it will try to use the full bandwidth when
> the system is idle, and it will throttle back when busy. It will
> only throttle back so far: it tries to maintain at least a minimum amount
> of work going, in order to gaurentee forward progress even on a busy system.
>
> The problem: this dramatically slows fsck after an unclean shut-down.
> You can hear the drives machine-gunning. I haven't stop-watch timed it,
> but its on the order of 5x slower to fsck a raid partition when there's
> reconstruction going on, then when the raid thinks its clean. This
> makes unclean reboots quite painful.
>
> (There is no config file to disable/alter this .. no work-around that I
> know of ..)

One possible 'work-around' is to use a journaling filesystem (like
reiserfs) which eliminates the fsck after a unclean shutdown. It's very
nice to have a crashed system back online fast. The raid sync makes the
system a bit slow, but as you said, it syncs at full speed when idle,
and is nice when less idle. :-)

Ookhoi