Subject: Locking md device and system for several seconds

Hi!

I'm using kernel 2.6.14.2 with md (RAID1 static) as bootable.

While md synching (initial creation or after marked one as failed,
removed and re-added) there are some locking problems with the
complete system/kernel.

Sometimes the system hangs (looks like a file/disk-access-lock)
while other tty's still work (until they access also to the disk).

This "hang" is some seconds (most from 10s up to 1 minute, seldom
more) and surprisedly the system continues working.

If md is in correct state (all partitions synced) this issue
doesn't seem to appear.

Configuration
4 Partitions (/boot 1GB, / 32GB, swap 16GB, /home 250GB) on
MaxLine III SATA 300GB Disks. Each of them (including swap)
is a RAID 1 device in the listed order.
This system has a Opteron 270 with nVidia Professional Chipset.

There are NO log entries found anywhere and no console warning/error.

These are 4 systems with she same behaviour.

Did anybody ever reported such an issue or has an idea?

Miro Dietiker


2005-11-13 10:41:30

by NeilBrown

[permalink] [raw]
Subject: Re: Locking md device and system for several seconds

On Sunday November 13, [email protected] wrote:
> Hi!
>
> I'm using kernel 2.6.14.2 with md (RAID1 static) as bootable.
>
> While md synching (initial creation or after marked one as failed,
> removed and re-added) there are some locking problems with the
> complete system/kernel.

Can you check which IO scheduler the drives are using, try different
schedulers, and see if it makes a different.

grep . /sys/block/*/queue/scheduler

will show you (the one in [brackets] is active).
Then just echo a new value out to each file.

I've had one report that [anticipatory] causes this problem and [cfq]
removes it. Could you confirm that?

Thanks,

NeilBrown

Subject: AW: Locking md device and system for several seconds

:-)

>Can you check which IO scheduler the drives are using, try different
>schedulers, and see if it makes a different.

there was [anticipatory] selected.

ORIGINAL:
tiger:~# grep . /sys/block/*/queue/scheduler
/sys/block/fd0/queue/scheduler:noop [anticipatory] deadline cfq
/sys/block/hdd/queue/scheduler:noop [anticipatory] deadline cfq
/sys/block/sda/queue/scheduler:noop [anticipatory] deadline cfq
/sys/block/sdb/queue/scheduler:noop [anticipatory] deadline cfq

NEW:
tiger:~# grep . /sys/block/*/queue/scheduler
/sys/block/fd0/queue/scheduler:noop anticipatory deadline [cfq]
/sys/block/hdd/queue/scheduler:noop anticipatory deadline [cfq]
/sys/block/sda/queue/scheduler:noop anticipatory deadline [cfq]
/sys/block/sdb/queue/scheduler:noop anticipatory deadline [cfq]

System seems to work, but I need some testing time to check that
behaviour. (Any suggestion of a testing tool to generate disk
traffic and reporting response-times and throughput?)

Which is the right way / position on bootup to set this field
permanent to this value and what exactly did I change with this
modification? (Performance issues?)
I'm using debian..

I also need to check this on the other (identical) machines.

Thanks! Miro Dietiker

-----Urspr?ngliche Nachricht-----
Von: Neil Brown [mailto:[email protected]]
Gesendet: Sonntag, 13. November 2005 11:41
An: Miro Dietiker, MD Systems
Cc: [email protected]
Betreff: Re: Locking md device and system for several seconds

On Sunday November 13, [email protected] wrote:
> Hi!
>
> I'm using kernel 2.6.14.2 with md (RAID1 static) as bootable.
>
> While md synching (initial creation or after marked one as failed,
> removed and re-added) there are some locking problems with the
> complete system/kernel.

Can you check which IO scheduler the drives are using, try different
schedulers, and see if it makes a different.

grep . /sys/block/*/queue/scheduler

will show you (the one in [brackets] is active).
Then just echo a new value out to each file.

I've had one report that [anticipatory] causes this problem and [cfq]
removes it. Could you confirm that?

Thanks,

NeilBrown

2005-11-13 12:00:09

by Philippe Pegon

[permalink] [raw]
Subject: Re: AW: Locking md device and system for several seconds

Miro Dietiker, MD Systems wrote:
> :-)
>
>
>>Can you check which IO scheduler the drives are using, try different
>>schedulers, and see if it makes a different.
>
>
> there was [anticipatory] selected.
>
> ORIGINAL:
> tiger:~# grep . /sys/block/*/queue/scheduler
> /sys/block/fd0/queue/scheduler:noop [anticipatory] deadline cfq
> /sys/block/hdd/queue/scheduler:noop [anticipatory] deadline cfq
> /sys/block/sda/queue/scheduler:noop [anticipatory] deadline cfq
> /sys/block/sdb/queue/scheduler:noop [anticipatory] deadline cfq
>
> NEW:
> tiger:~# grep . /sys/block/*/queue/scheduler
> /sys/block/fd0/queue/scheduler:noop anticipatory deadline [cfq]
> /sys/block/hdd/queue/scheduler:noop anticipatory deadline [cfq]
> /sys/block/sda/queue/scheduler:noop anticipatory deadline [cfq]
> /sys/block/sdb/queue/scheduler:noop anticipatory deadline [cfq]
>
> System seems to work, but I need some testing time to check that
> behaviour. (Any suggestion of a testing tool to generate disk
> traffic and reporting response-times and throughput?)
>
> Which is the right way / position on bootup to set this field
> permanent to this value and what exactly did I change with this
> modification? (Performance issues?)
> I'm using debian..

you can use the kernel argument elevator=cfq in your lilo or grub boot
config file.

you can read this article about cfq :

http://lwn.net/Articles/143474/

for information, it seems cfq is used in the default kernel in some
distributions.

>
> I also need to check this on the other (identical) machines.
>
> Thanks! Miro Dietiker

--
Philippe Pegon