2001-11-26 17:03:10

by Alok K. Dhir

[permalink] [raw]
Subject: Possible md bug in 2.4.16-pre1


On kernel 2.4.16-pre1 software RAID (tested with levels 0 and 1 on the
same two drives), it is not possible to "raidstop /dev/md0" after
mounting and using it, even though the partition is unmounted. Attempts
are rejected with "/dev/md0: Device or resource busy". Even shutting
down to single user mode does not release the device for stopping. I
had to reboot to single user mode, then I was able to stop it,
unconfigure it, etc.

Testing the throughput of Linux's software raid in levels raid1 and
raid0 with various chunksizes was somewhat more tedious because of this
problem...

Here is my (current) raidtab:

Raiddev /dev/md0
raid-level 0
nr-raid-disks 2
chunk-size 64k
persistent-superblock 1
nr-spare-disks 0
device /dev/sda2
raid-disk 0
device /dev/sdb1
raid-disk 1

Thanks...

Al



2001-11-26 19:35:12

by Peter T. Breuer

[permalink] [raw]
Subject: Re: Possible md bug in 2.4.16-pre1

"Alok K. Dhir wrote:"
> On kernel 2.4.16-pre1 software RAID (tested with levels 0 and 1 on the
> same two drives), it is not possible to "raidstop /dev/md0" after
> mounting and using it, even though the partition is unmounted. Attempts

Raid has been in quite a shocking state for a long while and
often there seems nor rhyme nor reason to its behaviour. If you want
to stick your machine in an endless loop, just try initialising a
mirror raid device with only one of its two components currently
working.

> are rejected with "/dev/md0: Device or resource busy". Even shutting

ya, ya. Try raidhotsetfaulty for good luck and then try raidhotremove.
(curse, splutter. Will the authors ever write some docs that make
sense. And also document the interactions with lvm).

> down to single user mode does not release the device for stopping. I
> had to reboot to single user mode, then I was able to stop it,

You just said you couldn't?

> unconfigure it, etc.
>
> Testing the throughput of Linux's software raid in levels raid1 and
> raid0 with various chunksizes was somewhat more tedious because of this

You ain't kidding. It's a nightmare to test upon.

> Here is my (current) raidtab:
>
> Raiddev /dev/md0
> raid-level 0
> nr-raid-disks 2
> chunk-size 64k
> persistent-superblock 1
> nr-spare-disks 0
> device /dev/sda2
> raid-disk 0
> device /dev/sdb1
> raid-disk 1

That's a standard setup. It's not even confusing to me! Try it over lvm
logical partitions and other raid devices (snicker).

Peter

2001-11-26 19:52:11

by Alok K. Dhir

[permalink] [raw]
Subject: RE: Possible md bug in 2.4.16-pre1

> > down to single user mode does not release the device for
> stopping. I
> > had to reboot to single user mode, then I was able to stop it,
>
> You just said you couldn't?

Apologies for being unclear - _shutting down_ does *not* allow it to
work (i.e. "shutdown now"). I must first issue a "reboot", and then use
the "-s" flag at the boot prompt to get to single user mode, and then I
am able to stop the raid device...

2001-11-26 20:43:11

by Anton Altaparmakov

[permalink] [raw]
Subject: Re: Possible md bug in 2.4.16-pre1

At 19:29 26/11/01, Peter T. Breuer wrote:
>"Alok K. Dhir wrote:"
> > On kernel 2.4.16-pre1 software RAID (tested with levels 0 and 1 on the
> > same two drives), it is not possible to "raidstop /dev/md0" after
> > mounting and using it, even though the partition is unmounted. Attempts
>
>Raid has been in quite a shocking state for a long while and
>often there seems nor rhyme nor reason to its behaviour. If you want
>to stick your machine in an endless loop, just try initialising a
>mirror raid device with only one of its two components currently
>working.

Define "long while"... Here RAID-0 is working fine. Admittedly the file
server is still on kernel 2.4.10-pre14 (+ some patches) but I can't be
bothered to reboot it to install a new kernel (uptime is growing nicely...).

When you simulate not working component of RAID-0 by marking it as such in
/etc/raidtab it works fine for me. I know because I used it when installing
the second disk and creating several RAID-0 arrays on my file server on the
new disk then copying the data accross, marking the old disk partitions as
the other working half of the raid arrays and letting md driver synchronize
them without any problems at all... I did a lot of raidstart/stops at that
time too without problems.

> > are rejected with "/dev/md0: Device or resource busy". Even shutting
>
>ya, ya. Try raidhotsetfaulty for good luck and then try raidhotremove.
>(curse, splutter. Will the authors ever write some docs that make
>sense. And also document the interactions with lvm).

Hm, I should try 2.4.16 some time to see if it breaks here...

Anton


--
"I've not lost my mind. It's backed up on tape somewhere." - Unknown
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Linux NTFS Maintainer / WWW: http://linux-ntfs.sf.net/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/

2001-11-27 01:32:47

by NeilBrown

[permalink] [raw]
Subject: Re: Possible md bug in 2.4.16-pre1


On Monday November 26, [email protected] wrote:
>
> On kernel 2.4.16-pre1 software RAID (tested with levels 0 and 1 on the
> same two drives), it is not possible to "raidstop /dev/md0" after
> mounting and using it, even though the partition is unmounted. Attempts
> are rejected with "/dev/md0: Device or resource busy". Even shutting
> down to single user mode does not release the device for stopping. I
> had to reboot to single user mode, then I was able to stop it,
> unconfigure it, etc.

I think this might be due to a buggy "raidstop". I seem to recall
someone having a similar problem some months ago. It turned out that
they we using a vendor supplied raidstop that did the wrong thing.

Could you try compiling raid-tools from
http://www.kernel.org/pub/linux/daemons/raid/alpha/raidtools-19990824-0.90.tar.bz2

and see if that works.

Alternaltely, get mdctl from

http://www.cse.unsw.edu.au/~neilb/source/mdctl/

and use
mdctl --stop /dev/md0

If this still doesn't work, please send me an "strace" of raidstop
running and failing.

NeilBrown

Subject: Re: Possible md bug in 2.4.16-pre1

"Peter T. Breuer" <[email protected]> writes:

>Raid has been in quite a shocking state for a long while and
>often there seems nor rhyme nor reason to its behaviour. If you want
>to stick your machine in an endless loop, just try initialising a
>mirror raid device with only one of its two components currently
>working.

Hm, what? That's my standard migration path from non-mirrored to
mirrored under 2.2.x You say, that this is not possible under 2.4?

Regards
Henning

--
Dipl.-Inf. (Univ.) Henning P. Schmiedehausen -- Geschaeftsfuehrer
INTERMETA - Gesellschaft fuer Mehrwertdienste mbH [email protected]

Am Schwabachgrund 22 Fon.: 09131 / 50654-0 [email protected]
D-91054 Buckenhof Fax.: 09131 / 50654-20