2002-11-13 00:16:18

by Brian Jackson

[permalink] [raw]
Subject: md on shared storage

Here's a question for all those out there that are smarter than me(so I
guess that's most of you then :) I looked around (google, kernel source,
etc.) trying to find the answer, but came up with nothing.

Does the MD driver work with shared storage? I would also be interested to
know if the new DM driver works with shared storage(though I must admit I
didn't really try to answer this one myself, just hoping somebody will
know).

I ask because I seem to be having some strange problems with an md device on
shared storage(Qlogic FC controllers). The qlogic drivers spit out messages
for about 20-60 lines then the machines lock up. So the drivers were my
first suspicion, but they were working okay before. So I went back and got
rid of the md device and now everything is working again. Anybody got any
ideas?

My logic says that it should work fine with shared storage, but my recent
experience says my logic is wrong.

--Brian Jackson


2002-11-13 01:11:29

by Steven Dake

[permalink] [raw]
Subject: Re: md on shared storage

Brian,

The RAID driver does indeed work with shared storage, if you don't have
RAID autostart set as the partition type. If you do, each host will try
to rebuild the RAID array resulting in really bad magic.

I posted patches to solve this problem long ago to this list and
linux-raid, but Neil Brown (md maintainer) rejected them saying that
access to a raid volume should be controlled by user space, not by the
kernel. Everyone is entitled to their opinions I guess. :)

The patch worked by locking RAID volumes to either a FibreChannel host
WWN (qlogic only) or scsi host id. This ensured that if a raid volume
was started, it could ONLY be started on the host that created it. This
worked for the autostart path as well as the start path via IOCTL.

I also modified mdadm to handle takeover for failed nodes to takeover
RAID arrays.

I'm extending this type of support into LVM volume groups as we speak.
If you would like to see the patch when I'm done mail me and I'll send
it out. This only applies to 2.4.19.

Thanks
-steve

Brian Jackson wrote:

> Here's a question for all those out there that are smarter than me(so
> I guess that's most of you then :) I looked around (google, kernel
> source, etc.) trying to find the answer, but came up with nothing.
> Does the MD driver work with shared storage? I would also be
> interested to know if the new DM driver works with shared
> storage(though I must admit I didn't really try to answer this one
> myself, just hoping somebody will know).
> I ask because I seem to be having some strange problems with an md
> device on shared storage(Qlogic FC controllers). The qlogic drivers
> spit out messages for about 20-60 lines then the machines lock up. So
> the drivers were my first suspicion, but they were working okay
> before. So I went back and got rid of the md device and now everything
> is working again. Anybody got any ideas?
> My logic says that it should work fine with shared storage, but my
> recent experience says my logic is wrong.
> --Brian Jackson
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
>

2002-11-13 02:44:30

by Michael Clark

[permalink] [raw]
Subject: Re: md on shared storage

On 11/13/02 08:25, Brian Jackson wrote:
> Here's a question for all those out there that are smarter than me(so I
> guess that's most of you then :) I looked around (google, kernel source,
> etc.) trying to find the answer, but came up with nothing.
> Does the MD driver work with shared storage? I would also be interested
> to know if the new DM driver works with shared storage(though I must
> admit I didn't really try to answer this one myself, just hoping
> somebody will know).

They should work, obviously with some caveats. Having 2 hosts both
trying to reconstruct the same md RAID1 may cause some troubles.

> I ask because I seem to be having some strange problems with an md
> device on shared storage(Qlogic FC controllers). The qlogic drivers spit
> out messages for about 20-60 lines then the machines lock up. So the
> drivers were my first suspicion, but they were working okay before. So I
> went back and got rid of the md device and now everything is working
> again. Anybody got any ideas?

Could be a stack related problem with the qlogic driver. The additional
stack pressure of the md layer perhaps ?? The 20-60 lines of logs in
would probably give some ideas.

I have a couple of shared storage clusters that were using qla2300
driver, ext3 and LVM1 and they would periodically ooops. Removed LVM
and the systems are now rock solid.

~mc

2002-11-13 02:53:37

by Michael Clark

[permalink] [raw]
Subject: Re: md on shared storage

On 11/13/02 09:19, Steven Dake wrote:
> Brian,
>
> The RAID driver does indeed work with shared storage, if you don't have
> RAID autostart set as the partition type. If you do, each host will try
> to rebuild the RAID array resulting in really bad magic.
>
> I posted patches to solve this problem long ago to this list and
> linux-raid, but Neil Brown (md maintainer) rejected them saying that
> access to a raid volume should be controlled by user space, not by the
> kernel. Everyone is entitled to their opinions I guess. :)
>
> The patch worked by locking RAID volumes to either a FibreChannel host
> WWN (qlogic only) or scsi host id. This ensured that if a raid volume
> was started, it could ONLY be started on the host that created it. This
> worked for the autostart path as well as the start path via IOCTL.
>
> I also modified mdadm to handle takeover for failed nodes to takeover
> RAID arrays.
>
> I'm extending this type of support into LVM volume groups as we speak.
> If you would like to see the patch when I'm done mail me and I'll send
> it out. This only applies to 2.4.19.

I'm interested in finding what magic is required to get a stable
setup with qlogic drivers and LVM. I have tested many kernel combinations,
vendor kernels, stock, -aa and variety of different qlogic drivers
inclusing the one with the alleged stack hog fixes and they all ooops
when using LVM (can take up to 10 days of production load). Removing
LVM 45 days ago and now I have 45 days uptime on these boxes.

I'm currently building a test setup to try and excercise this problem
as all my other boxes with qlogic cards are production and can't be
played with. I really miss having volume management and a SAN setup
is really where you need it the most.

~mc

2002-11-13 03:06:03

by Brian Jackson

[permalink] [raw]
Subject: Re: md on shared storage


I am testing OpenGFS on this hardware(it is on loan from OSDL), I could
probably do some testing for you if you have some specifics you want to try.
I am having trouble with the volume management portion of OpenGFS also(but I
don't necessarily think they are related).

--Brian Jackson


<snip>
>
> I'm interested in finding what magic is required to get a stable
> setup with qlogic drivers and LVM. I have tested many kernel combinations,
> vendor kernels, stock, -aa and variety of different qlogic drivers
> inclusing the one with the alleged stack hog fixes and they all ooops
> when using LVM (can take up to 10 days of production load). Removing
> LVM 45 days ago and now I have 45 days uptime on these boxes.
>
> I'm currently building a test setup to try and excercise this problem
> as all my other boxes with qlogic cards are production and can't be
> played with. I really miss having volume management and a SAN setup
> is really where you need it the most.
>
> ~mc
>

2002-11-13 11:39:39

by Lars Marowsky-Bree

[permalink] [raw]
Subject: Re: md on shared storage

On 2002-11-12T18:25:29,
Brian Jackson <[email protected]> said:

> Does the MD driver work with shared storage? I would also be interested to
> know if the new DM driver works with shared storage(though I must admit I
> didn't really try to answer this one myself, just hoping somebody will
> know).

The short answer is "Not sanely", as far as I know.

RAID0 might be okay, however RAID1/5 get into issues if two nodes update the
same data in parallel; they do not coordinate the writes, and thus might stomp
over each other.

In theory, given a RAID1 with disks {d1,d2}, node n1 might write in order
(d2,d1) while n2 writes as (d1,d2), resulting in inconsistent mirrors. This
becomes a bigger race window for RAID5, obviously, because more disks are
involved.

The "multiple nodes beginning to reconstruct the same md device" is also a
problem; but even if that was solved that only one node does the recovery, the
others would be blocked from doing any IO on that drive for the time being.

Another issue that any node will want to update the md superblock regularly.

LVM is fine, MD doesn't seem to be.

With the MD patches I posted weeks ago, at least MD multipathing should work
appropriately; even if the ugliness of multiple nodes scribbling over the
superblock remains, it shouldn't matter because the autodetection is based
only on the UUID for m-p.

In short, you can do "MD", if you don't use it as "shared"; have only one node
have a given md device active at any point in time. Thus, no autostart, but
manual activation. This rules out "GFS over md", basically.

If you want to fix that, it would be cool; it will just require a DLM,
membership and communication services in the kernel. ;-)


Sincerely,
Lars Marowsky-Br?e <[email protected]>

--
Principal Squirrel
SuSE Labs - Research & Development, SuSE Linux AG

"If anything can go wrong, it will." "Chance favors the prepared (mind)."
-- Capt. Edward A. Murphy -- Louis Pasteur

2002-11-13 13:15:23

by Michael Clark

[permalink] [raw]
Subject: Re: md on shared storage

Basically my last setup was ext3 + LVM1 + qla2x00 (6.0.1b3).
This is exactly what is in 2.4.19pre10aa4. With general fileserver
load using netatalk on one machine, openldap on another machine
and oracle on another. One of them would oops every 5-8 days.

I haven't come up with a repeatable test yet to generate the oops.

I intend on trying fsx + bonnie in parallel on a test setup with same
kernels to see if I can reproduce the oops. Then i'll see if I can
reproduce with qla2300 6.01 driver. If still not okay, i'll try dm
and evms. Basically I want a stable ext3 + volume manager + qla2x00

~mc

On 11/13/02 11:15, Brian Jackson wrote:
>
> I am testing OpenGFS on this hardware(it is on loan from OSDL), I could
> probably do some testing for you if you have some specifics you want to
> try. I am having trouble with the volume management portion of OpenGFS
> also(but I don't necessarily think they are related).
> --Brian Jackson
>
> <snip>
>
>>
>> I'm interested in finding what magic is required to get a stable
>> setup with qlogic drivers and LVM. I have tested many kernel
>> combinations,
>> vendor kernels, stock, -aa and variety of different qlogic drivers
>> inclusing the one with the alleged stack hog fixes and they all ooops
>> when using LVM (can take up to 10 days of production load). Removing
>> LVM 45 days ago and now I have 45 days uptime on these boxes.
>> I'm currently building a test setup to try and excercise this problem
>> as all my other boxes with qlogic cards are production and can't be
>> played with. I really miss having volume management and a SAN setup
>> is really where you need it the most.
>> ~mc

2002-11-13 15:28:53

by Eric Weigle

[permalink] [raw]
Subject: Re: md on shared storage

> ...
> I ask because I seem to be having some strange problems with an md device
> on shared storage(Qlogic FC controllers). The qlogic drivers spit out
> messages for about 20-60 lines then the machines lock up. So the drivers
> ...
If those messages look like "no handle slots, this should not happen" and
you're running a 2.4.x kernel there's a known problem with the qlogicfc
driver locking up the machine under high load. It's been fixed in 2.5,
but I don't think it's been back-ported yet.

So, aside from the _other_ problems induced by shared storage, this might
be biting you too. :)

See:
http://www.uwsg.iu.edu/hypermail/linux/kernel/0209.0/0467.html


Thanks,
-Eric

--
------------------------------------------------
Eric H. Weigle -- http://public.lanl.gov/ehw/
------------------------------------------------


Attachments:
(No filename) (834.00 B)
(No filename) (232.00 B)
Download all attachments

2002-11-13 17:09:27

by Steven Dake

[permalink] [raw]
Subject: Re: md on shared storage

Lars,

Another method is to lock an md array to a specific host. This method
requires no DLM (since there is no shared write to the same array
capability).

Thanks
-steve

Lars Marowsky-Bree wrote:

>On 2002-11-12T18:25:29,
> Brian Jackson <[email protected]> said:
>
>
>
>>Does the MD driver work with shared storage? I would also be interested to
>>know if the new DM driver works with shared storage(though I must admit I
>>didn't really try to answer this one myself, just hoping somebody will
>>know).
>>
>>
>
>The short answer is "Not sanely", as far as I know.
>
>RAID0 might be okay, however RAID1/5 get into issues if two nodes update the
>same data in parallel; they do not coordinate the writes, and thus might stomp
>over each other.
>
>In theory, given a RAID1 with disks {d1,d2}, node n1 might write in order
>(d2,d1) while n2 writes as (d1,d2), resulting in inconsistent mirrors. This
>becomes a bigger race window for RAID5, obviously, because more disks are
>involved.
>
>The "multiple nodes beginning to reconstruct the same md device" is also a
>problem; but even if that was solved that only one node does the recovery, the
>others would be blocked from doing any IO on that drive for the time being.
>
>Another issue that any node will want to update the md superblock regularly.
>
>LVM is fine, MD doesn't seem to be.
>
>With the MD patches I posted weeks ago, at least MD multipathing should work
>appropriately; even if the ugliness of multiple nodes scribbling over the
>superblock remains, it shouldn't matter because the autodetection is based
>only on the UUID for m-p.
>
>In short, you can do "MD", if you don't use it as "shared"; have only one node
>have a given md device active at any point in time. Thus, no autostart, but
>manual activation. This rules out "GFS over md", basically.
>
>If you want to fix that, it would be cool; it will just require a DLM,
>membership and communication services in the kernel. ;-)
>
>
>Sincerely,
> Lars Marowsky-Br?e <[email protected]>
>
>
>

2002-11-13 17:17:17

by Joel Becker

[permalink] [raw]
Subject: Re: md on shared storage

On Wed, Nov 13, 2002 at 12:46:41PM +0100, Lars Marowsky-Bree wrote:
> In short, you can do "MD", if you don't use it as "shared"; have only one node
> have a given md device active at any point in time. Thus, no autostart, but
> manual activation. This rules out "GFS over md", basically.

RAID0 can work fine. You cannot have a persistent superblock
and autostart, because then the two nodes stomp on each other. To allow
anything other than this, you'd need cluster services and node locking.
How to do RAID0? Have each node mkraid the RAID0 (or mdadm
equiv) at boot. Because there is no persistent superblock, there is no
contention to the disk. OpenGFS or OCFS can now share the nice striped
volume, because they handle the locking for their data.
We have, in fact, run OCFS on shared RAID0 in this fashion.

Joel

--

"Maybe the time has drawn the faces I recall.
But things in this life change very slowly,
If they ever change at all."

Joel Becker
Senior Member of Technical Staff
Oracle Corporation
E-mail: [email protected]
Phone: (650) 506-8127

2002-11-13 17:18:48

by Joel Becker

[permalink] [raw]
Subject: Re: md on shared storage

On Wed, Nov 13, 2002 at 10:17:08AM -0700, Steven Dake wrote:
> Another method is to lock an md array to a specific host. This method
> requires no DLM (since there is no shared write to the same array
> capability).

But the entire point is to share access. Otherwise it is pretty
uninteresting.
If you want a failover setup, there is no need for md locking
either. Simply have the backup node not start the md until the failover
happens.

Joel

--

"There are only two ways to live your life. One is as though nothing
is a miracle. The other is as though everything is a miracle."
- Albert Einstein

Joel Becker
Senior Member of Technical Staff
Oracle Corporation
E-mail: [email protected]
Phone: (650) 506-8127

2002-11-13 17:48:16

by Steven Dake

[permalink] [raw]
Subject: Re: md on shared storage

Waiting to start a failed-over node doesn't work for booting raid 1
mds... Since autostart is required.

Joel Becker wrote:

>On Wed, Nov 13, 2002 at 10:17:08AM -0700, Steven Dake wrote:
>
>
>>Another method is to lock an md array to a specific host. This method
>>requires no DLM (since there is no shared write to the same array
>>capability).
>>
>>
>
> But the entire point is to share access. Otherwise it is pretty
>uninteresting.
> If you want a failover setup, there is no need for md locking
>either. Simply have the backup node not start the md until the failover
>happens.
>
>Joel
>
>
>

2002-11-13 18:24:32

by Brian Jackson

[permalink] [raw]
Subject: Re: md on shared storage

2.4.19 appears to have that patch already applied except for changing the
QLOGICFC_REQ_QUEUE_LEN to 255 in qlogicfc.h. I made that change and
rebooted with the new kernel. And I still have the same problem. I did
however make sure none of the other hosts had the raid device started, and
it still does the same thing, which pushes my suspicion back onto the
drivers.

--Brian Jackson

<snip>
> If those messages look like "no handle slots, this should not happen" and
> you're running a 2.4.x kernel there's a known problem with the qlogicfc
> driver locking up the machine under high load. It's been fixed in 2.5,
> but I don't think it's been back-ported yet.
>
> So, aside from the _other_ problems induced by shared storage, this might
> be biting you too. :)
>
> See:
> http://www.uwsg.iu.edu/hypermail/linux/kernel/0209.0/0467.html
>
>
> Thanks,
> -Eric
>
> --
> ------------------------------------------------
> Eric H. Weigle -- http://public.lanl.gov/ehw/
> ------------------------------------------------

2002-11-13 19:18:14

by Eric Weigle

[permalink] [raw]
Subject: Re: md on shared storage

> 2.4.19 appears to have that patch already applied except for changing the
> QLOGICFC_REQ_QUEUE_LEN to 255 in qlogicfc.h. I made that change and
> rebooted with the new kernel. And I still have the same problem. I did
> however make sure none of the other hosts had the raid device started, and
> it still does the same thing, which pushes my suspicion back onto the
> drivers.
Oh well... it was just an idea.

All I know for certain is that I've got a machine here with fiber channel
running to a 7-disk Linux software RAID-0 array (240GB, stolen from a defunct
NetApp) that would lock up and die all the time, even using 2.4.19. Using
2.5.44, which includes the patch I mentioned before, it's been up for 23
days rock solid. [side note: The layout of the disks in my raidtab reversed
annoyingly between 2.4 and 2.5... whose bright idea was that?].

Providing more information (hardware info, log messages, so forth) might
allow the more capable people on the list to debug your problem.


Thanks,
-Eric

--
------------------------------------------------
Eric H. Weigle -- http://public.lanl.gov/ehw/
------------------------------------------------


Attachments:
(No filename) (1.14 kB)
(No filename) (232.00 B)
Download all attachments