2003-02-13 19:39:27

by Jurjen Oskam

[permalink] [raw]
Subject: Accessing the same disk via multiple channels

Hi everybody,

here's something I've been wondering about. On my work, we have
an EMC2 Symmetrix in a SAN environment, with (until now) only
AIX boxes attached to the SAN.

Each server is equipped with 2 FibreChannel cards. The SAN is
configured to present the same disk (which is in fact a virtual
Symmetrix device) over two channels. This means the host sees
two physical devices (as far as that host's concerned) which is
in fact really only one device. In linux terms: /dev/sda and /dev/sdc
are exactly the same disks, but the (standard) OS doesn't know this.

EMC2 provide a piece of software called PowerPath, which takes advantage of
this situation. It provides yet another device (let's say /dev/powersda), which
uses the (identical) native devices /dev/sda and /dev/sdc. If one of those
two would disappear, access to powersda would still be possible.

How does linux as it is now handle the situation of one physical device
presented via multiple paths (without extra software)?

--
Jurjen Oskam

PGP Key available at http://www.stupendous.org/


2003-02-13 22:36:33

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: Accessing the same disk via multiple channels

In article <[email protected]> you wrote:
> Each server is equipped with 2 FibreChannel cards. The SAN is
> configured to present the same disk (which is in fact a virtual
> Symmetrix device) over two channels. This means the host sees
> two physical devices (as far as that host's concerned) which is
> in fact really only one device. In linux terms: /dev/sda and /dev/sdc
> are exactly the same disks, but the (standard) OS doesn't know this.
...
> How does linux as it is now handle the situation of one physical device
> presented via multiple paths (without extra software)?

You can use the multipath option to md which can do that.

Basically there are two options, a failover and a load balancing option. The
problem with failover is, to detect the actual failure reliable, toe problem
with load balancing is, that not all san configurations allow this.

http://www-124.ibm.com/storageio/multipath/md-multipath/index.php

this is at least in 2.4.20-xfs

Greetings
Bernd
--
eckes privat - http://www.eckes.org/
Project Freefire - http://www.freefire.org/

2003-02-14 09:55:21

by Lars Marowsky-Bree

[permalink] [raw]
Subject: Re: Accessing the same disk via multiple channels

On 2003-02-13T23:45:51,
Bernd Eckenfels <[email protected]> said:

> You can use the multipath option to md which can do that.
>
> Basically there are two options, a failover and a load balancing option. The
> problem with failover is, to detect the actual failure reliable, toe problem
> with load balancing is, that not all san configurations allow this.
>
> http://www-124.ibm.com/storageio/multipath/md-multipath/index.php
>
> this is at least in 2.4.20-xfs

That one? Ouch, it is a bit dated according to the webpage ;-) I don't recall
that it was discussed on LKML, either.

SuSE (Jens Axboe and myself) have also done work on the md multipathing,
supporting failover and load balancing and in general giving the code a rinse;
as well as extensions to mdadm to make them work.

The patches currently live at http://lars.marowsky-bree.de/dl/md-mp/

(And are included in SuSE's kernel release, of course ;-)

Currently, for 2.5 / 2.6, I think I really like the SCSI midlayer stuff. In
the past, I didn't, because it constrains everything to SCSI. But then,
everything so far _has_ been SCSI, except for weird arch stuff like s390(x)
DASDs ;-)

Doing it in the SCSI layer has the advantage of not being constrained to block
devices, but also working with tapes. Oh well, we'll see ;-)


Sincerely,
Lars Marowsky-Br?e <[email protected]>

--
Principal Squirrel
SuSE Labs - Research & Development, SuSE Linux AG

"If anything can go wrong, it will." "Chance favors the prepared (mind)."
-- Capt. Edward A. Murphy -- Louis Pasteur

2003-02-14 15:09:17

by Cameron, Steve

[permalink] [raw]
Subject: Re: Accessing the same disk via multiple channels


Lars Marowsky-Bree ([email protected]) wrote:

> SuSE (Jens Axboe and myself) have also done work on the md multipathing,
> supporting failover and load balancing and in general giving the code a rinse;
> as well as extensions to mdadm to make them work.
>
Yay! We noticed that if a controller fails in such a way that
no interrupts are generated then md driver doesn't notice anything is
wrong. Commands don't fail, but don't complete either. I played around with
feature in the low level driver to periodically send a no-op command
down to the controller and fail all outstanding commands and disable
the controller if that command didn't come back pretty quickly, that
seemed to work pretty well in a failover type situation.
(Better than putting a timeout on every command.) Also, md multipath
doesn't notice if the backup path has failed, to warn the user that
redundancy is no longer in effect. (Though if you set up things so i/o
is going down both paths, not such a big deal, as md will notice.
Probably you know all this already.

> The patches currently live at http://lars.marowsky-bree.de/dl/md-mp/
>
> (And are included in SuSE's kernel release, of course ;-)
>
> Currently, for 2.5 / 2.6, I think I really like the SCSI midlayer stuff. In
> the past, I didn't, because it constrains everything to SCSI. But then,
> everything so far _has_ been SCSI, except for weird arch stuff like s390(x)
> DASDs ;-)

Well, the cciss driver is not a SCSI driver (except for purpsoes of
tape drives & tape changers) and HP/Compaq has sold more than one
million of those controllers (does popularity mean they aren't
"weird"? :-), and we have mulitpath capable storage boxes they
can connect to.

-- steve

2003-02-14 16:17:45

by Lars Marowsky-Bree

[permalink] [raw]
Subject: Re: Accessing the same disk via multiple channels

On 2003-02-14T09:20:12,
steve cameron <[email protected]> said:

> Yay! We noticed that if a controller fails in such a way that
> no interrupts are generated then md driver doesn't notice anything is
> wrong. Commands don't fail, but don't complete either.

Uhhhhh. They should timeout though and then be counted as errors; hard to
catch this differently.

You are saying it doesn't work?

> (Better than putting a timeout on every command.) Also, md multipath
> doesn't notice if the backup path has failed, to warn the user that
> redundancy is no longer in effect. (Though if you set up things so i/o
> is going down both paths, not such a big deal, as md will notice.
> Probably you know all this already.

Yes. We intentionally don't do active _monitoring_ on non-active paths, same
as we don't do reprobing of failed paths to see whether they are alive again.

(The LVM m-p patch does periodically send down live requests to failed paths
to check this; I consider this intentional data corruption, but I'm
paranoid.)

This is something which can easily be implemented safely in user-space
though; the md approach still exposes the lower level devices and you can
check them periodically if you care, and be it with a

while sleep 1 ; do
if ! dd if=/dev/sdaX of=/dev/null ; then
mdadm /dev/md0 --fail /dev/sdaX
logger -p kern.alert "Path /dev/sdaX failed!"
fi
done

;-) Obviously, not something we need to run in kernel space.


> Well, the cciss driver is not a SCSI driver (except for purpsoes of
> tape drives & tape changers) and HP/Compaq has sold more than one
> million of those controllers (does popularity mean they aren't
> "weird"? :-), and we have mulitpath capable storage boxes they
> can connect to.

Indeed. Yes, we'll need to figure out how to do this for 2.5/2.6; maybe
porting forward the md m-p patch to 2.5 is indeed the best choice. It should
be way easier, as md has been greatly cleaned up...

However, past discussions on LKML regarding "How to do m-p cleanly in 2.5"
have never reached a conclusion ;-) We'll see. The good thing about the SCSI
m-p is that it can also handle multipathed tape drives...


Sincerely,
Lars Marowsky-Br?e <[email protected]>

--
Principal Squirrel
SuSE Labs - Research & Development, SuSE Linux AG

"If anything can go wrong, it will." "Chance favors the prepared (mind)."
-- Capt. Edward A. Murphy -- Louis Pasteur

2003-02-14 18:16:15

by Patrick Mansfield

[permalink] [raw]
Subject: Re: Accessing the same disk via multiple channels

On Fri, Feb 14, 2003 at 05:27:22PM +0100, Lars Marowsky-Bree wrote:

> Indeed. Yes, we'll need to figure out how to do this for 2.5/2.6; maybe
> porting forward the md m-p patch to 2.5 is indeed the best choice. It should
> be way easier, as md has been greatly cleaned up...

> However, past discussions on LKML regarding "How to do m-p cleanly in 2.5"
> have never reached a conclusion ;-) We'll see. The good thing about the SCSI
> m-p is that it can also handle multipathed tape drives...

I thought the general consensus was it is OK for now (as a first go) to
have scsi only multi-path, I have not heard anyone say don't do scsi
multi-path. And then later (maybe after we have more than one subsystem
supporting multi-path IO) we can add general multi-path support into the
layers above scsi.

In any case, md or volume manager based multi-path solutions are good
alternatives.

I have recently ported the scsi multi-path patch to 2.5.59, but haven't
posted patches.

The current multi-path patch still needs at least two major changes in
scsi: error recovery (scsi_error.c) that allows other paths to be used
without long delays, and a per-device queue_lock versus the current
per-host queue_lock.

Hopefully we can get underlying changes for those last two into 2.5 (and
maybe someday the multi-path patch), as they are improvements to scsi with
or without multi-path.

-- Patrick Mansfield

2003-02-14 21:52:09

by Tim Pepper

[permalink] [raw]
Subject: Re: Accessing the same disk via multiple channels

On Fri 14 Feb at 11:03:16 +0100 [email protected] done said:
>
> Doing it in the SCSI layer has the advantage of not being constrained to block
> devices, but also working with tapes. Oh well, we'll see ;-)

Tape needs special multipathing logic. Don't you think moving
multipathing to the mid-layer requires the mid-layer to know much more
about the upper layer and muddles up the scsi stack's layering? To keep
multipathing high and generic we need better error reporting than the
one bit that hits the md layer in 2.4...

t.
--
*********************************************************
* tpepper@vato dot org * Venimus, Vidimus, *
* http://www.vato.org/~tpepper * Dolavimus *
*********************************************************

2003-02-14 23:36:10

by Patrick Mansfield

[permalink] [raw]
Subject: Re: Accessing the same disk via multiple channels

On Fri, Feb 14, 2003 at 02:01:55PM -0800, Tim Pepper wrote:
> On Fri 14 Feb at 11:03:16 +0100 [email protected] done said:
> >
> > Doing it in the SCSI layer has the advantage of not being constrained to block
> > devices, but also working with tapes. Oh well, we'll see ;-)
>
> Tape needs special multipathing logic. Don't you think moving
> multipathing to the mid-layer requires the mid-layer to know much more
> about the upper layer and muddles up the scsi stack's layering? To keep
> multipathing high and generic we need better error reporting than the
> one bit that hits the md layer in 2.4...
>
> t.

If we want to be able to retry a tape read/write that actually might have
made it to the media it requires special handling (as compared to a random
access device) no matter where we put multi-path.

The scsi mid-layer is a bit muddled up already (but getting better).
Adding multi-path there does not make it worse. Much of the same code that
cleans up the scsi mid-layer also makes scsi multi-path easier (that is,
recent changes cleaning up the scsi mid-layer make it easier to implement
scsi multi-path).

Generic multi-path without the lower levels knowing anything can waste a
lot of resources. For example, for each extra path to a block device
(disk) we end up with an extra sd plus associated data structures, and an
extra scsi_device including multiple request queues.

The main thing scsi multi-path needs to know is that multiple nexuses
(i.e. host/channel/target/lun) correspond to the same unit. This has no
interactions with any upper layer drivers, and limits what the upper layer
drivers (and users) have to work with, and more closely matches the layout
of the actual hardware.

-- Patrick Mansfield

2003-02-17 09:44:34

by Lars Marowsky-Bree

[permalink] [raw]
Subject: Re: Accessing the same disk via multiple channels

On 2003-02-14T15:37:47,
Patrick Mansfield <[email protected]> said:

> Generic multi-path without the lower levels knowing anything can waste a
> lot of resources. For example, for each extra path to a block device
> (disk) we end up with an extra sd plus associated data structures, and an
> extra scsi_device including multiple request queues.

I think these somehow still need to be exposed so that userspace can do
per-path diagnostics; unless you also want to move this into the kernel space,
which I'm not sure about.

What we really need is better error handling and escalation so that a higher
layer actually has a chance at really well done recovery and retrying.


Sincerely,
Lars Marowsky-Br?e <[email protected]>

--
Principal Squirrel
SuSE Labs - Research & Development, SuSE Linux AG

"If anything can go wrong, it will." "Chance favors the prepared (mind)."
-- Capt. Edward A. Murphy -- Louis Pasteur