2003-03-26 22:08:38

by Lincoln Dale

[permalink] [raw]
Subject: Re: [PATCH] ENBD for 2.5.64

At 10:09 AM 26/03/2003 -0600, Matt Mackall wrote:
> > >Indeed, there are iSCSI implementations that do multipath and
> > >failover.
> >
> > iSCSI is a transport.
> > logically, any "multipathing" and "failover" belongs in a layer above
> it --
> > typically as a block-layer function -- and not as a transport-layer
> > function.
> >
> > multipathing belongs elsewhere -- whether it be in MD, LVM, EVMS,
> DevMapper
> > PowerPath, ...
>
>Funny then that I should be talking about Cisco's driver. :P

:-)

see my previous email to Jeff. iSCSI as a transport protocol does have a
muxing capability -- but its usefulness is somewhat limited (imho).

>iSCSI inherently has more interesting reconnect logic than other block
>devices, so it's fairly trivial to throw in recognition of identical
>devices discovered on two or more iSCSI targets..

what logic do you use to identify "identical devices"?
same data reported from SCSI Report_LUNs? or perhaps the same data
reported from a SCSI_Inquiry?

in reality, all multipathing software tends to use some blocks at the end
of the disk (just in the same way that most LVMs do also).

for example, consider the following output from a set of two SCSI_Inquiry
and Report_LUNs on two paths to storage:
Lun Description Table
WWPN Lun Capacity Vendor Product Serial
---------------- ----- -------- ------------ ------------ ------
Path A:
21000004cf8c21fb 0 16GB HP 18.2G ST318452FC 3EV0BD8E
21000004cf8c21c5 0 16GB HP 18.2G ST318452FC 3EV0KHHP
50060e8000009591 0 50GB HITACHI DF500F DF500-00B
50060e8000009591 1 50GB HITACHI DF500F DF500-00B
50060e8000009591 2 50GB HITACHI DF500F DF500-00B
50060e8000009591 3 50GB HITACHI DF500F DF500-00B

Path B:
31000004cf8c21fb 0 16GB HP 18.2G ST318452FC 3EV0BD8E
31000004cf8c21c5 0 16GB HP 18.2G ST318452FC 3EV0KHHP
50060e8000009591 0 50GB HITACHI DF500F DF500-00A
50060e8000009591 1 50GB HITACHI DF500F DF500-00A
50060e8000009591 2 50GB HITACHI DF500F DF500-00A
50060e8000009591 3 50GB HITACHI DF500F DF500-00A


the "HP 18.2G" devices are 18G FC disks in a FC JBOD. each disk will
report an identical Serial # regardless of the interface/path used to get
to that device. no issues there right -- you can identify the disk as
being unique via its "Serial #" and can see the interface used to get to it
via its WWPN.

now, take a look at some disk from an intelligent disk array (in this case,
a HDS 9200).
it reports a _different_ serial number for the same disk, dependent on the
interface used. (DF500 is the model # of a HDS 9200, interfaces are
numbered 00A/00B/01A/01B).

does one now need to add logic into the kernel to provide some multipathing
for HDS disks?
does using linux mean that one had to change some settings on the HDS disk
array to get it to report different information via a SCSI_Inquiry? (it
can - but thats not the point - the point is that any multipathing software
out there just 'works' right now).

this is just one example. i could probably find another 50 of
slightly-different-behavior if you wanted me to!

> > >Both iSCSI and ENBD currently have issues with pending writes during
> > >network outages. The current I/O layer fails to report failed writes
> > >to fsync and friends.
> >
> > these are not "iSCSI" or "ENBD" issues. these are issues with VFS.
>
>Except that the issue simply doesn't show up for anyone else, which is
>why it hasn't been fixed yet. Patches are in the works, but they need
>more testing:
>
>http://www.selenic.com/linux/write-error-propagation/

oh, but it does show up for other people. it may be that the issue doesn't
show up at fsync() time, but rather at close() time, or perhaps neither of
those!

code looks interesting. i'll take a look.
hmm, must find out a way to intentionally introduce errors now and see what
happens!


cheers,

lincoln.


2003-03-26 22:46:37

by Lars Marowsky-Bree

[permalink] [raw]
Subject: Re: [PATCH] ENBD for 2.5.64

On 2003-03-27T09:16:18,
Lincoln Dale <[email protected]> said:

> what logic do you use to identify "identical devices"?
> same data reported from SCSI Report_LUNs? or perhaps the same data
> reported from a SCSI_Inquiry?

That would work well.

We do parse device specific information in order to auto-configure the md
multipath at setup time. After that, magic is on disk...

> does one now need to add logic into the kernel to provide some multipathing
> for HDS disks?

Topology discovery is user-space! It does not need to live in the kernel.


Sincerely,
Lars Marowsky-Br?e <[email protected]>

--
SuSE Labs - Research & Development, SuSE Linux AG

"If anything can go wrong, it will." "Chance favors the prepared (mind)."
-- Capt. Edward A. Murphy -- Louis Pasteur

2003-03-26 23:11:31

by Lincoln Dale

[permalink] [raw]
Subject: Re: [PATCH] ENBD for 2.5.64

Hi Lars,

At 11:56 PM 26/03/2003 +0100, Lars Marowsky-Bree wrote:
[..]
>We do parse device specific information in order to auto-configure the md
>multipath at setup time. After that, magic is on disk...
>
> > does one now need to add logic into the kernel to provide some multipathing
> > for HDS disks?
>
>Topology discovery is user-space! It does not need to live in the kernel.

i think we're agreeing on the same thing here!

yes, i believe topology discovery should only belong in userspace.
i believe it should be in userspace for both (a) setup and (b) at
kernel-boot-time

likewise, i believe policy of deciding what mix of i/o's to put down
different paths also belongs in userspace.
this could take the form of a daemon that frequently looks up statistics
from the kernel (e.g. average latency per target), and uses that
information in conjunction with some 'policy' to tweak what paths are used.
but i definitely don't think that the kernel should make any wide-ranging
decisions about multiple paths, except beyond something like "deviceA has
disappeared. i know that deviceB is an alternate path, so will swing all
outstanding i/o plugged into A to B".


cheers,

lincoln.