2009-09-15 14:45:15

by Philipp Reisner

[permalink] [raw]
Subject: [GIT PULL] DRBD for 2.6.32

Hi Linus,

Please pull
git://git.drbd.org/linux-2.6-drbd.git drbd

DRBD is a shared-nothing, replicated block device. It is designed to
serve as a building block for high availability clusters and
in this context, is a "drop-in" replacement for shared storage.

It has been discussed and reviewed on the list since March,
and Andrew has asked us to send a pull request for 2.6.32-rc1.

Stephen added us to linux-next in July, and our most recent
build failure against Jens' 2.6.32-rc1 updates was ironed out this
morning. So it should be in fairly good shape.

Documentation/blockdev/drbd/DRBD-8.3-data-packets.svg | 588 +
Documentation/blockdev/drbd/DRBD-data-packets.svg | 459 +
Documentation/blockdev/drbd/README.txt | 16
Documentation/blockdev/drbd/conn-states-8.dot | 18
Documentation/blockdev/drbd/disk-states-8.dot | 16
Documentation/blockdev/drbd/drbd-connection-state-overview.dot | 85
Documentation/blockdev/drbd/node-states-8.dot | 14
MAINTAINERS | 13
drivers/block/Kconfig | 2
drivers/block/Makefile | 1
drivers/block/drbd/Kconfig | 82
drivers/block/drbd/Makefile | 8
drivers/block/drbd/drbd_actlog.c | 1484 +++
drivers/block/drbd/drbd_bitmap.c | 1327 ++
drivers/block/drbd/drbd_int.h | 2258 +++++
drivers/block/drbd/drbd_main.c | 3735 ++++++++
drivers/block/drbd/drbd_nl.c | 2365 +++++
drivers/block/drbd/drbd_proc.c | 266
drivers/block/drbd/drbd_receiver.c | 4456 ++++++++++
drivers/block/drbd/drbd_req.c | 1132 ++
drivers/block/drbd/drbd_req.h | 327
drivers/block/drbd/drbd_strings.c | 113
drivers/block/drbd/drbd_tracing.c | 753 +
drivers/block/drbd/drbd_tracing.h | 87
drivers/block/drbd/drbd_vli.h | 351
drivers/block/drbd/drbd_worker.c | 1529 +++
drivers/block/drbd/drbd_wrappers.h | 91
include/linux/drbd.h | 349
include/linux/drbd_limits.h | 137
include/linux/drbd_nl.h | 137
include/linux/drbd_tag_magic.h | 83
include/linux/lru_cache.h | 294
lib/Kconfig | 3
lib/Makefile | 2
lib/lru_cache.c | 560 +
35 files changed, 23141 insertions(+)

-phil


2009-09-15 23:19:34

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

On Tue, Sep 15, 2009 at 04:45:13PM +0200, Philipp Reisner wrote:
> Hi Linus,
>
> Please pull
> git://git.drbd.org/linux-2.6-drbd.git drbd
>
> DRBD is a shared-nothing, replicated block device. It is designed to
> serve as a building block for high availability clusters and
> in this context, is a "drop-in" replacement for shared storage.
>
> It has been discussed and reviewed on the list since March,
> and Andrew has asked us to send a pull request for 2.6.32-rc1.

The last thing we need is another bloody raid-reimplementation, coupled
with a propritary on the wire protocol. NACK as far as I am concerned.

2009-09-16 00:46:39

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

> Hi Linus,
>
> Please pull
> git://git.drbd.org/linux-2.6-drbd.git drbd
>
> DRBD is a shared-nothing, replicated block device. It is designed to
> serve as a building block for high availability clusters and
> in this context, is a "drop-in" replacement for shared storage.
>
> It has been discussed and reviewed on the list since March,
> and Andrew has asked us to send a pull request for 2.6.32-rc1.
>
> Stephen added us to linux-next in July, and our most recent
> build failure against Jens' 2.6.32-rc1 updates was ironed out this
> morning. So it should be in fairly good shape.
>

I'm sorry for my delayed pointing out.
if anyone add new feature, its Kconfig default parameter should be N.

but AFAIK, DRBD use M. I hope you change it. At least, typical desktop user don't use DRBD.

Thanks.

- kosaki

> Documentation/blockdev/drbd/DRBD-8.3-data-packets.svg | 588 +
> Documentation/blockdev/drbd/DRBD-data-packets.svg | 459 +
> Documentation/blockdev/drbd/README.txt | 16
> Documentation/blockdev/drbd/conn-states-8.dot | 18
> Documentation/blockdev/drbd/disk-states-8.dot | 16
> Documentation/blockdev/drbd/drbd-connection-state-overview.dot | 85
> Documentation/blockdev/drbd/node-states-8.dot | 14
> MAINTAINERS | 13
> drivers/block/Kconfig | 2
> drivers/block/Makefile | 1
> drivers/block/drbd/Kconfig | 82
> drivers/block/drbd/Makefile | 8
> drivers/block/drbd/drbd_actlog.c | 1484 +++
> drivers/block/drbd/drbd_bitmap.c | 1327 ++
> drivers/block/drbd/drbd_int.h | 2258 +++++
> drivers/block/drbd/drbd_main.c | 3735 ++++++++
> drivers/block/drbd/drbd_nl.c | 2365 +++++
> drivers/block/drbd/drbd_proc.c | 266
> drivers/block/drbd/drbd_receiver.c | 4456 ++++++++++
> drivers/block/drbd/drbd_req.c | 1132 ++
> drivers/block/drbd/drbd_req.h | 327
> drivers/block/drbd/drbd_strings.c | 113
> drivers/block/drbd/drbd_tracing.c | 753 +
> drivers/block/drbd/drbd_tracing.h | 87
> drivers/block/drbd/drbd_vli.h | 351
> drivers/block/drbd/drbd_worker.c | 1529 +++
> drivers/block/drbd/drbd_wrappers.h | 91
> include/linux/drbd.h | 349
> include/linux/drbd_limits.h | 137
> include/linux/drbd_nl.h | 137
> include/linux/drbd_tag_magic.h | 83
> include/linux/lru_cache.h | 294
> lib/Kconfig | 3
> lib/Makefile | 2
> lib/lru_cache.c | 560 +
> 35 files changed, 23141 insertions(+)
>
> -phil
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/


2009-09-16 08:33:17

by Philipp Reisner

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

On Wednesday 16 September 2009 01:19:31 Christoph Hellwig wrote:
> On Tue, Sep 15, 2009 at 04:45:13PM +0200, Philipp Reisner wrote:
> > Hi Linus,
> >
> > Please pull
> > git://git.drbd.org/linux-2.6-drbd.git drbd
> >
> > DRBD is a shared-nothing, replicated block device. It is designed to
> > serve as a building block for high availability clusters and
> > in this context, is a "drop-in" replacement for shared storage.
> >
> > It has been discussed and reviewed on the list since March,
> > and Andrew has asked us to send a pull request for 2.6.32-rc1.
>
> The last thing we need is another bloody raid-reimplementation, coupled
> with a propritary on the wire protocol. NACK as far as I am concerned.

Hi Christoph,

Unfortunately we have not been CCing you on our first posts and discussion,
but only on our most recent one. So I will repeat the key points of the
discussion.

DRBD does not want to be a local RAID, it is heavily tied to its domain,
and offers significant advantages there. -- Things that can not be achieved
by combining MD+NBD or MD+iSCSI:

* When DRBD is used over small bandwidth links and one has to do a resync,
DRBD can do a "checksum based resync", similar in the way rsync works.
A whole data block gets transmitted only if the checksums of that
block differ.

Again, this is something you can not do with an iSCSI transport.

* DRBD can do online verify of the mirror, again, using checksums to
reduce network traffic.

How do you want to achieve that using an iSCSI transport ?

* Dual primary mode with write conflict detection and resolution.

One need to point out that this should never happen, as long
as the DLM used does not fail. But if it ever happens, you
want you mirroring solution to keep the two sides of your
mirror in sync.

This is something that can not be done in the MD+MBD or MD+iSCSI
model, because the block transport does not have a concept for that.

That said to the conceptual reasons for DRBD, now for some other reasons:

* UUIDs that identify data generations, dirty bitmap, bitmap merging.

Think of a two node HA cluster. Node A is active ('primary' in DRBD
speak) has the filesystem mounted and the application running. Node B is
in standby mode ('secondary' in DRBD speak).

We loose network connectivity, the primary node continues to run, the
secondary no longer gets updates.

Then we have a complete power failure, both nodes are down. Then they
power up the data center again, but at first they get only the power
circuit of node B up and running again.

Should node B offer the service right now ?
( DRBD has configurable policies for that )

Later on they manage to get node A up and running again, now lets assume
node B was chosen to be the new primary node. What needs to be done ?

Modifications on B since it became primary needs to be resynced to A.
Modifications on A sind it lost contact to B needs to be taken out.

DRBD does that.

How do you fit that into a RAID1+NBD model ? NBD is just a block
transport, it does not offer the ability to exchange dirty bitmaps or
data generation identifiers, nor does the RAID1 code has a concept of
that.

* There is a whole eco-system of integration work of DRBD with various
cluster managers (open source, and closed ones).
There is no open source cluster manager integration available of the
MD+NBD idea.

* DRBD has a massive user base. It is included in SLES, Debian and Ubuntu,
(and probably some other distributions as well).

Please also have a look at the lists' archive, the main discussion was
started on 2009-05-15.

-Phil

2009-09-16 09:19:37

by Philipp Reisner

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

On Wednesday 16 September 2009 02:46:34 KOSAKI Motohiro wrote:
> > Hi Linus,
> >
> > Please pull
> > git://git.drbd.org/linux-2.6-drbd.git drbd
> >
> > DRBD is a shared-nothing, replicated block device. It is designed to
> > serve as a building block for high availability clusters and
> > in this context, is a "drop-in" replacement for shared storage.
> >
> > It has been discussed and reviewed on the list since March,
> > and Andrew has asked us to send a pull request for 2.6.32-rc1.
> >
> > Stephen added us to linux-next in July, and our most recent
> > build failure against Jens' 2.6.32-rc1 updates was ironed out this
> > morning. So it should be in fairly good shape.
>
> I'm sorry for my delayed pointing out.
> if anyone add new feature, its Kconfig default parameter should be N.
>
> but AFAIK, DRBD use M. I hope you change it. At least, typical desktop user
> don't use DRBD.
>
> Thanks.

Ok. I have changed the default to N.

-Phil

2009-09-17 08:12:53

by Lars Ellenberg

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32


I took the liberty to extend the CC list again a little bit.

On Tue, Sep 15, 2009 at 07:19:31PM -0400, Christoph Hellwig wrote:
> On Tue, Sep 15, 2009 at 04:45:13PM +0200, Philipp Reisner wrote:
> > Hi Linus,
> >
> > Please pull
> > git://git.drbd.org/linux-2.6-drbd.git drbd
> >
> > DRBD is a shared-nothing, replicated block device. It is designed to
> > serve as a building block for high availability clusters and
> > in this context, is a "drop-in" replacement for shared storage.
> >
> > It has been discussed and reviewed on the list since March,
> > and Andrew has asked us to send a pull request for 2.6.32-rc1.


This has been discussed before on LKML.

To contrast your NACK by a few previous posts
I perceived effectively as ACKS:
e.g.

Andrew Morton:
http://lkml.org/lkml/2009/5/1/307

"Oh. Thanks. Well we should all get cracking on it then."

Lars Marowsky-Bree:
http://lkml.org/lkml/2009/5/5/224

"I would suggest at this time, we may want to refocus on the remaining
objections to merging drbd as a driver in the short-term."

In reply to that,

James Bottomley:
http://lkml.org/lkml/2009/5/5/226

"I'd agree with that. drbd essentially qualifies as a
driver under our new merge rules, so we should be
thinking about blockers to getting it into the tree
first (serious issues) and working out kinks
(like raid unification) after it gets in."

Neil Brown:
http://lkml.org/lkml/2009/5/5/332

"I cannot imagine that there would be any. Given its
history, its popularity, and its modularity, there can
be no question about merging it"

hch:
>
> The last thing we need is another bloody raid-reimplementation,

It is not RAID, it is replication, see also that blog post below.

> coupled with a propritary on the wire protocol.

http://www.openformats.org/en1
proprietary:
"the mode of presentation of its data is opaque
and its specification is not publicly available"

Which does not apply to DRBD.

So lets settle for "homegrown".

Besides, what was the non-proprietary, generally accepted,
link layer agnostic block-level replication protocol again?

And in case you're referring to MD/NBD or MD/iSCSI or some such,
http://fghaas.wordpress.com/2009/09/16/alternatives-to-drbd/ may be a
worthy read. Certainly not deeply technical, but sufficient to
illustrate the most important points.

> NACK as far as I am concerned.

Too bad :(
What can we do to have that revised?


--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD? and LINBIT? are registered trademarks of LINBIT, Austria.

2009-09-17 09:14:31

by Lars Marowsky-Bree

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

On 2009-09-15T19:19:31, Christoph Hellwig <[email protected]> wrote:

Hi Christoph,

> > It has been discussed and reviewed on the list since March,
> > and Andrew has asked us to send a pull request for 2.6.32-rc1.
>
> The last thing we need is another bloody raid-reimplementation, coupled
> with a propritary on the wire protocol. NACK as far as I am concerned.

You know that several RAID implementations are my primary pet peeve, and
I would just love to agree with you here. However, reality isn't that
black-xor-white.

In reality, a significant number of deployments using this
implementation exist already. There is no alternative for them yet, much
less one which would allow them an online migration. There might be one
day, if dm-replicator takes off, and the RAID engines between
md/dm/btrfs/drbd/dm-replicator etc get unified, but as it stands today,
this doesn't exist.

drbd is stable, the code has been significantly cleaned up during the
LKML dialogue so far. It is very well maintained and supported.

As a mid- to long-term goal, the unification should be pursued, and I
know that Lars Ellenberg _is_ talking with Heinz about dm-replicator and
that Neil/Heinz/Alasdair are also occasionally talking with each other.

Until this has happened though, the plurality of solutions exist.

drbd meets the technical/code quality requirements for merging; the
argument that we should only have one RAID implementation is valid, but
"should" is overruled by the normative power of facts.

Putting the burden of converging our RAID implementations on drbd is a bit
too much; this argument would have made sense when dm-raid* was merged,
but today, we're already carrying several.

Similarly, we support FCoE, AoE, iSCSI, nbd, and if someone proposed
iSCSI-over-USB, I'm sure we would merge even that abdomination. (I hope
I didn't give anyone ideas!) We also have several file systems.


Regards,
Lars

--
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG N?rnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

2009-09-17 16:02:57

by James Bottomley

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

On Thu, 2009-09-17 at 10:12 +0200, Lars Ellenberg wrote:
> I took the liberty to extend the CC list again a little bit.
>
> On Tue, Sep 15, 2009 at 07:19:31PM -0400, Christoph Hellwig wrote:
> > On Tue, Sep 15, 2009 at 04:45:13PM +0200, Philipp Reisner wrote:
> > > Hi Linus,
> > >
> > > Please pull
> > > git://git.drbd.org/linux-2.6-drbd.git drbd
> > >
> > > DRBD is a shared-nothing, replicated block device. It is designed to
> > > serve as a building block for high availability clusters and
> > > in this context, is a "drop-in" replacement for shared storage.
> > >
> > > It has been discussed and reviewed on the list since March,
> > > and Andrew has asked us to send a pull request for 2.6.32-rc1.
>
>
> This has been discussed before on LKML.
>
> To contrast your NACK by a few previous posts
> I perceived effectively as ACKS:
> e.g.
>
> Andrew Morton:
> http://lkml.org/lkml/2009/5/1/307
>
> "Oh. Thanks. Well we should all get cracking on it then."
>
> Lars Marowsky-Bree:
> http://lkml.org/lkml/2009/5/5/224
>
> "I would suggest at this time, we may want to refocus on the remaining
> objections to merging drbd as a driver in the short-term."
>
> In reply to that,
>
> James Bottomley:
> http://lkml.org/lkml/2009/5/5/226
>
> "I'd agree with that. drbd essentially qualifies as a
> driver under our new merge rules, so we should be
> thinking about blockers to getting it into the tree
> first (serious issues) and working out kinks
> (like raid unification) after it gets in."
>
> Neil Brown:
> http://lkml.org/lkml/2009/5/5/332
>
> "I cannot imagine that there would be any. Given its
> history, its popularity, and its modularity, there can
> be no question about merging it"
>
> hch:
> >
> > The last thing we need is another bloody raid-reimplementation,
>
> It is not RAID, it is replication, see also that blog post below.
>
> > coupled with a propritary on the wire protocol.
>
> http://www.openformats.org/en1
> proprietary:
> "the mode of presentation of its data is opaque
> and its specification is not publicly available"
>
> Which does not apply to DRBD.
>
> So lets settle for "homegrown".
>
> Besides, what was the non-proprietary, generally accepted,
> link layer agnostic block-level replication protocol again?
>
> And in case you're referring to MD/NBD or MD/iSCSI or some such,
> http://fghaas.wordpress.com/2009/09/16/alternatives-to-drbd/ may be a
> worthy read. Certainly not deeply technical, but sufficient to
> illustrate the most important points.
>
> > NACK as far as I am concerned.
>
> Too bad :(
> What can we do to have that revised?

So I think Christoph's NAK is rooted in the fact that we have a
proliferation of in-kernel RAID implementations and he's trying to
reunify them all again.

As part of the review, reusing the kernel RAID (and actually logging)
logic did come up and you added it to your todo list. Perhaps expanding
on the status of that would help, since what's being looked for is that
you're not adding more work to the RAID reunification effort and that
you do have a plan and preferably a time frame for coming into sync with
it.

James

2009-09-17 16:11:13

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

On Thu, Sep 17, 2009 at 10:02:45AM -0600, James Bottomley wrote:
> So I think Christoph's NAK is rooted in the fact that we have a
> proliferation of in-kernel RAID implementations and he's trying to
> reunify them all again.
>
> As part of the review, reusing the kernel RAID (and actually logging)
> logic did come up and you added it to your todo list. Perhaps expanding
> on the status of that would help, since what's being looked for is that
> you're not adding more work to the RAID reunification effort and that
> you do have a plan and preferably a time frame for coming into sync with
> it.

Yes. RDBD has spend tons of time out of tree, and if they want to put
it in now I think requiring them to do their homework is a good idea.

Note that the in-kernel raid implementation is just a rather small part
of this, what's much more important is the user interface. A big part
of raid unification is that we can support on proper interface to deal
with raid vs volume management, and DRBD adds another totally
incompatible one to that. We'd be much better off adding the drbd in
the write protocol (at least the most recent version) to DM instead of
adding another big chunk of framework.

2009-09-17 18:52:58

by Roland

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

so, no drbd for 2.6.32 because of "raid unification issues"?

why did no one tell that earlier ?

how long will that last?

ages?

somebody taking into consideration that decisions like these is scaring away good kernel developers ?

i was giving much hope on DRBD being merged in .32 , but given this i think i need to kick Linux and better spend bucks on winblowze & datacore for the storage-server undertaking..... (as i know that works and will be supported in the forseeable future)

or does someone know another good and intelligent storage replication solution i can trust ?

maybe pratima ? (http://www.linuxjournal.com/article/7265 )
or any of those several closed source linux products with those weird binary blobs which causing support and update headaches?
just kidding......

sorry, but i am NOT amused.

:-P



List: linux-kernel
Subject: Re: [GIT PULL] DRBD for 2.6.32
From: Christoph Hellwig <hch () infradead ! org>
Date: 2009-09-17 16:11:09
Message-ID: 20090917161108.GA3361 () infradead ! org
[Download message RAW]

On Thu, Sep 17, 2009 at 10:02:45AM -0600, James Bottomley wrote:
> So I think Christoph's NAK is rooted in the fact that we have a
> proliferation of in-kernel RAID implementations and he's trying to
> reunify them all again.
>
> As part of the review, reusing the kernel RAID (and actually logging)
> logic did come up and you added it to your todo list. Perhaps expanding
> on the status of that would help, since what's being looked for is that
> you're not adding more work to the RAID reunification effort and that
> you do have a plan and preferably a time frame for coming into sync with
> it.

Yes. RDBD has spend tons of time out of tree, and if they want to put
it in now I think requiring them to do their homework is a good idea.

Note that the in-kernel raid implementation is just a rather small part
of this, what's much more important is the user interface. A big part
of raid unification is that we can support on proper interface to deal
with raid vs volume management, and DRBD adds another totally
incompatible one to that. We'd be much better off adding the drbd in
the write protocol (at least the most recent version) to DM instead of
adding another big chunk of framework.


________________________________________________________________
Neu: WEB.DE Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate
f?r nur 19,99 Euro/mtl.!* http://produkte.web.de/go/02/

2009-09-18 03:31:10

by NeilBrown

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

On Thursday September 17, [email protected] wrote:
> On Thu, Sep 17, 2009 at 10:02:45AM -0600, James Bottomley wrote:
> > So I think Christoph's NAK is rooted in the fact that we have a
> > proliferation of in-kernel RAID implementations and he's trying to
> > reunify them all again.
> >
> > As part of the review, reusing the kernel RAID (and actually logging)
> > logic did come up and you added it to your todo list. Perhaps expanding
> > on the status of that would help, since what's being looked for is that
> > you're not adding more work to the RAID reunification effort and that
> > you do have a plan and preferably a time frame for coming into sync with
> > it.
>
> Yes. RDBD has spend tons of time out of tree, and if they want to put
> it in now I think requiring them to do their homework is a good idea.

What homework?

If there was a sensible unifying framework in the kernel that they
could plug in to, then requiring them do to that might make sense. But
there isn't. You/I/We haven't created a solution (i.e. there is no
equivalent of the VFS for virtual block devices) and saying that
because we haven't they cannot merge DRBD hardly seems fair.

Indeed, merging DRBD must be seen as a *good* thing as we then have
more examples of differing requirements against which a proposed
solution can be measured and tested.

I thought the current attitude was "merge then fix". That is what the
drivers/staging tree seems to be all about. Maybe you could argue
that DRBD should go in to 'staging' first (though I don't think that
is appropriate or require myself), but keeping it out just seems
wrong.

>
> Note that the in-kernel raid implementation is just a rather small part
> of this, what's much more important is the user interface. A big part
> of raid unification is that we can support on proper interface to deal
> with raid vs volume management, and DRBD adds another totally
> incompatible one to that. We'd be much better off adding the drbd in
> the write protocol (at least the most recent version) to DM instead of
> adding another big chunk of framework.

I agree that the interface is very important. But the 'dm' interface
and the 'md' interface (both imperfect) are not going away any time
soon and there is no reason to expect that the DRBD interface has to
be sacrificed simply because they didn't manage to get it in-kernel
before now.

Let me try to paint a partial picture for you to show how my thoughts
have been going. I'm looking at this from the perspective of the
driver model, particularly exposed through sysfs.

A 'block device' like 'sda' has a parent in sysfs, which represents
(e.g.) the SCSI device which provides the storage that is exposed
through 'sda'. e.g.
.../target0:0:0/0:0:0:0/block/sda
^target ^lun ^padding ^block-device
Block devices 'md0' or 'mapper/whatever' don't have a real parent and
so live in /sys/devices/virtual/block which is really just a
place-holder because there is no real parent. There should be.

So I would propose a 'bus' device which contains virtual block devices
- 'vbd's. There is probably just one instance of this bus.

A 'vbd' is somewhat like a SCSI target (or maybe 'lun').
The preferred way to create a vbd is to write a device name to a
'scan' file in the 'bus' device. (similar to ....scsi_host/host0/scan).
Legacy interfaces (md,dm,drbd,loop,...) would be able to do the same
thing using an internal interface.

This would make the named vbd appear in the bus and it would have some
attribute files which could be filled in to describe the device.
Writing one of these attributes would activate the device and make a
'block device' come into existence. The block device would be a child
of the vbd, just like sda is a child of a SCSI target.

When a vbd is being managed by a legacy interface (md, dm, drbd...) it
would probably has a second child device which represents that
interface.

So to be a bit concrete:

/sys/devices/virtual/vdbus would be the bus
/sys/devices/virtual/vdbus/md0 would be the vbd for an md device
/sys/devices/virtual/vdbus/md0/block/md0 would be the block device
/sys/devices/virtual/vdbus/md0/md/md0 would be an 'md' device
representing the (legacy) md interface.

For compatibility (maybe only temporarily),
/sys/devices/virtual/vdbus/md0/block/md0/md -> /sys/devices/virtual/vdbus/md0/md/md0

so the current /sys/block/mdX/md/ directory still works.
that directory would largely have symlink up to the parent,
though possible with different names.


The next bit is the messy bit that I haven't come up with an adequate
solution yet:
What is the relationship between the component devices and the vdb
device?

This is clearly a dependency, and sysfs has a clear model for
representing dependencies: The child is dependent on the parent.
However with vdb, the child is dependent on multiple parents and those
dependencies change.
As reported in http://lwn.net/Articles/347573/, other things have
multiple dependencies too, so we should probably try to make sure a
solution is created that fits both needs.
Personally, I would much rather all the dependencies were links, and
the directory hierarchy was
/sys/subsystem/$SUBSYSTEM/devices/$DEVICE
(where 'subsystem' subsumes both 'class' and 'bus'). But it is
probably 7 years too late for that.

The other thing I would really like to be able to manage is for a
'class/block' device to be able to be moved from one parent to
another. This would make it possible to change a block device to a
RAID1 containing the same data while it was mounted. It isn't too
hard to implement that internally, but making it fit with the sysfs
model is hard. It requires changeable dependencies again.


So yeah, let's have a discussion and find a good universal interface
which can subsume all the others and provide even more functionality,
but I don't think we can justify using the fact that we haven't
devised such an interface yet as reason to exclude DRBD.

NeilBrown

2009-09-18 20:08:01

by Jens Axboe

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

On Fri, Sep 18 2009, Neil Brown wrote:
> On Thursday September 17, [email protected] wrote:
> > On Thu, Sep 17, 2009 at 10:02:45AM -0600, James Bottomley wrote:
> > > So I think Christoph's NAK is rooted in the fact that we have a
> > > proliferation of in-kernel RAID implementations and he's trying to
> > > reunify them all again.
> > >
> > > As part of the review, reusing the kernel RAID (and actually logging)
> > > logic did come up and you added it to your todo list. Perhaps expanding
> > > on the status of that would help, since what's being looked for is that
> > > you're not adding more work to the RAID reunification effort and that
> > > you do have a plan and preferably a time frame for coming into sync with
> > > it.
> >
> > Yes. RDBD has spend tons of time out of tree, and if they want to put
> > it in now I think requiring them to do their homework is a good idea.
>
> What homework?
>
> If there was a sensible unifying framework in the kernel that they
> could plug in to, then requiring them do to that might make sense. But
> there isn't. You/I/We haven't created a solution (i.e. there is no
> equivalent of the VFS for virtual block devices) and saying that
> because we haven't they cannot merge DRBD hardly seems fair.
>
> Indeed, merging DRBD must be seen as a *good* thing as we then have
> more examples of differing requirements against which a proposed
> solution can be measured and tested.
>
> I thought the current attitude was "merge then fix". That is what the
> drivers/staging tree seems to be all about. Maybe you could argue
> that DRBD should go in to 'staging' first (though I don't think that
> is appropriate or require myself), but keeping it out just seems
> wrong.

FWIW, I agree with Neil here. If drbd is merge clean, lets go ahead and
merge it. While it would be nice to offload the raid unification onto
drbd, it's not exactly fair.

--
Jens Axboe

2009-09-19 05:15:35

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

On Fri, 18 Sep 2009 22:08:03 +0200
Jens Axboe <[email protected]> wrote:

> On Fri, Sep 18 2009, Neil Brown wrote:
> > On Thursday September 17, [email protected] wrote:
> > > On Thu, Sep 17, 2009 at 10:02:45AM -0600, James Bottomley wrote:
> > > > So I think Christoph's NAK is rooted in the fact that we have a
> > > > proliferation of in-kernel RAID implementations and he's trying to
> > > > reunify them all again.
> > > >
> > > > As part of the review, reusing the kernel RAID (and actually logging)
> > > > logic did come up and you added it to your todo list. Perhaps expanding
> > > > on the status of that would help, since what's being looked for is that
> > > > you're not adding more work to the RAID reunification effort and that
> > > > you do have a plan and preferably a time frame for coming into sync with
> > > > it.
> > >
> > > Yes. RDBD has spend tons of time out of tree, and if they want to put
> > > it in now I think requiring them to do their homework is a good idea.
> >
> > What homework?
> >
> > If there was a sensible unifying framework in the kernel that they
> > could plug in to, then requiring them do to that might make sense. But
> > there isn't. You/I/We haven't created a solution (i.e. there is no
> > equivalent of the VFS for virtual block devices) and saying that
> > because we haven't they cannot merge DRBD hardly seems fair.
> >
> > Indeed, merging DRBD must be seen as a *good* thing as we then have
> > more examples of differing requirements against which a proposed
> > solution can be measured and tested.
> >
> > I thought the current attitude was "merge then fix". That is what the
> > drivers/staging tree seems to be all about. Maybe you could argue
> > that DRBD should go in to 'staging' first (though I don't think that
> > is appropriate or require myself), but keeping it out just seems
> > wrong.
>
> FWIW, I agree with Neil here. If drbd is merge clean, lets go ahead and
> merge it. While it would be nice to offload the raid unification onto
> drbd, it's not exactly fair.

I guess that Christoph is worry about adding another user interface
for kinda device management; once we merge this, we can't fix it (for
the raid unification).

BTW, DM already has something like drbd? I thought that there is a
talk about that new target at LinuxCon.

2009-09-19 22:03:00

by Lars Marowsky-Bree

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

On 2009-09-19T14:14:30, FUJITA Tomonori <[email protected]> wrote:

> I guess that Christoph is worry about adding another user interface
> for kinda device management; once we merge this, we can't fix it (for
> the raid unification).

Why can't it be fixed?

Either

a) there's going to be a transition period during which the "old"
interface is supported but depreciated and scheduled to be removed (all
driving the new unified same back-end),

or b) there's going to be a new kernel which requires new user-space
tools sharp.

In either case, dm/md are affected by this, so a third interface doesn't
really make much difference. The refactoring needs to happen in the
back-end anyway, and that actually becomes easier when all concurrent
implementations are present and can be reworked at the same time.

> BTW, DM already has something like drbd? I thought that there is a
> talk about that new target at LinuxCon.

dm-replicator is nowhere near as usable as DRBD, and not upstream yet
either. (Further, it's another independent implementation, pursued
instead of unifying any of the existing ones or helping to merge drbd -
don't get me started on my thoughts of that.)


Regards,
Lars

--
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG N?rnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

2009-09-19 23:56:13

by Dan Williams

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

On Sat, Sep 19, 2009 at 3:02 PM, Lars Marowsky-Bree <[email protected]> wrote:
> On 2009-09-19T14:14:30, FUJITA Tomonori <[email protected]> wrote:
>
>> I guess that Christoph is worry about adding another user interface
>> for kinda device management; once we merge this, we can't fix it (for
>> the raid unification).
>
> Why can't it be fixed?
>
> Either
>
> a) there's going to be a transition period during which the "old"
> interface is supported but depreciated and scheduled to be removed (all
> driving the new unified same back-end),
>
> or b) there's going to be a new kernel which requires new user-space
> tools sharp.
>
> In either case, dm/md are affected by this, so a third interface doesn't
> really make much difference. The refactoring needs to happen in the
> back-end anyway, and that actually becomes easier when all concurrent
> implementations are present and can be reworked at the same time.

It's actually four "raid" implementations in the kernel if you count
the multiple-disk functionality of btrfs. The precedent is already
set for merging new multiple-disk management interfaces.

Neil has come the closest to actually trying to start (i.e. code) the
unification effort [1] and that was for the relatively straightforward
case of mapping the dm-raid5 backend to md-raid5... no uptake to date.
There are no strictly equivalent drbd-backends in the kernel
presently, so leaving this out of tree is a net-loss for mainline.

--
Dan

[1]: http://marc.info/?l=dm-devel&m=124567352518676&w=2

2009-09-21 13:40:53

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

On Sun, 20 Sep 2009 00:02:32 +0200
Lars Marowsky-Bree <[email protected]> wrote:

> On 2009-09-19T14:14:30, FUJITA Tomonori <[email protected]> wrote:
>
> > I guess that Christoph is worry about adding another user interface
> > for kinda device management; once we merge this, we can't fix it (for
> > the raid unification).
>
> Why can't it be fixed?
>
> Either
>
> a) there's going to be a transition period during which the "old"
> interface is supported but depreciated and scheduled to be removed (all
> driving the new unified same back-end),

We should avoid removing the existing interface. Once we merge drbd, I
don't think that it's a good idea to remove the drbd user interface.


> or b) there's going to be a new kernel which requires new user-space
> tools sharp.
>
> In either case, dm/md are affected by this, so a third interface doesn't
> really make much difference. The refactoring needs to happen in the
> back-end anyway, and that actually becomes easier when all concurrent
> implementations are present and can be reworked at the same time.

I don't think so. It's much easier to implement something that
supports fewer user interfaces.


> > BTW, DM already has something like drbd? I thought that there is a
> > talk about that new target at LinuxCon.
>
> dm-replicator is nowhere near as usable as DRBD, and not upstream yet

I don't think usability at this point is important. The design
matters. dm-replicator is built on the existing framework.

And my question is, if drbd and dm-replicator will provide similar
features, then why do we need both in mainline?


> either. (Further, it's another independent implementation, pursued
> instead of unifying any of the existing ones or helping to merge drbd -
> don't get me started on my thoughts of that.)

Again, dm-replicator is built on the existing framework instead of
adding another 'multiple (virtual) devices' framework into mainline.

2009-09-21 14:43:10

by Lars Ellenberg

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

On Mon, Sep 21, 2009 at 10:39:42PM +0900, FUJITA Tomonori wrote:
> > Either
> >
> > a) there's going to be a transition period during which the "old"
> > interface is supported but depreciated and scheduled to be removed (all
> > driving the new unified same back-end),
>
> We should avoid removing the existing interface. Once we merge drbd, I
> don't think that it's a good idea to remove the drbd user interface.


the drbd user interface is presented via
low level drbdsetup
and high level drbdadm (parses configuration files,
and calls out to drbdsetup).

changing (simplifying) the in-kernel configuration can be done any time,
as long as we can write a compat layer in the user land tools,
i.e. write drbdsetup so it will accept the same command line,
and try, based on "something" (sysfs file, genetlink group,
environment variable, whatever) the "new" kernel interface,
or the "old" one.

I don't see any issue here.

> I don't think so. It's much easier to implement something that
> supports fewer user interfaces.

We can choose whatever user-kernel interface you like,
and change it with every dot release --
we'd just need to add additional compat code into
the drbdsetup userland binary.

> > > BTW, DM already has something like drbd? I thought that there is a
> > > talk about that new target at LinuxCon.
> >
> > dm-replicator is nowhere near as usable as DRBD, and not upstream yet
>
> I don't think usability at this point is important. The design
> matters. dm-replicator is built on the existing framework.
>
> And my question is, if drbd and dm-replicator will provide similar
> features, then why do we need both in mainline?

dm-replicator is not there yet, and as such has zero user base.

To actually use it in the HA clustering world, quite a lot
userland glue would have to be written, which is not there yet either.

In contrast, DRBD is used in production, in many thousands of
installations worldwide since many years.

By design, dm-replicator is more comparable to dm-raid1, with the
knowledge that several mirror legs may break independently
(resulting in one "dirty log" per mirror leg), and come back
independendly, as well as the option of adding an on-disk ring-buffer to
any mirror leg.

It is by design NOT able to do dual-active mode.

If any of you happens to be at LinuxCon,
please discuss with Heinz (Maulshagen, dm-replicator)
and Phil (Reisner, DRBD), who both are present.

Heinz' talk about replicator is scheduled today, 10:30 am,
that would be a good opportunity, I guess.

> > either. (Further, it's another independent implementation, pursued
> > instead of unifying any of the existing ones or helping to merge drbd -
> > don't get me started on my thoughts of that.)
>
> Again, dm-replicator is built on the existing framework instead of
> adding another 'multiple (virtual) devices' framework into mainline.

Well, not exactly.

It adds quite a bit of additional framework (to the device mapper
subsystem), before it then starts to use that additional framework
via the generic device mapper hooks.

On that same line DRBD could argue that it uses the existing generic
block layer framework, just adding a bit functionality ;)

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD? and LINBIT? are registered trademarks of LINBIT, Austria.

2009-09-21 14:52:56

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

On Mon, 21 Sep 2009 16:43:08 +0200
Lars Ellenberg <[email protected]> wrote:
> We can choose whatever user-kernel interface you like,
> and change it with every dot release --
> we'd just need to add additional compat code into
> the drbdsetup userland binary.

uh no.

the kernel<->userspace ABI is stable.
we don't go about randomly changing it
(extending it is fine obviously)


--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-09-21 14:55:17

by Lars Ellenberg

[permalink] [raw]
Subject: Re: [Drbd-dev] [GIT PULL] DRBD for 2.6.32

On Mon, Sep 21, 2009 at 04:43:08PM +0200, Lars Ellenberg wrote:
> > > > BTW, DM already has something like drbd? I thought that there is a
> > > > talk about that new target at LinuxCon.

...

> If any of you happens to be at LinuxCon,
> please discuss with Heinz (Maulshagen, dm-replicator)
> and Phil (Reisner, DRBD), who both are present.
>
> Heinz' talk about replicator is scheduled today, 10:30 am,

And, Phil does a session about DRBD tomorrow,
Tue 22. Sep, 2:45 pm

So, two good opportunities,
for anyone who happens to be at or near LinuxCon, Portland.

Lars

2009-09-21 16:53:20

by Lars Ellenberg

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

On Mon, Sep 21, 2009 at 04:52:52PM +0200, Arjan van de Ven wrote:
> On Mon, 21 Sep 2009 16:43:08 +0200
> Lars Ellenberg <[email protected]> wrote:
> > We can choose whatever user-kernel interface you like,
> > and change it with every dot release --
> > we'd just need to add additional compat code into
> > the drbdsetup userland binary.
>
> uh no.
>
> the kernel<->userspace ABI is stable.
> we don't go about randomly changing it
> (extending it is fine obviously)

That's not what I meant, of course that is and needs to be stable.
Sorry, I exagerated to make a point.

Point was:
mdadm configured md.
dmsetup configured dm.
drbdsetup configure drbd.

If and when "something" is done to "unify" things on the implementation
level, it is likely to also unify the "kernel<->userspace" configuration
interface.

If it happens, once that happens, that _will_ be an ABI break.

One way to go about it would be to just do that excellently designed and
generic and extensible and whatnot new kernel<->userspace thing, and add
the necessary compat cruft to the above mentioned configuration tools.

Doing the drbdsetup part of it would be our part,
which we would gladly accept.

Not speaking that not yet designed all-new unified config interface
is not a valid argument against DRBD inclusion.

Lars

2009-09-21 22:28:26

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

On Mon, 21 Sep 2009 18:53:21 +0200
Lars Ellenberg <[email protected]> wrote:

> On Mon, Sep 21, 2009 at 04:52:52PM +0200, Arjan van de Ven wrote:
> > On Mon, 21 Sep 2009 16:43:08 +0200
> > Lars Ellenberg <[email protected]> wrote:
> > > We can choose whatever user-kernel interface you like,
> > > and change it with every dot release --
> > > we'd just need to add additional compat code into
> > > the drbdsetup userland binary.
> >
> > uh no.
> >
> > the kernel<->userspace ABI is stable.
> > we don't go about randomly changing it
> > (extending it is fine obviously)
>
> That's not what I meant, of course that is and needs to be stable.
> Sorry, I exagerated to make a point.
>
> Point was:
> mdadm configured md.
> dmsetup configured dm.
> drbdsetup configure drbd.
>
> If and when "something" is done to "unify" things on the implementation
> level, it is likely to also unify the "kernel<->userspace" configuration
> interface.
>
> If it happens, once that happens, that _will_ be an ABI break.

You misunderstand the raid unification.

We will not unify the kernel<->userspace configuration interface
because we can't break the kernel<->userspace ABI.

We plan to unify the multiple device frameworks, but the unified
framework must support the all existing ABIs.

So adding another 'drbd' ABI hurts us.

2009-09-22 01:13:14

by Kyle Moffett

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

On Mon, Sep 21, 2009 at 18:27, FUJITA Tomonori
<[email protected]> wrote:
> On Mon, 21 Sep 2009 18:53:21 +0200 Lars Ellenberg <[email protected]> wrote:
>> That's not what I meant, of course that is and needs to be stable.
>> Sorry, I exagerated to make a point.
>>
>> Point was:
>> mdadm configured md.
>> dmsetup configured dm.
>> drbdsetup configure drbd.
>>
>> If and when "something" is done to "unify" things on the implementation
>> level, it is likely to also unify the "kernel<->userspace" configuration
>> interface.
>>
>> If it happens, once that happens, that _will_ be an ABI break.
>
> You misunderstand the raid unification.
>
> We will not unify the kernel<->userspace configuration interface
> because we can't break the kernel<->userspace ABI.
>
> We plan to unify the multiple device frameworks, but the unified
> framework must support the all existing ABIs.
>
> So adding another 'drbd' ABI hurts us.

One major issue for me personally (and I don't think its been mentioned enough):

There is a *VAST* existing user-base for DRBD. Basically every vendor
builds the modules for their kernels, ships the userspace tools, etc.
*Regardless* of when or how it gets merged, the existing user-base
will need kernel support for the existing tools.

You have to realize that this project is NOT a new one, it's been
around quite a decent number of years (since kernel 2.2-ish). Yes,
the ABI is unique and has its warts, but there are a lot of things
that depend on it.

Think of it (in concept) like merging mainline support for an
architecture that has been forward-ported as patches since 2.2. If
the architecture was a simple embedded-only one (like a few recent
ones have been), then you might just say "hell with it, everybody
needs to rebuild libc and the world". That doesn't seem to be the
case with an enterprise-supported distributed block device.

IMHO, we should treat the kernel<=>userspace ABI as fixed... it's an
existing wart that will need to be supported for a while. The
benefits of getting the stable and long-out-of-tree drbd modules into
the mainline kernel will far outweigh the pain of having to maintain
the existing ABI.

To put it another way: Would you really keep a stable SCSI raid
driver for existing hardware out of mainline by claiming they need to
write a new raid-management abstraction first? If not, then why the
pushback on DRBD?

Cheers,
Kyle Moffett

2009-09-22 14:38:58

by Heinz Mauelshagen

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

On Mon, 2009-09-21 at 16:43 +0200, Lars Ellenberg wrote:
> On Mon, Sep 21, 2009 at 10:39:42PM +0900, FUJITA Tomonori wrote:
> > > Either
> > >
> > > a) there's going to be a transition period during which the "old"
> > > interface is supported but depreciated and scheduled to be removed (all
> > > driving the new unified same back-end),
> >
> > We should avoid removing the existing interface. Once we merge drbd, I
> > don't think that it's a good idea to remove the drbd user interface.
>
>
> the drbd user interface is presented via
> low level drbdsetup
> and high level drbdadm (parses configuration files,
> and calls out to drbdsetup).
>
> changing (simplifying) the in-kernel configuration can be done any time,
> as long as we can write a compat layer in the user land tools,
> i.e. write drbdsetup so it will accept the same command line,
> and try, based on "something" (sysfs file, genetlink group,
> environment variable, whatever) the "new" kernel interface,
> or the "old" one.
>
> I don't see any issue here.
>
> > I don't think so. It's much easier to implement something that
> > supports fewer user interfaces.
>
> We can choose whatever user-kernel interface you like,
> and change it with every dot release --
> we'd just need to add additional compat code into
> the drbdsetup userland binary.
>
> > > > BTW, DM already has something like drbd? I thought that there is a
> > > > talk about that new target at LinuxCon.
> > >
> > > dm-replicator is nowhere near as usable as DRBD, and not upstream yet
> >
> > I don't think usability at this point is important. The design
> > matters. dm-replicator is built on the existing framework.
> >
> > And my question is, if drbd and dm-replicator will provide similar
> > features, then why do we need both in mainline?
>
> dm-replicator is not there yet, and as such has zero user base.

dm-replicator is work in progress and we're aiming to ship it with
RHEL6.

>
> To actually use it in the HA clustering world, quite a lot
> userland glue would have to be written, which is not there yet either.

We had quite some target table syntax settlement to work through but
lvm2 support is coming along now hence leveraging the existing LVM2 UI
(e.g. lvconvert) to support managing remote replication of a set of
logical volumes to one or more remote sites.

>
> In contrast, DRBD is used in production, in many thousands of
> installations worldwide since many years.
>
> By design, dm-replicator is more comparable to dm-raid1, with the
> knowledge that several mirror legs may break independently
> (resulting in one "dirty log" per mirror leg), and come back
> independendly, as well as the option of adding an on-disk ring-buffer to
> any mirror leg.

The on-disk ring-buffer is not an option, it's mandatory and being used
to ensure write ordering fidelity for all devices eing replicated in
groups to one or more remote sites. dm-replicator ensures write ordering
for a group of devices rather than single devices while replicating.

The per remote device dirty logs are being used for initial
synchronization of remote devices *and* to allow fallback to dirty
logging in case the replication log (which ensures write ordering
fidelity to allow for remote recovery after a failover) runs full. That
fallback mode allows us to avoid starvation of application io when the
log gets full.

>
> It is by design NOT able to do dual-active mode.

This is a false statement.

dm-replicator abstracts the logging of the data and the transport out
into separate plugin-type modules. It just happens to be that the
initial version is active-passive because of our requirements which aim
at long distance replication, hence don't require active-active
initially. A different log module can support active-active but this is
not our goal initially.

>
> If any of you happens to be at LinuxCon,
> please discuss with Heinz (Maulshagen, dm-replicator)
> and Phil (Reisner, DRBD), who both are present.
>
> Heinz' talk about replicator is scheduled today, 10:30 am,
> that would be a good opportunity, I guess.

My talk's past now but I'm still at the conference till Wednesday so
please feel free to contact me.

Heinz

>
> > > either. (Further, it's another independent implementation, pursued
> > > instead of unifying any of the existing ones or helping to merge drbd -
> > > don't get me started on my thoughts of that.)
> >
> > Again, dm-replicator is built on the existing framework instead of
> > adding another 'multiple (virtual) devices' framework into mainline.
>
> Well, not exactly.
>
> It adds quite a bit of additional framework (to the device mapper
> subsystem), before it then starts to use that additional framework
> via the generic device mapper hooks.
>
> On that same line DRBD could argue that it uses the existing generic
> block layer framework, just adding a bit functionality ;)
>

2009-09-22 06:21:10

by Lars Marowsky-Bree

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

On 2009-09-22T07:27:21, FUJITA Tomonori <[email protected]> wrote:

> > If it happens, once that happens, that _will_ be an ABI break.
>
> You misunderstand the raid unification.
>
> We will not unify the kernel<->userspace configuration interface
> because we can't break the kernel<->userspace ABI.

I disagree here. Who says we can't over time, and with due notice?

For sure, the new ABI needs to co-exist with the old ones for a while,
until it is proven and fully complete, but then, why can't the old one
be marked as depreciated and phased out over 1-2 years time?

Users won't notice. Modern distros will switch, and in cases of legacy
distros ("enterprise"), the vendors will backport appropriately.

This happens. There's precedence with the network filtering rules etc.

> We plan to unify the multiple device frameworks, but the unified
> framework must support the all existing ABIs.
>
> So adding another 'drbd' ABI hurts us.

Even that doesn't really apply, I think. If the new framework is
powerful enough and a super-set of everything that came before, the shim
layer will be somewhat annoying, but harmless code.


Regards,
Lars

--
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG N?rnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

2009-09-23 11:29:44

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

On Mon, 21 Sep 2009 20:51:32 -0400
Kyle Moffett <[email protected]> wrote:

> On Mon, Sep 21, 2009 at 18:27, FUJITA Tomonori
> <[email protected]> wrote:
> > On Mon, 21 Sep 2009 18:53:21 +0200 Lars Ellenberg <[email protected]> wrote:
> >> That's not what I meant, of course that is and needs to be stable.
> >> Sorry, I exagerated to make a point.
> >>
> >> Point was:
> >> mdadm configured md.
> >> dmsetup configured dm.
> >> drbdsetup configure drbd.
> >>
> >> If and when "something" is done to "unify" things on the implementation
> >> level, it is likely to also unify the "kernel<->userspace" configuration
> >> interface.
> >>
> >> If it happens, once that happens, that _will_ be an ABI break.
> >
> > You misunderstand the raid unification.
> >
> > We will not unify the kernel<->userspace configuration interface
> > because we can't break the kernel<->userspace ABI.
> >
> > We plan to unify the multiple device frameworks, but the unified
> > framework must support the all existing ABIs.
> >
> > So adding another 'drbd' ABI hurts us.
>
> One major issue for me personally (and I don't think its been mentioned enough):
>
> There is a *VAST* existing user-base for DRBD. Basically every vendor
> builds the modules for their kernels, ships the userspace tools, etc.
> *Regardless* of when or how it gets merged, the existing user-base
> will need kernel support for the existing tools.

I don't think that the user base can be a reason for mainline
inclusion.

IMHO, vendors should use their resource to push an out-of-tree thing
into mainline instead of taking care of it with their own
kernels. Finally, device-mapper people are trying to push the similar
feature. I think that the history taught us that people who have used
out-of-tree stuff eventually move in the mainline alternative.


> To put it another way: Would you really keep a stable SCSI raid
> driver for existing hardware out of mainline by claiming they need to
> write a new raid-management abstraction first? If not, then why the
> pushback on DRBD?

Yeah, we should have done that. It's too late though.

Anyway, I don't think that your example is fair; we need a driver
for scsi hardware but we have an alternative to drbd.

2009-09-23 11:37:37

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

On Tue, 22 Sep 2009 08:20:34 +0200
Lars Marowsky-Bree <[email protected]> wrote:

> On 2009-09-22T07:27:21, FUJITA Tomonori <[email protected]> wrote:
>
> > > If it happens, once that happens, that _will_ be an ABI break.
> >
> > You misunderstand the raid unification.
> >
> > We will not unify the kernel<->userspace configuration interface
> > because we can't break the kernel<->userspace ABI.
>
> I disagree here. Who says we can't over time, and with due notice?
>
> For sure, the new ABI needs to co-exist with the old ones for a while,
> until it is proven and fully complete, but then, why can't the old one
> be marked as depreciated and phased out over 1-2 years time?

Let me know If you find a Linux storage developer who say, "Yeah, we
can remove the md ABI over 1-2 years time after the raid unification".

Seems that you have a very different idea from other kernel developers
about the stable ABI.


> > We plan to unify the multiple device frameworks, but the unified
> > framework must support the all existing ABIs.
> >
> > So adding another 'drbd' ABI hurts us.
>
> Even that doesn't really apply, I think. If the new framework is
> powerful enough and a super-set of everything that came before, the shim
> layer will be somewhat annoying, but harmless code.

Improving the existing framework is a proper approach.

2009-09-23 11:58:00

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

On Mon, Sep 21, 2009 at 08:51:32PM -0400, Kyle Moffett wrote:
> You have to realize that this project is NOT a new one, it's been
> around quite a decent number of years (since kernel 2.2-ish). Yes,
> the ABI is unique and has its warts, but there are a lot of things
> that depend on it.

So? That's never been an argument. Quite contrary, we ignored upstream
for years and fucked up out of tree but please merge anyway is almost
a counter-argument.

2009-09-23 14:01:29

by Kyle Moffett

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

On Wed, Sep 23, 2009 at 07:57, Christoph Hellwig <[email protected]> wrote:
> On Mon, Sep 21, 2009 at 08:51:32PM -0400, Kyle Moffett wrote:
>> You have to realize that this project is NOT a new one, it's been
>> around quite a decent number of years (since kernel 2.2-ish).  Yes,
>> the ABI is unique and has its warts, but there are a lot of things
>> that depend on it.
>
> So?  That's never been an argument.  Quite contrary, we ignored upstream
> for years and fucked up out of tree but please merge anyway is almost
> a counter-argument.

That's not what happened with DRBD at *all*. It was a large project
that ignored upstream for a while yes... but recently they decided to
do things right and submitted all of their patches for review and
comments. After a good number of review cycles during which they were
model citizens for making big necessary changes, nobody could find
anything technically wrong with the code.

Now people are asking the out-of-tree project to continue to maintain
their otherwise-perfectly-merge-ready patchset while also implementing
a bunch of MD/DM/RAID-integration code. Meanwhile several of the
DM/MD RAID guys who *already* have their code upstream have not been
having much luck defining a usable userspace API for the proposed
integrated configuration model.

At the very least, the code is at the point where Greg KH could easily
merge it into staging:
* The code is under GPLv2
* The goal of the developers is to get it merged in the near future
* It builds properly on x86
* It's for a new feature (not an existing one)
* There's a reliable point-of-contact for the code

The only thing missing is a list of exactly what still needs to be
fixed. I see a lot of handwaving about "We want a new API", but
nobody defining what the requirements for that are. If nobody can
figure that out yet, then I see no reason it shouldn't be mainline
mergeable; both Neil Brown and Jens Axboe seem to think this is ready
to merge as well.

Cheers,
Kyle Moffett

2009-09-23 19:10:23

by Roland

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

Dear kernel developers,

looking at this from a user/admin/system-engineer perspective, to be honest: i think this discussion sucks !

sorry for being a little bit offensive, but the DRBD people did their homework and they did it well, and now you scare them away with this deferred "ohhh, not another raid implementation" discussion.

you also scare away DRBD users, at least you scare ME away, as i loose trust in Linux and loose trust in DRBD to be a "solution with a future".

quoting fujita:
>Anyway, I don't think that your example is fair; we need a driver
>for scsi hardware but we have an alternative to drbd.

no, i don`t think there is a real alternative to DRBD. At least not now. I?m quite sure we won`t have something as feature rich, as stable and as usable within the next year.

dm-replicator? come on!

to give an example, of how long development of kernel features may last: for how long is BTRFS being discussed and being introduced now?
announcement is more than 2 years ago, but it?s in mainline.
i can create snapshots for ages - but why can`t i still delete them?
how can something go mainline which is a "work-in-progress", missing important features and still may get on-disk-layout changes?
is there a different rule for integrating filesystems?
Colognian cliquishness, or what? ;-)

to my sorrow, i now drop my linux based storage server undertaking and give Solaris/ZFS + ZFS replication a try.
that will probably also fit. and works. and is supported.

use a distro which includes DRBD ?
uhhhm, no......you guys don`t like it...so i don`t like it.
at least using tainted kernel without upstream support is something to avoid....

my 10 cents
roland




>On Wed, Sep 23, 2009 at 07:57, Christoph Hellwig <[email protected]> wrote:
>> On Mon, Sep 21, 2009 at 08:51:32PM -0400, Kyle Moffett wrote:
>>> You have to realize that this project is NOT a new one, it's been
>>> around quite a decent number of years (since kernel 2.2-ish). ? Yes,
>>> the ABI is unique and has its warts, but there are a lot of things
>>> that depend on it.
>>
>> So? ? That's never been an argument. ? Quite contrary, we ignored upstream
>> for years and fucked up out of tree but please merge anyway is almost
>> a counter-argument.
>
>That's not what happened with DRBD at *all*. It was a large project
>that ignored upstream for a while yes... but recently they decided to
>do things right and submitted all of their patches for review and
>comments. After a good number of review cycles during which they were
>model citizens for making big necessary changes, nobody could find
>anything technically wrong with the code.
>
>Now people are asking the out-of-tree project to continue to maintain
>their otherwise-perfectly-merge-ready patchset while also implementing
>a bunch of MD/DM/RAID-integration code. Meanwhile several of the
>DM/MD RAID guys who *already* have their code upstream have not been
>having much luck defining a usable userspace API for the proposed
>integrated configuration model.
>
>At the very least, the code is at the point where Greg KH could easily
>merge it into staging:
> * The code is under GPLv2
> * The goal of the developers is to get it merged in the near future
> * It builds properly on x86
> * It's for a new feature (not an existing one)
> * There's a reliable point-of-contact for the code
>
>The only thing missing is a list of exactly what still needs to be
>fixed. I see a lot of handwaving about "We want a new API", but
>nobody defining what the requirements for that are. If nobody can
>figure that out yet, then I see no reason it shouldn't be mainline
>mergeable; both Neil Brown and Jens Axboe seem to think this is ready
>to merge as well.
>
>Cheers,
>Kyle Moffett

______________________________________________________
GRATIS f?r alle WEB.DE-Nutzer: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://movieflat.web.de

2009-09-23 23:05:24

by NeilBrown

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

On Wednesday September 23, [email protected] wrote:
> On Tue, 22 Sep 2009 08:20:34 +0200
> Lars Marowsky-Bree <[email protected]> wrote:
>
> > On 2009-09-22T07:27:21, FUJITA Tomonori <[email protected]> wrote:
> >
> > > > If it happens, once that happens, that _will_ be an ABI break.
> > >
> > > You misunderstand the raid unification.
> > >
> > > We will not unify the kernel<->userspace configuration interface
> > > because we can't break the kernel<->userspace ABI.
> >
> > I disagree here. Who says we can't over time, and with due notice?
> >
> > For sure, the new ABI needs to co-exist with the old ones for a while,
> > until it is proven and fully complete, but then, why can't the old one
> > be marked as depreciated and phased out over 1-2 years time?
>
> Let me know If you find a Linux storage developer who say, "Yeah, we
> can remove the md ABI over 1-2 years time after the raid unification".

I would have said 3-5 years, that being about the time frame for
enterprise releases, and it would be best if every enterprise vendor
got to have a release that supported both the old and the new
interface. But I don't have a problem with migrating to a better ABI
is we actually had a better ABI.
>
> Seems that you have a very different idea from other kernel developers
> about the stable ABI.

CONFIG_SYSFS_DEPRECATED_V2 seems to suggest that other kernel
developers understand that we sometimes make mistakes and need to
deprecate them.


However I think this is all very premature as there is even a coherent
proposal for what unification might look like, let alone broad
agreement or implementation. I would *much* rather we spent our
energies debating that than debating whether or not DRBD should get
merged.... Maybe would should only accept votes on "Should DRBD get
merged" from people provide constructive input to the question "what
would a unified virtual block device model look like".

>
> Improving the existing framework is a proper approach.

Yes. So let's do it.

NeilBrown

2009-09-23 23:23:42

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

On Wed, 23 Sep 2009 10:01:09 -0400
Kyle Moffett <[email protected]> wrote:

> The only thing missing is a list of exactly what still needs to be
> fixed. I see a lot of handwaving about "We want a new API", but
> nobody defining what the requirements for that are. If nobody can

Seems that you missed Christoph's requirements:

http://marc.info/?l=drbd-dev&m=125326222414549&w=2

2009-09-23 23:38:38

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

On Thu, 24 Sep 2009 09:06:15 +1000
Neil Brown <[email protected]> wrote:

> On Wednesday September 23, [email protected] wrote:
> > On Tue, 22 Sep 2009 08:20:34 +0200
> > Lars Marowsky-Bree <[email protected]> wrote:
> >
> > > On 2009-09-22T07:27:21, FUJITA Tomonori <[email protected]> wrote:
> > >
> > > > > If it happens, once that happens, that _will_ be an ABI break.
> > > >
> > > > You misunderstand the raid unification.
> > > >
> > > > We will not unify the kernel<->userspace configuration interface
> > > > because we can't break the kernel<->userspace ABI.
> > >
> > > I disagree here. Who says we can't over time, and with due notice?
> > >
> > > For sure, the new ABI needs to co-exist with the old ones for a while,
> > > until it is proven and fully complete, but then, why can't the old one
> > > be marked as depreciated and phased out over 1-2 years time?
> >
> > Let me know If you find a Linux storage developer who say, "Yeah, we
> > can remove the md ABI over 1-2 years time after the raid unification".
>
> I would have said 3-5 years, that being about the time frame for
> enterprise releases, and it would be best if every enterprise vendor

Enterprise vendors don't pick up the latest kernel. So I think that we
need more.


> got to have a release that supported both the old and the new
> interface. But I don't have a problem with migrating to a better ABI
> is we actually had a better ABI.
> >
> > Seems that you have a very different idea from other kernel developers
> > about the stable ABI.
>
> CONFIG_SYSFS_DEPRECATED_V2 seems to suggest that other kernel
> developers understand that we sometimes make mistakes and need to
> deprecate them.

Yeah, however, we can try not to make mistakes.


> However I think this is all very premature as there is even a coherent
> proposal for what unification might look like, let alone broad
> agreement or implementation. I would *much* rather we spent our
> energies debating that than debating whether or not DRBD should get
> merged.... Maybe would should only accept votes on "Should DRBD get
> merged" from people provide constructive input to the question "what
> would a unified virtual block device model look like".
>
> >
> > Improving the existing framework is a proper approach.
>
> Yes. So let's do it.

So we should implement something like drbd on the top of dm framework,
one of the existing 'virtual device' frameworks. That would improve dm
and we could get better ideas about "what would a unified virtual
block device model look like".

2009-09-25 05:26:46

by NeilBrown

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

On Thursday September 24, [email protected] wrote:
> On Thu, 24 Sep 2009 09:06:15 +1000
> Neil Brown <[email protected]> wrote:
>
> > On Wednesday September 23, [email protected] wrote:
> > > On Tue, 22 Sep 2009 08:20:34 +0200
> > > Lars Marowsky-Bree <[email protected]> wrote:
> > >
> > > > On 2009-09-22T07:27:21, FUJITA Tomonori <[email protected]> wrote:
> > > >
> > > > > > If it happens, once that happens, that _will_ be an ABI break.
> > > > >
> > > > > You misunderstand the raid unification.
> > > > >
> > > > > We will not unify the kernel<->userspace configuration interface
> > > > > because we can't break the kernel<->userspace ABI.
> > > >
> > > > I disagree here. Who says we can't over time, and with due notice?
> > > >
> > > > For sure, the new ABI needs to co-exist with the old ones for a while,
> > > > until it is proven and fully complete, but then, why can't the old one
> > > > be marked as depreciated and phased out over 1-2 years time?
> > >
> > > Let me know If you find a Linux storage developer who say, "Yeah, we
> > > can remove the md ABI over 1-2 years time after the raid unification".
> >
> > I would have said 3-5 years, that being about the time frame for
> > enterprise releases, and it would be best if every enterprise vendor
>
> Enterprise vendors don't pick up the latest kernel. So I think that we
> need more.

I don't really follow your logic, but that isn't important. I think
that we need to be open to deprecating old ABIs, particularly when the
ABI is largely used by just one or two programs or libraries. This is
the case for md/dm/drbd and similar devices.

>
>
> > got to have a release that supported both the old and the new
> > interface. But I don't have a problem with migrating to a better ABI
> > is we actually had a better ABI.
> > >
> > > Seems that you have a very different idea from other kernel developers
> > > about the stable ABI.
> >
> > CONFIG_SYSFS_DEPRECATED_V2 seems to suggest that other kernel
> > developers understand that we sometimes make mistakes and need to
> > deprecate them.
>
> Yeah, however, we can try not to make mistakes.

In this case the mistakes are already made. The main mistake was
that there was no credible model to follow for managing a virtual
block device. There is still no generally good model to follow so
that mistake hasn't been fixed.

Merging perfectly working and widely used code, so that it will be
easier for the community to maintain, to learn from, and to help
improve is not a mistake.
Had the implementers deliberately ignored the established practice in
the kernel for doing things you might have a case. But there was no
established practice to follow or to ignore.

>
>
> > However I think this is all very premature as there is even a coherent
> > proposal for what unification might look like, let alone broad
> > agreement or implementation. I would *much* rather we spent our
> > energies debating that than debating whether or not DRBD should get
> > merged.... Maybe would should only accept votes on "Should DRBD get
> > merged" from people provide constructive input to the question "what
> > would a unified virtual block device model look like".
> >
> > >
> > > Improving the existing framework is a proper approach.
> >
> > Yes. So let's do it.
>
> So we should implement something like drbd on the top of dm framework,
> one of the existing 'virtual device' frameworks. That would improve dm
> and we could get better ideas about "what would a unified virtual
> block device model look like".

dm is not an appropriate framework. It is a walled garden that
follows its own rules rather than being consistent with the rest of
the kernel.
The clearest example is that individual drivers in dm don't present
block devices, they present dm-targets. dm-targets do not fit in to
the device model and are not visible in sysfs.

NeilBrown

2009-09-25 10:01:43

by Lars Marowsky-Bree

[permalink] [raw]
Subject: Re: [GIT PULL] DRBD for 2.6.32

On 2009-09-25T15:27:40, Neil Brown <[email protected]> wrote:

> > Enterprise vendors don't pick up the latest kernel. So I think that we
> > need more.

Enterprise kernel providers tend to accept the burden of supporting
their enterprise releases. While I appreciate the thought from the
community, I think the enterprise kernels already including drbd would
be extremely happy to see it officially included.

> I don't really follow your logic, but that isn't important. I think
> that we need to be open to deprecating old ABIs, particularly when the
> ABI is largely used by just one or two programs or libraries. This is
> the case for md/dm/drbd and similar devices.

It is even one step beyond this here. The additional ABI effort is
raised as an objection to merging drbd, and the drbd developers and user
community is offering to depreciate it within a reasonable timeframe of
a better ABI existing (since this will be hidden in the user-space
tools), if this means that it can be merged earlier.

This is quite different from an ABI which is expected to be stable and
remain forever (even if it was just an implicit user assumption); the
expectations are set accordingly from day 0, and thus should not be a
hurdle to acceptance.


Regards,
Lars

--
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG N?rnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde