[permalink] [raw]

Subject: Re: [PATCH 13/13] DRBD: final

> +#
> +config BLK_DEV_DRBD
> + tristate "DRBD Distributed Replicated Block Device support"
> + select INET
> + select PROC_FS
> + select CONNECTOR
> + select CRYPTO
> + select CRYPTO_HMAC

Have you double checked that these symbols are supposed to be 'selected'?
If they:
- have dependencies
- have a prompt
then they most likely are not.

> @@ -0,0 +1,7 @@
> +#CFLAGS_drbd_sizeof_sanity_check.o = -Wpadded # -Werror
Commented out?

> +
> +drbd-objs := drbd_buildtag.o drbd_bitmap.o drbd_proc.o \
> + drbd_worker.o drbd_receiver.o drbd_req.o drbd_actlog.o \
> + lru_cache.o drbd_main.o drbd_strings.o drbd_nl.o

Please use:
drdb-y := drbd_buildtag.o drbd_bitmap.o drbd_proc.o
...

And my personal taste favours:
drdb-y := ...
drdb-y += ...

over all the escaping.

Sam

2009-04-01 10:13:21

by Philipp Reisner

[permalink] [raw]

Subject: Re: [PATCH 13/13] DRBD: final

On Monday 30 March 2009 21:05:30 Sam Ravnborg wrote:
> > +#
> > +config BLK_DEV_DRBD
> > + tristate "DRBD Distributed Replicated Block Device support"
> > + select INET
> > + select PROC_FS
> > + select CONNECTOR
> > + select CRYPTO
> > + select CRYPTO_HMAC
>
> Have you double checked that these symbols are supposed to be 'selected'?
> If they:
> - have dependencies
> - have a prompt
> then they most likely are not.
>

Right! Reading kconfig-language.txt makes one wiser ;)
I have changed them into dependencies.

> > @@ -0,0 +1,7 @@
> > +#CFLAGS_drbd_sizeof_sanity_check.o = -Wpadded # -Werror
>
> Commented out?
>

Removed.

> > +
> > +drbd-objs := drbd_buildtag.o drbd_bitmap.o drbd_proc.o \
> > + drbd_worker.o drbd_receiver.o drbd_req.o drbd_actlog.o \
> > + lru_cache.o drbd_main.o drbd_strings.o drbd_nl.o
>
> Please use:
> drdb-y := drbd_buildtag.o drbd_bitmap.o drbd_proc.o
> ...
>
> And my personal taste favours:
> drdb-y := ...
> drdb-y += ...
>

Ok and ok, following your taste.

Thanks for those helpful hints!

-Phil
--
: Dipl-Ing Philipp Reisner
: LINBIT | Your Way to High Availability
: Tel: +43-1-8178292-50, Fax: +43-1-8178292-82
: http://www.linbit.com

DRBD(R) and LINBIT(R) are registered trademarks of LINBIT, Austria.

2009-04-07 10:52:38

by Lars Marowsky-Bree

[permalink] [raw]

Subject: Re: [PATCH 00/12] DRBD: a block device for HA clusters

On 2009-03-30T18:47:08, Philipp Reisner <[email protected]> wrote:

> Hi,
>
> This is a repost of DRBD, to keep you updated about the ongoing
> cleanups.

Hi Philipp,

thanks for the submission!

On reading the code, I think it is in pretty good shape to be merged for
linux-next or Andrew's tree, at the very least.

(Ultimately, of course it'd be very nice if we could reduce the number
of raid engines in the kernel, but that should not necessarily delay the
merge here. Like Greg likes to say, the kernel community also merges
tons of hardware drivers in much worse states.)

Maybe you could also provide a git repository of a kernel tree with your
patches where testers could pull from?

Regards,
Lars

--
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG N?rnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

2009-04-07 12:23:27

by Nikanth K

[permalink] [raw]

Subject: Re: [PATCH 00/12] DRBD: a block device for HA clusters

Hi Philipp,

On Mon, Mar 30, 2009 at 10:17 PM, Philipp Reisner
<[email protected]> wrote:
> Hi,
>
> This is a repost of DRBD, to keep you updated about the ongoing
> cleanups.
>
> Description
>
> DRBD is a shared-nothing, synchronously replicated block device. It
> is designed to serve as a building block for high availability
> clusters and in this context, is a "drop-in" replacement for shared
> storage. Simplistically, you could see it as a network RAID 1.
>
> Each minor device has a role, which can be 'primary' or 'secondary'.
> On the node with the primary device the application is supposed to
> run and to access the device (/dev/drbdX). Every write is sent to
> the local 'lower level block device' and, across the network, to the
> node with the device in 'secondary' state. The secondary device
> simply writes the data to its lower level block device.
>
> DRBD can also be used in dual-Primary mode (device writable on both
> nodes), which means it can exhibit shared disk semantics in a
> shared-nothing cluster. Needless to say, on top of dual-Primary
> DRBD utilizing a cluster file system is necessary to maintain for
> cache coherency.
>
> This is one of the areas where DRBD differs notably from RAID1 (say
> md) stacked on top of NBD or iSCSI. DRBD solves the issue of
> concurrent writes to the same on disk location. That is an error of
> the layer above us -- it usually indicates a broken lock manager in
> a cluster file system --, but DRBD has to ensure that both sides
> agree on which write came last, and therefore overwrites the other
> write.
>

So this difference to RAID1+NBD is required only if the DLM of the
clustered fs is buggy?

> More background on this can be found in this paper:
> http://www.drbd.org/fileadmin/drbd/publications/drbd8.pdf
>
> Beyond that, DRBD addresses various issues of cluster partitioning,
> which the MD/NBD stack, to the best of our knowledge, does not
> solve. The above-mentioned paper goes into some detail about that as
> well.
>

It would be nice, if you can list those limitations of NBD/RAID here.

Thanks
Nikanth

2009-04-07 15:56:37

by Philipp Reisner

[permalink] [raw]

Subject: Re: [PATCH 00/12] DRBD: a block device for HA clusters

On Tuesday 07 April 2009 14:23:14 Nikanth K wrote:
> Hi Philipp,
>
> On Mon, Mar 30, 2009 at 10:17 PM, Philipp Reisner
>
> <[email protected]> wrote:
> > Hi,
> >
> > This is a repost of DRBD, to keep you updated about the ongoing
> > cleanups.
> >
> > Description
> >
> > DRBD is a shared-nothing, synchronously replicated block device. It
> > is designed to serve as a building block for high availability
> > clusters and in this context, is a "drop-in" replacement for shared
> > storage. Simplistically, you could see it as a network RAID 1.
> >
> > Each minor device has a role, which can be 'primary' or 'secondary'.
> > On the node with the primary device the application is supposed to
> > run and to access the device (/dev/drbdX). Every write is sent to
> > the local 'lower level block device' and, across the network, to the
> > node with the device in 'secondary' state. The secondary device
> > simply writes the data to its lower level block device.
> >
> > DRBD can also be used in dual-Primary mode (device writable on both
> > nodes), which means it can exhibit shared disk semantics in a
> > shared-nothing cluster. Needless to say, on top of dual-Primary
> > DRBD utilizing a cluster file system is necessary to maintain for
> > cache coherency.
> >
> > This is one of the areas where DRBD differs notably from RAID1 (say
> > md) stacked on top of NBD or iSCSI. DRBD solves the issue of
> > concurrent writes to the same on disk location. That is an error of
> > the layer above us -- it usually indicates a broken lock manager in
> > a cluster file system --, but DRBD has to ensure that both sides
> > agree on which write came last, and therefore overwrites the other
> > write.
>
> So this difference to RAID1+NBD is required only if the DLM of the
> clustered fs is buggy?
>

No, DRBD is much more than RAID1+NBD, I had the impression that by writing
"RAID1+NBD" I can quickly communicate the big picture what DRBD is.

> > More background on this can be found in this paper:
> > http://www.drbd.org/fileadmin/drbd/publications/drbd8.pdf
> >
> > Beyond that, DRBD addresses various issues of cluster partitioning,
> > which the MD/NBD stack, to the best of our knowledge, does not
> > solve. The above-mentioned paper goes into some detail about that as
> > well.
>
> It would be nice, if you can list those limitations of NBD/RAID here.
>

Ok. I will give you two simple examples:

1)
Think of a two node HA cluster. Node A is active ('primary' in DRBD speak)
has the filesystem mounted and the application running. Node B is
in standby mode ('secondary' in DRBD speak).

We loose network connectivity, the primary node continues to run, the
secondary no longer gets updates.

Then we have a complete power failure, both nodes are down. Then they
power up the data center again, but at first the get only the power circuit
of node B up and running again.

Should node B offer the service right now ?
( DRBD has configurable policies for that )

Later on they manage to get node A up and running again, now lets assume
node B was chosen to be the new primary node. What needs to be done ?

Modifications on B since it became primary needs to be resynced to A.
Modifications on A sind it lost contact to B needs to be taken out.

DRBD does that.

How do you fit that into a RAID1+NBD model ? NBD is just a block transport,
it does not offer the ability to exchange dirty bitmaps or data generation
identifiers, nor does the RAID1 code has a concept of that.

2)
When using DRBD over small bandwidth links, one has to run a resync, DRBD
offers the option to do a "checksum based resync". Similar to rsync it
at first only exchanges a checksum, and transmits the whole data block only
if the checksums differ.

That again is something that does not fit into the concepts of NBD or RAID1.

I will write down more examples if you think, that you need more justification
for yet another implementation of RAID in the kernel. DRBD does more, but DRBD
is not suitable for RAID1 on a local box.

PS: Lars Marowsky-Bree requested a GIT tree of the DRBD-for-mainline kernel
patch. I will set that up until Friday, and maintain the code there for
for the merging process.

Best,
Philipp
--
: Dipl-Ing Philipp Reisner
: LINBIT | Your Way to High Availability
: Tel: +43-1-8178292-50, Fax: +43-1-8178292-82
: http://www.linbit.com

DRBD(R) and LINBIT(R) are registered trademarks of LINBIT, Austria.