2011-06-23 11:02:36

by Nico Schottelius

[permalink] [raw]
Subject: Mis-Design of Btrfs?

Good morning devs,

I'm wondering whether the raid- and volume-management-builtin of btrfs is
actually a sane idea or not.
Currently we do have md/device-mapper support for raid
already, btrfs lacks raid5 support and re-implements stuff that
has already been done.

I'm aware of the fact that it is very useful to know on which devices
we are in a filesystem. But I'm wondering, whether it wouldn't be
smarter to generalise the information exposure through the VFS layer
instead of replicating functionality:

Physical: USB-HD SSD USB-Flash | Exposes information to
Raid: Raid1, Raid5, Raid10, etc. | higher levels
Crypto: Luks |
LVM: Groups/Volumes |
FS: xfs/jfs/reiser/ext3 v

Thus a filesystem like ext3 could be aware that it is running
on a USB HD, enable -o sync be default or have the filesystem
to rewrite blocks when running on crypto or optimise for an SSD, ...

Cheers,

Nico

--
PGP key: 7ED9 F7D3 6B10 81D7 0EC5 5C09 D7DC C8E4 3187 7DF0


2011-06-27 06:47:43

by NeilBrown

[permalink] [raw]
Subject: Re: Mis-Design of Btrfs?

On Thu, 23 Jun 2011 12:53:37 +0200 Nico Schottelius
<[email protected]> wrote:

> Good morning devs,
>
> I'm wondering whether the raid- and volume-management-builtin of btrfs is
> actually a sane idea or not.
> Currently we do have md/device-mapper support for raid
> already, btrfs lacks raid5 support and re-implements stuff that
> has already been done.
>
> I'm aware of the fact that it is very useful to know on which devices
> we are in a filesystem. But I'm wondering, whether it wouldn't be
> smarter to generalise the information exposure through the VFS layer
> instead of replicating functionality:
>
> Physical: USB-HD SSD USB-Flash | Exposes information to
> Raid: Raid1, Raid5, Raid10, etc. | higher levels
> Crypto: Luks |
> LVM: Groups/Volumes |
> FS: xfs/jfs/reiser/ext3 v
>
> Thus a filesystem like ext3 could be aware that it is running
> on a USB HD, enable -o sync be default or have the filesystem
> to rewrite blocks when running on crypto or optimise for an SSD, ...

I would certainly agree that exposing information to higher levels is a good
idea. To some extent we do. But it isn't always as easy as it might sound.
Choosing exactly what information to expose is the challenge. If you lack
sufficient foresight you might expose something which turns out to be
very specific to just one device, so all those upper levels which make use of
the information find they are really special-casing one specific device,
which isn't a good idea.


However it doesn't follow that RAID5 should not be implemented in BTRFS.
The levels that you have drawn are just one perspective. While that has
value, it may not be universal.
I could easily argue that the LVM layer is a mistake and that filesystems
should provide that functionality directly.
I could almost argue the same for crypto.
RAID1 can make a lot of sense to be tightly integrated with the FS.
RAID5 ... I'm less convinced, but then I have a vested interest there so that
isn't an objective assessment.

Part of "the way Linux works" is that s/he who writes the code gets to make
the design decisions. The BTRFS developers might create something truly
awesome, or might end up having to support a RAID feature that they
subsequently think is a bad idea. But it really is their decision to make.

NeilBrown

2011-06-29 09:30:14

by Ric Wheeler

[permalink] [raw]
Subject: Re: Mis-Design of Btrfs?

On 06/27/2011 07:46 AM, NeilBrown wrote:
> On Thu, 23 Jun 2011 12:53:37 +0200 Nico Schottelius
> <[email protected]> wrote:
>
>> Good morning devs,
>>
>> I'm wondering whether the raid- and volume-management-builtin of btrfs is
>> actually a sane idea or not.
>> Currently we do have md/device-mapper support for raid
>> already, btrfs lacks raid5 support and re-implements stuff that
>> has already been done.
>>
>> I'm aware of the fact that it is very useful to know on which devices
>> we are in a filesystem. But I'm wondering, whether it wouldn't be
>> smarter to generalise the information exposure through the VFS layer
>> instead of replicating functionality:
>>
>> Physical: USB-HD SSD USB-Flash | Exposes information to
>> Raid: Raid1, Raid5, Raid10, etc. | higher levels
>> Crypto: Luks |
>> LVM: Groups/Volumes |
>> FS: xfs/jfs/reiser/ext3 v
>>
>> Thus a filesystem like ext3 could be aware that it is running
>> on a USB HD, enable -o sync be default or have the filesystem
>> to rewrite blocks when running on crypto or optimise for an SSD, ...
> I would certainly agree that exposing information to higher levels is a good
> idea. To some extent we do. But it isn't always as easy as it might sound.
> Choosing exactly what information to expose is the challenge. If you lack
> sufficient foresight you might expose something which turns out to be
> very specific to just one device, so all those upper levels which make use of
> the information find they are really special-casing one specific device,
> which isn't a good idea.
>
>
> However it doesn't follow that RAID5 should not be implemented in BTRFS.
> The levels that you have drawn are just one perspective. While that has
> value, it may not be universal.
> I could easily argue that the LVM layer is a mistake and that filesystems
> should provide that functionality directly.
> I could almost argue the same for crypto.
> RAID1 can make a lot of sense to be tightly integrated with the FS.
> RAID5 ... I'm less convinced, but then I have a vested interest there so that
> isn't an objective assessment.
>
> Part of "the way Linux works" is that s/he who writes the code gets to make
> the design decisions. The BTRFS developers might create something truly
> awesome, or might end up having to support a RAID feature that they
> subsequently think is a bad idea. But it really is their decision to make.
>
> NeilBrown
>

One more thing to add here is that I think that we still have a chance to
increase the sharing between btrfs and the MD stack if we can get those changes
made. No one likes to duplicate code, but we will need a richer interface
between the block and file system layer to help close that gap.

Ric