An option to select the RAID Stripe size is made
available in the BTRFS Filesystem, via an option
in the BTRFS Config setup, with minimal change
to the existing code base.
Signed-off-by: Sanidhya Solanki <[email protected]>
---
fs/btrfs/Kconfig | 42 ++++++++++++++++++++++++++++++++++++++++++
fs/btrfs/Makefile | 2 ++
fs/btrfs/scrub.c | 4 ++++
fs/btrfs/super.c | 4 ++++
fs/btrfs/volumes.c | 2 +-
fs/btrfs/volumes.h | 4 +++-
6 files changed, 56 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig
index 80e9c18..1454d21 100644
--- a/fs/btrfs/Kconfig
+++ b/fs/btrfs/Kconfig
@@ -28,6 +28,48 @@ config BTRFS_FS
If unsure, say N.
+choice
+ prompt "Choose Stripe Size"
+ default RS_1024
+ help
+ Allows you to select the size of the stripe, which is the smallest sized
+ data block to be replicated.
+ Selecting a larger size than your physical block size may lead to data
+ fragmentation in some cases.
+
+ config RS_512
+ bool "512 bytes"
+
+ config RS_1024
+ bool "1024 bytes"
+
+ config RS_2048
+ bool "2048 bytes"
+
+ config RS_4096
+ bool "4096 bytes"
+
+ config RS_8192
+ bool "8192 bytes"
+
+ config RS_16384
+ bool "16384 bytes"
+
+ config RS_32768
+ bool "32768 bytes"
+
+endchoice
+
+config BTRFS_RAID_STRIPE
+ int
+ default 512 if RS_512
+ default 1024 if RS_1024
+ default 2048 if RS_2048
+ default 4096 if RS_4096
+ default 8192 if RS_8192
+ default 16384 if RS_16384
+ default 32768 if RS_32768
+
config BTRFS_FS_POSIX_ACL
bool "Btrfs POSIX Access Control Lists"
depends on BTRFS_FS
diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 6d1d0b9..1c4e384 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -17,3 +17,5 @@ btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
btrfs-$(CONFIG_BTRFS_FS_RUN_SANITY_TESTS) += tests/free-space-tests.o \
tests/extent-buffer-tests.o tests/btrfs-tests.o \
tests/extent-io-tests.o tests/inode-tests.o tests/qgroup-tests.o
+
+btrfs-$(CONFIG_BTRFS_RAID_STRIPE) += scrub.o super.o volumes.o
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index b091d94..4d0f802 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -63,6 +63,10 @@ struct scrub_ctx;
*/
#define SCRUB_MAX_PAGES_PER_BLOCK 16 /* 64k per node/leaf/sector */
+#define STRIPE_LENGTH CONFIG_BTRFS_RAID_STRIPE
+
+#define BTRFS_STRIPE_LEN (64 * STRIPE_LENGTH)
+
struct scrub_recover {
atomic_t refs;
struct btrfs_bio *bbio;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 24154e4..3d91f8d 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -64,6 +64,10 @@
#define CREATE_TRACE_POINTS
#include <trace/events/btrfs.h>
+#define STRIPE_LENGTH CONFIG_BTRFS_RAID_STRIPE
+
+#define BTRFS_STRIPE_LEN (64 * STRIPE_LENGTH)
+
static const struct super_operations btrfs_super_ops;
static struct file_system_type btrfs_fs_type;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 4564522..e1b2e5c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4461,7 +4461,7 @@ static int btrfs_cmp_device_info(const void *a, const void *b)
static u32 find_raid56_stripe_len(u32 data_devices, u32 dev_stripe_target)
{
/* TODO allow them to set a preferred stripe size */
- return 64 * 1024;
+ return BTRFS_STRIPE_LEN;
}
static void check_raid56_incompat_flag(struct btrfs_fs_info *info, u64 type)
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index d5c84f6..9115a80 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -26,7 +26,9 @@
extern struct mutex uuid_mutex;
-#define BTRFS_STRIPE_LEN (64 * 1024)
+#define STRIPE_LENGTH CONFIG_BTRFS_RAID_STRIPE
+
+#define BTRFS_STRIPE_LEN (64 * STRIPE_LENGTH)
struct buffer_head;
struct btrfs_pending_bios {
--
2.5.0
On Tue, 29 Dec 2015 14:39:07 +0100
David Sterba <[email protected]> wrote:
> The stripe size depends on how the filesystem was made, at the moment
> the stripesize parameter is missing from mkfs. The kernel module
> should support all sizes at runtime, so it's not a compile-time
> option.
No good? I will try and re-implement it as a runtime option.
Thanks
On Mon, Dec 28, 2015 at 07:24:11AM -0500, Sanidhya Solanki wrote:
> An option to select the RAID Stripe size is made
> available in the BTRFS Filesystem, via an option
> in the BTRFS Config setup, with minimal change
> to the existing code base.
The stripe size depends on how the filesystem was made, at the moment
the stripesize parameter is missing from mkfs. The kernel module should
support all sizes at runtime, so it's not a compile-time option.
On Tue, Dec 29, 2015 at 06:15:12AM -0500, Sanidhya Solanki wrote:
> On Tue, 29 Dec 2015 14:39:07 +0100
> David Sterba <[email protected]> wrote:
>
> > The stripe size depends on how the filesystem was made, at the moment
> > the stripesize parameter is missing from mkfs. The kernel module
> > should support all sizes at runtime, so it's not a compile-time
> > option.
>
> No good? I will try and re-implement it as a runtime option.
So you want to make the stripe size configurable? The stripesize is
sotred in the superblock but as the hardcoded value is 64k through the
BTRFS_STRIPE_LEN define, the superblock value is not honored in the
code. I don't know about all implications from changing the define to
sb->stripesize, also we want to define the allowed range etc. It would
be better to add more description to the patch.
On Tue, 29 Dec 2015 18:06:11 +0100
David Sterba <[email protected]> wrote:
> I don't know about all implications from changing the define to
> sb->stripesize, also we want to define the allowed range etc. It would
> be better to add more description to the patch.
So, is the patch atleast somewhat usable and you just need me to expand
the description, or should I write the whole patch from scratch,
modifying superblock size as a runtime configurable option?
Thanks
On Tue, 29 Dec 2015 18:06:11 +0100
David Sterba <[email protected]> wrote:
> So you want to make the stripe size configurable?...
As I see it there are 3 ways to do it:
-Make it a compile time option that only configures it for a single
system with any devices that are added to the RAID.
-Make it a runtime option that can change based on how the
administrator configures it.
-A non-user facing option that is configurable by someone like a
distribution maintainer for all systems using the Binary Distribution.
As I see it, DS would like something like the third option, but CAM
(ostensibly a SysAdmin) wants the second option.
On the other hand, I implemented the first option.
The first and third option can co-exit, the second is an orthogonal
target that needs to be setup separately.
Or we can make all options co-exist, but make it more complicated.
Please let me know which implementation is preferable, and, if you just
want me to expand the description (as DS' mail asked for) or redo the
entire setup.
Thanks
On Wed, 30 Dec 2015 19:59:16 +0800
Qu Wenruo <[email protected]> wrote:
> Not really sure about the difference between 2 and 3.
I should have made it clear before, I was asking the exact use case in
mind when listing the choices. Option 2 would be for SysAdmins running
production software and configuring it as they desire.
Option 3 is what we have in the Kernel now, before my patch, where the
option exists, but it is fixed by the code. You can change it, but you
need to be someone fairly involved in the upstream work (like a
distribution Maintainer). This is what my patch implements (well, this
and option 3).
Option 1 leaves it as a compile time option.
> When you mention runtime option, did you mean ioctl/mount/balance
> convert option?
Yes, that is correct.
> And what's the third one? Default mkfs time option?
> If you can make it mkfs time option, it won't be really hard to make
> it configurable.
This would be ideal for all use-cases, but make the implementation
much larger than it would be for the other options. Hence, I asked
what the exact use case was for the end-user being targeted.
> I didn't consider David means something that.
> As far as I read, he means balance convert option along with mkfs
> option.
Hence, why I asked.
> At least from what I have learned in recent btrfs development,
He> He> either
> we provide a good enough interfaces (normally, balance convert ioctl
> with mkfs time option) to configure some on-disk fields.
Just confirming before starting the implementation.
> So fixed kernel value is not a really good idea, and should at least
> be replace by mkfs time option.
Will do after confirmation.
Thanks
On Wed, 30 Dec 2015 22:10:44 +0800
Qu Wenruo <[email protected]> wrote:
> Understood now.
Good.
> I totally understand that implement ... to polish your
> skill.
That has got to be the most hilarious way I believe I have seen someone
delegate a task. But it was effective.
Only one problem. I do not run BTRFS on my systems nor do I have a
RAID setup, due to possessing a limited number of free drives. So, while
I may be able to code for it, I will not be able to test it. I will need
the community's help to do the testing.
I will get started tomorrow.
To-do (so far):
- Implement RAID Stripe length as a compile and runtime option.
- Implement a way to do an in-place Stripe Length change.
- Debugging & testing for the above additions.
Thanks.
On 12/30/2015 02:39 PM, Sanidhya Solanki wrote:
> On Tue, 29 Dec 2015 18:06:11 +0100
> David Sterba <[email protected]> wrote:
>
>> So you want to make the stripe size configurable?...
>
> As I see it there are 3 ways to do it:
> -Make it a compile time option that only configures it for a single
> system with any devices that are added to the RAID.
> -Make it a runtime option that can change based on how the
> administrator configures it.
> -A non-user facing option that is configurable by someone like a
> distribution maintainer for all systems using the Binary Distribution.
Not really sure about the difference between 2 and 3.
When you mention runtime option, did you mean ioctl/mount/balance
convert option?
And what's the third one? Default mkfs time option?
If you can make it mkfs time option, it won't be really hard to make it
configurable.
>
> As I see it, DS would like something like the third option, but CAM
> (ostensibly a SysAdmin) wants the second option.
I didn't consider David means something that.
As far as I read, he means balance convert option along with mkfs option.
>
> On the other hand, I implemented the first option.
At least from what I have learned in recent btrfs development, either we
provide a good enough interfaces (normally, balance convert ioctl with
mkfs time option) to configure some on-disk fields.
Or we just leave it to fixed value(normally 0, just like for encryption
of EXTENT_DATA, and that's the case for current stripe_size).
So fixed kernel value is not a really good idea, and should at least be
replace by mkfs time option.
>
> The first and third option can co-exit, the second is an orthogonal
> target that needs to be setup separately.
>
> Or we can make all options co-exist, but make it more complicated.
No need.
Just refer to how btrfs kernel handle chunk profile.
It can be specified at mkfs time (by -d and -m options), and can also be
converted later by balance ioctl. (by btrfs balance convert filter).
The only tricky thing I am a little considered about is, how do we keep
the default chunk stripe size for a fs.
Thanks,
Qu
>
> Please let me know which implementation is preferable, and, if you just
> want me to expand the description (as DS' mail asked for) or redo the
> entire setup.
>
> Thanks
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On 12/30/2015 05:54 PM, Sanidhya Solanki wrote:
> On Wed, 30 Dec 2015 19:59:16 +0800
> Qu Wenruo <[email protected]> wrote:
>> Not really sure about the difference between 2 and 3.
>
> I should have made it clear before, I was asking the exact use case in
> mind when listing the choices. Option 2 would be for SysAdmins running
> production software and configuring it as they desire.
> Option 3 is what we have in the Kernel now, before my patch, where the
> option exists, but it is fixed by the code. You can change it, but you
> need to be someone fairly involved in the upstream work (like a
> distribution Maintainer). This is what my patch implements (well, this
> and option 3).
> Option 1 leaves it as a compile time option.
>
>> When you mention runtime option, did you mean ioctl/mount/balance
>> convert option?
>
> Yes, that is correct.
>
>> And what's the third one? Default mkfs time option?
>> If you can make it mkfs time option, it won't be really hard to make
>> it configurable.
>
> This would be ideal for all use-cases, but make the implementation
> much larger than it would be for the other options. Hence, I asked
> what the exact use case was for the end-user being targeted.
>
>> I didn't consider David means something that.
>> As far as I read, he means balance convert option along with mkfs
>> option.
>
> Hence, why I asked.
>
>> At least from what I have learned in recent btrfs development,
> He> He> either
>> we provide a good enough interfaces (normally, balance convert ioctl
>> with mkfs time option) to configure some on-disk fields.
>
> Just confirming before starting the implementation.
>> So fixed kernel value is not a really good idea, and should at least
>> be replace by mkfs time option.
>
> Will do after confirmation.
Understood now.
Now I am on the same side of David.
Which means a runtime interface to change them. (along with mkfs option)
If provide some configurable features, then it should be able to be
tuned at both right time and mkfs time.
Or, just don't touch it until there is really enough user demand.
(In stripe_len case, it's also a possible choice, as configurable stripe
length doesn't really affect much except RAID5/6)
I totally understand that implement will cost you a lot of more time,
not only kernel part but also user-tool part.
But this also means more patches.
No matter what the motivation for you to contribute to btrfs, more
patches (except the more time spent) are always good.
More patches, more reputation built in community, and more patches also
means better split code structures for easier review.
And also you will need to do more debugging/tests, to polish your skill.
Thanks,
Qu
>
> Thanks
>
On Wed, Dec 30, 2015 at 06:15:23AM -0500, Sanidhya Solanki wrote:
> Only one problem. I do not run BTRFS on my systems nor do I have a
> RAID setup, due to possessing a limited number of free drives. So, while
> I may be able to code for it, I will not be able to test it. I will need
> the community's help to do the testing.
Multiple devices can be simulated by loop devices or one physical device
partitioned. I'd expect at least some testing on your side, the
community will help with testing, but that's nothing specific to this
patch. This happens all the time.
> I will get started tomorrow.
>
> To-do (so far):
> - Implement RAID Stripe length as a compile and runtime option.
I was trying to explain that it's not a compile time option.
> - Implement a way to do an in-place Stripe Length change.
How are you going to implement that? I've suggested the balance filter
style of conversion, which is not in-place so I'm curious what do you
mean by in-place.
On Wed, Dec 30, 2015 at 10:10:44PM +0800, Qu Wenruo wrote:
> Now I am on the same side of David.
> Which means a runtime interface to change them. (along with mkfs option)
>
> If provide some configurable features, then it should be able to be
> tuned at both right time and mkfs time.
> Or, just don't touch it until there is really enough user demand.
> (In stripe_len case, it's also a possible choice, as configurable stripe
> length doesn't really affect much except RAID5/6)
I think that we need configurable stripe size regardless. The
performance drop is measurable if the stripe size used by filesystem
does not match the hardware.
> I totally understand that implement will cost you a lot of more time,
> not only kernel part but also user-tool part.
>
> But this also means more patches.
> No matter what the motivation for you to contribute to btrfs, more
> patches (except the more time spent) are always good.
>
> More patches, more reputation built in community, and more patches also
> means better split code structures for easier review.
Let me note that a good reputation is also built from patch reviews
(hint hint).
On Wed, 30 Dec 2015 16:58:05 +0100
David Sterba <[email protected]> wrote:
> On Wed, Dec 30, 2015 at 06:15:23AM -0500, Sanidhya Solanki wrote:
> > - Implement a way to do an in-place Stripe Length change.
>
> How are you going to implement that? I've suggested the balance filter
> style of conversion, which is not in-place so I'm curious what do you
> mean by in-place.
As CAM suggested, it would basically be a CoW, with a checksum
comparison at the end to make sure no data has been corrupted.
In-place: Without taking the drives or filesystem offline or unmounting
them. Doing the conversion while the rest of the RAID is in use.
Risky, slow, but possible, given enough time for large data sets.
Thanks.
On Wed, 30 Dec 2015 17:17:22 +0100
David Sterba <[email protected]> wrote:
> Let me note that a good reputation is also built from patch reviews
> (hint hint).
Unfortunately, not too many patches coming in for BTRFS presently.
Mailing list activity is down to 25-35 mails per day. Mostly feature
and bug requests.
I will try to pitch in with patch reviews where possible.
Thanks.
David Sterba wrote on 2015/12/30 17:17 +0100:
> On Wed, Dec 30, 2015 at 10:10:44PM +0800, Qu Wenruo wrote:
>> Now I am on the same side of David.
>> Which means a runtime interface to change them. (along with mkfs option)
>>
>> If provide some configurable features, then it should be able to be
>> tuned at both right time and mkfs time.
>> Or, just don't touch it until there is really enough user demand.
>> (In stripe_len case, it's also a possible choice, as configurable stripe
>> length doesn't really affect much except RAID5/6)
>
> I think that we need configurable stripe size regardless. The
> performance drop is measurable if the stripe size used by filesystem
> does not match the hardware.
Right, I just missed the benchmark from Christoph and forgot the case of
RAID 5/6.
>
>> I totally understand that implement will cost you a lot of more time,
>> not only kernel part but also user-tool part.
>>
>> But this also means more patches.
>> No matter what the motivation for you to contribute to btrfs, more
>> patches (except the more time spent) are always good.
>>
>> More patches, more reputation built in community, and more patches also
>> means better split code structures for easier review.
>
> Let me note that a good reputation is also built from patch reviews
> (hint hint).
I must admit I'm a bad reviewer.
As when I review something, I always has an eager to rewrite part or all
the patch to follow my idea, even it's just a choice between different
design.
Thanks,
Qu
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>