From: Zheng Liu <[email protected]>
When cluster-size is enabled without bigalloc feature, mke2fs will ignore this
argument silently. But user might think bigalloc feature has been enabled
unless they use 'stats' command to check it in debugfs. So now we ask user
to set bigalloc feature explicity when cluster-size is enabled. This can
make sure that users understand what they are doing because bigalloc might
impact the performance for some workloads.
Signed-off-by: Zheng Liu <[email protected]>
---
misc/mke2fs.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index a288147..bf4d7a2 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -1915,8 +1915,14 @@ profile_error:
blocksize*16);
fs_param.s_log_cluster_size =
int_log2(cluster_size >> EXT2_MIN_CLUSTER_LOG_SIZE);
- } else
+ } else if (cluster_size) {
+ com_err(program_name, EINVAL,
+ _("while setting clustersize; You need to enable "
+ "bigalloc feature explicity"));
+ exit(1);
+ } else {
fs_param.s_log_cluster_size = fs_param.s_log_block_size;
+ }
if (inode_ratio == 0) {
inode_ratio = get_int_from_profile(fs_types, "inode_ratio",
--
1.7.12.rc2.18.g61b472e
From: Zheng Liu <[email protected]>
There are two bugs need to be fixed, which are about cluster-size. Now the
range of cluster-size is from 1024 to 512M bytes. Although with '-C 1024',
the cluster-size will be 4096 after making a filesystem because in
ext2fs_initialize() set_field() needs to check 'param->s_log_cluster_size' and
s_log_cluster_size is 0 as cluster-size is 1024. Then s_log_cluster_size will
be assigned to s_log_block_size+4. So we never set cluster-size to 1024.
Another bug is that when cluster-size is 512M EXT2FS_C2B will return 0. So
s_blocks_per_group will be assigned to zero and we will meet a 'division by
zero' error.
Here we reduce the range of cluster-size and check s_blocks_per_group=0 to avoid
'division by zero' error.
Signed-off-by: Zheng Liu <[email protected]>
---
lib/ext2fs/initialize.c | 4 ++++
misc/mke2fs.c | 4 ++--
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/lib/ext2fs/initialize.c b/lib/ext2fs/initialize.c
index dca0d9a..0eccd59 100644
--- a/lib/ext2fs/initialize.c
+++ b/lib/ext2fs/initialize.c
@@ -226,6 +226,10 @@ errcode_t ext2fs_initialize(const char *name, int flags,
super->s_clusters_per_group = EXT2_MAX_CLUSTERS_PER_GROUP(super);
super->s_blocks_per_group = EXT2FS_C2B(fs,
super->s_clusters_per_group);
+ if (super->s_blocks_per_group == 0) {
+ retval = EXT2_ET_TOOSMALL;
+ goto cleanup;
+ }
} else {
set_field(s_blocks_per_group, fs->blocksize * 8);
if (super->s_blocks_per_group > EXT2_MAX_BLOCKS_PER_GROUP(super))
diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index bf4d7a2..f4140a1 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -1384,8 +1384,8 @@ profile_error:
break;
case 'C':
cluster_size = strtoul(optarg, &tmp, 0);
- if (cluster_size < EXT2_MIN_CLUSTER_SIZE ||
- cluster_size > EXT2_MAX_CLUSTER_SIZE || *tmp) {
+ if (cluster_size <= EXT2_MIN_CLUSTER_SIZE ||
+ cluster_size >= EXT2_MAX_CLUSTER_SIZE || *tmp) {
com_err(program_name, 0,
_("invalid cluster size - %s"),
optarg);
--
1.7.12.rc2.18.g61b472e
From: Zheng Liu <[email protected]>
Bigalloc feature has been used for a long time, but the documentation in mke2fs
is still missing. So add it.
Signed-off-by: Zheng Liu <[email protected]>
---
misc/mke2fs.8.in | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/misc/mke2fs.8.in b/misc/mke2fs.8.in
index d4fbe00..ca3083d 100644
--- a/misc/mke2fs.8.in
+++ b/misc/mke2fs.8.in
@@ -187,6 +187,11 @@ Check the device for bad blocks before creating the file system. If
this option is specified twice, then a slower read-write
test is used instead of a fast read-only test.
.TP
+.B \-C " cluster-size"
+Specify the size of cluster in bytes. Valid cluster-size values are from
+2048 to 256M bytes per cluster. If omiited, cluster-size is 64KB by
+default.
+.TP
.B \-D
Use direct I/O when writing to the disk. This avoids mke2fs dirtying a
lot of buffer cache memory, which may impact other applications running
@@ -516,6 +521,12 @@ prefix the feature name with a caret ('^') character. The
pseudo-filesystem feature "none" will clear all filesystem features.
.RS 1.2i
.TP
+.B bigalloc
+Allow to allocate block-size beyond the 4096 bytes. That can decrease the time
+spent on doing block allocation and brings smaller fragmentation, especially
+for large files. The size can be specified using the
+.B \-C option.
+.TP
.B dir_index
Use hashed b-trees to speed up lookups in large directories.
.TP
--
1.7.12.rc2.18.g61b472e
On 2013-01-13, at 2:08 AM, Zheng Liu wrote:
> Here we reduce the range of cluster-size and check s_blocks_per_group=0 to avoid 'division by zero' error.
>
> Signed-off-by: Zheng Liu <[email protected]>
> diff --git a/misc/mke2fs.c b/misc/mke2fs.c
> index bf4d7a2..f4140a1 100644
> --- a/misc/mke2fs.c
> +++ b/misc/mke2fs.c
> @@ -1384,8 +1384,8 @@ profile_error:
> break;
> case 'C':
> cluster_size = strtoul(optarg, &tmp, 0);
> - if (cluster_size < EXT2_MIN_CLUSTER_SIZE ||
> - cluster_size > EXT2_MAX_CLUSTER_SIZE || *tmp) {
> + if (cluster_size <= EXT2_MIN_CLUSTER_SIZE ||
> + cluster_size >= EXT2_MAX_CLUSTER_SIZE || *tmp) {
> com_err(program_name, 0,
> _("invalid cluster size - %s"),
> optarg);
Wouldn't it make more sense to change EXT2_MIN_CLUSTER_SIZE and EXT2_MAX_CLUSTER_SIZE? Otherwise, those constants don't really
contain the min/max cluster size, and it is confusing to use them
in other code.
Cheers, Andreas
On Sun, Jan 13, 2013 at 05:08:13PM +0800, Zheng Liu wrote:
> From: Zheng Liu <[email protected]>
>
> When cluster-size is enabled without bigalloc feature, mke2fs will ignore this
> argument silently. But user might think bigalloc feature has been enabled
> unless they use 'stats' command to check it in debugfs. So now we ask user
> to set bigalloc feature explicity when cluster-size is enabled. This can
> make sure that users understand what they are doing because bigalloc might
> impact the performance for some workloads.
>
> Signed-off-by: Zheng Liu <[email protected]>
Thanks, applied.
- Ted
On Sun, Jan 13, 2013 at 05:08:14PM +0800, Zheng Liu wrote:
> From: Zheng Liu <[email protected]>
>
> There are two bugs need to be fixed, which are about cluster-size.
> Now the range of cluster-size is from 1024 to 512M bytes. Although
> with '-C 1024', the cluster-size will be 4096 after making a
> filesystem because in ext2fs_initialize() set_field() needs to check
> 'param->s_log_cluster_size' and s_log_cluster_size is 0 as
> cluster-size is 1024. Then s_log_cluster_size will be assigned to
> s_log_block_size+4. So we never set cluster-size to 1024.
>
> Another bug is that when cluster-size is 512M EXT2FS_C2B will return
> 0. So s_blocks_per_group will be assigned to zero and we will meet
> a 'division by zero' error.
There are a couple of things going on here. The first is that it
makes no senes when the cluster size is less than or equal to the
block size. (Actually, nothing bad should happen in the case when the
cluster size == block size, but if the user specified the bigalloc
feature, that's something which they almost certainly don't want.)
So the more general check is we should be complaining if the cluster
size is <= the block size. That is, the combination of -b 4096 and -C
2048 makes no sense, either.
Also, there's technically nothing wrong with a cluster size of 512MB.
The problem is in how we calculate the default number of clusters per
group --- if it translates to a number of blocks per group which
overals 2**32, that's when we run into problems.
Which leads to another bug in the current mke2fs command. The range
checking for the -g (which allows you to specify the number of blocks
per group is bogus in the case when the bigalloc feature is enabled).
I think the best way of fixing this is to document that the -g option
specifies the number of clusters per block if the bigalloc feature is
enabled.
- Ted
On 2013-01-14, at 2:03 PM, Theodore Ts'o wrote:
> On Sun, Jan 13, 2013 at 05:08:14PM +0800, Zheng Liu wrote:
>> From: Zheng Liu <[email protected]>
>>
>> There are two bugs need to be fixed, which are about cluster-size.
>> Now the range of cluster-size is from 1024 to 512M bytes. Although
>> with '-C 1024', the cluster-size will be 4096 after making a
>> filesystem because in ext2fs_initialize() set_field() needs to check
>> 'param->s_log_cluster_size' and s_log_cluster_size is 0 as
>> cluster-size is 1024. Then s_log_cluster_size will be assigned to
>> s_log_block_size+4. So we never set cluster-size to 1024.
>>
>> Another bug is that when cluster-size is 512M EXT2FS_C2B will return
>> 0. So s_blocks_per_group will be assigned to zero and we will meet
>> a 'division by zero' error.
>
> There are a couple of things going on here. The first is that it
> makes no senes when the cluster size is less than or equal to the
> block size. (Actually, nothing bad should happen in the case when
> cluster size == block size, but if the user specified the bigalloc
> feature, that's something which they almost certainly don't want.)
>
> So the more general check is we should be complaining if the cluster
> size is <= the block size. That is, the combination of -b 4096 and
> -C 2048 makes no sense, either.
>
> Also, there's technically nothing wrong with a cluster size of 512MB.
> The problem is in how we calculate the default number of clusters per
> group --- if it translates to a number of blocks per group which
> overals 2**32, that's when we run into problems.
>
> Which leads to another bug in the current mke2fs command. The range
> checking for the -g (which allows you to specify the number of blocks
> per group is bogus in the case when the bigalloc feature is enabled).
> I think the best way of fixing this is to document that the -g option
> specifies the number of clusters per block if the bigalloc feature is
> enabled.
Presumably you mean "the -g option specifies the number of clusters
_per_group_ if the bigalloc feature is enabled"?
Cheers, Andreas
On Mon, Jan 14, 2013 at 02:07:43PM -0700, Andreas Dilger wrote:
> > I think the best way of fixing this is to document that the -g option
> > specifies the number of clusters per block if the bigalloc feature is
> > enabled.
>
> Presumably you mean "the -g option specifies the number of clusters
> _per_group_ if the bigalloc feature is enabled"?
Yes, I meant to say that I want to change the definition of the -g
option such that if the bigalloc feature is enabled, it specifies the
number of clusters per block group.
- Ted
The -b and -C options now use parse_num_blocks2() insteda of strtol,
so that users can specify -C 256M instead of the much less convenient
-C 268435456.
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
misc/mke2fs.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index 5cb49b3..c16ab33 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -1324,10 +1324,10 @@ profile_error:
"b:cg:i:jl:m:no:qr:s:t:vC:DE:FG:I:J:KL:M:N:O:R:ST:U:V")) != EOF) {
switch (c) {
case 'b':
- blocksize = strtol(optarg, &tmp, 0);
+ blocksize = parse_num_blocks2(optarg, -1);
b = (blocksize > 0) ? blocksize : -blocksize;
if (b < EXT2_MIN_BLOCK_SIZE ||
- b > EXT2_MAX_BLOCK_SIZE || *tmp) {
+ b > EXT2_MAX_BLOCK_SIZE) {
com_err(program_name, 0,
_("invalid block size - %s"), optarg);
exit(1);
@@ -1345,9 +1345,9 @@ profile_error:
cflag++;
break;
case 'C':
- cluster_size = strtoul(optarg, &tmp, 0);
+ cluster_size = parse_num_blocks2(optarg, -1);
if (cluster_size <= EXT2_MIN_CLUSTER_SIZE ||
- cluster_size > EXT2_MAX_CLUSTER_SIZE || *tmp) {
+ cluster_size > EXT2_MAX_CLUSTER_SIZE) {
com_err(program_name, 0,
_("invalid cluster size - %s"),
optarg);
--
1.7.12.rc0.22.gcdd159b
If bigalloc is enabled, then -g will specify the clusters per block
group. (If bigalloc is not enabled, then a cluster == a block, so the
meaning of -g is not changed.)
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
misc/mke2fs.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index 75d0e48..5cb49b3 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -1962,6 +1962,15 @@ profile_error:
}
}
+ /*
+ * If the bigalloc feature is enabled, then the -g option will
+ * specify the number of clusters per group.
+ */
+ if (fs_param.s_feature_ro_compat & EXT4_FEATURE_RO_COMPAT_BIGALLOC) {
+ fs_param.s_clusters_per_group = fs_param.s_blocks_per_group;
+ fs_param.s_blocks_per_group = 0;
+ }
+
if (inode_size == 0)
inode_size = get_int_from_profile(fs_types, "inode_size", 0);
if (!flex_bg_size && (fs_param.s_feature_incompat &
--
1.7.12.rc0.22.gcdd159b
If the user attemps to create a 512MB cluster, we need to adjust the
defaults to avoid a 32-bit overflow of s_blocks_per_group. Also check
to make sure that the caller of ext2fs_initialize() has not given a
value of s_clusters_per_group that would result in an overflow of
s_blocks_per_group.
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
lib/ext2fs/initialize.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/lib/ext2fs/initialize.c b/lib/ext2fs/initialize.c
index b0c15d2..5afdc27 100644
--- a/lib/ext2fs/initialize.c
+++ b/lib/ext2fs/initialize.c
@@ -207,6 +207,8 @@ errcode_t ext2fs_initialize(const char *name, int flags,
super->s_log_block_size;
if (bigalloc_flag) {
+ unsigned long long bpg;
+
if (param->s_blocks_per_group &&
param->s_clusters_per_group &&
((param->s_clusters_per_group * EXT2FS_CLUSTER_RATIO(fs)) !=
@@ -220,12 +222,19 @@ errcode_t ext2fs_initialize(const char *name, int flags,
super->s_clusters_per_group =
param->s_blocks_per_group /
EXT2FS_CLUSTER_RATIO(fs);
- else
+ else if (super->s_log_cluster_size + 15 < 32)
super->s_clusters_per_group = fs->blocksize * 8;
+ else
+ super->s_clusters_per_group = (fs->blocksize - 1) * 8;
if (super->s_clusters_per_group > EXT2_MAX_CLUSTERS_PER_GROUP(super))
super->s_clusters_per_group = EXT2_MAX_CLUSTERS_PER_GROUP(super);
- super->s_blocks_per_group = EXT2FS_C2B(fs,
- super->s_clusters_per_group);
+ bpg = EXT2FS_C2B(fs,
+ (unsigned long long) super->s_clusters_per_group);
+ if (bpg >= (((unsigned long long) 1) << 32)) {
+ retval = EXT2_ET_INVALID_ARGUMENT;
+ goto cleanup;
+ }
+ super->s_blocks_per_group = bpg;
} else {
set_field(s_blocks_per_group, fs->blocksize * 8);
if (super->s_blocks_per_group > EXT2_MAX_BLOCKS_PER_GROUP(super))
--
1.7.12.rc0.22.gcdd159b
In addition, do not allow a cluster size of 1024, since that will be
interpreted by ext2fs_initialize() as requesting the default cluster
size.
Signed-off-by: "Theodore Ts'o" <[email protected]>
Reported-by: Zheng Liu <[email protected]>
---
misc/mke2fs.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index 4748f4a..75d0e48 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -1346,7 +1346,7 @@ profile_error:
break;
case 'C':
cluster_size = strtoul(optarg, &tmp, 0);
- if (cluster_size < EXT2_MIN_CLUSTER_SIZE ||
+ if (cluster_size <= EXT2_MIN_CLUSTER_SIZE ||
cluster_size > EXT2_MAX_CLUSTER_SIZE || *tmp) {
com_err(program_name, 0,
_("invalid cluster size - %s"),
@@ -1847,6 +1847,13 @@ profile_error:
blocksize*16);
fs_param.s_log_cluster_size =
int_log2(cluster_size >> EXT2_MIN_CLUSTER_LOG_SIZE);
+ if (fs_param.s_log_cluster_size &&
+ fs_param.s_log_cluster_size < fs_param.s_log_block_size) {
+ com_err(program_name, 0,
+ _("The cluster size must not be "
+ "smaller than the block size.\n"));
+ exit(1);
+ }
} else if (cluster_size) {
com_err(program_name, 0,
_("specifying a cluster size requires the "
--
1.7.12.rc0.22.gcdd159b
Previously the behavior of parse_num_block2 was undefined if
log_block_size was less than zero. It will now return a number in
units of bytes.
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
lib/e2p/parse_num.c | 24 ++++++++++++++++++++----
1 file changed, 20 insertions(+), 4 deletions(-)
diff --git a/lib/e2p/parse_num.c b/lib/e2p/parse_num.c
index d9ad3e7..cb0dc5b 100644
--- a/lib/e2p/parse_num.c
+++ b/lib/e2p/parse_num.c
@@ -35,10 +35,16 @@ unsigned long long parse_num_blocks2(const char *arg, int log_block_size)
num <<= 10;
/* fallthrough */
case 'K': case 'k':
- num >>= log_block_size;
+ if (log_block_size < 0)
+ num <<= 10;
+ else
+ num >>= log_block_size;
break;
case 's':
- num >>= (1+log_block_size);
+ if (log_block_size < 0)
+ num << 1;
+ else
+ num >>= (1+log_block_size);
break;
case '\0':
break;
@@ -62,11 +68,21 @@ main(int argc, char **argv)
unsigned long num;
int log_block_size = 0;
- if (argc != 2) {
- fprintf(stderr, "Usage: %s arg\n", argv[0]);
+ if (argc != 2 && argc != 3) {
+ fprintf(stderr, "Usage: %s arg [log_block_size]\n", argv[0]);
exit(1);
}
+ if (argc == 3) {
+ char *p;
+
+ log_block_size = strtol(argv[2], &p, 0);
+ if (*p) {
+ fprintf(stderr, "Bad log_block_size: %s\n", argv[2]);
+ exit(1);
+ }
+ }
+
num = parse_num_blocks(argv[1], log_block_size);
printf("Parsed number: %lu\n", num);
--
1.7.12.rc0.22.gcdd159b
On Mon, Jan 14, 2013 at 07:37:08PM -0500, Theodore Ts'o wrote:
> + if (fs_param.s_log_cluster_size &&
> + fs_param.s_log_cluster_size < fs_param.s_log_block_size) {
> + com_err(program_name, 0,
> + _("The cluster size must not be "
> + "smaller than the block size.\n"));
This would read better: "the cluster size may not be smaller than the
block size"
- Ted
On Sun, Jan 13, 2013 at 05:08:15PM +0800, Zheng Liu wrote:
> From: Zheng Liu <[email protected]>
>
> Bigalloc feature has been used for a long time, but the documentation in mke2fs
> is still missing. So add it.
>
> Signed-off-by: Zheng Liu <[email protected]>
Applied (with some changes to improve the english/wording).
- Ted
On Mon, Jan 14, 2013 at 07:41:58PM -0500, Theodore Ts'o wrote:
> On Mon, Jan 14, 2013 at 07:37:08PM -0500, Theodore Ts'o wrote:
> > + if (fs_param.s_log_cluster_size &&
> > + fs_param.s_log_cluster_size < fs_param.s_log_block_size) {
> > + com_err(program_name, 0,
> > + _("The cluster size must not be "
> > + "smaller than the block size.\n"));
>
> This would read better: "the cluster size may not be smaller than the
> block size"
Reviewed-by: Zheng Liu <[email protected]>
Regards,
- Zheng
On Mon, Jan 14, 2013 at 07:37:09PM -0500, Theodore Ts'o wrote:
> If bigalloc is enabled, then -g will specify the clusters per block
> group. (If bigalloc is not enabled, then a cluster == a block, so the
> meaning of -g is not changed.)
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
Reviewed-by: Zheng Liu <[email protected]>
Regards,
- Zheng
> ---
> misc/mke2fs.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/misc/mke2fs.c b/misc/mke2fs.c
> index 75d0e48..5cb49b3 100644
> --- a/misc/mke2fs.c
> +++ b/misc/mke2fs.c
> @@ -1962,6 +1962,15 @@ profile_error:
> }
> }
>
> + /*
> + * If the bigalloc feature is enabled, then the -g option will
> + * specify the number of clusters per group.
> + */
> + if (fs_param.s_feature_ro_compat & EXT4_FEATURE_RO_COMPAT_BIGALLOC) {
> + fs_param.s_clusters_per_group = fs_param.s_blocks_per_group;
> + fs_param.s_blocks_per_group = 0;
> + }
> +
> if (inode_size == 0)
> inode_size = get_int_from_profile(fs_types, "inode_size", 0);
> if (!flex_bg_size && (fs_param.s_feature_incompat &
> --
> 1.7.12.rc0.22.gcdd159b
>
On Mon, Jan 14, 2013 at 07:37:10PM -0500, Theodore Ts'o wrote:
> Previously the behavior of parse_num_block2 was undefined if
> log_block_size was less than zero. It will now return a number in
> units of bytes.
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
Reviewed-by: Zheng Liu <[email protected]>
Regards,
- Zheng
> ---
> lib/e2p/parse_num.c | 24 ++++++++++++++++++++----
> 1 file changed, 20 insertions(+), 4 deletions(-)
>
> diff --git a/lib/e2p/parse_num.c b/lib/e2p/parse_num.c
> index d9ad3e7..cb0dc5b 100644
> --- a/lib/e2p/parse_num.c
> +++ b/lib/e2p/parse_num.c
> @@ -35,10 +35,16 @@ unsigned long long parse_num_blocks2(const char *arg, int log_block_size)
> num <<= 10;
> /* fallthrough */
> case 'K': case 'k':
> - num >>= log_block_size;
> + if (log_block_size < 0)
> + num <<= 10;
> + else
> + num >>= log_block_size;
> break;
> case 's':
> - num >>= (1+log_block_size);
> + if (log_block_size < 0)
> + num << 1;
> + else
> + num >>= (1+log_block_size);
> break;
> case '\0':
> break;
> @@ -62,11 +68,21 @@ main(int argc, char **argv)
> unsigned long num;
> int log_block_size = 0;
>
> - if (argc != 2) {
> - fprintf(stderr, "Usage: %s arg\n", argv[0]);
> + if (argc != 2 && argc != 3) {
> + fprintf(stderr, "Usage: %s arg [log_block_size]\n", argv[0]);
> exit(1);
> }
>
> + if (argc == 3) {
> + char *p;
> +
> + log_block_size = strtol(argv[2], &p, 0);
> + if (*p) {
> + fprintf(stderr, "Bad log_block_size: %s\n", argv[2]);
> + exit(1);
> + }
> + }
> +
> num = parse_num_blocks(argv[1], log_block_size);
>
> printf("Parsed number: %lu\n", num);
> --
> 1.7.12.rc0.22.gcdd159b
>
On Mon, Jan 14, 2013 at 07:37:11PM -0500, Theodore Ts'o wrote:
> The -b and -C options now use parse_num_blocks2() insteda of strtol,
> so that users can specify -C 256M instead of the much less convenient
> -C 268435456.
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
Reviewed-by: Zheng Liu <[email protected]>
Regards,
- Zheng
> ---
> misc/mke2fs.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/misc/mke2fs.c b/misc/mke2fs.c
> index 5cb49b3..c16ab33 100644
> --- a/misc/mke2fs.c
> +++ b/misc/mke2fs.c
> @@ -1324,10 +1324,10 @@ profile_error:
> "b:cg:i:jl:m:no:qr:s:t:vC:DE:FG:I:J:KL:M:N:O:R:ST:U:V")) != EOF) {
> switch (c) {
> case 'b':
> - blocksize = strtol(optarg, &tmp, 0);
> + blocksize = parse_num_blocks2(optarg, -1);
> b = (blocksize > 0) ? blocksize : -blocksize;
> if (b < EXT2_MIN_BLOCK_SIZE ||
> - b > EXT2_MAX_BLOCK_SIZE || *tmp) {
> + b > EXT2_MAX_BLOCK_SIZE) {
> com_err(program_name, 0,
> _("invalid block size - %s"), optarg);
> exit(1);
> @@ -1345,9 +1345,9 @@ profile_error:
> cflag++;
> break;
> case 'C':
> - cluster_size = strtoul(optarg, &tmp, 0);
> + cluster_size = parse_num_blocks2(optarg, -1);
> if (cluster_size <= EXT2_MIN_CLUSTER_SIZE ||
> - cluster_size > EXT2_MAX_CLUSTER_SIZE || *tmp) {
> + cluster_size > EXT2_MAX_CLUSTER_SIZE) {
> com_err(program_name, 0,
> _("invalid cluster size - %s"),
> optarg);
> --
> 1.7.12.rc0.22.gcdd159b
>
On 1/14/13 6:37 PM, Theodore Ts'o wrote:
> If bigalloc is enabled, then -g will specify the clusters per block
> group. (If bigalloc is not enabled, then a cluster == a block, so the
> meaning of -g is not changed.)
This should be clearly documented in the man page as well, I think.
-Eric
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> misc/mke2fs.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/misc/mke2fs.c b/misc/mke2fs.c
> index 75d0e48..5cb49b3 100644
> --- a/misc/mke2fs.c
> +++ b/misc/mke2fs.c
> @@ -1962,6 +1962,15 @@ profile_error:
> }
> }
>
> + /*
> + * If the bigalloc feature is enabled, then the -g option will
> + * specify the number of clusters per group.
> + */
> + if (fs_param.s_feature_ro_compat & EXT4_FEATURE_RO_COMPAT_BIGALLOC) {
> + fs_param.s_clusters_per_group = fs_param.s_blocks_per_group;
> + fs_param.s_blocks_per_group = 0;
> + }
> +
> if (inode_size == 0)
> inode_size = get_int_from_profile(fs_types, "inode_size", 0);
> if (!flex_bg_size && (fs_param.s_feature_incompat &
>
On 1/14/13 6:37 PM, Theodore Ts'o wrote:
> The -b and -C options now use parse_num_blocks2() insteda of strtol,
> so that users can specify -C 256M instead of the much less convenient
> -C 268435456.
Hm and -C isn't in the manpage at all.
Seems like the mke2fs manpage needs an update for bigalloc?
-Eric
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> misc/mke2fs.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/misc/mke2fs.c b/misc/mke2fs.c
> index 5cb49b3..c16ab33 100644
> --- a/misc/mke2fs.c
> +++ b/misc/mke2fs.c
> @@ -1324,10 +1324,10 @@ profile_error:
> "b:cg:i:jl:m:no:qr:s:t:vC:DE:FG:I:J:KL:M:N:O:R:ST:U:V")) != EOF) {
> switch (c) {
> case 'b':
> - blocksize = strtol(optarg, &tmp, 0);
> + blocksize = parse_num_blocks2(optarg, -1);
> b = (blocksize > 0) ? blocksize : -blocksize;
> if (b < EXT2_MIN_BLOCK_SIZE ||
> - b > EXT2_MAX_BLOCK_SIZE || *tmp) {
> + b > EXT2_MAX_BLOCK_SIZE) {
> com_err(program_name, 0,
> _("invalid block size - %s"), optarg);
> exit(1);
> @@ -1345,9 +1345,9 @@ profile_error:
> cflag++;
> break;
> case 'C':
> - cluster_size = strtoul(optarg, &tmp, 0);
> + cluster_size = parse_num_blocks2(optarg, -1);
> if (cluster_size <= EXT2_MIN_CLUSTER_SIZE ||
> - cluster_size > EXT2_MAX_CLUSTER_SIZE || *tmp) {
> + cluster_size > EXT2_MAX_CLUSTER_SIZE) {
> com_err(program_name, 0,
> _("invalid cluster size - %s"),
> optarg);
>
On 1/15/13 9:11 AM, Eric Sandeen wrote:
> On 1/14/13 6:37 PM, Theodore Ts'o wrote:
>> The -b and -C options now use parse_num_blocks2() insteda of strtol,
>> so that users can specify -C 256M instead of the much less convenient
>> -C 268435456.
>
> Hm and -C isn't in the manpage at all.
>
> Seems like the mke2fs manpage needs an update for bigalloc?
I'm sorry, I see that patch has already been sent. I'll catch
up with my email, now. :(
-Eric
> -Eric
>
>> Signed-off-by: "Theodore Ts'o" <[email protected]>
>> ---
>> misc/mke2fs.c | 8 ++++----
>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/misc/mke2fs.c b/misc/mke2fs.c
>> index 5cb49b3..c16ab33 100644
>> --- a/misc/mke2fs.c
>> +++ b/misc/mke2fs.c
>> @@ -1324,10 +1324,10 @@ profile_error:
>> "b:cg:i:jl:m:no:qr:s:t:vC:DE:FG:I:J:KL:M:N:O:R:ST:U:V")) != EOF) {
>> switch (c) {
>> case 'b':
>> - blocksize = strtol(optarg, &tmp, 0);
>> + blocksize = parse_num_blocks2(optarg, -1);
>> b = (blocksize > 0) ? blocksize : -blocksize;
>> if (b < EXT2_MIN_BLOCK_SIZE ||
>> - b > EXT2_MAX_BLOCK_SIZE || *tmp) {
>> + b > EXT2_MAX_BLOCK_SIZE) {
>> com_err(program_name, 0,
>> _("invalid block size - %s"), optarg);
>> exit(1);
>> @@ -1345,9 +1345,9 @@ profile_error:
>> cflag++;
>> break;
>> case 'C':
>> - cluster_size = strtoul(optarg, &tmp, 0);
>> + cluster_size = parse_num_blocks2(optarg, -1);
>> if (cluster_size <= EXT2_MIN_CLUSTER_SIZE ||
>> - cluster_size > EXT2_MAX_CLUSTER_SIZE || *tmp) {
>> + cluster_size > EXT2_MAX_CLUSTER_SIZE) {
>> com_err(program_name, 0,
>> _("invalid cluster size - %s"),
>> optarg);
>>
>
On Mon, Jan 14, 2013 at 07:37:12PM -0500, Theodore Ts'o wrote:
> If the user attemps to create a 512MB cluster, we need to adjust the
> defaults to avoid a 32-bit overflow of s_blocks_per_group. Also check
> to make sure that the caller of ext2fs_initialize() has not given a
> value of s_clusters_per_group that would result in an overflow of
> s_blocks_per_group.
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
The patch itself looks good to me. Feel free to add:
Reviewed-by: Zheng Liu <[email protected]>
FWIW, I wonder why we need to add such complex logical to handle a
corner case. I guess no one wants to use a 512MB cluster. So changing
max cluster size from 512MB to 256MB is very simple and straightfoward.
Regards,
- Zheng
> ---
> lib/ext2fs/initialize.c | 15 ++++++++++++---
> 1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/lib/ext2fs/initialize.c b/lib/ext2fs/initialize.c
> index b0c15d2..5afdc27 100644
> --- a/lib/ext2fs/initialize.c
> +++ b/lib/ext2fs/initialize.c
> @@ -207,6 +207,8 @@ errcode_t ext2fs_initialize(const char *name, int flags,
> super->s_log_block_size;
>
> if (bigalloc_flag) {
> + unsigned long long bpg;
> +
> if (param->s_blocks_per_group &&
> param->s_clusters_per_group &&
> ((param->s_clusters_per_group * EXT2FS_CLUSTER_RATIO(fs)) !=
> @@ -220,12 +222,19 @@ errcode_t ext2fs_initialize(const char *name, int flags,
> super->s_clusters_per_group =
> param->s_blocks_per_group /
> EXT2FS_CLUSTER_RATIO(fs);
> - else
> + else if (super->s_log_cluster_size + 15 < 32)
> super->s_clusters_per_group = fs->blocksize * 8;
> + else
> + super->s_clusters_per_group = (fs->blocksize - 1) * 8;
> if (super->s_clusters_per_group > EXT2_MAX_CLUSTERS_PER_GROUP(super))
> super->s_clusters_per_group = EXT2_MAX_CLUSTERS_PER_GROUP(super);
> - super->s_blocks_per_group = EXT2FS_C2B(fs,
> - super->s_clusters_per_group);
> + bpg = EXT2FS_C2B(fs,
> + (unsigned long long) super->s_clusters_per_group);
> + if (bpg >= (((unsigned long long) 1) << 32)) {
> + retval = EXT2_ET_INVALID_ARGUMENT;
> + goto cleanup;
> + }
> + super->s_blocks_per_group = bpg;
> } else {
> set_field(s_blocks_per_group, fs->blocksize * 8);
> if (super->s_blocks_per_group > EXT2_MAX_BLOCKS_PER_GROUP(super))
> --
> 1.7.12.rc0.22.gcdd159b
>
On Tue, Jan 15, 2013 at 11:33:31PM +0800, Zheng Liu wrote:
> On Mon, Jan 14, 2013 at 07:37:12PM -0500, Theodore Ts'o wrote:
> > If the user attemps to create a 512MB cluster, we need to adjust the
> > defaults to avoid a 32-bit overflow of s_blocks_per_group. Also check
> > to make sure that the caller of ext2fs_initialize() has not given a
> > value of s_clusters_per_group that would result in an overflow of
> > s_blocks_per_group.
> >
> > Signed-off-by: "Theodore Ts'o" <[email protected]>
>
> The patch itself looks good to me. Feel free to add:
> Reviewed-by: Zheng Liu <[email protected]>
Sorry, forgot to say that '-C' option in manpage will need to be modified
from 256MB to 512MB if this patch is applied.
Thanks,
- Zheng
On Tue, Jan 15, 2013 at 09:10:41AM -0600, Eric Sandeen wrote:
> On 1/14/13 6:37 PM, Theodore Ts'o wrote:
> > If bigalloc is enabled, then -g will specify the clusters per block
> > group. (If bigalloc is not enabled, then a cluster == a block, so the
> > meaning of -g is not changed.)
>
> This should be clearly documented in the man page as well, I think.
Yes, I'm going to include this change as a part of the Zheng's man
page clean up.
- Ted
On Tue, Jan 15, 2013 at 11:33:31PM +0800, Zheng Liu wrote:
>
> FWIW, I wonder why we need to add such complex logical to handle a
> corner case. I guess no one wants to use a 512MB cluster. So changing
> max cluster size from 512MB to 256MB is very simple and straightfoward.
I agree that it seems very unlikely that there would be much interest
in using a 512MB cluster. However, we would still need to do the
check in the case of a 16k block size (on an PowerPC or Itanium
system) at a 256MB or 128MB cluster. So part of this check would be
needed anyway.
- Ted
On Mon, Jan 14, 2013 at 10:10:06PM -0500, Theodore Ts'o wrote:
> On Sun, Jan 13, 2013 at 05:08:15PM +0800, Zheng Liu wrote:
> > From: Zheng Liu <[email protected]>
> >
> > Bigalloc feature has been used for a long time, but the documentation in mke2fs
> > is still missing. So add it.
> >
> > Signed-off-by: Zheng Liu <[email protected]>
>
> Applied (with some changes to improve the english/wording).
BTW, I used the following modified text:
bigalloc
This feature enables clustered allocation, so that
the unit of allocation is a power of two number of
blocks. That is, each bit in the what had tradi‐
tionally been known as the block allocation bitmap
now indicates whether a cluster is in use or not,
where a cluster is by default composed of 16 blocks.
This feature can decrease the time spent on doing
block allocation and brings smaller fragmentation,
especially for large files. The size can be speci‐
fied using the -C option.
Warning: The bigalloc feature is still under devel‐
opment, and may not be fully supported with your
kernel or may have various bugs. Please see the web
page http://ext4.wiki.kernel.org/index.php/Bigalloc
for details.
And I populated the page on the ext4 wiki so that when we finally fix
the delalloc/bigalloc problem, we can update the ext4 wiki to reflect
this.
- Ted
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 1/15/2013 2:12 PM, Theodore Ts'o wrote:
> BTW, I used the following modified text:
>
> bigalloc This feature enables clustered allocation, so that the
> unit of allocation is a power of two number of blocks. That
> is, each bit in the what had tradi‐ tionally been known as the
> block allocation bitmap now indicates whether a cluster is in
> use or not, where a cluster is by default composed of 16 blocks.
> This feature can decrease the time spent on doing block
> allocation and brings smaller fragmentation, especially for
> large files. The size can be speci‐ fied using the -C option.
>
> Warning: The bigalloc feature is still under devel‐ opment, and
> may not be fully supported with your kernel or may have various
> bugs. Please see the web page
> http://ext4.wiki.kernel.org/index.php/Bigalloc for details.
Does this mean that a cluster is the minimum allocation unit, or can
two small files allocate different blocks in the same cluster, leaving
the cluster partially used? If the former, then how is this different
than just using a larger block size?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQEcBAEBAgAGBQJQ9bIJAAoJEJrBOlT6nu75WskIAM6eNjA1updKy6Kh2SrMWavB
bX7EeTGmXMrxbQtMDgmG1+V2kOy9RoYtCZ5+pXijqJHzrovEtyIwHVdzntKSTtZi
tYSqjZrOpJ/bTJpXuP5AIew9mXRTKzGF8lNyPZkLIgX0AyhTsbC4cccpcmfsnGEX
RfuwDd2Z2NEKhmsXH4SI3HXDM2f4EGZmPqPG8It/B49HXrzfDq+YqzKwVqdrDJ5V
jdTLV5xjJ4E9Y+/P3EC1l2KvfDf0KjJjA2CiuG4sqrthwwQGfdEFK+MF2bfz5nMi
VBsuZQRF5kFgekpsHXy7b0Do9Qa3wMm9FL8Sv2QMy7xf92FxCwrLJFlpIZ9iuSQ=
=BKGM
-----END PGP SIGNATURE-----
On Tue, Jan 15, 2013 at 02:46:17PM -0500, Phillip Susi wrote:
>
> Does this mean that a cluster is the minimum allocation unit, or can
> two small files allocate different blocks in the same cluster, leaving
> the cluster partially used? If the former, then how is this different
> than just using a larger block size?
The former. The difference is that we use units of blocks in the
indirect blocks and extents --- and the reason for this is because
there's a pretty fundamental limitation baked into the MM layer that
the file system block size is less than or equal to the page size. So
on architectures where we have 16k page sizes, we can use a 16k block
size --- but then you won't be able to mount that file system on an
x86 system.
So bigalloc is basically a hack because it was easier to make this
change in the file system than it is to deal with block sizes greater
than the page size.
- Ted
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 1/15/2013 2:57 PM, Theodore Ts'o wrote:
> So bigalloc is basically a hack because it was easier to make this
> change in the file system than it is to deal with block sizes
> greater than the page size.
If it is only to get around the mm pagesize limit, then why not just
have the fs automatically lie to the kernel about the block size and
shift the references back and forth on the fly when it detects a
larger blocksize?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQEcBAEBAgAGBQJQ9b5XAAoJEJrBOlT6nu75ankH/39qiU2tahSVRekQ6kkyFeaY
RnGMydXcKgSacKAlvmIgP6VOnqPmBCKtdM8oxlob4+orJUksWiLmT7nvMIukUOqs
QGqPzsHtpIzNZnPB6soc6ToRbx+b53EM4fQ+XIt9egnJ4p6gDRiS83xKKjyZnywq
94ZPH5Zg84Xr+zmyUFRqs/cDG2tmbo/6qgkqkVUeFfdLzygq2K4LO/dFpuRg3oqV
ceUqVCieCfEplcjnyClT1uOv3RBrjCcyFW1j46UjEYiHkFENYsCSi/Hk4qOYDls+
i/bETjWABSXz23BMD2/B0wZwhGwkrsX2Y5g1CjtRksgPFthW/rmk0HzvWniC9J4=
=kJ8x
-----END PGP SIGNATURE-----
On Tue, Jan 15, 2013 at 03:38:47PM -0500, Phillip Susi wrote:
>
> If it is only to get around the mm pagesize limit, then why not just
> have the fs automatically lie to the kernel about the block size and
> shift the references back and forth on the fly when it detects a
> larger blocksize?
Because of the pain in dealing with how to handle random writes into a
sparse file. We need to either track which blocks in the large block
have been initialized, or we would need to erase the entire large
block before writing the first page into the large block (and then you
still need to track whether or not you are writing that first or
subsequent page into a large block).
What we're doing with bigalloc is effectively tracking which blocks in
the cluster have been initialized by using entries in the extent tree,
since entries to the allocation bitmaps is in units of clusters, but
entries in the extent tree is in units of blocks.
Looking back at how complicated it has been to get delalloc right, it
may have been the case that just using a brute-force sb_issue_zeroout
when the block is freshly allocated, unless the arguments to the
request to ext4_writepages() exactly covered the large block might
have been simpler. Getting the Direct I/O path right would have been
messy, but perhaps it would have been less work in the end.
- Ted
On Tue, Jan 15, 2013 at 02:10:30PM -0500, Theodore Ts'o wrote:
> On Tue, Jan 15, 2013 at 11:33:31PM +0800, Zheng Liu wrote:
> >
> > FWIW, I wonder why we need to add such complex logical to handle a
> > corner case. I guess no one wants to use a 512MB cluster. So changing
> > max cluster size from 512MB to 256MB is very simple and straightfoward.
>
> I agree that it seems very unlikely that there would be much interest
> in using a 512MB cluster. However, we would still need to do the
> check in the case of a 16k block size (on an PowerPC or Itanium
> system) at a 256MB or 128MB cluster. So part of this check would be
> needed anyway.
Yeah, I see. Thanks for the explanation.
Regards,
- Zheng