2012-02-24 08:34:59

by Lukas Czerner

[permalink] [raw]
Subject: [RESEND] [PATCH 1/2 v2] e2fsck: Discard only unused parts of inode table

When calling e2fsck with '-E discard' option it might happen that
valid inodes are discarded accidentally. This is because we just
discard the part of inode table which lies past the highest used
inode. This is terribly wrong (sorry!).

This patch fixes it so only the free parts of an inode table
is discarded, leaving used inodes intact. This was tested with highly
fragmented inode tables with block size 4k and 1k.

Signed-off-by: Lukas Czerner <[email protected]>
Reported-by: Phillip Susi <[email protected]>
---
v2: reworked so that we comply with inode number counting and adjust
start in the e2fsck_discard_inodes(). Add some comments

e2fsck/pass5.c | 82 ++++++++++++++++++++++++++++++++++++++++---------------
1 files changed, 59 insertions(+), 23 deletions(-)

diff --git a/e2fsck/pass5.c b/e2fsck/pass5.c
index 1e836e3..57a207d 100644
--- a/e2fsck/pass5.c
+++ b/e2fsck/pass5.c
@@ -94,6 +94,43 @@ static void e2fsck_discard_blocks(e2fsck_t ctx, io_manager manager,
ctx->options &= ~E2F_OPT_DISCARD;
}

+/*
+ * This will try to discard number 'count' starting at inode
+ * number 'start'. Note that 'start' is a inode number and hence
+ * it is counted from 1, it means that we need to adjust it
+ * by -1 in this function to compute right offset in the
+ * inode table.
+ */
+static void e2fsck_discard_inodes(e2fsck_t ctx, int group,
+ int start, int count)
+{
+ ext2_filsys fs = ctx->fs;
+ blk64_t blk, num;
+ int orig = count;
+
+ if ((ctx->options & E2F_OPT_NO) || !(ctx->options & E2F_OPT_DISCARD))
+ return;
+
+ /*
+ * Start is inode number which starts counting from 1,
+ * so we need to adjust it.
+ */
+ start -= 1;
+
+ /*
+ * We can discard only blocks containing only unused
+ * inodes in the table.
+ */
+ blk = DIV_ROUND_UP(start,
+ EXT2_INODES_PER_BLOCK(fs->super));
+ count -= (blk * EXT2_INODES_PER_BLOCK(fs->super) - start);
+ blk += ext2fs_inode_table_loc(fs, group);
+ num = count / EXT2_INODES_PER_BLOCK(fs->super);
+
+ if (num > 0)
+ e2fsck_discard_blocks(ctx, fs->io->manager, blk, num);
+}
+
#define NO_BLK ((blk64_t) -1)

static void print_bitmap_problem(e2fsck_t ctx, int problem,
@@ -435,6 +472,7 @@ static void check_inode_bitmaps(e2fsck_t ctx)
int skip_group = 0;
int redo_flag = 0;
io_manager manager = ctx->fs->io->manager;
+ unsigned long long first_free = fs->super->s_inodes_per_group + 1;

clear_problem_context(&pctx);
free_array = (int *) e2fsck_allocate_memory(ctx,
@@ -497,6 +535,7 @@ redo_counts:
* are 0, count the free inode,
* skip the current block group.
*/
+ first_free = 1;
inodes = fs->super->s_inodes_per_group - 1;
group_free = inodes;
free_inodes += inodes;
@@ -561,50 +600,47 @@ redo_counts:
ctx->options &= ~E2F_OPT_DISCARD;

do_counts:
+ inodes++;
if (bitmap) {
if (ext2fs_test_inode_bitmap2(ctx->inode_dir_map, i))
dirs_count++;
+ if (inodes > first_free) {
+ e2fsck_discard_inodes(ctx, group, first_free,
+ inodes - first_free);
+ first_free = fs->super->s_inodes_per_group + 1;
+ }
} else if (!skip_group || csum_flag) {
group_free++;
free_inodes++;
+ if (first_free > inodes)
+ first_free = inodes;
}

- inodes++;
if ((inodes == fs->super->s_inodes_per_group) ||
(i == fs->super->s_inodes_count)) {
-
- free_array[group] = group_free;
- dir_array[group] = dirs_count;
-
- /* Discard inode table */
- if (ctx->options & E2F_OPT_DISCARD) {
- blk64_t used_blks, blk, num;
-
- used_blks = DIV_ROUND_UP(
- (EXT2_INODES_PER_GROUP(fs->super) -
- group_free),
- EXT2_INODES_PER_BLOCK(fs->super));
-
- blk = ext2fs_inode_table_loc(fs, group) +
- used_blks;
- num = fs->inode_blocks_per_group -
- used_blks;
- e2fsck_discard_blocks(ctx, manager, blk, num);
- }
-
+ /*
+ * If the last inode is free, we can discard it as well.
+ */
+ if (inodes >= first_free)
+ e2fsck_discard_inodes(ctx, group, first_free,
+ inodes - first_free + 1);
/*
* If discard zeroes data and the group inode table
* was not zeroed yet, set itable as zeroed
*/
if ((ctx->options & E2F_OPT_DISCARD) &&
- (io_channel_discard_zeroes_data(fs->io)) &&
+ !(ctx->options & E2F_OPT_NO) &&
+ io_channel_discard_zeroes_data(fs->io) &&
!(ext2fs_bg_flags_test(fs, group,
- EXT2_BG_INODE_ZEROED))) {
+ EXT2_BG_INODE_ZEROED))) {
ext2fs_bg_flags_set(fs, group,
EXT2_BG_INODE_ZEROED);
ext2fs_group_desc_csum_set(fs, group);
}

+ first_free = fs->super->s_inodes_per_group + 1;
+ free_array[group] = group_free;
+ dir_array[group] = dirs_count;
group ++;
inodes = 0;
skip_group = 0;
--
1.7.4.4



2012-02-24 08:35:00

by Lukas Czerner

[permalink] [raw]
Subject: [RESEND] [PATCH 2/2 v2] e2fsck: Do not forget to discard last block group

Previously when running e2fsck with '-E discard' argument the end of
the last group has not been discarded. This patch fixes it so we
always discard the end of the last group if needed.

This commit also removes unneeded argument from the
e2fsck_discard_blocks(). Simultaneously the commit causes the block
groups with BLOCK_UNINIT flag not to be discarded, which makes
since because we do not need to reclaim the space since so far
there has not been written anything.

Signed-off-by: Lukas Czerner <[email protected]>
---
v2: Discard the last free block of the last group as well.

e2fsck/pass5.c | 24 ++++++++++++++++--------
1 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/e2fsck/pass5.c b/e2fsck/pass5.c
index 57a207d..3b33aa6 100644
--- a/e2fsck/pass5.c
+++ b/e2fsck/pass5.c
@@ -74,8 +74,8 @@ void e2fsck_pass5(e2fsck_t ctx)
print_resource_track(ctx, _("Pass 5"), &rtrack, ctx->fs->io);
}

-static void e2fsck_discard_blocks(e2fsck_t ctx, io_manager manager,
- blk64_t start, blk64_t count)
+static void e2fsck_discard_blocks(e2fsck_t ctx, blk64_t start,
+ blk64_t count)
{
ext2_filsys fs = ctx->fs;

@@ -85,7 +85,7 @@ static void e2fsck_discard_blocks(e2fsck_t ctx, io_manager manager,
* not enough to fix the problem, hence it is not safe to run discard
* in this case.
*/
- if (ext2fs_test_changed(ctx->fs))
+ if (ext2fs_test_changed(fs))
ctx->options &= ~E2F_OPT_DISCARD;

if (!(ctx->options & E2F_OPT_NO) &&
@@ -128,7 +128,7 @@ static void e2fsck_discard_inodes(e2fsck_t ctx, int group,
num = count / EXT2_INODES_PER_BLOCK(fs->super);

if (num > 0)
- e2fsck_discard_blocks(ctx, fs->io->manager, blk, num);
+ e2fsck_discard_blocks(ctx, blk, num);
}

#define NO_BLK ((blk64_t) -1)
@@ -368,16 +368,24 @@ redo_counts:
free_blocks++;
if (first_free > i)
first_free = i;
- } else {
- if (i > first_free)
- e2fsck_discard_blocks(ctx, manager, first_free,
- (i - first_free));
+ } else if (i > first_free) {
+ e2fsck_discard_blocks(ctx, first_free,
+ (i - first_free));
first_free = ext2fs_blocks_count(fs->super);
}
blocks ++;
if ((blocks == fs->super->s_clusters_per_group) ||
(EXT2FS_B2C(fs, i) ==
EXT2FS_B2C(fs, ext2fs_blocks_count(fs->super)-1))) {
+ /*
+ * If the last block of this group is free, then we can
+ * discard it as well.
+ */
+ if (i >= first_free)
+ e2fsck_discard_blocks(ctx, first_free,
+ (i - first_free) + 1);
+ first_free = ext2fs_blocks_count(fs->super);
+
free_array[group] = group_free;
group ++;
blocks = 0;
--
1.7.4.4


2012-02-24 14:46:22

by Phillip Susi

[permalink] [raw]
Subject: Re: [RESEND] [PATCH 2/2 v2] e2fsck: Do not forget to discard last block group

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2/24/2012 3:34 AM, Lukas Czerner wrote:
> Previously when running e2fsck with '-E discard' argument the end
> of the last group has not been discarded. This patch fixes it so
> we always discard the end of the last group if needed.
>
> This commit also removes unneeded argument from the
> e2fsck_discard_blocks(). Simultaneously the commit causes the
> block groups with BLOCK_UNINIT flag not to be discarded, which
> makes since because we do not need to reclaim the space since so
> far there has not been written anything.

I don't see where this patch changes anything to do with uninitialized
block bitmaps. It also is still a good idea to discard them since
they may have been initialized at some point in the past and become
uninitialized again.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJPR6K8AAoJEJrBOlT6nu75Ad8H/A4sfRF7GGqNMpfKOHMOxwE3
gjHAQNP3anFJ8aa5vkHwDEsDPBMSYlOtNWYBSY1DmaBPm0AHyrFUyWN1Q9FtcOp3
bDdsx7SXHa0NIzcguTsjw33Ln+3jiqI3vPk7gJkZyOkRw3JNcmM1MMlu/83Vaeaa
p2uQc8OzNfsuFoN62kkBj502p4/pFhfN1TCFf2EEFgw7HlS0Z4xqx0Mq3JTDJH0u
2GrYMz8ulwuYfRa2WUy5A11Oi/iLFVm0J+mI+gR94Li1zRqnwBaF99TGvEhzB0SF
MS6kykvgPU2D+n2YL9fFChQDLvyyW+zSXxYjgqrrU0DjCjVkycTQ5GsvnKoEu6c=
=FeTy
-----END PGP SIGNATURE-----

2012-02-24 14:57:26

by Lukas Czerner

[permalink] [raw]
Subject: Re: [RESEND] [PATCH 2/2 v2] e2fsck: Do not forget to discard last block group

On Fri, 24 Feb 2012, Phillip Susi wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 2/24/2012 3:34 AM, Lukas Czerner wrote:
> > Previously when running e2fsck with '-E discard' argument the end
> > of the last group has not been discarded. This patch fixes it so
> > we always discard the end of the last group if needed.
> >
> > This commit also removes unneeded argument from the
> > e2fsck_discard_blocks(). Simultaneously the commit causes the
> > block groups with BLOCK_UNINIT flag not to be discarded, which
> > makes since because we do not need to reclaim the space since so
> > far there has not been written anything.
>
> I don't see where this patch changes anything to do with uninitialized
> block bitmaps. It also is still a good idea to discard them since
> they may have been initialized at some point in the past and become
> uninitialized again.

I do not see how the group would become UNINIT again, so there is not
reason to discard it. Previously we could count free block through out
several groups (accounting or BLOCK_UNINIT groups as well. We do not do
that now, because we discard every group separately.

Thanks!
-Lukas

>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.17 (MingW32)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iQEcBAEBAgAGBQJPR6K8AAoJEJrBOlT6nu75Ad8H/A4sfRF7GGqNMpfKOHMOxwE3
> gjHAQNP3anFJ8aa5vkHwDEsDPBMSYlOtNWYBSY1DmaBPm0AHyrFUyWN1Q9FtcOp3
> bDdsx7SXHa0NIzcguTsjw33Ln+3jiqI3vPk7gJkZyOkRw3JNcmM1MMlu/83Vaeaa
> p2uQc8OzNfsuFoN62kkBj502p4/pFhfN1TCFf2EEFgw7HlS0Z4xqx0Mq3JTDJH0u
> 2GrYMz8ulwuYfRa2WUy5A11Oi/iLFVm0J+mI+gR94Li1zRqnwBaF99TGvEhzB0SF
> MS6kykvgPU2D+n2YL9fFChQDLvyyW+zSXxYjgqrrU0DjCjVkycTQ5GsvnKoEu6c=
> =FeTy
> -----END PGP SIGNATURE-----
>

--

2012-02-24 15:56:30

by Phillip Susi

[permalink] [raw]
Subject: Re: [RESEND] [PATCH 2/2 v2] e2fsck: Do not forget to discard last block group

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2/24/2012 9:57 AM, Lukas Czerner wrote:
>> I do not see how the group would become UNINIT again, so there is
>> not reason to discard it. Previously we could count free block
>> through out several groups (accounting or BLOCK_UNINIT groups as
>> well. We do not do that now, because we discard every group
>> separately.

Simple; delete some files. Once all blocks in the block group are
free again, it can go back to UNINIT. Also the fs may have been
formatted without discard, or moved with dd, or any number of ways
that the bock allocation bitmap could still use a discard despite
being uninitialized.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJPR7MrAAoJEJrBOlT6nu751oMIAN4YsKxC5XjHOTyif0U4Pffj
x0TCcXW+0BS6bOfwvGq/3DLO9oCJXOh6mCIPxNNJZJ0NQt0t6yLmpnuZIw7AXzJT
+WCxduNxnu9e/bIy1vzKlwcUkzd+lb8XIljEcO9RBDj+3LUFT4YYlCE8yiITze8V
EZiE81U1IJtguhWzqFQmOsJrH0KP5kb7BE36yn5I+yfs5YYc3CyMrqi4SVpdv4gX
/+YoroJCEa89nsk6pluk+uAGPd/gxeSC6w32YjJqaM5bVLpiD6yhKsqIXNQK8HzB
uXNxZvvrHLBBQtp31+r0yy90K3eQmc2ycwn54DKkK+vOioH7Q2AB+s8YTxr/Hq0=
=DOVr
-----END PGP SIGNATURE-----

2012-02-24 16:13:28

by Lukas Czerner

[permalink] [raw]
Subject: Re: [RESEND] [PATCH 2/2 v2] e2fsck: Do not forget to discard last block group

On Fri, 24 Feb 2012, Phillip Susi wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 2/24/2012 9:57 AM, Lukas Czerner wrote:
> >> I do not see how the group would become UNINIT again, so there is
> >> not reason to discard it. Previously we could count free block
> >> through out several groups (accounting or BLOCK_UNINIT groups as
> >> well. We do not do that now, because we discard every group
> >> separately.
>
> Simple; delete some files. Once all blocks in the block group are
> free again, it can go back to UNINIT. Also the fs may have been
> formatted without discard, or moved with dd, or any number of ways
> that the bock allocation bitmap could still use a discard despite
> being uninitialized.

It does not work that way, uinit is never set back. If it has been
formated without discard it is user choice, moving the image to the
thinly provisioned device, or ssd with dd is really bad idea anyway.
That said, UNINIT means that it has not been used and hence there
should be nothing to reclaim.

-Lukas

>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.17 (MingW32)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iQEcBAEBAgAGBQJPR7MrAAoJEJrBOlT6nu751oMIAN4YsKxC5XjHOTyif0U4Pffj
> x0TCcXW+0BS6bOfwvGq/3DLO9oCJXOh6mCIPxNNJZJ0NQt0t6yLmpnuZIw7AXzJT
> +WCxduNxnu9e/bIy1vzKlwcUkzd+lb8XIljEcO9RBDj+3LUFT4YYlCE8yiITze8V
> EZiE81U1IJtguhWzqFQmOsJrH0KP5kb7BE36yn5I+yfs5YYc3CyMrqi4SVpdv4gX
> /+YoroJCEa89nsk6pluk+uAGPd/gxeSC6w32YjJqaM5bVLpiD6yhKsqIXNQK8HzB
> uXNxZvvrHLBBQtp31+r0yy90K3eQmc2ycwn54DKkK+vOioH7Q2AB+s8YTxr/Hq0=
> =DOVr
> -----END PGP SIGNATURE-----
>

2012-02-24 16:48:33

by Phillip Susi

[permalink] [raw]
Subject: Re: [RESEND] [PATCH 2/2 v2] e2fsck: Do not forget to discard last block group

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2/24/2012 11:13 AM, Lukas Czerner wrote:
> It does not work that way, uinit is never set back. If it has been
> formated without discard it is user choice, moving the image to
> the thinly provisioned device, or ssd with dd is really bad idea
> anyway. That said, UNINIT means that it has not been used and hence
> there should be nothing to reclaim.

I could have sworn that e2fsck set it back when the block group became
free again, since there is once again no need to parse the bitmap and
you can just assume it's empty without having to read it. I certainly
have e2defrag doing this. If fsck and the kernel currently don't do
this, they should.

Whether it is a bad idea or not, people do move filesystems around and
have existing systems formatted before mke2fs would issue discards, so
it is a good idea to discard unused areas regardless of whether or not
they are uninitialized.

My understanding of uninitialized is that it was added as an
optimization meaning "there's nothing here, so you can skip/ignore
this" rather than "this has _never_ been used, so you can rely on it
containing all zeros and being discarded".

Indeed, a quick test filling a block device with random data and
running mke2fs on it leaves the random data in the uninitialized block
bitmaps rather than writing all zeros.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJPR79fAAoJEJrBOlT6nu75K8QH/RJoNQPm+rGtmv1cmWPusuNb
pb/6hRmOhsIUClaMn2diinGgH7HQbZ9FqsSx0mZmWq52T/21korGk3fyVe/nfL9m
h4xFYJLNEdsSCJE7mcpUu5BMxCwlYEcybHu7xobVtqHlF671zjszj/xCGBgQIEwD
3tRu8JXc/grnrya0CxDXd5kenM6oQviEmkproYUjG21XW+2DKjgHD1w6lbcHZHw5
5fvWVwFOMy9OgagcBzAxo43E7oZoPCD6o54HT8As7FoBfUSt9Z4GLMe3ULH4SbpP
KKuRiumOnBW9fz7I3jRDkVpJ+9MxWqpUL4SA79sDreYfOBAa6m7cOaR+PHr8sTM=
=8u4o
-----END PGP SIGNATURE-----

2012-02-27 07:39:45

by Lukas Czerner

[permalink] [raw]
Subject: Re: [RESEND] [PATCH 2/2 v2] e2fsck: Do not forget to discard last block group

On Fri, 24 Feb 2012, Phillip Susi wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 2/24/2012 11:13 AM, Lukas Czerner wrote:
> > It does not work that way, uinit is never set back. If it has been
> > formated without discard it is user choice, moving the image to
> > the thinly provisioned device, or ssd with dd is really bad idea
> > anyway. That said, UNINIT means that it has not been used and hence
> > there should be nothing to reclaim.
>
> I could have sworn that e2fsck set it back when the block group became
> free again, since there is once again no need to parse the bitmap and
> you can just assume it's empty without having to read it. I certainly
> have e2defrag doing this. If fsck and the kernel currently don't do
> this, they should.
>
> Whether it is a bad idea or not, people do move filesystems around and
> have existing systems formatted before mke2fs would issue discards, so
> it is a good idea to discard unused areas regardless of whether or not
> they are uninitialized.

I think that no one would ever install their file system on thinly
provisioned storage just with dd. That's just stupid.

>
> My understanding of uninitialized is that it was added as an
> optimization meaning "there's nothing here, so you can skip/ignore
> this" rather than "this has _never_ been used, so you can rely on it
> containing all zeros and being discarded".

I have never said anything about the block group containing zeros. You
do not even have this guarantee when using discard command.

>
> Indeed, a quick test filling a block device with random data and
> running mke2fs on it leaves the random data in the uninitialized block
> bitmaps rather than writing all zeros.

I am not really sure how this is relevant to the e2fsck doing discard. I
just said that *if* the block group is market as UNINIT, then it was not
used by the file system from the time of mke2fs, hence there should be
nothing to reclaim.

Yes, I am aware of the fact that UNINIT flag is used as a optimization
and it means that there is nothing. In current implementation it
also means that there was nothing since mke2fs. I agree that we might
actually good to set that flag again if the group is entirely unused.

However this would only make sense to do in kernel, because in e2fsck it
would not actually speed up anything until the next e2fsck call. But by
that time the groups might be initialized again. To do this in kernel we
would have to check whether the group is actually empty when freeing
blocks, or freeing inodes. That code is not there, and I am not sure if
it is worth the hassle (it might be), so I think that this patch is fine
as it is.

Thanks!
-Lukas

>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.17 (MingW32)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iQEcBAEBAgAGBQJPR79fAAoJEJrBOlT6nu75K8QH/RJoNQPm+rGtmv1cmWPusuNb
> pb/6hRmOhsIUClaMn2diinGgH7HQbZ9FqsSx0mZmWq52T/21korGk3fyVe/nfL9m
> h4xFYJLNEdsSCJE7mcpUu5BMxCwlYEcybHu7xobVtqHlF671zjszj/xCGBgQIEwD
> 3tRu8JXc/grnrya0CxDXd5kenM6oQviEmkproYUjG21XW+2DKjgHD1w6lbcHZHw5
> 5fvWVwFOMy9OgagcBzAxo43E7oZoPCD6o54HT8As7FoBfUSt9Z4GLMe3ULH4SbpP
> KKuRiumOnBW9fz7I3jRDkVpJ+9MxWqpUL4SA79sDreYfOBAa6m7cOaR+PHr8sTM=
> =8u4o
> -----END PGP SIGNATURE-----
>

2012-02-27 16:35:40

by Phillip Susi

[permalink] [raw]
Subject: Re: [RESEND] [PATCH 2/2 v2] e2fsck: Do not forget to discard last block group

On 2/27/2012 2:39 AM, Lukas Czerner wrote:
> I think that no one would ever install their file system on thinly
> provisioned storage just with dd. That's just stupid.

It could be on an SSD. It also could have been transferred with e2image
or similar imaging tools. While foolish to use dd, correcting that
foolishness seems to be the point of e2fsck -E discard. If you could
always rely on parts of the disk being discarded when they should, then
you wouldn't need e2fsck to do it would you?

> I have never said anything about the block group containing zeros. You
> do not even have this guarantee when using discard command.

You do with some disks, and sparse image files. My point was that the
uninitialized flag is no guarantee that the blocks are discarded. If
e2fsck -E discard is to clean up blocks that *should* be discarded, but
( for whatever reason ) aren't, then that should include uninitialized
groups.

> I am not really sure how this is relevant to the e2fsck doing discard. I
> just said that *if* the block group is market as UNINIT, then it was not
> used by the file system from the time of mke2fs, hence there should be
> nothing to reclaim.

There is something to reclaim if it was not discarded at mkfs time.
Think of an SSD that was formatted before mke2fs supported discard, and
had previously been dirtied by another fs.

> However this would only make sense to do in kernel, because in e2fsck it
> would not actually speed up anything until the next e2fsck call. But by
> that time the groups might be initialized again. To do this in kernel we
> would have to check whether the group is actually empty when freeing
> blocks, or freeing inodes. That code is not there, and I am not sure if
> it is worth the hassle (it might be), so I think that this patch is fine
> as it is.

If it may be a good idea to to have the kernel do this in the future,
then wouldn't it be prudent to allow for it now?


2012-02-27 17:00:13

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RESEND] [PATCH 2/2 v2] e2fsck: Do not forget to discard last block group

On Fri, Feb 24, 2012 at 09:34:50AM +0100, Lukas Czerner wrote:
> Previously when running e2fsck with '-E discard' argument the end of
> the last group has not been discarded. This patch fixes it so we
> always discard the end of the last group if needed.
>
> This commit also removes unneeded argument from the
> e2fsck_discard_blocks(). Simultaneously the commit causes the block
> groups with BLOCK_UNINIT flag not to be discarded, which makes
> since because we do not need to reclaim the space since so far
> there has not been written anything.

Let me ask the question Phillip is asking a different way. What's the
*benefit* in not issuing a discard for blocks in block groups where
the block bitmap is marked as unitialized, as opposed to simply
issuing discard for all blocks that are not marked as in use? Are you
trying to optimize the amount of time that the storage device spends
processing the trim commands? Do you think issuing discards on space
that is already discarded will somehow cause more wear on SSD's?

- Ted

2012-02-27 17:58:02

by Lukas Czerner

[permalink] [raw]
Subject: Re: [RESEND] [PATCH 2/2 v2] e2fsck: Do not forget to discard last block group

On Mon, 27 Feb 2012, Ted Ts'o wrote:

> On Fri, Feb 24, 2012 at 09:34:50AM +0100, Lukas Czerner wrote:
> > Previously when running e2fsck with '-E discard' argument the end of
> > the last group has not been discarded. This patch fixes it so we
> > always discard the end of the last group if needed.
> >
> > This commit also removes unneeded argument from the
> > e2fsck_discard_blocks(). Simultaneously the commit causes the block
> > groups with BLOCK_UNINIT flag not to be discarded, which makes
> > since because we do not need to reclaim the space since so far
> > there has not been written anything.
>
> Let me ask the question Phillip is asking a different way. What's the
> *benefit* in not issuing a discard for blocks in block groups where
> the block bitmap is marked as unitialized, as opposed to simply
> issuing discard for all blocks that are not marked as in use? Are you
> trying to optimize the amount of time that the storage device spends
> processing the trim commands? Do you think issuing discards on space
> that is already discarded will somehow cause more wear on SSD's?
>
> - Ted
>

Hi Ted,

exactly. I am trying to benefit from the same optimization e2fsck does.
If the BLOCK_UNINIT is set then we can easily leave that group be and
save some time. Even though that might not be a huge problem with small
file systems, or some *really* fast SSD's, you'll certainly notice it on
huge thin-provisioned storage, or generally any bigger discard capable
device.

I do not think that issuing discards on already discarded space would
somehow wear off the SSD's, however this might not be true for some ancient
disks, or the small cheap flashes. I know that the "common sense" tells
us that when there is nothing to discard, it should be really cheap
operation, however that is not the case. From what I have seen so far
the discard time of non-provisioned regions is almost the same as on
provisioned regions.

Note that on today SSD's discard should be relatively fast (I have not
actually tested any recent device though), this is not the case of some
older SSD and it certainly is not the case for thin-provisioned storage
appliances. The situation is even worse that this storage might be
significantly bigger than todays SSD's.

Moreover device mapper tp target should now support discard as well, so
we can have relatively big thin-provisioned storage which is common enough
and we certainly want to optimize for that, not some obscure case when
people dd their fs on tp targets.

However if we want to implement what Phillip suggested - mark the block
group UNINIT again when the became entirely free - then we'll have to
deal with that somehow. Possibly new ?BLOCK_UNUSED? flag, but it is
different problem to solve.

Oh, and it reminds me that we should take advantage of the BLOCK_UNINIT
flag in FITRIM implementation as well. I am working on the patches.

Thanks!
-Lukas

2012-02-27 18:05:30

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RESEND] [PATCH 2/2 v2] e2fsck: Do not forget to discard last block group

On Mon, Feb 27, 2012 at 06:57:48PM +0100, Lukas Czerner wrote:
>
> exactly. I am trying to benefit from the same optimization e2fsck does.
> If the BLOCK_UNINIT is set then we can easily leave that group be and
> save some time. Even though that might not be a huge problem with small
> file systems, or some *really* fast SSD's, you'll certainly notice it on
> huge thin-provisioned storage, or generally any bigger discard capable
> device.

Well, maybe then the right answer is that we make it an option. There
*are* times when you might want to issue a discard for every single
unused block, even if you think it may have already been discarded.

For example, there are some brain-damaged thin-provisioned storage
boxes that ignore discards that are smaller than 4 megabytes, and then
only discard on 4 megabyte aligned boundaries. So even though the
space might have been previously discarded, one of the reasons why you
might want e2fsck -E discard to force a discard of the entire block
group is because the space might not have been released if the
previous TRIM commands (perhaps issued as various 1mb files were
deleted) didn't obey whatever arbitrary restrictions that are imposed
by the storage device.

I'll agree that as a default, the optimization might make sense. But
it would be good if there's a way to really to bypass the optimization
and issue the discard for every single unused block.

- Ted

2012-02-27 18:33:25

by Lukas Czerner

[permalink] [raw]
Subject: Re: [RESEND] [PATCH 2/2 v2] e2fsck: Do not forget to discard last block group

On Mon, 27 Feb 2012, Ted Ts'o wrote:

> On Mon, Feb 27, 2012 at 06:57:48PM +0100, Lukas Czerner wrote:
> >
> > exactly. I am trying to benefit from the same optimization e2fsck does.
> > If the BLOCK_UNINIT is set then we can easily leave that group be and
> > save some time. Even though that might not be a huge problem with small
> > file systems, or some *really* fast SSD's, you'll certainly notice it on
> > huge thin-provisioned storage, or generally any bigger discard capable
> > device.
>
> Well, maybe then the right answer is that we make it an option. There
> *are* times when you might want to issue a discard for every single
> unused block, even if you think it may have already been discarded.

I would really like to avoid another option if we can avoid it.

>
> For example, there are some brain-damaged thin-provisioned storage
> boxes that ignore discards that are smaller than 4 megabytes, and then
> only discard on 4 megabyte aligned boundaries. So even though the
> space might have been previously discarded, one of the reasons why you
> might want e2fsck -E discard to force a discard of the entire block
> group is because the space might not have been released if the
> previous TRIM commands (perhaps issued as various 1mb files were
> deleted) didn't obey whatever arbitrary restrictions that are imposed
> by the storage device.

I have to admit I do not really understand the example. If the block
group is flagged with BLOCK_UNINIT there was never any file allocated
from that group right ? Also, mke2fs issues initial full device discard
before the file system creation. So if you have this brain-damaged
thin-provisioned storage, discarding BLOCK_UNINIT groups would not
actually bring you anything.

>
> I'll agree that as a default, the optimization might make sense. But
> it would be good if there's a way to really to bypass the optimization
> and issue the discard for every single unused block.

Well, if the user really want to discard everything which is free in the
file system, then there is still wiper.sh which is not fs specific.
However it would have some unneeded consequences.

>
> - Ted
>