2008-08-14 08:59:41

by David Woodhouse

[permalink] [raw]
Subject: [EXT2] Discard unused sectors

When a file is deleted, tell the block device that we don't care about
its blocks any more.

Signed-off-by: David Woodhouse <[email protected]>
---
For linux-next, where sb_issue_discard() has been implemented.
http://git.kernel.dk/?p=linux-2.6-block.git;a=shortlog;h=refs/heads/for-next

fs/ext2/balloc.c | 9 ++++++---
1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/ext2/balloc.c b/fs/ext2/balloc.c
index 10bb02c..03fc2dc 100644
--- a/fs/ext2/balloc.c
+++ b/fs/ext2/balloc.c
@@ -16,6 +16,7 @@
#include <linux/sched.h>
#include <linux/buffer_head.h>
#include <linux/capability.h>
+#include <linux/blkdev.h>

/*
* balloc.c contains the blocks allocation and deallocation routines
@@ -478,13 +479,13 @@ void ext2_discard_reservation(struct inode *inode)
}

/**
- * ext2_free_blocks_sb() -- Free given blocks and update quota and i_blocks
+ * ext2_free_blocks() -- Free given blocks and update quota and i_blocks
* @inode: inode
* @block: start physcial block to free
* @count: number of blocks to free
*/
-void ext2_free_blocks (struct inode * inode, unsigned long block,
- unsigned long count)
+void ext2_free_blocks(struct inode * inode, unsigned long block,
+ unsigned long count)
{
struct buffer_head *bitmap_bh = NULL;
struct buffer_head * bh2;
@@ -555,6 +556,8 @@ do_more:
}
}

+ sb_issue_discard(sb, block, count);
+
mark_buffer_dirty(bitmap_bh);
if (sb->s_flags & MS_SYNCHRONOUS)
sync_dirty_buffer(bitmap_bh);
--
1.5.5.1


--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation





2008-08-14 09:05:50

by David Woodhouse

[permalink] [raw]
Subject: Re: [EXT2] Discard unused sectors

I'm not sure how to do this for ext[34]. The sb_issue_discard() function
issues its requests as a soft barrier, because for naïve callers it
needs to ensure that the discard happens _before_ any subsequent writes
to the same sectors (if they get reallocated immediately).

But ext[34] can probably do better than that, and submit the discard
requests _without_ barriers of their own. If someone with a bit more
clue does it, that is.

--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation


2008-08-15 12:02:37

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [EXT2] Discard unused sectors

On Thu, Aug 14, 2008 at 10:05:48AM +0100, David Woodhouse wrote:
> I'm not sure how to do this for ext[34]. The sb_issue_discard() function
> issues its requests as a soft barrier, because for na?ve callers it
> needs to ensure that the discard happens _before_ any subsequent writes
> to the same sectors (if they get reallocated immediately).
>
> But ext[34] can probably do better than that, and submit the discard
> requests _without_ barriers of their own. If someone with a bit more
> clue does it, that is.

It's worse than this. We can't call sb_issue_discard() until the
transaction commits, since if we crash before the commit, the undelete
will not have happened. (The block/inode bitmaps, inode table,
et. al., aren't allowed to go out to disk until the transaction
commit, and similarly, those sectors aren't allowed to get reused
until the commit happens, as well.)

This is going to be true of any filesystem which is doing journaling.
What makes life a bit more difficult for ext4 is that we are doing
physical block journaling, so we're not keeping track which blocks are
getting discarded. (In contrast, systems that do logical journaling
are keeping track of specific lists of blocks that are getting freed,
since that's what they write to the journal.) This means we'll have
to keep our own in-memory list of extents for which we should call
sb_issue_discard() when the transaction finally commits. So this is
something that we would have to track in the jbd/jbd2 layer, hanging
off of the transaction structure. If we do this right, it will also
be what OCFS2 can use too (since it uses the jbd layer as well.)

- Ted

2008-08-15 18:19:06

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [EXT2] Discard unused sectors

On Fri, Aug 15, 2008 at 08:02:35AM -0400, Theodore Tso wrote:
> On Thu, Aug 14, 2008 at 10:05:48AM +0100, David Woodhouse wrote:
> > I'm not sure how to do this for ext[34]. The sb_issue_discard() function
> > issues its requests as a soft barrier, because for na?ve callers it
> > needs to ensure that the discard happens _before_ any subsequent writes
> > to the same sectors (if they get reallocated immediately).
> >
> > But ext[34] can probably do better than that, and submit the discard
> > requests _without_ barriers of their own. If someone with a bit more
> > clue does it, that is.
>
> It's worse than this. We can't call sb_issue_discard() until the
> transaction commits, since if we crash before the commit, the undelete
> will not have happened. (The block/inode bitmaps, inode table,
> et. al., aren't allowed to go out to disk until the transaction
> commit, and similarly, those sectors aren't allowed to get reused
> until the commit happens, as well.)
>
> This is going to be true of any filesystem which is doing journaling.
> What makes life a bit more difficult for ext4 is that we are doing
> physical block journaling, so we're not keeping track which blocks are
> getting discarded. (In contrast, systems that do logical journaling
> are keeping track of specific lists of blocks that are getting freed,
> since that's what they write to the journal.) This means we'll have
> to keep our own in-memory list of extents for which we should call
> sb_issue_discard() when the transaction finally commits. So this is
> something that we would have to track in the jbd/jbd2 layer, hanging
> off of the transaction structure. If we do this right, it will also
> be what OCFS2 can use too (since it uses the jbd layer as well.)

Doesn't both ext3 and ext4 do this via
ext4_journal_get_undo_access and ext4_mb_free_metadata ?. We actually
wait for the transaction to commit to free the meta-data blocks used by the
transaction

-aneesh