2012-08-31 22:21:36

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH 00/15 v2] Add invalidatepage_range address space operation

This set of patches are aimed to allow truncate_inode_pages_range() handle
ranges which are not aligned at the end of the page. Currently it will
hit BUG_ON() when the end of the range is not aligned. Punch hole feature
however can benefit from this ability saving file systems some work not
forcing them to implement their own invalidate code to handle unaligned
ranges.

In order for this to work we need however new address space operation
invalidatepage_range which should be able to handle page invalidation with
offset and length specified.

patch 01: Implements the new invalidatepage_range address space
operation in the mm layer
patch 02 - 05: Wire the new invalidatepage_range aop to the ext4, xfs and
ocfs2 file system (which are currently the only file
systems supporting punch hole not counting tmpfs which has
its own method) and implement this
functionality for jbd2 as well.
patch 06: Change truncate_inode_pages_range() to handle unaligned
ranges.
patch 07 - 15: Ext4 specific changes which take benefit from the
previous truncate_inode_pages_range() change. Not all
are realated specifically to this change, but all are
related to the punch hole feature.

v2: Change invalidatepage_range lenght and offset argument to be 'unsigned int'
Fix range provided to do_invalidatepage_range() in truncate_inode_pages_range()

Thanks!
-Lukas


[PATCH 01/15 v2] mm: add invalidatepage_range address space operation
[PATCH 02/15 v2] jbd2: implement jbd2_journal_invalidatepage_range
[PATCH 03/15 v2] ext4: implement invalidatepage_range aop
[PATCH 04/15 v2] xfs: implement invalidatepage_range aop
[PATCH 05/15 v2] ocfs2: implement invalidatepage_range aop
[PATCH 06/15 v2] mm: teach truncate_inode_pages_range() to handle non
[PATCH 07/15 v2] ext4: Take i_mutex before punching hole
[PATCH 08/15 v2] Revert "ext4: remove no longer used functions in
[PATCH 09/15 v2] Revert "ext4: fix fsx truncate failure"
[PATCH 10/15 v2] ext4: use ext4_zero_partial_blocks in punch_hole
[PATCH 11/15 v2] ext4: remove unused discard_partial_page_buffers
[PATCH 12/15 v2] ext4: remove unused code from ext4_remove_blocks()
[PATCH 13/15 v2] ext4: update ext4_ext_remove_space trace point
[PATCH 14/15 v2] ext4: make punch hole code path work with bigalloc
[PATCH 15/15 v2] ext4: Allow punch hole with bigalloc enabled

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>


2012-08-31 22:21:38

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH 02/15 v2] jbd2: implement jbd2_journal_invalidatepage_range

mm now supports invalidatepage_range address space operation and there
are two file system using jbd2 also implementing punch hole feature
which can benefit from this. We need to implement the same thing for
jbd2 layer in order to allow those file system take benefit of this
functionality.

With new function jbd2_journal_invalidatepage_range() we can now specify
length to invalidate, rather than assuming invalidate to the end of the
page.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/jbd2/journal.c | 1 +
fs/jbd2/transaction.c | 19 +++++++++++++++++--
include/linux/jbd2.h | 2 ++
3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index e149b99..e4618e9 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -86,6 +86,7 @@ EXPORT_SYMBOL(jbd2_journal_force_commit_nested);
EXPORT_SYMBOL(jbd2_journal_wipe);
EXPORT_SYMBOL(jbd2_journal_blocks_per_page);
EXPORT_SYMBOL(jbd2_journal_invalidatepage);
+EXPORT_SYMBOL(jbd2_journal_invalidatepage_range);
EXPORT_SYMBOL(jbd2_journal_try_to_free_buffers);
EXPORT_SYMBOL(jbd2_journal_force_commit);
EXPORT_SYMBOL(jbd2_journal_file_inode);
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index fb1ab953..65c1374 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1993,10 +1993,20 @@ zap_buffer_unlocked:
*
*/
void jbd2_journal_invalidatepage(journal_t *journal,
- struct page *page,
- unsigned long offset)
+ struct page *page,
+ unsigned long offset)
+{
+ jbd2_journal_invalidatepage_range(journal, page, offset,
+ PAGE_CACHE_SIZE - offset);
+}
+
+void jbd2_journal_invalidatepage_range(journal_t *journal,
+ struct page *page,
+ unsigned int offset,
+ unsigned int length)
{
struct buffer_head *head, *bh, *next;
+ unsigned int stop = offset + length;
unsigned int curr_off = 0;
int may_free = 1;

@@ -2005,6 +2015,8 @@ void jbd2_journal_invalidatepage(journal_t *journal,
if (!page_has_buffers(page))
return;

+ BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
+
/* We will potentially be playing with lists other than just the
* data lists (especially for journaled data mode), so be
* cautious in our locking. */
@@ -2014,6 +2026,9 @@ void jbd2_journal_invalidatepage(journal_t *journal,
unsigned int next_off = curr_off + bh->b_size;
next = bh->b_this_page;

+ if (next_off > stop)
+ return;
+
if (offset <= curr_off) {
/* This block is wholly outside the truncation point */
lock_buffer(bh);
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 3efc43f..21288fa 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1101,6 +1101,8 @@ extern int jbd2_journal_forget (handle_t *, struct buffer_head *);
extern void journal_sync_buffer (struct buffer_head *);
extern void jbd2_journal_invalidatepage(journal_t *,
struct page *, unsigned long);
+extern void jbd2_journal_invalidatepage_range(journal_t *, struct page *,
+ unsigned int, unsigned int);
extern int jbd2_journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
extern int jbd2_journal_stop(handle_t *);
extern int jbd2_journal_flush (journal_t *);
--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2012-08-31 22:21:37

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH 01/15 v2] mm: add invalidatepage_range address space operation

Currently there is no way to truncate partial page where the end
truncate point is not at the end of the page. This is because it was not
needed and the functionality was enough for file system truncate
operation to work properly. However more file systems now support punch
hole feature and it can benefit from mm supporting truncating page just
up to the certain point.

Specifically, with this functionality truncate_inode_pages_range() can
be changed so it supports truncating partial page at the end of the
range (currently it will BUG_ON() if 'end' is not at the end of the
page).

This commit add new address space operation invalidatepage_range which
allows specifying length of bytes to invalidate, rather than assuming
truncate to the end of the page. It also introduce
block_invalidatepage_range() and do_invalidatepage)range() functions for
exactly the same reason.

The caller does not have to implement both aops (invalidatepage and
invalidatepage_range) and the latter is preferred. The old method will be
used only if invalidatepage_range is not implemented by the caller.

Signed-off-by: Lukas Czerner <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Hugh Dickins <[email protected]>
---
Documentation/filesystems/Locking | 17 ++++++++++++---
Documentation/filesystems/vfs.txt | 17 +++++++++++++-
fs/buffer.c | 30 ++++++++++++++++++++++++++-
include/linux/buffer_head.h | 2 +
include/linux/fs.h | 2 +
include/linux/mm.h | 2 +
mm/truncate.c | 40 +++++++++++++++++++++++++++++++++---
7 files changed, 99 insertions(+), 11 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index e540a24..c137fce 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -193,7 +193,9 @@ prototypes:
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata);
sector_t (*bmap)(struct address_space *, sector_t);
- int (*invalidatepage) (struct page *, unsigned long);
+ void (*invalidatepage) (struct page *, unsigned long);
+ void (*invalidatepage_range) (struct page *, unsigned int,
+ unsigned int);
int (*releasepage) (struct page *, int);
void (*freepage)(struct page *);
int (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
@@ -221,6 +223,7 @@ write_begin: locks the page yes
write_end: yes, unlocks yes
bmap:
invalidatepage: yes
+invalidatepage_range: yes
releasepage: yes
freepage: yes
direct_IO:
@@ -314,9 +317,15 @@ filesystems and by the swapper. The latter will eventually go away. Please,
keep it that way and don't breed new callers.

->invalidatepage() is called when the filesystem must attempt to drop
-some or all of the buffers from the page when it is being truncated. It
-returns zero on success. If ->invalidatepage is zero, the kernel uses
-block_invalidatepage() instead.
+some or all of the buffers from the page when it is being truncated. If
+->invalidatepage is zero, the kernel uses block_invalidatepage_range()
+instead.
+
+ ->invalidatepage_range() serves the same purpose as ->invalidatepage()
+except that range within the page to invalidate can be specified. This should
+be preferred operation over the ->invalidatepage(). If ->invalidatepage_range()
+is zero, the kernel tries to use ->invalidatepage(), if it is zero as well the
+kernel uses block_invalidatepage_range() instead.

->releasepage() is called when the kernel is about to try to drop the
buffers from the page in preparation for freeing it. It returns zero to
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 2ee133e..c7d7da8 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -560,7 +560,7 @@ struct address_space_operations
-------------------------------

This describes how the VFS can manipulate mapping of a file to page cache in
-your filesystem. As of kernel 2.6.22, the following members are defined:
+your filesystem. The following members are defined:

struct address_space_operations {
int (*writepage)(struct page *page, struct writeback_control *wbc);
@@ -577,7 +577,9 @@ struct address_space_operations {
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata);
sector_t (*bmap)(struct address_space *, sector_t);
- int (*invalidatepage) (struct page *, unsigned long);
+ void (*invalidatepage) (struct page *, unsigned long);
+ void (*invalidatepage_range) (struct page *, unsigned int,
+ unsigned int);
int (*releasepage) (struct page *, int);
void (*freepage)(struct page *);
ssize_t (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
@@ -705,6 +707,17 @@ struct address_space_operations {
calling the ->releasepage function, but in this case the
release MUST succeed.

+ invalidatepage_range: If a page has PagePrivate set, then
+ invalidatepage_range will be called when part or all of the page
+ is to be removed from the address space. This generally corresponds
+ to either a truncation, punch hole or a complete invalidateion of
+ the address space. Any private data associated with the page should
+ be updated to reflect this. If offset is 0 and length is
+ PAGE_CACHE_SIZE, then the private data should be released, because
+ the page must be able to be completely discarded. This may be done
+ by calling the ->releasepage function, but in this case the release
+ MUST succeed.
+
releasepage: releasepage is called on PagePrivate pages to indicate
that the page should be freed if possible. ->releasepage
should remove any private data from the page and clear the
diff --git a/fs/buffer.c b/fs/buffer.c
index 58e2e7b..180c109 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1451,13 +1451,34 @@ static void discard_buffer(struct buffer_head * bh)
*/
void block_invalidatepage(struct page *page, unsigned long offset)
{
+ block_invalidatepage_range(page, offset, PAGE_CACHE_SIZE - offset);
+}
+EXPORT_SYMBOL(block_invalidatepage);
+
+/**
+ * block_invalidatepage_range() - invalidate all of a buffers within the
+ * specified range of the buffer-backed page.
+ *
+ * @page: the page which is affected
+ * @offset: start of the range
+ * @length: length of the range
+ */
+void block_invalidatepage_range(struct page *page, unsigned int offset,
+ unsigned int length)
+{
struct buffer_head *head, *bh, *next;
unsigned int curr_off = 0;
+ unsigned int stop = length + offset;

BUG_ON(!PageLocked(page));
if (!page_has_buffers(page))
goto out;

+ /*
+ * Check for overflow
+ */
+ BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
+
head = page_buffers(page);
bh = head;
do {
@@ -1465,6 +1486,12 @@ void block_invalidatepage(struct page *page, unsigned long offset)
next = bh->b_this_page;

/*
+ * Are we still fully in range ?
+ */
+ if (next_off > stop)
+ goto out;
+
+ /*
* is this block fully invalidated?
*/
if (offset <= curr_off)
@@ -1483,7 +1510,8 @@ void block_invalidatepage(struct page *page, unsigned long offset)
out:
return;
}
-EXPORT_SYMBOL(block_invalidatepage);
+EXPORT_SYMBOL(block_invalidatepage_range);
+

/*
* We attach and possibly dirty the buffers atomically wrt
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 458f497..2e7f5ab 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -194,6 +194,8 @@ extern int buffer_heads_over_limit;
* address_spaces.
*/
void block_invalidatepage(struct page *page, unsigned long offset);
+void block_invalidatepage_range(struct page *page, unsigned int offset,
+ unsigned int length);
int block_write_full_page(struct page *page, get_block_t *get_block,
struct writeback_control *wbc);
int block_write_full_page_endio(struct page *page, get_block_t *get_block,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index aa11047..d80de28 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -626,6 +626,8 @@ struct address_space_operations {
/* Unfortunately this kludge is needed for FIBMAP. Don't use it */
sector_t (*bmap)(struct address_space *, sector_t);
void (*invalidatepage) (struct page *, unsigned long);
+ void (*invalidatepage_range) (struct page *, unsigned int,
+ unsigned int);
int (*releasepage) (struct page *, gfp_t);
void (*freepage)(struct page *);
ssize_t (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 311be90..9f616fd 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1027,6 +1027,8 @@ struct page *get_dump_page(unsigned long addr);

extern int try_to_release_page(struct page * page, gfp_t gfp_mask);
extern void do_invalidatepage(struct page *page, unsigned long offset);
+extern void do_invalidatepage_range(struct page *page, unsigned int offset,
+ unsigned int length);

int __set_page_dirty_nobuffers(struct page *page);
int __set_page_dirty_no_writeback(struct page *page);
diff --git a/mm/truncate.c b/mm/truncate.c
index 75801ac..b22efdf 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -39,14 +39,46 @@
*/
void do_invalidatepage(struct page *page, unsigned long offset)
{
+ do_invalidatepage_range(page, offset, PAGE_CACHE_SIZE - offset);
+}
+
+
+/**
+ * do_invalidatepage_range - invalidate range of the page
+ *
+ * @page: the page which is affected
+ * @offset: start of the range to invalidate
+ * @length: length of the range to invalidate
+ */
+void do_invalidatepage_range(struct page *page, unsigned int offset,
+ unsigned int length)
+{
+ void (*invalidatepage_range)(struct page *, unsigned int,
+ unsigned int);
void (*invalidatepage)(struct page *, unsigned long);
+
+ /*
+ * Try invalidatepage_range first
+ */
+ invalidatepage_range = page->mapping->a_ops->invalidatepage_range;
+ if (invalidatepage_range) {
+ (*invalidatepage_range)(page, offset, length);
+ return;
+ }
+
+ /*
+ * When only invalidatepage is registered length + offset must be
+ * PAGE_CACHE_SIZE
+ */
invalidatepage = page->mapping->a_ops->invalidatepage;
+ if (invalidatepage) {
+ BUG_ON(length + offset != PAGE_CACHE_SIZE);
+ (*invalidatepage)(page, offset);
+ }
#ifdef CONFIG_BLOCK
- if (!invalidatepage)
- invalidatepage = block_invalidatepage;
+ if (!invalidatepage_range && !invalidatepage)
+ block_invalidatepage_range(page, offset, length);
#endif
- if (invalidatepage)
- (*invalidatepage)(page, offset);
}

static inline void truncate_partial_page(struct page *page, unsigned partial)
--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2012-08-31 22:21:39

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH 03/15 v2] ext4: implement invalidatepage_range aop

mm now supports invalidatepage_range address space operation which is
useful to allow truncating page range which is not aligned to the end of
the page. This will help in punch hole implementation once
truncate_inode_pages_range() is modify to allow this as well.

With this commit ext4 now register only invalidatepage_range. Also
change the respective tracepoint to print length of the range.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/inode.c | 57 ++++++++++++++++++++++++++++++-------------
include/trace/events/ext4.h | 15 +++++++----
2 files changed, 49 insertions(+), 23 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index dff171c..244579d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -131,7 +131,8 @@ static inline int ext4_begin_ordered_truncate(struct inode *inode,
new_size);
}

-static void ext4_invalidatepage(struct page *page, unsigned long offset);
+static void ext4_invalidatepage_range(struct page *page, unsigned int offset,
+ unsigned int length);
static int noalloc_get_block_write(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create);
static int ext4_set_bh_endio(struct buffer_head *bh, struct inode *inode);
@@ -1291,20 +1292,27 @@ static void ext4_da_release_space(struct inode *inode, int to_free)
}

static void ext4_da_page_release_reservation(struct page *page,
- unsigned long offset)
+ unsigned int offset,
+ unsigned int length)
{
int to_release = 0;
struct buffer_head *head, *bh;
unsigned int curr_off = 0;
struct inode *inode = page->mapping->host;
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+ unsigned int stop = offset + length;
int num_clusters;

+ BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
+
head = page_buffers(page);
bh = head;
do {
unsigned int next_off = curr_off + bh->b_size;

+ if (next_off > stop)
+ break;
+
if ((offset <= curr_off) && (buffer_delay(bh))) {
to_release++;
clear_buffer_delay(bh);
@@ -2637,7 +2645,9 @@ static int ext4_da_write_end(struct file *file,
return ret ? ret : copied;
}

-static void ext4_da_invalidatepage(struct page *page, unsigned long offset)
+static void ext4_da_invalidatepage_range(struct page *page,
+ unsigned int offset,
+ unsigned int length)
{
/*
* Drop reserved blocks
@@ -2646,10 +2656,10 @@ static void ext4_da_invalidatepage(struct page *page, unsigned long offset)
if (!page_has_buffers(page))
goto out;

- ext4_da_page_release_reservation(page, offset);
+ ext4_da_page_release_reservation(page, offset, length);

out:
- ext4_invalidatepage(page, offset);
+ ext4_invalidatepage_range(page, offset, length);

return;
}
@@ -2775,47 +2785,60 @@ ext4_readpages(struct file *file, struct address_space *mapping,
return mpage_readpages(mapping, pages, nr_pages, ext4_get_block);
}

-static void ext4_invalidatepage_free_endio(struct page *page, unsigned long offset)
+static void ext4_invalidatepage_free_endio(struct page *page,
+ unsigned int offset,
+ unsigned int length)
{
struct buffer_head *head, *bh;
unsigned int curr_off = 0;
+ unsigned int stop = offset + length;

if (!page_has_buffers(page))
return;
+
+ BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
+
head = bh = page_buffers(page);
do {
+ unsigned int next_off = curr_off + bh->b_size;
+
+ if (next_off > stop)
+ return;
+
if (offset <= curr_off && test_clear_buffer_uninit(bh)
&& bh->b_private) {
ext4_free_io_end(bh->b_private);
bh->b_private = NULL;
bh->b_end_io = NULL;
}
- curr_off = curr_off + bh->b_size;
+ curr_off = next_off;
bh = bh->b_this_page;
} while (bh != head);
}

-static void ext4_invalidatepage(struct page *page, unsigned long offset)
+static void ext4_invalidatepage_range(struct page *page, unsigned int offset,
+ unsigned int length)
{
journal_t *journal = EXT4_JOURNAL(page->mapping->host);

- trace_ext4_invalidatepage(page, offset);
+ trace_ext4_invalidatepage_range(page, offset, length);

/*
* free any io_end structure allocated for buffers to be discarded
*/
if (ext4_should_dioread_nolock(page->mapping->host))
- ext4_invalidatepage_free_endio(page, offset);
+ ext4_invalidatepage_free_endio(page, offset, length);
/*
* If it's a full truncate we just forget about the pending dirtying
*/
- if (offset == 0)
+ if (offset == 0 && length == PAGE_CACHE_SIZE)
ClearPageChecked(page);

if (journal)
- jbd2_journal_invalidatepage(journal, page, offset);
+ jbd2_journal_invalidatepage_range(journal, page,
+ offset, length);
else
- block_invalidatepage(page, offset);
+ block_invalidatepage_range(page, offset, length);
}

static int ext4_releasepage(struct page *page, gfp_t wait)
@@ -3187,7 +3210,7 @@ static const struct address_space_operations ext4_ordered_aops = {
.write_begin = ext4_write_begin,
.write_end = ext4_ordered_write_end,
.bmap = ext4_bmap,
- .invalidatepage = ext4_invalidatepage,
+ .invalidatepage_range = ext4_invalidatepage_range,
.releasepage = ext4_releasepage,
.direct_IO = ext4_direct_IO,
.migratepage = buffer_migrate_page,
@@ -3202,7 +3225,7 @@ static const struct address_space_operations ext4_writeback_aops = {
.write_begin = ext4_write_begin,
.write_end = ext4_writeback_write_end,
.bmap = ext4_bmap,
- .invalidatepage = ext4_invalidatepage,
+ .invalidatepage_range = ext4_invalidatepage_range,
.releasepage = ext4_releasepage,
.direct_IO = ext4_direct_IO,
.migratepage = buffer_migrate_page,
@@ -3218,7 +3241,7 @@ static const struct address_space_operations ext4_journalled_aops = {
.write_end = ext4_journalled_write_end,
.set_page_dirty = ext4_journalled_set_page_dirty,
.bmap = ext4_bmap,
- .invalidatepage = ext4_invalidatepage,
+ .invalidatepage_range = ext4_invalidatepage_range,
.releasepage = ext4_releasepage,
.direct_IO = ext4_direct_IO,
.is_partially_uptodate = block_is_partially_uptodate,
@@ -3233,7 +3256,7 @@ static const struct address_space_operations ext4_da_aops = {
.write_begin = ext4_da_write_begin,
.write_end = ext4_da_write_end,
.bmap = ext4_bmap,
- .invalidatepage = ext4_da_invalidatepage,
+ .invalidatepage_range = ext4_da_invalidatepage_range,
.releasepage = ext4_releasepage,
.direct_IO = ext4_direct_IO,
.migratepage = buffer_migrate_page,
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 69d8a69..ee7e11a 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -450,14 +450,15 @@ DEFINE_EVENT(ext4__page_op, ext4_releasepage,
TP_ARGS(page)
);

-TRACE_EVENT(ext4_invalidatepage,
- TP_PROTO(struct page *page, unsigned long offset),
+TRACE_EVENT(ext4_invalidatepage_range,
+ TP_PROTO(struct page *page, unsigned int offset, unsigned int length),

- TP_ARGS(page, offset),
+ TP_ARGS(page, offset, length),

TP_STRUCT__entry(
__field( pgoff_t, index )
- __field( unsigned long, offset )
+ __field( unsigned int, offset )
+ __field( unsigned int, length )
__field( ino_t, ino )
__field( dev_t, dev )

@@ -466,14 +467,16 @@ TRACE_EVENT(ext4_invalidatepage,
TP_fast_assign(
__entry->index = page->index;
__entry->offset = offset;
+ __entry->length = length;
__entry->ino = page->mapping->host->i_ino;
__entry->dev = page->mapping->host->i_sb->s_dev;
),

- TP_printk("dev %d,%d ino %lu page_index %lu offset %lu",
+ TP_printk("dev %d,%d ino %lu page_index %lu offset %u length %u",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
- (unsigned long) __entry->index, __entry->offset)
+ (unsigned long) __entry->index,
+ __entry->offset, __entry->length)
);

TRACE_EVENT(ext4_discard_blocks,
--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2012-08-31 22:21:41

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH 05/15 v2] ocfs2: implement invalidatepage_range aop

mm now supports invalidatepage_range address space operation which is
useful to allow truncating page range which is not aligned to the end of
the page. This will help in punch hole implementation once
truncate_inode_pages_range() is modify to allow this as well.

With this commit ocfs2 now register only invalidatepage_range.

Signed-off-by: Lukas Czerner <[email protected]>
Cc: [email protected]
---
fs/ocfs2/aops.c | 7 ++++---
1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 6577432..a101232 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -603,11 +603,12 @@ static void ocfs2_dio_end_io(struct kiocb *iocb,
* from ext3. PageChecked() bits have been removed as OCFS2 does not
* do journalled data.
*/
-static void ocfs2_invalidatepage(struct page *page, unsigned long offset)
+static void ocfs2_invalidatepage_range(struct page *page, unsigned int offset,
+ unsigned int length)
{
journal_t *journal = OCFS2_SB(page->mapping->host->i_sb)->journal->j_journal;

- jbd2_journal_invalidatepage(journal, page, offset);
+ jbd2_journal_invalidatepage_range(journal, page, offset, length);
}

static int ocfs2_releasepage(struct page *page, gfp_t wait)
@@ -2091,7 +2092,7 @@ const struct address_space_operations ocfs2_aops = {
.write_end = ocfs2_write_end,
.bmap = ocfs2_bmap,
.direct_IO = ocfs2_direct_IO,
- .invalidatepage = ocfs2_invalidatepage,
+ .invalidatepage_range = ocfs2_invalidatepage_range,
.releasepage = ocfs2_releasepage,
.migratepage = buffer_migrate_page,
.is_partially_uptodate = block_is_partially_uptodate,
--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2012-08-31 22:21:42

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH 06/15 v2] mm: teach truncate_inode_pages_range() to handle non page aligned ranges

This commit changes truncate_inode_pages_range() so it can handle non
page aligned regions of the truncate. Currently we can hit BUG_ON when
the end of the range is not page aligned, but we can handle unaligned
start of the range.

Being able to handle non page aligned regions of the page can help file
system punch_hole implementations and save some work, because once we're
holding the page we might as well deal with it right away.

In order for this to work correctly, called must register
invalidatepage_range address space operation, or rely solely on the
block_invalidatepage_range. That said it will BUG_ON() if caller
implements invalidatepage(), does not implement invalidatepage_range()
and use truncate_inode_pages_range() with unaligned end of the range.

This was based on the code provided by Hugh Dickins with some small
changes to make use of do_invalidatepage_range().

Signed-off-by: Lukas Czerner <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Hugh Dickins <[email protected]>
---
mm/truncate.c | 77 +++++++++++++++++++++++++++++++++++++--------------------
1 files changed, 50 insertions(+), 27 deletions(-)

diff --git a/mm/truncate.c b/mm/truncate.c
index b22efdf..0db1551 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -81,14 +81,6 @@ void do_invalidatepage_range(struct page *page, unsigned int offset,
#endif
}

-static inline void truncate_partial_page(struct page *page, unsigned partial)
-{
- zero_user_segment(page, partial, PAGE_CACHE_SIZE);
- cleancache_invalidate_page(page->mapping, page);
- if (page_has_private(page))
- do_invalidatepage(page, partial);
-}
-
/*
* This cancels just the dirty bit on the kernel page itself, it
* does NOT actually remove dirty bits on any mmap's that may be
@@ -222,8 +214,8 @@ int invalidate_inode_page(struct page *page)
* @lend: offset to which to truncate
*
* Truncate the page cache, removing the pages that are between
- * specified offsets (and zeroing out partial page
- * (if lstart is not page aligned)).
+ * specified offsets (and zeroing out partial pages
+ * if lstart or lend + 1 is not page aligned).
*
* Truncate takes two passes - the first pass is nonblocking. It will not
* block on page locks and it will not block on writeback. The second pass
@@ -234,35 +226,44 @@ int invalidate_inode_page(struct page *page)
* We pass down the cache-hot hint to the page freeing code. Even if the
* mapping is large, it is probably the case that the final pages are the most
* recently touched, and freeing happens in ascending file offset order.
+ *
+ * Note that it is able to handle cases where lend + 1 is not page aligned.
+ * However in order for this to work caller have to register
+ * invalidatepage_range address space operation or rely solely on
+ * block_invalidatepage_range(). That said, do_invalidatepage_range() will
+ * BUG_ON() if caller implements invalidatapage(), does not implement
+ * invalidatepage_range() and uses truncate_inode_pages_range() with lend + 1
+ * unaligned to the page cache size.
*/
void truncate_inode_pages_range(struct address_space *mapping,
loff_t lstart, loff_t lend)
{
- const pgoff_t start = (lstart + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
- const unsigned partial = lstart & (PAGE_CACHE_SIZE - 1);
+ pgoff_t start = (lstart + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
+ pgoff_t end = (lend + 1) >> PAGE_CACHE_SHIFT;
+ unsigned int partial_start = lstart & (PAGE_CACHE_SIZE - 1);
+ unsigned int partial_end = (lend + 1) & (PAGE_CACHE_SIZE - 1);
struct pagevec pvec;
pgoff_t index;
- pgoff_t end;
int i;

cleancache_invalidate_inode(mapping);
if (mapping->nrpages == 0)
return;

- BUG_ON((lend & (PAGE_CACHE_SIZE - 1)) != (PAGE_CACHE_SIZE - 1));
- end = (lend >> PAGE_CACHE_SHIFT);
+ if (lend == -1)
+ end = -1; /* unsigned, so actually very big */

pagevec_init(&pvec, 0);
index = start;
- while (index <= end && pagevec_lookup(&pvec, mapping, index,
- min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
+ while (index < end && pagevec_lookup(&pvec, mapping, index,
+ min(end - index, (pgoff_t)PAGEVEC_SIZE))) {
mem_cgroup_uncharge_start();
for (i = 0; i < pagevec_count(&pvec); i++) {
struct page *page = pvec.pages[i];

/* We rely upon deletion not changing page->index */
index = page->index;
- if (index > end)
+ if (index >= end)
break;

if (!trylock_page(page))
@@ -281,27 +282,51 @@ void truncate_inode_pages_range(struct address_space *mapping,
index++;
}

- if (partial) {
+ if (partial_start) {
struct page *page = find_lock_page(mapping, start - 1);
if (page) {
+ unsigned int top = PAGE_CACHE_SIZE;
+ if (start > end) {
+ top = partial_end;
+ partial_end = 0;
+ }
+ wait_on_page_writeback(page);
+ zero_user_segment(page, partial_start, top);
+ cleancache_invalidate_page(mapping, page);
+ if (page_has_private(page))
+ do_invalidatepage_range(page, partial_start,
+ top - partial_start);
+ unlock_page(page);
+ page_cache_release(page);
+ }
+ }
+ if (partial_end) {
+ struct page *page = find_lock_page(mapping, end);
+ if (page) {
wait_on_page_writeback(page);
- truncate_partial_page(page, partial);
+ zero_user_segment(page, 0, partial_end);
+ cleancache_invalidate_page(mapping, page);
+ if (page_has_private(page))
+ do_invalidatepage_range(page, 0,
+ partial_end);
unlock_page(page);
page_cache_release(page);
}
}
+ if (start >= end)
+ return;

index = start;
for ( ; ; ) {
cond_resched();
if (!pagevec_lookup(&pvec, mapping, index,
- min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
+ min(end - index, (pgoff_t)PAGEVEC_SIZE))) {
if (index == start)
break;
index = start;
continue;
}
- if (index == start && pvec.pages[0]->index > end) {
+ if (index == start && pvec.pages[0]->index >= end) {
pagevec_release(&pvec);
break;
}
@@ -311,7 +336,7 @@ void truncate_inode_pages_range(struct address_space *mapping,

/* We rely upon deletion not changing page->index */
index = page->index;
- if (index > end)
+ if (index >= end)
break;

lock_page(page);
@@ -656,10 +681,8 @@ void truncate_pagecache_range(struct inode *inode, loff_t lstart, loff_t lend)
* This rounding is currently just for example: unmap_mapping_range
* expands its hole outwards, whereas we want it to contract the hole
* inwards. However, existing callers of truncate_pagecache_range are
- * doing their own page rounding first; and truncate_inode_pages_range
- * currently BUGs if lend is not pagealigned-1 (it handles partial
- * page at start of hole, but not partial page at end of hole). Note
- * unmap_mapping_range allows holelen 0 for all, and we allow lend -1.
+ * doing their own page rounding first. Note that unmap_mapping_range
+ * allows holelen 0 for all, and we allow lend -1 for end of file.
*/

/*
--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2012-08-31 22:21:43

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH 07/15 v2] ext4: Take i_mutex before punching hole

Currently the allocation might happen in the punched range after the
truncation and before the releasing the space of the range. This would
lead to blocks being unallocated under the mapped buffer heads resulting
in nasty bugs.

With this commit we take i_mutex before going to do anything in the
ext4_ext_punch_hole() preventing any write to happen while the hole
punching is in progress. This will also allow us to ditch the writeout
of dirty pages withing the range.

This commit was based on code provided by Zheng Liu, thanks!

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/extents.c | 26 ++++++++++----------------
1 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index aabbb3f..f920383 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4769,9 +4769,11 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
loff_t first_page_offset, last_page_offset;
int credits, err = 0;

+ mutex_lock(&inode->i_mutex);
+
/* No need to punch hole beyond i_size */
if (offset >= inode->i_size)
- return 0;
+ goto out1;

/*
* If the hole extends beyond i_size, set the hole
@@ -4789,18 +4791,6 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
first_page_offset = first_page << PAGE_CACHE_SHIFT;
last_page_offset = last_page << PAGE_CACHE_SHIFT;

- /*
- * Write out all dirty pages to avoid race conditions
- * Then release them.
- */
- if (mapping->nrpages && mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
- err = filemap_write_and_wait_range(mapping,
- offset, offset + length - 1);
-
- if (err)
- return err;
- }
-
/* Now release the pages */
if (last_page_offset > first_page_offset) {
truncate_pagecache_range(inode, first_page_offset,
@@ -4812,12 +4802,14 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)

credits = ext4_writepage_trans_blocks(inode);
handle = ext4_journal_start(inode, credits);
- if (IS_ERR(handle))
- return PTR_ERR(handle);
+ if (IS_ERR(handle)) {
+ err = PTR_ERR(handle);
+ goto out1;
+ }

err = ext4_orphan_add(handle, inode);
if (err)
- goto out;
+ goto out1;

/*
* Now we need to zero out the non-page-aligned data in the
@@ -4907,6 +4899,8 @@ out:
inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
ext4_mark_inode_dirty(handle, inode);
ext4_journal_stop(handle);
+out1:
+ mutex_unlock(&inode->i_mutex);
return err;
}
int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2012-08-31 22:21:44

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH 08/15 v2] Revert "ext4: remove no longer used functions in inode.c"

This reverts commit ccb4d7af914e0fe9b2f1022f8ea6c300463fd5e6.

This commit reintroduces functions ext4_block_truncate_page() and
ext4_block_zero_page_range() which has been previously removed in favour
of ext4_discard_partial_page_buffers().

In future commits we want to reintroduce those function and remove
ext4_discard_partial_page_buffers() since it is duplicating some code
and also partially duplicating work of truncate_pagecache_range(),
moreover the old implementation was much clearer.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/ext4.h | 4 ++
fs/ext4/inode.c | 120 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 124 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index c3411d4..eb58808 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2001,6 +2001,10 @@ extern int ext4_alloc_da_blocks(struct inode *inode);
extern void ext4_set_aops(struct inode *inode);
extern int ext4_writepage_trans_blocks(struct inode *);
extern int ext4_chunk_trans_blocks(struct inode *, int nrblocks);
+extern int ext4_block_truncate_page(handle_t *handle,
+ struct address_space *mapping, loff_t from);
+extern int ext4_block_zero_page_range(handle_t *handle,
+ struct address_space *mapping, loff_t from, loff_t length);
extern int ext4_discard_partial_page_buffers(handle_t *handle,
struct address_space *mapping, loff_t from,
loff_t length, int flags);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 244579d..3c907f6 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3490,6 +3490,126 @@ next:
return err;
}

+/*
+ * ext4_block_truncate_page() zeroes out a mapping from file offset `from'
+ * up to the end of the block which corresponds to `from'.
+ * This required during truncate. We need to physically zero the tail end
+ * of that block so it doesn't yield old data if the file is later grown.
+ */
+int ext4_block_truncate_page(handle_t *handle,
+ struct address_space *mapping, loff_t from)
+{
+ unsigned offset = from & (PAGE_CACHE_SIZE-1);
+ unsigned length;
+ unsigned blocksize;
+ struct inode *inode = mapping->host;
+
+ blocksize = inode->i_sb->s_blocksize;
+ length = blocksize - (offset & (blocksize - 1));
+
+ return ext4_block_zero_page_range(handle, mapping, from, length);
+}
+
+/*
+ * ext4_block_zero_page_range() zeros out a mapping of length 'length'
+ * starting from file offset 'from'. The range to be zero'd must
+ * be contained with in one block. If the specified range exceeds
+ * the end of the block it will be shortened to end of the block
+ * that cooresponds to 'from'
+ */
+int ext4_block_zero_page_range(handle_t *handle,
+ struct address_space *mapping, loff_t from, loff_t length)
+{
+ ext4_fsblk_t index = from >> PAGE_CACHE_SHIFT;
+ unsigned offset = from & (PAGE_CACHE_SIZE-1);
+ unsigned blocksize, max, pos;
+ ext4_lblk_t iblock;
+ struct inode *inode = mapping->host;
+ struct buffer_head *bh;
+ struct page *page;
+ int err = 0;
+
+ page = find_or_create_page(mapping, from >> PAGE_CACHE_SHIFT,
+ mapping_gfp_mask(mapping) & ~__GFP_FS);
+ if (!page)
+ return -ENOMEM;
+
+ blocksize = inode->i_sb->s_blocksize;
+ max = blocksize - (offset & (blocksize - 1));
+
+ /*
+ * correct length if it does not fall between
+ * 'from' and the end of the block
+ */
+ if (length > max || length < 0)
+ length = max;
+
+ iblock = index << (PAGE_CACHE_SHIFT - inode->i_sb->s_blocksize_bits);
+
+ if (!page_has_buffers(page))
+ create_empty_buffers(page, blocksize, 0);
+
+ /* Find the buffer that contains "offset" */
+ bh = page_buffers(page);
+ pos = blocksize;
+ while (offset >= pos) {
+ bh = bh->b_this_page;
+ iblock++;
+ pos += blocksize;
+ }
+
+ err = 0;
+ if (buffer_freed(bh)) {
+ BUFFER_TRACE(bh, "freed: skip");
+ goto unlock;
+ }
+
+ if (!buffer_mapped(bh)) {
+ BUFFER_TRACE(bh, "unmapped");
+ ext4_get_block(inode, iblock, bh, 0);
+ /* unmapped? It's a hole - nothing to do */
+ if (!buffer_mapped(bh)) {
+ BUFFER_TRACE(bh, "still unmapped");
+ goto unlock;
+ }
+ }
+
+ /* Ok, it's mapped. Make sure it's up-to-date */
+ if (PageUptodate(page))
+ set_buffer_uptodate(bh);
+
+ if (!buffer_uptodate(bh)) {
+ err = -EIO;
+ ll_rw_block(READ, 1, &bh);
+ wait_on_buffer(bh);
+ /* Uhhuh. Read error. Complain and punt. */
+ if (!buffer_uptodate(bh))
+ goto unlock;
+ }
+
+ if (ext4_should_journal_data(inode)) {
+ BUFFER_TRACE(bh, "get write access");
+ err = ext4_journal_get_write_access(handle, bh);
+ if (err)
+ goto unlock;
+ }
+
+ zero_user(page, offset, length);
+
+ BUFFER_TRACE(bh, "zeroed end of block");
+
+ err = 0;
+ if (ext4_should_journal_data(inode)) {
+ err = ext4_handle_dirty_metadata(handle, inode, bh);
+ } else
+ mark_buffer_dirty(bh);
+
+unlock:
+ unlock_page(page);
+ page_cache_release(page);
+ return err;
+}
+
int ext4_can_truncate(struct inode *inode)
{
if (S_ISREG(inode->i_mode))
--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2012-08-31 22:21:40

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH 04/15 v2] xfs: implement invalidatepage_range aop

mm now supports invalidatepage_range address space operation which is
useful to allow truncating page range which is not aligned to the end of
the page. This will help in punch hole implementation once
truncate_inode_pages_range() is modify to allow this as well.

With this commit xfs now register only invalidatepage_range.
Additionally we also update the respective trace point.

Signed-off-by: Lukas Czerner <[email protected]>
Cc: [email protected]
---
fs/xfs/xfs_aops.c | 14 ++++++++------
fs/xfs/xfs_trace.h | 41 ++++++++++++++++++++++++++++++++++++++++-
2 files changed, 48 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index e562dd4..c395f9e 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -813,12 +813,14 @@ xfs_cluster_write(
}

STATIC void
-xfs_vm_invalidatepage(
+xfs_vm_invalidatepage_range(
struct page *page,
- unsigned long offset)
+ unsigned int offset,
+ unsigned int length)
{
- trace_xfs_invalidatepage(page->mapping->host, page, offset);
- block_invalidatepage(page, offset);
+ trace_xfs_invalidatepage_range(page->mapping->host, page, offset,
+ length);
+ block_invalidatepage_range(page, offset, length);
}

/*
@@ -882,7 +884,7 @@ next_buffer:

xfs_iunlock(ip, XFS_ILOCK_EXCL);
out_invalidate:
- xfs_vm_invalidatepage(page, 0);
+ xfs_vm_invalidatepage_range(page, 0, PAGE_CACHE_SIZE);
return;
}

@@ -1646,7 +1648,7 @@ const struct address_space_operations xfs_address_space_operations = {
.writepage = xfs_vm_writepage,
.writepages = xfs_vm_writepages,
.releasepage = xfs_vm_releasepage,
- .invalidatepage = xfs_vm_invalidatepage,
+ .invalidatepage_range = xfs_vm_invalidatepage_range,
.write_begin = xfs_vm_write_begin,
.write_end = xfs_vm_write_end,
.bmap = xfs_vm_bmap,
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index e5795dd..e716754 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -981,7 +981,46 @@ DEFINE_EVENT(xfs_page_class, name, \
TP_ARGS(inode, page, off))
DEFINE_PAGE_EVENT(xfs_writepage);
DEFINE_PAGE_EVENT(xfs_releasepage);
-DEFINE_PAGE_EVENT(xfs_invalidatepage);
+
+TRACE_EVENT(xfs_invalidatepage_range,
+ TP_PROTO(struct inode *inode, struct page *page, unsigned int off,
+ unsigned int len),
+ TP_ARGS(inode, page, off, len),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_ino_t, ino)
+ __field(pgoff_t, pgoff)
+ __field(loff_t, size)
+ __field(unsigned int, offset)
+ __field(unsigned int, length)
+ __field(int, delalloc)
+ __field(int, unwritten)
+ ),
+ TP_fast_assign(
+ int delalloc = -1, unwritten = -1;
+
+ if (page_has_buffers(page))
+ xfs_count_page_state(page, &delalloc, &unwritten);
+ __entry->dev = inode->i_sb->s_dev;
+ __entry->ino = XFS_I(inode)->i_ino;
+ __entry->pgoff = page_offset(page);
+ __entry->size = i_size_read(inode);
+ __entry->offset = off;
+ __entry->length = len;
+ __entry->delalloc = delalloc;
+ __entry->unwritten = unwritten;
+ ),
+ TP_printk("dev %d:%d ino 0x%llx pgoff 0x%lx size 0x%llx offset %x "
+ "length %x delalloc %d unwritten %d",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->ino,
+ __entry->pgoff,
+ __entry->size,
+ __entry->offset,
+ __entry->length,
+ __entry->delalloc,
+ __entry->unwritten)
+)

DECLARE_EVENT_CLASS(xfs_imap_class,
TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count,
--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2012-08-31 22:21:45

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH 09/15 v2] Revert "ext4: fix fsx truncate failure"

This reverts commit 189e868fa8fdca702eb9db9d8afc46b5cb9144c9.

This commit reintroduces the use of ext4_block_truncate_page() in ext4
truncate operation instead of ext4_discard_partial_page_buffers().

The statement in the commit description that the truncate operation only
zero block unaligned portion of the last page is not exactly right,
since truncate_pagecache_range() also zeroes and invalidate the unaligned
portion of the page. Then there is no need to zero and unmap it once more
and ext4_block_truncate_page() was doing the right job, although we
still need to update the buffer head containing the last block, which is
exactly what ext4_block_truncate_page() is doing.

Moreover the problem described in the commit is fixed more properly with
commit

15291164b22a357cb211b618adfef4fa82fc0de3
jbd2: clear BH_Delay & BH_Unwritten in journal_unmap_buffer

This was tested on ppc64 machine with block size of 1024 bytes without
any problems.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/extents.c | 13 ++-----------
fs/ext4/indirect.c | 13 ++-----------
2 files changed, 4 insertions(+), 22 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index f920383..8336e4e 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4234,7 +4234,6 @@ void ext4_ext_truncate(struct inode *inode)
struct super_block *sb = inode->i_sb;
ext4_lblk_t last_block;
handle_t *handle;
- loff_t page_len;
int err = 0;

/*
@@ -4251,16 +4250,8 @@ void ext4_ext_truncate(struct inode *inode)
if (IS_ERR(handle))
return;

- if (inode->i_size % PAGE_CACHE_SIZE != 0) {
- page_len = PAGE_CACHE_SIZE -
- (inode->i_size & (PAGE_CACHE_SIZE - 1));
-
- err = ext4_discard_partial_page_buffers(handle,
- mapping, inode->i_size, page_len, 0);
-
- if (err)
- goto out_stop;
- }
+ if (inode->i_size & (sb->s_blocksize - 1))
+ ext4_block_truncate_page(handle, mapping, inode->i_size);

if (ext4_orphan_add(handle, inode))
goto out_stop;
diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index 830e1b2..a082b30 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -1349,9 +1349,7 @@ void ext4_ind_truncate(struct inode *inode)
__le32 nr = 0;
int n = 0;
ext4_lblk_t last_block, max_block;
- loff_t page_len;
unsigned blocksize = inode->i_sb->s_blocksize;
- int err;

handle = start_transaction(inode);
if (IS_ERR(handle))
@@ -1362,16 +1360,9 @@ void ext4_ind_truncate(struct inode *inode)
max_block = (EXT4_SB(inode->i_sb)->s_bitmap_maxbytes + blocksize-1)
>> EXT4_BLOCK_SIZE_BITS(inode->i_sb);

- if (inode->i_size % PAGE_CACHE_SIZE != 0) {
- page_len = PAGE_CACHE_SIZE -
- (inode->i_size & (PAGE_CACHE_SIZE - 1));
-
- err = ext4_discard_partial_page_buffers(handle,
- mapping, inode->i_size, page_len, 0);
-
- if (err)
+ if (inode->i_size & (blocksize - 1))
+ if (ext4_block_truncate_page(handle, mapping, inode->i_size))
goto out_stop;
- }

if (last_block != max_block) {
n = ext4_block_to_path(inode, last_block, offsets, NULL);
--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2012-08-31 22:21:47

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH 11/15 v2] ext4: remove unused discard_partial_page_buffers

The discard_partial_page_buffers is no longer used anywhere so we can
simply remove it including the *_no_lock variant and
EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED define.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/ext4.h | 8 --
fs/ext4/inode.c | 206 -------------------------------------------------------
2 files changed, 0 insertions(+), 214 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 54ee9f3..47dba3c 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -585,11 +585,6 @@ enum {
#define EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER 0x0020

/*
- * Flags used by ext4_discard_partial_page_buffers
- */
-#define EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED 0x0001
-
-/*
* ioctl commands
*/
#define EXT4_IOC_GETFLAGS FS_IOC_GETFLAGS
@@ -2007,9 +2002,6 @@ extern int ext4_block_zero_page_range(handle_t *handle,
struct address_space *mapping, loff_t from, loff_t length);
extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
loff_t lstart, loff_t lend);
-extern int ext4_discard_partial_page_buffers(handle_t *handle,
- struct address_space *mapping, loff_t from,
- loff_t length, int flags);
extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
extern qsize_t *ext4_get_reserved_space(struct inode *inode);
extern void ext4_da_update_reserve_space(struct inode *inode,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d9c4a28..95111b4 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -139,9 +139,6 @@ static int ext4_set_bh_endio(struct buffer_head *bh, struct inode *inode);
static void ext4_end_io_buffer_write(struct buffer_head *bh, int uptodate);
static int __ext4_journalled_writepage(struct page *page, unsigned int len);
static int ext4_bh_delay_or_unwritten(handle_t *handle, struct buffer_head *bh);
-static int ext4_discard_partial_page_buffers_no_lock(handle_t *handle,
- struct inode *inode, struct page *page, loff_t from,
- loff_t length, int flags);

/*
* Test whether an inode is a fast symlink.
@@ -3287,209 +3284,6 @@ void ext4_set_aops(struct inode *inode)
}
}

-
-/*
- * ext4_discard_partial_page_buffers()
- * Wrapper function for ext4_discard_partial_page_buffers_no_lock.
- * This function finds and locks the page containing the offset
- * "from" and passes it to ext4_discard_partial_page_buffers_no_lock.
- * Calling functions that already have the page locked should call
- * ext4_discard_partial_page_buffers_no_lock directly.
- */
-int ext4_discard_partial_page_buffers(handle_t *handle,
- struct address_space *mapping, loff_t from,
- loff_t length, int flags)
-{
- struct inode *inode = mapping->host;
- struct page *page;
- int err = 0;
-
- page = find_or_create_page(mapping, from >> PAGE_CACHE_SHIFT,
- mapping_gfp_mask(mapping) & ~__GFP_FS);
- if (!page)
- return -ENOMEM;
-
- err = ext4_discard_partial_page_buffers_no_lock(handle, inode, page,
- from, length, flags);
-
- unlock_page(page);
- page_cache_release(page);
- return err;
-}
-
-/*
- * ext4_discard_partial_page_buffers_no_lock()
- * Zeros a page range of length 'length' starting from offset 'from'.
- * Buffer heads that correspond to the block aligned regions of the
- * zeroed range will be unmapped. Unblock aligned regions
- * will have the corresponding buffer head mapped if needed so that
- * that region of the page can be updated with the partial zero out.
- *
- * This function assumes that the page has already been locked. The
- * The range to be discarded must be contained with in the given page.
- * If the specified range exceeds the end of the page it will be shortened
- * to the end of the page that corresponds to 'from'. This function is
- * appropriate for updating a page and it buffer heads to be unmapped and
- * zeroed for blocks that have been either released, or are going to be
- * released.
- *
- * handle: The journal handle
- * inode: The files inode
- * page: A locked page that contains the offset "from"
- * from: The starting byte offset (from the begining of the file)
- * to begin discarding
- * len: The length of bytes to discard
- * flags: Optional flags that may be used:
- *
- * EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED
- * Only zero the regions of the page whose buffer heads
- * have already been unmapped. This flag is appropriate
- * for updateing the contents of a page whose blocks may
- * have already been released, and we only want to zero
- * out the regions that correspond to those released blocks.
- *
- * Returns zero on sucess or negative on failure.
- */
-static int ext4_discard_partial_page_buffers_no_lock(handle_t *handle,
- struct inode *inode, struct page *page, loff_t from,
- loff_t length, int flags)
-{
- ext4_fsblk_t index = from >> PAGE_CACHE_SHIFT;
- unsigned int offset = from & (PAGE_CACHE_SIZE-1);
- unsigned int blocksize, max, pos;
- ext4_lblk_t iblock;
- struct buffer_head *bh;
- int err = 0;
-
- blocksize = inode->i_sb->s_blocksize;
- max = PAGE_CACHE_SIZE - offset;
-
- if (index != page->index)
- return -EINVAL;
-
- /*
- * correct length if it does not fall between
- * 'from' and the end of the page
- */
- if (length > max || length < 0)
- length = max;
-
- iblock = index << (PAGE_CACHE_SHIFT - inode->i_sb->s_blocksize_bits);
-
- if (!page_has_buffers(page))
- create_empty_buffers(page, blocksize, 0);
-
- /* Find the buffer that contains "offset" */
- bh = page_buffers(page);
- pos = blocksize;
- while (offset >= pos) {
- bh = bh->b_this_page;
- iblock++;
- pos += blocksize;
- }
-
- pos = offset;
- while (pos < offset + length) {
- unsigned int end_of_block, range_to_discard;
-
- err = 0;
-
- /* The length of space left to zero and unmap */
- range_to_discard = offset + length - pos;
-
- /* The length of space until the end of the block */
- end_of_block = blocksize - (pos & (blocksize-1));
-
- /*
- * Do not unmap or zero past end of block
- * for this buffer head
- */
- if (range_to_discard > end_of_block)
- range_to_discard = end_of_block;
-
-
- /*
- * Skip this buffer head if we are only zeroing unampped
- * regions of the page
- */
- if (flags & EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED &&
- buffer_mapped(bh))
- goto next;
-
- /* If the range is block aligned, unmap */
- if (range_to_discard == blocksize) {
- clear_buffer_dirty(bh);
- bh->b_bdev = NULL;
- clear_buffer_mapped(bh);
- clear_buffer_req(bh);
- clear_buffer_new(bh);
- clear_buffer_delay(bh);
- clear_buffer_unwritten(bh);
- clear_buffer_uptodate(bh);
- zero_user(page, pos, range_to_discard);
- BUFFER_TRACE(bh, "Buffer discarded");
- goto next;
- }
-
- /*
- * If this block is not completely contained in the range
- * to be discarded, then it is not going to be released. Because
- * we need to keep this block, we need to make sure this part
- * of the page is uptodate before we modify it by writeing
- * partial zeros on it.
- */
- if (!buffer_mapped(bh)) {
- /*
- * Buffer head must be mapped before we can read
- * from the block
- */
- BUFFER_TRACE(bh, "unmapped");
- ext4_get_block(inode, iblock, bh, 0);
- /* unmapped? It's a hole - nothing to do */
- if (!buffer_mapped(bh)) {
- BUFFER_TRACE(bh, "still unmapped");
- goto next;
- }
- }
-
- /* Ok, it's mapped. Make sure it's up-to-date */
- if (PageUptodate(page))
- set_buffer_uptodate(bh);
-
- if (!buffer_uptodate(bh)) {
- err = -EIO;
- ll_rw_block(READ, 1, &bh);
- wait_on_buffer(bh);
- /* Uhhuh. Read error. Complain and punt.*/
- if (!buffer_uptodate(bh))
- goto next;
- }
-
- if (ext4_should_journal_data(inode)) {
- BUFFER_TRACE(bh, "get write access");
- err = ext4_journal_get_write_access(handle, bh);
- if (err)
- goto next;
- }
-
- zero_user(page, pos, range_to_discard);
-
- err = 0;
- if (ext4_should_journal_data(inode)) {
- err = ext4_handle_dirty_metadata(handle, inode, bh);
- } else
- mark_buffer_dirty(bh);
-
- BUFFER_TRACE(bh, "Partial buffer zeroed");
-next:
- bh = bh->b_this_page;
- iblock++;
- pos += range_to_discard;
- }
-
- return err;
-}
-
/*
* ext4_block_truncate_page() zeroes out a mapping from file offset `from'
* up to the end of the block which corresponds to `from'.
--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2012-08-31 22:21:51

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH 15/15 v2] ext4: Allow punch hole with bigalloc enabled

In commits 5f95d21fb6f2aaa52830e5b7fb405f6c71d3ab85 and
30bc2ec9598a1b156ad75217f2e7d4560efdeeab we've reworked punch_hole
implementation and there is noting holding us back from using punch hole
on file system with bigalloc feature enabled.

This has been tested with fsx and xfstests.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/inode.c | 5 -----
1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 95111b4..bc452d1 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3468,11 +3468,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
return -EOPNOTSUPP;
}

- if (EXT4_SB(inode->i_sb)->s_cluster_ratio > 1) {
- /* TODO: Add support for bigalloc file systems */
- return -EOPNOTSUPP;
- }
-
return ext4_ext_punch_hole(file, offset, length);
}

--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2012-08-31 22:21:49

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH 13/15 v2] ext4: update ext4_ext_remove_space trace point

Add "end" variable.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/extents.c | 6 +++---
include/trace/events/ext4.h | 21 ++++++++++++++-------
2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index f2a6174..83be6ad 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2571,7 +2571,7 @@ static int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start,
again:
ext4_ext_invalidate_cache(inode);

- trace_ext4_ext_remove_space(inode, start, depth);
+ trace_ext4_ext_remove_space(inode, start, end, depth);

/*
* Check if we are removing extents inside the extent tree. If that
@@ -2735,8 +2735,8 @@ cont:
}
}

- trace_ext4_ext_remove_space_done(inode, start, depth, partial_cluster,
- path->p_hdr->eh_entries);
+ trace_ext4_ext_remove_space_done(inode, start, end, depth,
+ partial_cluster, path->p_hdr->eh_entries);

/* If we still have something in the partial cluster and we have removed
* even the first extent, then we should free the blocks in the partial
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index ee7e11a..ed461d7 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -1999,14 +1999,16 @@ TRACE_EVENT(ext4_ext_rm_idx,
);

TRACE_EVENT(ext4_ext_remove_space,
- TP_PROTO(struct inode *inode, ext4_lblk_t start, int depth),
+ TP_PROTO(struct inode *inode, ext4_lblk_t start,
+ ext4_lblk_t end, int depth),

- TP_ARGS(inode, start, depth),
+ TP_ARGS(inode, start, end, depth),

TP_STRUCT__entry(
__field( ino_t, ino )
__field( dev_t, dev )
__field( ext4_lblk_t, start )
+ __field( ext4_lblk_t, end )
__field( int, depth )
),

@@ -2014,26 +2016,29 @@ TRACE_EVENT(ext4_ext_remove_space,
__entry->ino = inode->i_ino;
__entry->dev = inode->i_sb->s_dev;
__entry->start = start;
+ __entry->end = end;
__entry->depth = depth;
),

- TP_printk("dev %d,%d ino %lu since %u depth %d",
+ TP_printk("dev %d,%d ino %lu start %u end %u depth %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned) __entry->start,
+ (unsigned) __entry->end,
__entry->depth)
);

TRACE_EVENT(ext4_ext_remove_space_done,
- TP_PROTO(struct inode *inode, ext4_lblk_t start, int depth,
- ext4_lblk_t partial, unsigned short eh_entries),
+ TP_PROTO(struct inode *inode, ext4_lblk_t start, ext4_lblk_t end,
+ int depth, ext4_lblk_t partial, unsigned short eh_entries),

- TP_ARGS(inode, start, depth, partial, eh_entries),
+ TP_ARGS(inode, start, end, depth, partial, eh_entries),

TP_STRUCT__entry(
__field( ino_t, ino )
__field( dev_t, dev )
__field( ext4_lblk_t, start )
+ __field( ext4_lblk_t, end )
__field( int, depth )
__field( ext4_lblk_t, partial )
__field( unsigned short, eh_entries )
@@ -2043,16 +2048,18 @@ TRACE_EVENT(ext4_ext_remove_space_done,
__entry->ino = inode->i_ino;
__entry->dev = inode->i_sb->s_dev;
__entry->start = start;
+ __entry->end = end;
__entry->depth = depth;
__entry->partial = partial;
__entry->eh_entries = eh_entries;
),

- TP_printk("dev %d,%d ino %lu since %u depth %d partial %u "
+ TP_printk("dev %d,%d ino %lu start %u end %u depth %d partial %u "
"remaining_entries %u",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned) __entry->start,
+ (unsigned) __entry->end,
__entry->depth,
(unsigned) __entry->partial,
(unsigned short) __entry->eh_entries)
--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2012-08-31 22:21:50

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH 14/15 v2] ext4: make punch hole code path work with bigalloc

Currently punch hole is disabled in file systems with bigalloc
feature enabled. However the recent changes in punch hole patch should
make it easier to support punching holes on bigalloc enabled file
systems.

This commit changes partial_cluster handling in ext4_remove_blocks(),
ext4_ext_rm_leaf() and ext4_ext_remove_space(). Currently
partial_cluster is unsigned long long type and it makes sure that we
will free the partial cluster if all extents has been released from that
cluster. However it has been specifically designed only for truncate.

With punch hole we can be freeing just some extents in the cluster
leaving the rest untouched. So we have to make sure that we will notice
cluster which still has some extents. To do this I've changed
partial_cluster to be signed long long type. The only scenario where
this could be a problem is when cluster_size == block size, however in
that case there would not be any partial clusters so we're safe. For
bigger clusters the signed type is enough. Now we use the negative value
in partial_cluster to mark such cluster used, hence we know that we must
not free it even if all other extents has been freed from such cluster.

This scenario can be described in simple diagram:

|FFF...FF..FF.UUU|
^----------^
punch hole

. - free space
| - cluster boundary
F - freed extent
U - used extent

Also update respective tracepoints to use signed long long type for
partial_cluster.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/extents.c | 69 +++++++++++++++++++++++++++++++-----------
include/trace/events/ext4.h | 25 ++++++++-------
2 files changed, 64 insertions(+), 30 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 83be6ad..f4805fd 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2268,7 +2268,7 @@ int ext4_ext_index_trans_blocks(struct inode *inode, int nrblocks, int chunk)

static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
struct ext4_extent *ex,
- ext4_fsblk_t *partial_cluster,
+ signed long long *partial_cluster,
ext4_lblk_t from, ext4_lblk_t to)
{
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
@@ -2294,7 +2294,8 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
* partial cluster here.
*/
pblk = ext4_ext_pblock(ex) + ee_len - 1;
- if (*partial_cluster && (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
+ if ((*partial_cluster > 0) &&
+ (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
ext4_free_blocks(handle, inode, NULL,
EXT4_C2B(sbi, *partial_cluster),
sbi->s_cluster_ratio, flags);
@@ -2320,23 +2321,41 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
&& to == le32_to_cpu(ex->ee_block) + ee_len - 1) {
/* tail removal */
ext4_lblk_t num;
+ unsigned int unaligned;

num = le32_to_cpu(ex->ee_block) + ee_len - from;
pblk = ext4_ext_pblock(ex) + ee_len - num;
- ext_debug("free last %u blocks starting %llu\n", num, pblk);
+ /*
+ * Usually we want to free partial cluster at the end of the
+ * extent, except for the situation when the cluster is still
+ * used by any other extent (partial_cluster is negative).
+ */
+ if (*partial_cluster < 0 &&
+ -(*partial_cluster) == EXT4_B2C(sbi, pblk + num - 1))
+ flags |= EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER;
+
+ ext_debug("free last %u blocks starting %llu partial %lld\n",
+ num, pblk, *partial_cluster);
ext4_free_blocks(handle, inode, NULL, pblk, num, flags);
/*
* If the block range to be freed didn't start at the
* beginning of a cluster, and we removed the entire
- * extent, save the partial cluster here, since we
- * might need to delete if we determine that the
- * truncate operation has removed all of the blocks in
- * the cluster.
+ * extent and the cluster is not used by any other extent,
+ * save the partial cluster here, since we might need to
+ * delete if we determine that the truncate operation has
+ * removed all of the blocks in the cluster.
+ *
+ * On the other hand, if we did not manage to free the whole
+ * extent, we have to mark the cluster as used (store negative
+ * cluster number in partial_cluster).
*/
- if (pblk & (sbi->s_cluster_ratio - 1) &&
- (ee_len == num))
+ unaligned = pblk & (sbi->s_cluster_ratio - 1);
+ if (unaligned && (ee_len == num) &&
+ (*partial_cluster != -((long long)EXT4_B2C(sbi, pblk))))
*partial_cluster = EXT4_B2C(sbi, pblk);
- else
+ else if (unaligned)
+ *partial_cluster = -((long long)EXT4_B2C(sbi, pblk));
+ else if (*partial_cluster > 0)
*partial_cluster = 0;
} else
ext4_error(sbi->s_sb, "strange request: removal(2) "
@@ -2354,12 +2373,16 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
* @handle: The journal handle
* @inode: The files inode
* @path: The path to the leaf
+ * @partial_cluster: The cluster which we'll have to free if all extents
+ * has been released from it. It gets negative in case
+ * that the cluster is still used.
* @start: The first block to remove
* @end: The last block to remove
*/
static int
ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
- struct ext4_ext_path *path, ext4_fsblk_t *partial_cluster,
+ struct ext4_ext_path *path,
+ signed long long *partial_cluster,
ext4_lblk_t start, ext4_lblk_t end)
{
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
@@ -2372,6 +2395,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
unsigned short ex_ee_len;
unsigned uninitialized = 0;
struct ext4_extent *ex;
+ ext4_fsblk_t pblk;

/* the header must be checked already in ext4_ext_remove_space() */
ext_debug("truncate since %u in leaf to %u\n", start, end);
@@ -2410,6 +2434,16 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,

/* If this extent is beyond the end of the hole, skip it */
if (end < ex_ee_block) {
+ /*
+ * We're going to skip this extent and move to another,
+ * so if this extent is not cluster aligned we have
+ * to mark the current cluster as used to avoid
+ * accidentally freeing it later on
+ */
+ pblk = ext4_ext_pblock(ex);
+ if (pblk & (sbi->s_cluster_ratio - 1))
+ *partial_cluster =
+ -((long long)EXT4_B2C(sbi, pblk));
ex--;
ex_ee_block = le32_to_cpu(ex->ee_block);
ex_ee_len = ext4_ext_get_actual_len(ex);
@@ -2485,7 +2519,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
sizeof(struct ext4_extent));
}
le16_add_cpu(&eh->eh_entries, -1);
- } else
+ } else if (*partial_cluster > 0)
*partial_cluster = 0;

err = ext4_ext_dirty(handle, inode, path + depth);
@@ -2503,11 +2537,10 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
err = ext4_ext_correct_indexes(handle, inode, path);

/*
- * If there is still a entry in the leaf node, check to see if
- * it references the partial cluster. This is the only place
- * where it could; if it doesn't, we can free the cluster.
+ * Free the partial cluster only if the current extent does not
+ * reference it. Otherwise we might free used cluster.
*/
- if (*partial_cluster && ex >= EXT_FIRST_EXTENT(eh) &&
+ if (*partial_cluster > 0 &&
(EXT4_B2C(sbi, ext4_ext_pblock(ex) + ex_ee_len - 1) !=
*partial_cluster)) {
int flags = EXT4_FREE_BLOCKS_FORGET;
@@ -2557,7 +2590,7 @@ static int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start,
struct super_block *sb = inode->i_sb;
int depth = ext_depth(inode);
struct ext4_ext_path *path = NULL;
- ext4_fsblk_t partial_cluster = 0;
+ signed long long partial_cluster = 0;
handle_t *handle;
int i = 0, err;

@@ -2741,7 +2774,7 @@ cont:
/* If we still have something in the partial cluster and we have removed
* even the first extent, then we should free the blocks in the partial
* cluster as well. */
- if (partial_cluster && path->p_hdr->eh_entries == 0) {
+ if (partial_cluster > 0 && path->p_hdr->eh_entries == 0) {
int flags = EXT4_FREE_BLOCKS_FORGET;

if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index ed461d7..4b3a8e9 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -1900,7 +1900,7 @@ TRACE_EVENT(ext4_ext_show_extent,
TRACE_EVENT(ext4_remove_blocks,
TP_PROTO(struct inode *inode, struct ext4_extent *ex,
ext4_lblk_t from, ext4_fsblk_t to,
- ext4_fsblk_t partial_cluster),
+ long long int partial_cluster),

TP_ARGS(inode, ex, from, to, partial_cluster),

@@ -1912,7 +1912,7 @@ TRACE_EVENT(ext4_remove_blocks,
__field( unsigned short, ee_len )
__field( ext4_lblk_t, from )
__field( ext4_lblk_t, to )
- __field( ext4_fsblk_t, partial )
+ __field( long long int, partial )
),

TP_fast_assign(
@@ -1927,7 +1927,7 @@ TRACE_EVENT(ext4_remove_blocks,
),

TP_printk("dev %d,%d ino %lu extent [%u(%llu), %u]"
- "from %u to %u partial_cluster %u",
+ "from %u to %u partial_cluster %lld",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned) __entry->ee_lblk,
@@ -1935,12 +1935,13 @@ TRACE_EVENT(ext4_remove_blocks,
(unsigned short) __entry->ee_len,
(unsigned) __entry->from,
(unsigned) __entry->to,
- (unsigned) __entry->partial)
+ (long long int) __entry->partial)
);

TRACE_EVENT(ext4_ext_rm_leaf,
TP_PROTO(struct inode *inode, ext4_lblk_t start,
- struct ext4_extent *ex, ext4_fsblk_t partial_cluster),
+ struct ext4_extent *ex,
+ long long int partial_cluster),

TP_ARGS(inode, start, ex, partial_cluster),

@@ -1951,7 +1952,7 @@ TRACE_EVENT(ext4_ext_rm_leaf,
__field( ext4_lblk_t, ee_lblk )
__field( ext4_fsblk_t, ee_pblk )
__field( short, ee_len )
- __field( ext4_fsblk_t, partial )
+ __field( long long int, partial )
),

TP_fast_assign(
@@ -1965,14 +1966,14 @@ TRACE_EVENT(ext4_ext_rm_leaf,
),

TP_printk("dev %d,%d ino %lu start_lblk %u last_extent [%u(%llu), %u]"
- "partial_cluster %u",
+ "partial_cluster %lld",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned) __entry->start,
(unsigned) __entry->ee_lblk,
(unsigned long long) __entry->ee_pblk,
(unsigned short) __entry->ee_len,
- (unsigned) __entry->partial)
+ (long long int) __entry->partial)
);

TRACE_EVENT(ext4_ext_rm_idx,
@@ -2030,7 +2031,7 @@ TRACE_EVENT(ext4_ext_remove_space,

TRACE_EVENT(ext4_ext_remove_space_done,
TP_PROTO(struct inode *inode, ext4_lblk_t start, ext4_lblk_t end,
- int depth, ext4_lblk_t partial, unsigned short eh_entries),
+ int depth, long long int partial, unsigned short eh_entries),

TP_ARGS(inode, start, end, depth, partial, eh_entries),

@@ -2040,7 +2041,7 @@ TRACE_EVENT(ext4_ext_remove_space_done,
__field( ext4_lblk_t, start )
__field( ext4_lblk_t, end )
__field( int, depth )
- __field( ext4_lblk_t, partial )
+ __field( long long int, partial )
__field( unsigned short, eh_entries )
),

@@ -2054,14 +2055,14 @@ TRACE_EVENT(ext4_ext_remove_space_done,
__entry->eh_entries = eh_entries;
),

- TP_printk("dev %d,%d ino %lu start %u end %u depth %d partial %u "
+ TP_printk("dev %d,%d ino %lu start %u end %u depth %d partial %lld "
"remaining_entries %u",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned) __entry->start,
(unsigned) __entry->end,
__entry->depth,
- (unsigned) __entry->partial,
+ (long long int) __entry->partial,
(unsigned short) __entry->eh_entries)
);

--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2012-08-31 22:21:46

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH 10/15 v2] ext4: use ext4_zero_partial_blocks in punch_hole

We're doing to get rid of ext4_discard_partial_page_buffers() since it is
duplicating some code and also partially duplicating work of
truncate_pagecache_range(), moreover the old implementation was much
clearer.

Now when the truncate_inode_pages_range() can handle truncating non page
aligned regions we can use this to invalidate and zero out block aligned
region of the punched out range and then use ext4_block_truncate_page()
to zero the unaligned blocks on the start and end of the range. This
will greatly simplify the punch hole code. Moreover after this commit we
can get rid of the ext4_discard_partial_page_buffers() completely.

This has been tested on ppc64 with 1k block size with fsx and xfstests
without any problems.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/ext4.h | 2 +
fs/ext4/extents.c | 80 ++++++-----------------------------------------------
fs/ext4/inode.c | 31 ++++++++++++++++++++
3 files changed, 42 insertions(+), 71 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index eb58808..54ee9f3 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2005,6 +2005,8 @@ extern int ext4_block_truncate_page(handle_t *handle,
struct address_space *mapping, loff_t from);
extern int ext4_block_zero_page_range(handle_t *handle,
struct address_space *mapping, loff_t from, loff_t length);
+extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
+ loff_t lstart, loff_t lend);
extern int ext4_discard_partial_page_buffers(handle_t *handle,
struct address_space *mapping, loff_t from,
loff_t length, int flags);
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 8336e4e..6c4a7fa 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4754,9 +4754,7 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
struct inode *inode = file->f_path.dentry->d_inode;
struct super_block *sb = inode->i_sb;
ext4_lblk_t first_block, stop_block;
- struct address_space *mapping = inode->i_mapping;
handle_t *handle;
- loff_t first_page, last_page, page_len;
loff_t first_page_offset, last_page_offset;
int credits, err = 0;

@@ -4776,17 +4774,13 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
offset;
}

- first_page = (offset + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
- last_page = (offset + length) >> PAGE_CACHE_SHIFT;
-
- first_page_offset = first_page << PAGE_CACHE_SHIFT;
- last_page_offset = last_page << PAGE_CACHE_SHIFT;
+ first_page_offset = round_up(offset, sb->s_blocksize);
+ last_page_offset = round_down((offset + length), sb->s_blocksize) - 1;

- /* Now release the pages */
- if (last_page_offset > first_page_offset) {
+ /* Now release the pages and zero block aligned part of pages*/
+ if (last_page_offset > first_page_offset)
truncate_pagecache_range(inode, first_page_offset,
- last_page_offset - 1);
- }
+ last_page_offset);

/* finish any pending end_io work */
ext4_flush_completed_IO(inode);
@@ -4802,66 +4796,10 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
if (err)
goto out1;

- /*
- * Now we need to zero out the non-page-aligned data in the
- * pages at the start and tail of the hole, and unmap the buffer
- * heads for the block aligned regions of the page that were
- * completely zeroed.
- */
- if (first_page > last_page) {
- /*
- * If the file space being truncated is contained within a page
- * just zero out and unmap the middle of that page
- */
- err = ext4_discard_partial_page_buffers(handle,
- mapping, offset, length, 0);
-
- if (err)
- goto out;
- } else {
- /*
- * zero out and unmap the partial page that contains
- * the start of the hole
- */
- page_len = first_page_offset - offset;
- if (page_len > 0) {
- err = ext4_discard_partial_page_buffers(handle, mapping,
- offset, page_len, 0);
- if (err)
- goto out;
- }
-
- /*
- * zero out and unmap the partial page that contains
- * the end of the hole
- */
- page_len = offset + length - last_page_offset;
- if (page_len > 0) {
- err = ext4_discard_partial_page_buffers(handle, mapping,
- last_page_offset, page_len, 0);
- if (err)
- goto out;
- }
- }
-
- /*
- * If i_size is contained in the last page, we need to
- * unmap and zero the partial page after i_size
- */
- if (inode->i_size >> PAGE_CACHE_SHIFT == last_page &&
- inode->i_size % PAGE_CACHE_SIZE != 0) {
-
- page_len = PAGE_CACHE_SIZE -
- (inode->i_size & (PAGE_CACHE_SIZE - 1));
-
- if (page_len > 0) {
- err = ext4_discard_partial_page_buffers(handle,
- mapping, inode->i_size, page_len, 0);
-
- if (err)
- goto out;
- }
- }
+ err = ext4_zero_partial_blocks(handle, inode, offset,
+ offset + length - 1);
+ if (err)
+ goto out;

first_block = (offset + sb->s_blocksize - 1) >>
EXT4_BLOCK_SIZE_BITS(sb);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3c907f6..d9c4a28 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3610,6 +3610,37 @@ unlock:
return err;
}

+int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
+ loff_t lstart, loff_t lend)
+{
+ struct super_block *sb = inode->i_sb;
+ struct address_space *mapping = inode->i_mapping;
+ unsigned partial = lstart & (sb->s_blocksize - 1);
+ ext4_fsblk_t start = lstart >> sb->s_blocksize_bits;
+ ext4_fsblk_t end = lend >> sb->s_blocksize_bits;
+ int err = 0;
+
+ /* Handle partial zero within the single block */
+ if (start == end) {
+ err = ext4_block_zero_page_range(handle, mapping,
+ lstart, lend - lstart + 1);
+ return err;
+ }
+ /* Handle partial zero out on the start of the range */
+ if (partial) {
+ err = ext4_block_zero_page_range(handle, mapping,
+ lstart, sb->s_blocksize);
+ if (err)
+ return err;
+ }
+ /* Handle partial zero out on the end of the range */
+ partial = lend & (sb->s_blocksize - 1);
+ if (partial != sb->s_blocksize - 1)
+ err = ext4_block_zero_page_range(handle, mapping,
+ lend - partial, partial + 1);
+ return err;
+}
+
int ext4_can_truncate(struct inode *inode)
{
if (S_ISREG(inode->i_mode))
--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2012-08-31 22:21:48

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH 12/15 v2] ext4: remove unused code from ext4_remove_blocks()

The "head removal" branch in the condition is never used in any code
path in ext4 since the function only caller ext4_ext_rm_leaf() will make
sure that the extent is properly split before removing blocks. Note that
there is a bug in this branch anyway.

This commit removes the unused code completely and makes use of
ext4_error() instead of printk if dubious range is provided.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/extents.c | 21 ++++-----------------
1 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 6c4a7fa..f2a6174 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2338,23 +2338,10 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
*partial_cluster = EXT4_B2C(sbi, pblk);
else
*partial_cluster = 0;
- } else if (from == le32_to_cpu(ex->ee_block)
- && to <= le32_to_cpu(ex->ee_block) + ee_len - 1) {
- /* head removal */
- ext4_lblk_t num;
- ext4_fsblk_t start;
-
- num = to - from;
- start = ext4_ext_pblock(ex);
-
- ext_debug("free first %u blocks starting %llu\n", num, start);
- ext4_free_blocks(handle, inode, NULL, start, num, flags);
-
- } else {
- printk(KERN_INFO "strange request: removal(2) "
- "%u-%u from %u:%u\n",
- from, to, le32_to_cpu(ex->ee_block), ee_len);
- }
+ } else
+ ext4_error(sbi->s_sb, "strange request: removal(2) "
+ "%u-%u from %u:%u\n",
+ from, to, le32_to_cpu(ex->ee_block), ee_len);
return 0;
}

--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2012-09-04 14:52:15

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 02/15 v2] jbd2: implement jbd2_journal_invalidatepage_range

On Fri, Aug 31, 2012 at 06:21:38PM -0400, Lukas Czerner wrote:
> mm now supports invalidatepage_range address space operation and there
> are two file system using jbd2 also implementing punch hole feature
> which can benefit from this. We need to implement the same thing for
> jbd2 layer in order to allow those file system take benefit of this
> functionality.
>
> With new function jbd2_journal_invalidatepage_range() we can now specify
> length to invalidate, rather than assuming invalidate to the end of the
> page.
>
> Signed-off-by: Lukas Czerner <[email protected]>
> ---
> fs/jbd2/journal.c | 1 +
> fs/jbd2/transaction.c | 19 +++++++++++++++++--
> include/linux/jbd2.h | 2 ++
> 3 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index e149b99..e4618e9 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -86,6 +86,7 @@ EXPORT_SYMBOL(jbd2_journal_force_commit_nested);
> EXPORT_SYMBOL(jbd2_journal_wipe);
> EXPORT_SYMBOL(jbd2_journal_blocks_per_page);
> EXPORT_SYMBOL(jbd2_journal_invalidatepage);
> +EXPORT_SYMBOL(jbd2_journal_invalidatepage_range);
> EXPORT_SYMBOL(jbd2_journal_try_to_free_buffers);
> EXPORT_SYMBOL(jbd2_journal_force_commit);
> EXPORT_SYMBOL(jbd2_journal_file_inode);
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index fb1ab953..65c1374 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -1993,10 +1993,20 @@ zap_buffer_unlocked:
> *
> */
> void jbd2_journal_invalidatepage(journal_t *journal,
> - struct page *page,
> - unsigned long offset)
> + struct page *page,
> + unsigned long offset)
> +{
> + jbd2_journal_invalidatepage_range(journal, page, offset,
> + PAGE_CACHE_SIZE - offset);
> +}
> +
> +void jbd2_journal_invalidatepage_range(journal_t *journal,
> + struct page *page,
> + unsigned int offset,
> + unsigned int length)
> {
> struct buffer_head *head, *bh, *next;
> + unsigned int stop = offset + length;
> unsigned int curr_off = 0;
> int may_free = 1;
>
> @@ -2005,6 +2015,8 @@ void jbd2_journal_invalidatepage(journal_t *journal,
> if (!page_has_buffers(page))
> return;
>
> + BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);

This misses e.g. length == (unsigned int)(-1), offset = 1. Could make
it obvious with:

BUG_ON(offset > PAGE_CACHE_SIZE || length > PAGE_CACHE_SIZE);
BUG_ON(stop > PAGE_CACHE_SIZE);

Or is that overkill?

--b.

> +
> /* We will potentially be playing with lists other than just the
> * data lists (especially for journaled data mode), so be
> * cautious in our locking. */
> @@ -2014,6 +2026,9 @@ void jbd2_journal_invalidatepage(journal_t *journal,
> unsigned int next_off = curr_off + bh->b_size;
> next = bh->b_this_page;
>
> + if (next_off > stop)
> + return;
> +
> if (offset <= curr_off) {
> /* This block is wholly outside the truncation point */
> lock_buffer(bh);
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index 3efc43f..21288fa 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -1101,6 +1101,8 @@ extern int jbd2_journal_forget (handle_t *, struct buffer_head *);
> extern void journal_sync_buffer (struct buffer_head *);
> extern void jbd2_journal_invalidatepage(journal_t *,
> struct page *, unsigned long);
> +extern void jbd2_journal_invalidatepage_range(journal_t *, struct page *,
> + unsigned int, unsigned int);
> extern int jbd2_journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
> extern int jbd2_journal_stop(handle_t *);
> extern int jbd2_journal_flush (journal_t *);
> --
> 1.7.7.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2012-09-04 15:37:13

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH 02/15 v2] jbd2: implement jbd2_journal_invalidatepage_range

On Tue, 4 Sep 2012, J. Bruce Fields wrote:

> Date: Tue, 4 Sep 2012 10:52:13 -0400
> From: J. Bruce Fields <[email protected]>
> To: Lukas Czerner <[email protected]>
> Cc: [email protected], [email protected], [email protected],
> [email protected], [email protected]
> Subject: Re: [PATCH 02/15 v2] jbd2: implement
> jbd2_journal_invalidatepage_range
>
> On Fri, Aug 31, 2012 at 06:21:38PM -0400, Lukas Czerner wrote:
> > mm now supports invalidatepage_range address space operation and there
> > are two file system using jbd2 also implementing punch hole feature
> > which can benefit from this. We need to implement the same thing for
> > jbd2 layer in order to allow those file system take benefit of this
> > functionality.
> >
> > With new function jbd2_journal_invalidatepage_range() we can now specify
> > length to invalidate, rather than assuming invalidate to the end of the
> > page.
> >
> > Signed-off-by: Lukas Czerner <[email protected]>
> > ---
> > fs/jbd2/journal.c | 1 +
> > fs/jbd2/transaction.c | 19 +++++++++++++++++--
> > include/linux/jbd2.h | 2 ++
> > 3 files changed, 20 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> > index e149b99..e4618e9 100644
> > --- a/fs/jbd2/journal.c
> > +++ b/fs/jbd2/journal.c
> > @@ -86,6 +86,7 @@ EXPORT_SYMBOL(jbd2_journal_force_commit_nested);
> > EXPORT_SYMBOL(jbd2_journal_wipe);
> > EXPORT_SYMBOL(jbd2_journal_blocks_per_page);
> > EXPORT_SYMBOL(jbd2_journal_invalidatepage);
> > +EXPORT_SYMBOL(jbd2_journal_invalidatepage_range);
> > EXPORT_SYMBOL(jbd2_journal_try_to_free_buffers);
> > EXPORT_SYMBOL(jbd2_journal_force_commit);
> > EXPORT_SYMBOL(jbd2_journal_file_inode);
> > diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> > index fb1ab953..65c1374 100644
> > --- a/fs/jbd2/transaction.c
> > +++ b/fs/jbd2/transaction.c
> > @@ -1993,10 +1993,20 @@ zap_buffer_unlocked:
> > *
> > */
> > void jbd2_journal_invalidatepage(journal_t *journal,
> > - struct page *page,
> > - unsigned long offset)
> > + struct page *page,
> > + unsigned long offset)
> > +{
> > + jbd2_journal_invalidatepage_range(journal, page, offset,
> > + PAGE_CACHE_SIZE - offset);
> > +}
> > +
> > +void jbd2_journal_invalidatepage_range(journal_t *journal,
> > + struct page *page,
> > + unsigned int offset,
> > + unsigned int length)
> > {
> > struct buffer_head *head, *bh, *next;
> > + unsigned int stop = offset + length;
> > unsigned int curr_off = 0;
> > int may_free = 1;
> >
> > @@ -2005,6 +2015,8 @@ void jbd2_journal_invalidatepage(journal_t *journal,
> > if (!page_has_buffers(page))
> > return;
> >
> > + BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
>
> This misses e.g. length == (unsigned int)(-1), offset = 1. Could make
> it obvious with:

Hmm.. So if length = -1 (e.g. UINT_MAX) and offset = 1 then:

offset + length = 0

so

length is bigger than (offset + length) right ? Speaking in numbers:

length = 4294967295
offset = 1
stop = length + offset = 0

so (0 < 4294967295) is true and we'll BUG() on this, right ?

Am I missing something ?

-Lukas

>
> BUG_ON(offset > PAGE_CACHE_SIZE || length > PAGE_CACHE_SIZE);
> BUG_ON(stop > PAGE_CACHE_SIZE);
>
> Or is that overkill?
>
> --b.
>
> > +
> > /* We will potentially be playing with lists other than just the
> > * data lists (especially for journaled data mode), so be
> > * cautious in our locking. */
> > @@ -2014,6 +2026,9 @@ void jbd2_journal_invalidatepage(journal_t *journal,
> > unsigned int next_off = curr_off + bh->b_size;
> > next = bh->b_this_page;
> >
> > + if (next_off > stop)
> > + return;
> > +
> > if (offset <= curr_off) {
> > /* This block is wholly outside the truncation point */
> > lock_buffer(bh);
> > diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> > index 3efc43f..21288fa 100644
> > --- a/include/linux/jbd2.h
> > +++ b/include/linux/jbd2.h
> > @@ -1101,6 +1101,8 @@ extern int jbd2_journal_forget (handle_t *, struct buffer_head *);
> > extern void journal_sync_buffer (struct buffer_head *);
> > extern void jbd2_journal_invalidatepage(journal_t *,
> > struct page *, unsigned long);
> > +extern void jbd2_journal_invalidatepage_range(journal_t *, struct page *,
> > + unsigned int, unsigned int);
> > extern int jbd2_journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
> > extern int jbd2_journal_stop(handle_t *);
> > extern int jbd2_journal_flush (journal_t *);
> > --
> > 1.7.7.6
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2012-09-04 17:44:26

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 02/15 v2] jbd2: implement jbd2_journal_invalidatepage_range

On Tue, Sep 04, 2012 at 11:37:13AM -0400, Lukáš Czerner wrote:
> On Tue, 4 Sep 2012, J. Bruce Fields wrote:
>
> > Date: Tue, 4 Sep 2012 10:52:13 -0400
> > From: J. Bruce Fields <[email protected]>
> > To: Lukas Czerner <[email protected]>
> > Cc: [email protected], [email protected], [email protected],
> > [email protected], [email protected]
> > Subject: Re: [PATCH 02/15 v2] jbd2: implement
> > jbd2_journal_invalidatepage_range
> >
> > On Fri, Aug 31, 2012 at 06:21:38PM -0400, Lukas Czerner wrote:
> > > mm now supports invalidatepage_range address space operation and there
> > > are two file system using jbd2 also implementing punch hole feature
> > > which can benefit from this. We need to implement the same thing for
> > > jbd2 layer in order to allow those file system take benefit of this
> > > functionality.
> > >
> > > With new function jbd2_journal_invalidatepage_range() we can now specify
> > > length to invalidate, rather than assuming invalidate to the end of the
> > > page.
> > >
> > > Signed-off-by: Lukas Czerner <[email protected]>
> > > ---
> > > fs/jbd2/journal.c | 1 +
> > > fs/jbd2/transaction.c | 19 +++++++++++++++++--
> > > include/linux/jbd2.h | 2 ++
> > > 3 files changed, 20 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> > > index e149b99..e4618e9 100644
> > > --- a/fs/jbd2/journal.c
> > > +++ b/fs/jbd2/journal.c
> > > @@ -86,6 +86,7 @@ EXPORT_SYMBOL(jbd2_journal_force_commit_nested);
> > > EXPORT_SYMBOL(jbd2_journal_wipe);
> > > EXPORT_SYMBOL(jbd2_journal_blocks_per_page);
> > > EXPORT_SYMBOL(jbd2_journal_invalidatepage);
> > > +EXPORT_SYMBOL(jbd2_journal_invalidatepage_range);
> > > EXPORT_SYMBOL(jbd2_journal_try_to_free_buffers);
> > > EXPORT_SYMBOL(jbd2_journal_force_commit);
> > > EXPORT_SYMBOL(jbd2_journal_file_inode);
> > > diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> > > index fb1ab953..65c1374 100644
> > > --- a/fs/jbd2/transaction.c
> > > +++ b/fs/jbd2/transaction.c
> > > @@ -1993,10 +1993,20 @@ zap_buffer_unlocked:
> > > *
> > > */
> > > void jbd2_journal_invalidatepage(journal_t *journal,
> > > - struct page *page,
> > > - unsigned long offset)
> > > + struct page *page,
> > > + unsigned long offset)
> > > +{
> > > + jbd2_journal_invalidatepage_range(journal, page, offset,
> > > + PAGE_CACHE_SIZE - offset);
> > > +}
> > > +
> > > +void jbd2_journal_invalidatepage_range(journal_t *journal,
> > > + struct page *page,
> > > + unsigned int offset,
> > > + unsigned int length)
> > > {
> > > struct buffer_head *head, *bh, *next;
> > > + unsigned int stop = offset + length;
> > > unsigned int curr_off = 0;
> > > int may_free = 1;
> > >
> > > @@ -2005,6 +2015,8 @@ void jbd2_journal_invalidatepage(journal_t *journal,
> > > if (!page_has_buffers(page))
> > > return;
> > >
> > > + BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
> >
> > This misses e.g. length == (unsigned int)(-1), offset = 1. Could make
> > it obvious with:
>
> Hmm.. So if length = -1 (e.g. UINT_MAX) and offset = 1 then:
>
> offset + length = 0
>
> so
>
> length is bigger than (offset + length) right ? Speaking in numbers:
>
> length = 4294967295
> offset = 1
> stop = length + offset = 0
>
> so (0 < 4294967295) is true and we'll BUG() on this, right ?
>
> Am I missing something ?

Gah, no, I just wasn't thinking straight: the only way offset or length
could individually be greater than PAGE_CACHE_SIZE while their sum is
less would be if their sum overflows, in which case the second condition
(stop < length) would trigger. So the two conditions are enough.

--b.

2012-09-04 23:43:16

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 01/15 v2] mm: add invalidatepage_range address space operation

On Fri, 31 Aug 2012 18:21:37 -0400
Lukas Czerner <[email protected]> wrote:

> Currently there is no way to truncate partial page where the end
> truncate point is not at the end of the page. This is because it was not
> needed and the functionality was enough for file system truncate
> operation to work properly. However more file systems now support punch
> hole feature and it can benefit from mm supporting truncating page just
> up to the certain point.
>
> Specifically, with this functionality truncate_inode_pages_range() can
> be changed so it supports truncating partial page at the end of the
> range (currently it will BUG_ON() if 'end' is not at the end of the
> page).
>
> This commit add new address space operation invalidatepage_range which
> allows specifying length of bytes to invalidate, rather than assuming
> truncate to the end of the page. It also introduce
> block_invalidatepage_range() and do_invalidatepage)range() functions for
> exactly the same reason.
>
> The caller does not have to implement both aops (invalidatepage and
> invalidatepage_range) and the latter is preferred. The old method will be
> used only if invalidatepage_range is not implemented by the caller.
>
> ...
>
> +/**
> + * do_invalidatepage_range - invalidate range of the page
> + *
> + * @page: the page which is affected
> + * @offset: start of the range to invalidate
> + * @length: length of the range to invalidate
> + */
> +void do_invalidatepage_range(struct page *page, unsigned int offset,
> + unsigned int length)
> +{
> + void (*invalidatepage_range)(struct page *, unsigned int,
> + unsigned int);
> void (*invalidatepage)(struct page *, unsigned long);
> +
> + /*
> + * Try invalidatepage_range first
> + */
> + invalidatepage_range = page->mapping->a_ops->invalidatepage_range;
> + if (invalidatepage_range) {
> + (*invalidatepage_range)(page, offset, length);
> + return;
> + }
> +
> + /*
> + * When only invalidatepage is registered length + offset must be
> + * PAGE_CACHE_SIZE
> + */
> invalidatepage = page->mapping->a_ops->invalidatepage;
> + if (invalidatepage) {
> + BUG_ON(length + offset != PAGE_CACHE_SIZE);
> + (*invalidatepage)(page, offset);
> + }
> #ifdef CONFIG_BLOCK
> - if (!invalidatepage)
> - invalidatepage = block_invalidatepage;
> + if (!invalidatepage_range && !invalidatepage)
> + block_invalidatepage_range(page, offset, length);
> #endif
> - if (invalidatepage)
> - (*invalidatepage)(page, offset);
> }

This interface is ... strange. If the caller requests a
non-page-aligned invalidateion against an fs which doesn't implement
->invalidatepage_range then the kernel goes BUG. So the caller must
know beforehand that the underlying fs _does_ implement
->invalidatepage_range.

For practical purposes, this implies that invalidation of a
non-page-aligned region will only be performed by fs code, because the
fs implicitly knows that it implements ->invalidatepage_range.

However this function isn't exported to modules, so scratch that.

So how is calling code supposed to determine whether it can actually
_use_ this interface?



Also... one would obviously like to see the old ->invalidatepage() get
removed entirely. But about 20 filesystems implement
->invalidatepage() and implementation of ->invalidatepage_range() is
non-trivial and actually unnecessary.

So I dunno. Perhaps we should keep ->invalidatepage() and
->invalidatepage_range() completely separate.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2012-09-05 14:36:00

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH 01/15 v2] mm: add invalidatepage_range address space operation

On Tue, 4 Sep 2012, Andrew Morton wrote:

> Date: Tue, 4 Sep 2012 16:43:16 -0700
> From: Andrew Morton <[email protected]>
> To: Lukas Czerner <[email protected]>
> Cc: [email protected], [email protected], [email protected],
> [email protected], [email protected]
> Subject: Re: [PATCH 01/15 v2] mm: add invalidatepage_range address space
> operation
>
> On Fri, 31 Aug 2012 18:21:37 -0400
> Lukas Czerner <[email protected]> wrote:
>
> > Currently there is no way to truncate partial page where the end
> > truncate point is not at the end of the page. This is because it was not
> > needed and the functionality was enough for file system truncate
> > operation to work properly. However more file systems now support punch
> > hole feature and it can benefit from mm supporting truncating page just
> > up to the certain point.
> >
> > Specifically, with this functionality truncate_inode_pages_range() can
> > be changed so it supports truncating partial page at the end of the
> > range (currently it will BUG_ON() if 'end' is not at the end of the
> > page).
> >
> > This commit add new address space operation invalidatepage_range which
> > allows specifying length of bytes to invalidate, rather than assuming
> > truncate to the end of the page. It also introduce
> > block_invalidatepage_range() and do_invalidatepage)range() functions for
> > exactly the same reason.
> >
> > The caller does not have to implement both aops (invalidatepage and
> > invalidatepage_range) and the latter is preferred. The old method will be
> > used only if invalidatepage_range is not implemented by the caller.
> >
> > ...
> >
> > +/**
> > + * do_invalidatepage_range - invalidate range of the page
> > + *
> > + * @page: the page which is affected
> > + * @offset: start of the range to invalidate
> > + * @length: length of the range to invalidate
> > + */
> > +void do_invalidatepage_range(struct page *page, unsigned int offset,
> > + unsigned int length)
> > +{
> > + void (*invalidatepage_range)(struct page *, unsigned int,
> > + unsigned int);
> > void (*invalidatepage)(struct page *, unsigned long);
> > +
> > + /*
> > + * Try invalidatepage_range first
> > + */
> > + invalidatepage_range = page->mapping->a_ops->invalidatepage_range;
> > + if (invalidatepage_range) {
> > + (*invalidatepage_range)(page, offset, length);
> > + return;
> > + }
> > +
> > + /*
> > + * When only invalidatepage is registered length + offset must be
> > + * PAGE_CACHE_SIZE
> > + */
> > invalidatepage = page->mapping->a_ops->invalidatepage;
> > + if (invalidatepage) {
> > + BUG_ON(length + offset != PAGE_CACHE_SIZE);
> > + (*invalidatepage)(page, offset);
> > + }
> > #ifdef CONFIG_BLOCK
> > - if (!invalidatepage)
> > - invalidatepage = block_invalidatepage;
> > + if (!invalidatepage_range && !invalidatepage)
> > + block_invalidatepage_range(page, offset, length);
> > #endif
> > - if (invalidatepage)
> > - (*invalidatepage)(page, offset);
> > }
>
> This interface is ... strange. If the caller requests a
> non-page-aligned invalidateion against an fs which doesn't implement
> ->invalidatepage_range then the kernel goes BUG. So the caller must
> know beforehand that the underlying fs _does_ implement
> ->invalidatepage_range.
>
> For practical purposes, this implies that invalidation of a
> non-page-aligned region will only be performed by fs code, because the
> fs implicitly knows that it implements ->invalidatepage_range.
>
> However this function isn't exported to modules, so scratch that.
>
> So how is calling code supposed to determine whether it can actually
> _use_ this interface?

Right now the only place we use ->invalidatepage_range is
do_invalidatepage_range() which is only used in
truncate_inode_pages_range(). Without these patches
truncate_inode_pages_range() throw a BUG() if it gets unaligned
range, so it is file system responsibility to take case about the
alignment, which is currently happening in all file systems unless
there is a bug (like in ocfs2).

So currently callers of truncate_inode_pages_range() know that the
range has to be aligned and with these patches they should know (it
is documented in the function comment after all) that when they want
to pass unaligned range the underlying file system has to implement
->invalidatepage_range().

Now I agree that the only one who will have this information will be
the file system itself. But both truncate_pagecache_range() and
truncate_inode_pages_range() are used from within the file system as
you pointed out earlier, so it does not look like a real problem to
me. But I have to admit that it is a bit strange.

However if we would want to keep ->invalidatepage_range() and
->invalidatepage() completely separate then we would have to have
separate truncate_inode_pages_range() and truncate_pagecache_range()
as well for the separation to actually matter. And IMO this would be
much worse...

As it is now the caller is forced to implement
->invalidatepage_range() if he wants to invalidate unaligned range
by the use of BUG_ON() in the kind of same way we would force him to
implement it if he would like to use the 'new'
truncate_inode_pages_range(), or truncate_pagecache_range().

I am intentionally not mentioning do_invalidatepage_range() since it
currently does not have other users than truncate_inode_pages_range() where
the range may be unaligned.

Thanks!
-Lukas

>
>
> Also... one would obviously like to see the old ->invalidatepage() get
> removed entirely. But about 20 filesystems implement
> ->invalidatepage() and implementation of ->invalidatepage_range() is
> non-trivial and actually unnecessary.
>
> So I dunno. Perhaps we should keep ->invalidatepage() and
> ->invalidatepage_range() completely separate.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2012-09-05 15:56:52

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 01/15 v2] mm: add invalidatepage_range address space operation

On Wed, Sep 05, 2012 at 10:36:00AM -0400, Luk?? Czerner wrote:
> However if we would want to keep ->invalidatepage_range() and
> ->invalidatepage() completely separate then we would have to have
> separate truncate_inode_pages_range() and truncate_pagecache_range()
> as well for the separation to actually matter. And IMO this would be
> much worse...

What's the problem with simply changing the ->invalidatepage prototype
to always pass the range and updating all instances for it?


2012-09-05 16:42:54

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH 01/15 v2] mm: add invalidatepage_range address space operation

On Wed, 5 Sep 2012, Christoph Hellwig wrote:

> Date: Wed, 5 Sep 2012 11:56:48 -0400
> From: Christoph Hellwig <[email protected]>
> To: Luk?? Czerner <[email protected]>
> Cc: Andrew Morton <[email protected]>, [email protected],
> [email protected], [email protected], [email protected],
> [email protected]
> Subject: Re: [PATCH 01/15 v2] mm: add invalidatepage_range address space
> operation
>
> On Wed, Sep 05, 2012 at 10:36:00AM -0400, Luk?? Czerner wrote:
> > However if we would want to keep ->invalidatepage_range() and
> > ->invalidatepage() completely separate then we would have to have
> > separate truncate_inode_pages_range() and truncate_pagecache_range()
> > as well for the separation to actually matter. And IMO this would be
> > much worse...
>
> What's the problem with simply changing the ->invalidatepage prototype
> to always pass the range and updating all instances for it?
>

The problem is that it would require me to implement this
functionality for _all_ the file systems, because it is not just
about changing the prototype, but also changing the implementation to
be able to handle unaligned end of the range. This change would
involve 20 file systems.

It is not impossible though... so if people think that it's the
right way to go, then I guess it can be done.

-Lukas

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2012-09-10 12:00:53

by Ashish Sangwan

[permalink] [raw]
Subject: Re: [PATCH 07/15 v2] ext4: Take i_mutex before punching hole

On Sat, Sep 1, 2012 at 3:51 AM, Lukas Czerner <[email protected]> wrote:
> Currently the allocation might happen in the punched range after the
> truncation and before the releasing the space of the range. This would
> lead to blocks being unallocated under the mapped buffer heads resulting
> in nasty bugs.
>
> With this commit we take i_mutex before going to do anything in the
> ext4_ext_punch_hole() preventing any write to happen while the hole
> punching is in progress. This will also allow us to ditch the writeout
> of dirty pages withing the range.
>
> This commit was based on code provided by Zheng Liu, thanks!
>
> Signed-off-by: Lukas Czerner <[email protected]>
> ---
> fs/ext4/extents.c | 26 ++++++++++----------------
> 1 files changed, 10 insertions(+), 16 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index aabbb3f..f920383 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -4769,9 +4769,11 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
> loff_t first_page_offset, last_page_offset;
> int credits, err = 0;
>
> + mutex_lock(&inode->i_mutex);
> +
> /* No need to punch hole beyond i_size */
> if (offset >= inode->i_size)
> - return 0;
> + goto out1;
>
> /*
> * If the hole extends beyond i_size, set the hole
> @@ -4789,18 +4791,6 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
> first_page_offset = first_page << PAGE_CACHE_SHIFT;
> last_page_offset = last_page << PAGE_CACHE_SHIFT;
>
> - /*
> - * Write out all dirty pages to avoid race conditions
> - * Then release them.
> - */
> - if (mapping->nrpages && mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
> - err = filemap_write_and_wait_range(mapping,
> - offset, offset + length - 1);
> -
> - if (err)
> - return err;
> - }
> -

Removing above code will cause a problem in case the file has all its
data in memory and nothing has been committed on disk. If punch hole
is issued for such a file, as there are no extents present, EIO would
be returned from ext4_ext_rm_leaf. So, even though blocks would be
removed from memory, the end result will be error EIO.

> /* Now release the pages */
> if (last_page_offset > first_page_offset) {
> truncate_pagecache_range(inode, first_page_offset,

To avoid this, you can add a check after the call to truncate_pagecache_range.
if(!inode->i_blocks)
return 0;

> @@ -4812,12 +4802,14 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
>
> credits = ext4_writepage_trans_blocks(inode);
> handle = ext4_journal_start(inode, credits);
> - if (IS_ERR(handle))
> - return PTR_ERR(handle);
> + if (IS_ERR(handle)) {
> + err = PTR_ERR(handle);
> + goto out1;
> + }
>
> err = ext4_orphan_add(handle, inode);
> if (err)
> - goto out;
> + goto out1;
>
> /*
> * Now we need to zero out the non-page-aligned data in the
> @@ -4907,6 +4899,8 @@ out:
> inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
> ext4_mark_inode_dirty(handle, inode);
> ext4_journal_stop(handle);
> +out1:
> + mutex_unlock(&inode->i_mutex);
> return err;
> }
> int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> --
> 1.7.7.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2012-09-13 15:15:03

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH 07/15 v2] ext4: Take i_mutex before punching hole

On Mon, 10 Sep 2012, Ashish Sangwan wrote:

> Date: Mon, 10 Sep 2012 17:30:53 +0530
> From: Ashish Sangwan <[email protected]>
> To: Lukas Czerner <[email protected]>
> Cc: [email protected], [email protected], [email protected],
> [email protected], [email protected]
> Subject: Re: [PATCH 07/15 v2] ext4: Take i_mutex before punching hole
>
> On Sat, Sep 1, 2012 at 3:51 AM, Lukas Czerner <[email protected]> wrote:
> > Currently the allocation might happen in the punched range after the
> > truncation and before the releasing the space of the range. This would
> > lead to blocks being unallocated under the mapped buffer heads resulting
> > in nasty bugs.
> >
> > With this commit we take i_mutex before going to do anything in the
> > ext4_ext_punch_hole() preventing any write to happen while the hole
> > punching is in progress. This will also allow us to ditch the writeout
> > of dirty pages withing the range.
> >
> > This commit was based on code provided by Zheng Liu, thanks!
> >
> > Signed-off-by: Lukas Czerner <[email protected]>
> > ---
> > fs/ext4/extents.c | 26 ++++++++++----------------
> > 1 files changed, 10 insertions(+), 16 deletions(-)
> >
> > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> > index aabbb3f..f920383 100644
> > --- a/fs/ext4/extents.c
> > +++ b/fs/ext4/extents.c
> > @@ -4769,9 +4769,11 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
> > loff_t first_page_offset, last_page_offset;
> > int credits, err = 0;
> >
> > + mutex_lock(&inode->i_mutex);
> > +
> > /* No need to punch hole beyond i_size */
> > if (offset >= inode->i_size)
> > - return 0;
> > + goto out1;
> >
> > /*
> > * If the hole extends beyond i_size, set the hole
> > @@ -4789,18 +4791,6 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
> > first_page_offset = first_page << PAGE_CACHE_SHIFT;
> > last_page_offset = last_page << PAGE_CACHE_SHIFT;
> >
> > - /*
> > - * Write out all dirty pages to avoid race conditions
> > - * Then release them.
> > - */
> > - if (mapping->nrpages && mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
> > - err = filemap_write_and_wait_range(mapping,
> > - offset, offset + length - 1);
> > -
> > - if (err)
> > - return err;
> > - }
> > -
>
> Removing above code will cause a problem in case the file has all its
> data in memory and nothing has been committed on disk. If punch hole
> is issued for such a file, as there are no extents present, EIO would
> be returned from ext4_ext_rm_leaf. So, even though blocks would be
> removed from memory, the end result will be error EIO.
>
> > /* Now release the pages */
> > if (last_page_offset > first_page_offset) {
> > truncate_pagecache_range(inode, first_page_offset,
>
> To avoid this, you can add a check after the call to truncate_pagecache_range.
> if(!inode->i_blocks)
> return 0;

Thanks for pointing this out. However Dimitry has better fix for
this with some additional changes so I am dropping this particular
patch.

(see "ext4: punch_hole should wait for DIO writers")

Thanks!
-Lukas

>
> > @@ -4812,12 +4802,14 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
> >
> > credits = ext4_writepage_trans_blocks(inode);
> > handle = ext4_journal_start(inode, credits);
> > - if (IS_ERR(handle))
> > - return PTR_ERR(handle);
> > + if (IS_ERR(handle)) {
> > + err = PTR_ERR(handle);
> > + goto out1;
> > + }
> >
> > err = ext4_orphan_add(handle, inode);
> > if (err)
> > - goto out;
> > + goto out1;
> >
> > /*
> > * Now we need to zero out the non-page-aligned data in the
> > @@ -4907,6 +4899,8 @@ out:
> > inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
> > ext4_mark_inode_dirty(handle, inode);
> > ext4_journal_stop(handle);
> > +out1:
> > + mutex_unlock(&inode->i_mutex);
> > return err;
> > }
> > int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> > --
> > 1.7.7.6
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2012-09-14 13:21:19

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH 01/15 v2] mm: add invalidatepage_range address space operation

On Wed, 5 Sep 2012, Luk?? Czerner wrote:

> Date: Wed, 5 Sep 2012 12:42:54 -0400 (EDT)
> From: Luk?? Czerner <[email protected]>
> To: Christoph Hellwig <[email protected]>
> Cc: Luk?? Czerner <[email protected]>,
> Andrew Morton <[email protected]>, [email protected],
> [email protected], [email protected], [email protected],
> [email protected]
> Subject: Re: [PATCH 01/15 v2] mm: add invalidatepage_range address space
> operation
>
> On Wed, 5 Sep 2012, Christoph Hellwig wrote:
>
> > Date: Wed, 5 Sep 2012 11:56:48 -0400
> > From: Christoph Hellwig <[email protected]>
> > To: Luk?? Czerner <[email protected]>
> > Cc: Andrew Morton <[email protected]>, [email protected],
> > [email protected], [email protected], [email protected],
> > [email protected]
> > Subject: Re: [PATCH 01/15 v2] mm: add invalidatepage_range address space
> > operation
> >
> > On Wed, Sep 05, 2012 at 10:36:00AM -0400, Luk?? Czerner wrote:
> > > However if we would want to keep ->invalidatepage_range() and
> > > ->invalidatepage() completely separate then we would have to have
> > > separate truncate_inode_pages_range() and truncate_pagecache_range()
> > > as well for the separation to actually matter. And IMO this would be
> > > much worse...
> >
> > What's the problem with simply changing the ->invalidatepage prototype
> > to always pass the range and updating all instances for it?
> >
>
> The problem is that it would require me to implement this
> functionality for _all_ the file systems, because it is not just
> about changing the prototype, but also changing the implementation to
> be able to handle unaligned end of the range. This change would
> involve 20 file systems.
>
> It is not impossible though... so if people think that it's the
> right way to go, then I guess it can be done.
>
> -Lukas

Are there still any objections or comments about this ?

-Lukas