2013-04-09 09:14:39

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v3 00/18] change invalidatepage prototype to accept length

Hi,

This set of patches are aimed to allow truncate_inode_pages_range() handle
ranges which are not aligned at the end of the page. Currently it will
hit BUG_ON() when the end of the range is not aligned. Punch hole feature
however can benefit from this ability saving file systems some work not
forcing them to implement their own invalidate code to handle unaligned
ranges.

In order for this to woke we need change ->invalidatepage() address space
operation to to accept range to invalidate by adding 'length' argument in
addition to 'offset'. This is different from my previous attempt to create
new aop ->invalidatepage_range (http://lwn.net/Articles/514828/) which I
reconsidered to be unnecessary.

It would be for the best if this series could go through ext4 branch since
there are a lot of ext4 changes which are based on dev branch of ext4
(git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git)

For description purposes this patch set can be divided into following
groups:

patch 0001: Change ->invalidatepage() prototype adding 'length' argument
and changing all the instances. In very simple cases file
system methods are completely adapted, otherwise only
prototype is changed and the rest will follow. This patch
also implement the 'range' invalidation in
block_invalidatepage().

patch 0002 - 0009:
Make the use of new 'length' argument in the file system
itself. Some file systems can take advantage of it trying
to invalidate only portion of the page if possible, some
can't, however none of the file systems currently attempt
to truncate non page aligned ranges.


patch 0010: Teach truncate_inode_pages_range() to handle non page aligned
ranges.

patch 0011 - 0018:
Ext4 changes build on top of previous changes, simplifying
punch hole code. Not all changes are realated specifically
to invalidatepage() change, but all are related to the punch
hole feature.

Even though this patch set would mainly affect functionality of the file
file systems implementing punch hole I've tested all the following file
system using xfstests without noticing any bugs related to this change.

ext3, ext4, xfs, btrfs, gfs2 and reiserfs

the much smaller changes in other file systems has not been directly tested,
so please review.


---
Documentation/filesystems/Locking | 6 +-
Documentation/filesystems/vfs.txt | 20 +-
fs/9p/vfs_addr.c | 5 +-
fs/afs/file.c | 10 +-
fs/btrfs/disk-io.c | 3 +-
fs/btrfs/extent_io.c | 2 +-
fs/btrfs/inode.c | 3 +-
fs/buffer.c | 21 ++-
fs/ceph/addr.c | 15 +-
fs/cifs/file.c | 5 +-
fs/exofs/inode.c | 6 +-
fs/ext3/inode.c | 9 +-
fs/ext4/ext4.h | 14 +-
fs/ext4/extents.c | 96 ++++++----
fs/ext4/inode.c | 393 +++++++++++++------------------------
fs/f2fs/data.c | 3 +-
fs/f2fs/node.c | 3 +-
fs/gfs2/aops.c | 17 ++-
fs/jbd/transaction.c | 19 ++-
fs/jbd2/transaction.c | 24 ++-
fs/jfs/jfs_metapage.c | 5 +-
fs/logfs/file.c | 3 +-
fs/logfs/segment.c | 3 +-
fs/nfs/file.c | 8 +-
fs/ntfs/aops.c | 2 +-
fs/ocfs2/aops.c | 5 +-
fs/reiserfs/inode.c | 12 +-
fs/ubifs/file.c | 5 +-
fs/xfs/xfs_aops.c | 10 +-
fs/xfs/xfs_trace.h | 41 ++++-
include/linux/buffer_head.h | 3 +-
include/linux/fs.h | 2 +-
include/linux/jbd.h | 2 +-
include/linux/jbd2.h | 2 +-
include/linux/mm.h | 3 +-
include/trace/events/ext3.h | 12 +-
include/trace/events/ext4.h | 64 ++++---
mm/readahead.c | 2 +-
mm/truncate.c | 117 ++++++++----
39 files changed, 522 insertions(+), 453 deletions(-)

[PATCH v3 01/18] mm: change invalidatepage prototype to accept
[PATCH v3 02/18] jbd2: change jbd2_journal_invalidatepage to accept
[PATCH v3 03/18] ext4: use ->invalidatepage() length argument
[PATCH v3 04/18] jbd: change journal_invalidatepage() to accept
[PATCH v3 05/18] xfs: use ->invalidatepage() length argument
[PATCH v3 06/18] ocfs2: use ->invalidatepage() length argument
[PATCH v3 07/18] ceph: use ->invalidatepage() length argument
[PATCH v3 08/18] gfs2: use ->invalidatepage() length argument
[PATCH v3 09/18] reiserfs: use ->invalidatepage() length argument
[PATCH v3 10/18] mm: teach truncate_inode_pages_range() to handle
[PATCH v3 11/18] Revert "ext4: remove no longer used functions in
[PATCH v3 12/18] Revert "ext4: fix fsx truncate failure"
[PATCH v3 13/18] ext4: use ext4_zero_partial_blocks in punch_hole
[PATCH v3 14/18] ext4: remove unused discard_partial_page_buffers
[PATCH v3 15/18] ext4: remove unused code from ext4_remove_blocks()
[PATCH v3 16/18] ext4: update ext4_ext_remove_space trace point
[PATCH v3 17/18] ext4: make punch hole code path work with bigalloc
[PATCH v3 18/18] ext4: Allow punch hole with bigalloc enabled


2013-04-09 09:14:46

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v3 02/18] jbd2: change jbd2_journal_invalidatepage to accept length

invalidatepage now accepts range to invalidate and there are two file
system using jbd2 also implementing punch hole feature which can benefit
from this. We need to implement the same thing for jbd2 layer in order to
allow those file system take benefit of this functionality.

This commit adds length argument to the jbd2_journal_invalidatepage()
and updates all instances in ext4 and ocfs2.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/inode.c | 3 ++-
fs/jbd2/transaction.c | 24 +++++++++++++++++-------
fs/ocfs2/aops.c | 3 ++-
include/linux/jbd2.h | 2 +-
4 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index f5bf189..69595f5 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3000,7 +3000,8 @@ static int __ext4_journalled_invalidatepage(struct page *page,
if (offset == 0)
ClearPageChecked(page);

- return jbd2_journal_invalidatepage(journal, page, offset);
+ return jbd2_journal_invalidatepage(journal, page, offset,
+ PAGE_CACHE_SIZE - offset);
}

/* Wrapper for aops... */
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 325bc01..d334e17 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -2027,18 +2027,23 @@ zap_buffer_unlocked:
* void jbd2_journal_invalidatepage()
* @journal: journal to use for flush...
* @page: page to flush
- * @offset: length of page to invalidate.
+ * @offset: start of the range to invalidate
+ * @length: length of the range to invalidate
*
- * Reap page buffers containing data after offset in page. Can return -EBUSY
- * if buffers are part of the committing transaction and the page is straddling
- * i_size. Caller then has to wait for current commit and try again.
+ * Reap page buffers containing data after in the specified range in page.
+ * Can return -EBUSY if buffers are part of the committing transaction and
+ * the page is straddling i_size. Caller then has to wait for current commit
+ * and try again.
*/
int jbd2_journal_invalidatepage(journal_t *journal,
struct page *page,
- unsigned long offset)
+ unsigned int offset,
+ unsigned int length)
{
struct buffer_head *head, *bh, *next;
+ unsigned int stop = offset + length;
unsigned int curr_off = 0;
+ int partial_page = (offset || length < PAGE_CACHE_SIZE);
int may_free = 1;
int ret = 0;

@@ -2047,6 +2052,8 @@ int jbd2_journal_invalidatepage(journal_t *journal,
if (!page_has_buffers(page))
return 0;

+ BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
+
/* We will potentially be playing with lists other than just the
* data lists (especially for journaled data mode), so be
* cautious in our locking. */
@@ -2056,10 +2063,13 @@ int jbd2_journal_invalidatepage(journal_t *journal,
unsigned int next_off = curr_off + bh->b_size;
next = bh->b_this_page;

+ if (next_off > stop)
+ return 0;
+
if (offset <= curr_off) {
/* This block is wholly outside the truncation point */
lock_buffer(bh);
- ret = journal_unmap_buffer(journal, bh, offset > 0);
+ ret = journal_unmap_buffer(journal, bh, partial_page);
unlock_buffer(bh);
if (ret < 0)
return ret;
@@ -2070,7 +2080,7 @@ int jbd2_journal_invalidatepage(journal_t *journal,

} while (bh != head);

- if (!offset) {
+ if (!partial_page) {
if (may_free && try_to_free_buffers(page))
J_ASSERT(!page_has_buffers(page));
}
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index ecb86ca..7c47755 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -608,7 +608,8 @@ static void ocfs2_invalidatepage(struct page *page, unsigned int offset,
{
journal_t *journal = OCFS2_SB(page->mapping->host->i_sb)->journal->j_journal;

- jbd2_journal_invalidatepage(journal, page, offset);
+ jbd2_journal_invalidatepage(journal, page, offset,
+ PAGE_CACHE_SIZE - offset);
}

static int ocfs2_releasepage(struct page *page, gfp_t wait)
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index f9fe889..8c34abd 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1090,7 +1090,7 @@ extern int jbd2_journal_dirty_metadata (handle_t *, struct buffer_head *);
extern int jbd2_journal_forget (handle_t *, struct buffer_head *);
extern void journal_sync_buffer (struct buffer_head *);
extern int jbd2_journal_invalidatepage(journal_t *,
- struct page *, unsigned long);
+ struct page *, unsigned int, unsigned int);
extern int jbd2_journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
extern int jbd2_journal_stop(handle_t *);
extern int jbd2_journal_flush (journal_t *);
--
1.7.7.6

2013-04-09 09:14:58

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v3 08/18] gfs2: use ->invalidatepage() length argument

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in gfs2_invalidatepage().

Signed-off-by: Lukas Czerner <[email protected]>
Cc: [email protected]
---
fs/gfs2/aops.c | 9 +++++++--
1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 37093ba..ea920bf 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -947,24 +947,29 @@ static void gfs2_invalidatepage(struct page *page, unsigned int offset,
unsigned int length)
{
struct gfs2_sbd *sdp = GFS2_SB(page->mapping->host);
+ unsigned int stop = offset + length;
+ int partial_page = (offset || length < PAGE_CACHE_SIZE);
struct buffer_head *bh, *head;
unsigned long pos = 0;

BUG_ON(!PageLocked(page));
- if (offset == 0)
+ if (!partial_page)
ClearPageChecked(page);
if (!page_has_buffers(page))
goto out;

bh = head = page_buffers(page);
do {
+ if (pos + bh->b_size > stop)
+ return;
+
if (offset <= pos)
gfs2_discard(sdp, bh);
pos += bh->b_size;
bh = bh->b_this_page;
} while (bh != head);
out:
- if (offset == 0)
+ if (!partial_page)
try_to_release_page(page, 0);
}

--
1.7.7.6

2013-04-09 09:15:08

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v3 10/18] mm: teach truncate_inode_pages_range() to handle non page aligned ranges

This commit changes truncate_inode_pages_range() so it can handle non
page aligned regions of the truncate. Currently we can hit BUG_ON when
the end of the range is not page aligned, but we can handle unaligned
start of the range.

Being able to handle non page aligned regions of the page can help file
system punch_hole implementations and save some work, because once we're
holding the page we might as well deal with it right away.

In previous commits we've changed ->invalidatepage() prototype to accept
'length' argument to be able to specify range to invalidate. No we can
use that new ability in truncate_inode_pages_range().

Signed-off-by: Lukas Czerner <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Hugh Dickins <[email protected]>
---
mm/truncate.c | 104 ++++++++++++++++++++++++++++++++++++++++-----------------
1 files changed, 73 insertions(+), 31 deletions(-)

diff --git a/mm/truncate.c b/mm/truncate.c
index fdba083..e2e8a8a 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -52,14 +52,6 @@ void do_invalidatepage(struct page *page, unsigned int offset,
(*invalidatepage)(page, offset, length);
}

-static inline void truncate_partial_page(struct page *page, unsigned partial)
-{
- zero_user_segment(page, partial, PAGE_CACHE_SIZE);
- cleancache_invalidate_page(page->mapping, page);
- if (page_has_private(page))
- do_invalidatepage(page, partial, PAGE_CACHE_SIZE - partial);
-}
-
/*
* This cancels just the dirty bit on the kernel page itself, it
* does NOT actually remove dirty bits on any mmap's that may be
@@ -188,11 +180,11 @@ int invalidate_inode_page(struct page *page)
* truncate_inode_pages_range - truncate range of pages specified by start & end byte offsets
* @mapping: mapping to truncate
* @lstart: offset from which to truncate
- * @lend: offset to which to truncate
+ * @lend: offset to which to truncate (inclusive)
*
* Truncate the page cache, removing the pages that are between
- * specified offsets (and zeroing out partial page
- * (if lstart is not page aligned)).
+ * specified offsets (and zeroing out partial pages
+ * if lstart or lend + 1 is not page aligned).
*
* Truncate takes two passes - the first pass is nonblocking. It will not
* block on page locks and it will not block on writeback. The second pass
@@ -203,35 +195,58 @@ int invalidate_inode_page(struct page *page)
* We pass down the cache-hot hint to the page freeing code. Even if the
* mapping is large, it is probably the case that the final pages are the most
* recently touched, and freeing happens in ascending file offset order.
+ *
+ * Note that since ->invalidatepage() accepts range to invalidate
+ * truncate_inode_pages_range is able to handle cases where lend + 1 is not
+ * page aligned properly.
*/
void truncate_inode_pages_range(struct address_space *mapping,
loff_t lstart, loff_t lend)
{
- const pgoff_t start = (lstart + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
- const unsigned partial = lstart & (PAGE_CACHE_SIZE - 1);
- struct pagevec pvec;
- pgoff_t index;
- pgoff_t end;
- int i;
+ pgoff_t start; /* inclusive */
+ pgoff_t end; /* exclusive */
+ unsigned int partial_start; /* inclusive */
+ unsigned int partial_end; /* exclusive */
+ struct pagevec pvec;
+ pgoff_t index;
+ int i;

cleancache_invalidate_inode(mapping);
if (mapping->nrpages == 0)
return;

- BUG_ON((lend & (PAGE_CACHE_SIZE - 1)) != (PAGE_CACHE_SIZE - 1));
- end = (lend >> PAGE_CACHE_SHIFT);
+ /* Offsets within partial pages */
+ partial_start = lstart & (PAGE_CACHE_SIZE - 1);
+ partial_end = (lend + 1) & (PAGE_CACHE_SIZE - 1);
+
+ /*
+ * 'start' and 'end' always covers the range of pages to be fully
+ * truncated. Partial pages are covered with 'partial_start' at the
+ * start of the range and 'partial_end' at the end of the range.
+ * Note that 'end' is exclusive while 'lend' is inclusive.
+ */
+ start = (lstart + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
+ if (lend == -1)
+ /*
+ * lend == -1 indicates end-of-file so we have to set 'end'
+ * to the highest possible pgoff_t and since the type is
+ * unsigned we're using -1.
+ */
+ end = -1;
+ else
+ end = (lend + 1) >> PAGE_CACHE_SHIFT;

pagevec_init(&pvec, 0);
index = start;
- while (index <= end && pagevec_lookup(&pvec, mapping, index,
- min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
+ while (index < end && pagevec_lookup(&pvec, mapping, index,
+ min(end - index, (pgoff_t)PAGEVEC_SIZE))) {
mem_cgroup_uncharge_start();
for (i = 0; i < pagevec_count(&pvec); i++) {
struct page *page = pvec.pages[i];

/* We rely upon deletion not changing page->index */
index = page->index;
- if (index > end)
+ if (index >= end)
break;

if (!trylock_page(page))
@@ -250,27 +265,56 @@ void truncate_inode_pages_range(struct address_space *mapping,
index++;
}

- if (partial) {
+ if (partial_start) {
struct page *page = find_lock_page(mapping, start - 1);
if (page) {
+ unsigned int top = PAGE_CACHE_SIZE;
+ if (start > end) {
+ /* Truncation within a single page */
+ top = partial_end;
+ partial_end = 0;
+ }
wait_on_page_writeback(page);
- truncate_partial_page(page, partial);
+ zero_user_segment(page, partial_start, top);
+ cleancache_invalidate_page(mapping, page);
+ if (page_has_private(page))
+ do_invalidatepage(page, partial_start,
+ top - partial_start);
unlock_page(page);
page_cache_release(page);
}
}
+ if (partial_end) {
+ struct page *page = find_lock_page(mapping, end);
+ if (page) {
+ wait_on_page_writeback(page);
+ zero_user_segment(page, 0, partial_end);
+ cleancache_invalidate_page(mapping, page);
+ if (page_has_private(page))
+ do_invalidatepage(page, 0,
+ partial_end);
+ unlock_page(page);
+ page_cache_release(page);
+ }
+ }
+ /*
+ * If the truncation happened within a single page no pages
+ * will be released, just zeroed, so we can bail out now.
+ */
+ if (start >= end)
+ return;

index = start;
for ( ; ; ) {
cond_resched();
if (!pagevec_lookup(&pvec, mapping, index,
- min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
+ min(end - index, (pgoff_t)PAGEVEC_SIZE))) {
if (index == start)
break;
index = start;
continue;
}
- if (index == start && pvec.pages[0]->index > end) {
+ if (index == start && pvec.pages[0]->index >= end) {
pagevec_release(&pvec);
break;
}
@@ -280,7 +324,7 @@ void truncate_inode_pages_range(struct address_space *mapping,

/* We rely upon deletion not changing page->index */
index = page->index;
- if (index > end)
+ if (index >= end)
break;

lock_page(page);
@@ -601,10 +645,8 @@ void truncate_pagecache_range(struct inode *inode, loff_t lstart, loff_t lend)
* This rounding is currently just for example: unmap_mapping_range
* expands its hole outwards, whereas we want it to contract the hole
* inwards. However, existing callers of truncate_pagecache_range are
- * doing their own page rounding first; and truncate_inode_pages_range
- * currently BUGs if lend is not pagealigned-1 (it handles partial
- * page at start of hole, but not partial page at end of hole). Note
- * unmap_mapping_range allows holelen 0 for all, and we allow lend -1.
+ * doing their own page rounding first. Note that unmap_mapping_range
+ * allows holelen 0 for all, and we allow lend -1 for end of file.
*/

/*
--
1.7.7.6

2013-04-09 09:15:12

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v3 13/18] ext4: use ext4_zero_partial_blocks in punch_hole

We're doing to get rid of ext4_discard_partial_page_buffers() since it is
duplicating some code and also partially duplicating work of
truncate_pagecache_range(), moreover the old implementation was much
clearer.

Now when the truncate_inode_pages_range() can handle truncating non page
aligned regions we can use this to invalidate and zero out block aligned
region of the punched out range and then use ext4_block_truncate_page()
to zero the unaligned blocks on the start and end of the range. This
will greatly simplify the punch hole code. Moreover after this commit we
can get rid of the ext4_discard_partial_page_buffers() completely.

We also introduce function ext4_prepare_punch_hole() to do come common
operations before we attempt to do the actual punch hole on
indirect or extent file which saves us some code duplication.

This has been tested on ppc64 with 1k block size with fsx and xfstests
without any problems.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/ext4.h | 2 +
fs/ext4/inode.c | 110 ++++++++++++++++++++-----------------------------------
2 files changed, 42 insertions(+), 70 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 3aa5943..2428244 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2109,6 +2109,8 @@ extern int ext4_block_truncate_page(handle_t *handle,
struct address_space *mapping, loff_t from);
extern int ext4_block_zero_page_range(handle_t *handle,
struct address_space *mapping, loff_t from, loff_t length);
+extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
+ loff_t lstart, loff_t lend);
extern int ext4_discard_partial_page_buffers(handle_t *handle,
struct address_space *mapping, loff_t from,
loff_t length, int flags);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d58e13c..6003fd1 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3675,6 +3675,37 @@ unlock:
return err;
}

+int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
+ loff_t lstart, loff_t lend)
+{
+ struct super_block *sb = inode->i_sb;
+ struct address_space *mapping = inode->i_mapping;
+ unsigned partial = lstart & (sb->s_blocksize - 1);
+ ext4_fsblk_t start = lstart >> sb->s_blocksize_bits;
+ ext4_fsblk_t end = lend >> sb->s_blocksize_bits;
+ int err = 0;
+
+ /* Handle partial zero within the single block */
+ if (start == end) {
+ err = ext4_block_zero_page_range(handle, mapping,
+ lstart, lend - lstart + 1);
+ return err;
+ }
+ /* Handle partial zero out on the start of the range */
+ if (partial) {
+ err = ext4_block_zero_page_range(handle, mapping,
+ lstart, sb->s_blocksize);
+ if (err)
+ return err;
+ }
+ /* Handle partial zero out on the end of the range */
+ partial = lend & (sb->s_blocksize - 1);
+ if (partial != sb->s_blocksize - 1)
+ err = ext4_block_zero_page_range(handle, mapping,
+ lend - partial, partial + 1);
+ return err;
+}
+
int ext4_can_truncate(struct inode *inode)
{
if (S_ISREG(inode->i_mode))
@@ -3703,7 +3734,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
struct super_block *sb = inode->i_sb;
ext4_lblk_t first_block, stop_block;
struct address_space *mapping = inode->i_mapping;
- loff_t first_page, last_page, page_len;
loff_t first_page_offset, last_page_offset;
handle_t *handle;
unsigned int credits;
@@ -3755,17 +3785,13 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
offset;
}

- first_page = (offset + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
- last_page = (offset + length) >> PAGE_CACHE_SHIFT;
-
- first_page_offset = first_page << PAGE_CACHE_SHIFT;
- last_page_offset = last_page << PAGE_CACHE_SHIFT;
+ first_page_offset = round_up(offset, sb->s_blocksize);
+ last_page_offset = round_down((offset + length), sb->s_blocksize) - 1;

- /* Now release the pages */
- if (last_page_offset > first_page_offset) {
+ /* Now release the pages and zero block aligned part of pages*/
+ if (last_page_offset > first_page_offset)
truncate_pagecache_range(inode, first_page_offset,
- last_page_offset - 1);
- }
+ last_page_offset);

/* Wait all existing dio workers, newcomers will block on i_mutex */
ext4_inode_block_unlocked_dio(inode);
@@ -3785,66 +3811,10 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
goto out_dio;
}

- /*
- * Now we need to zero out the non-page-aligned data in the
- * pages at the start and tail of the hole, and unmap the
- * buffer heads for the block aligned regions of the page that
- * were completely zeroed.
- */
- if (first_page > last_page) {
- /*
- * If the file space being truncated is contained
- * within a page just zero out and unmap the middle of
- * that page
- */
- ret = ext4_discard_partial_page_buffers(handle,
- mapping, offset, length, 0);
-
- if (ret)
- goto out_stop;
- } else {
- /*
- * zero out and unmap the partial page that contains
- * the start of the hole
- */
- page_len = first_page_offset - offset;
- if (page_len > 0) {
- ret = ext4_discard_partial_page_buffers(handle, mapping,
- offset, page_len, 0);
- if (ret)
- goto out_stop;
- }
-
- /*
- * zero out and unmap the partial page that contains
- * the end of the hole
- */
- page_len = offset + length - last_page_offset;
- if (page_len > 0) {
- ret = ext4_discard_partial_page_buffers(handle, mapping,
- last_page_offset, page_len, 0);
- if (ret)
- goto out_stop;
- }
- }
-
- /*
- * If i_size is contained in the last page, we need to
- * unmap and zero the partial page after i_size
- */
- if (inode->i_size >> PAGE_CACHE_SHIFT == last_page &&
- inode->i_size % PAGE_CACHE_SIZE != 0) {
- page_len = PAGE_CACHE_SIZE -
- (inode->i_size & (PAGE_CACHE_SIZE - 1));
-
- if (page_len > 0) {
- ret = ext4_discard_partial_page_buffers(handle,
- mapping, inode->i_size, page_len, 0);
-
- if (ret)
- goto out_stop;
- }
- }
+ ret = ext4_zero_partial_blocks(handle, inode, offset,
+ offset + length - 1);
+ if (ret)
+ goto out_stop;

first_block = (offset + sb->s_blocksize - 1) >>
EXT4_BLOCK_SIZE_BITS(sb);
--
1.7.7.6

2013-04-09 09:15:22

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v3 17/18] ext4: make punch hole code path work with bigalloc

Currently punch hole is disabled in file systems with bigalloc
feature enabled. However the recent changes in punch hole patch should
make it easier to support punching holes on bigalloc enabled file
systems.

This commit changes partial_cluster handling in ext4_remove_blocks(),
ext4_ext_rm_leaf() and ext4_ext_remove_space(). Currently
partial_cluster is unsigned long long type and it makes sure that we
will free the partial cluster if all extents has been released from that
cluster. However it has been specifically designed only for truncate.

With punch hole we can be freeing just some extents in the cluster
leaving the rest untouched. So we have to make sure that we will notice
cluster which still has some extents. To do this I've changed
partial_cluster to be signed long long type. The only scenario where
this could be a problem is when cluster_size == block size, however in
that case there would not be any partial clusters so we're safe. For
bigger clusters the signed type is enough. Now we use the negative value
in partial_cluster to mark such cluster used, hence we know that we must
not free it even if all other extents has been freed from such cluster.

This scenario can be described in simple diagram:

|FFF...FF..FF.UUU|
^----------^
punch hole

. - free space
| - cluster boundary
F - freed extent
U - used extent

Also update respective tracepoints to use signed long long type for
partial_cluster.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/extents.c | 69 +++++++++++++++++++++++++++++++-----------
include/trace/events/ext4.h | 25 ++++++++-------
2 files changed, 64 insertions(+), 30 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 9023b76..577c4f5 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2362,7 +2362,7 @@ int ext4_ext_index_trans_blocks(struct inode *inode, int nrblocks, int chunk)

static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
struct ext4_extent *ex,
- ext4_fsblk_t *partial_cluster,
+ signed long long *partial_cluster,
ext4_lblk_t from, ext4_lblk_t to)
{
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
@@ -2391,7 +2391,8 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
* partial cluster here.
*/
pblk = ext4_ext_pblock(ex) + ee_len - 1;
- if (*partial_cluster && (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
+ if ((*partial_cluster > 0) &&
+ (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
ext4_free_blocks(handle, inode, NULL,
EXT4_C2B(sbi, *partial_cluster),
sbi->s_cluster_ratio, flags);
@@ -2417,23 +2418,41 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
&& to == le32_to_cpu(ex->ee_block) + ee_len - 1) {
/* tail removal */
ext4_lblk_t num;
+ unsigned int unaligned;

num = le32_to_cpu(ex->ee_block) + ee_len - from;
pblk = ext4_ext_pblock(ex) + ee_len - num;
- ext_debug("free last %u blocks starting %llu\n", num, pblk);
+ /*
+ * Usually we want to free partial cluster at the end of the
+ * extent, except for the situation when the cluster is still
+ * used by any other extent (partial_cluster is negative).
+ */
+ if (*partial_cluster < 0 &&
+ -(*partial_cluster) == EXT4_B2C(sbi, pblk + num - 1))
+ flags |= EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER;
+
+ ext_debug("free last %u blocks starting %llu partial %lld\n",
+ num, pblk, *partial_cluster);
ext4_free_blocks(handle, inode, NULL, pblk, num, flags);
/*
* If the block range to be freed didn't start at the
* beginning of a cluster, and we removed the entire
- * extent, save the partial cluster here, since we
- * might need to delete if we determine that the
- * truncate operation has removed all of the blocks in
- * the cluster.
+ * extent and the cluster is not used by any other extent,
+ * save the partial cluster here, since we might need to
+ * delete if we determine that the truncate operation has
+ * removed all of the blocks in the cluster.
+ *
+ * On the other hand, if we did not manage to free the whole
+ * extent, we have to mark the cluster as used (store negative
+ * cluster number in partial_cluster).
*/
- if (pblk & (sbi->s_cluster_ratio - 1) &&
- (ee_len == num))
+ unaligned = pblk & (sbi->s_cluster_ratio - 1);
+ if (unaligned && (ee_len == num) &&
+ (*partial_cluster != -((long long)EXT4_B2C(sbi, pblk))))
*partial_cluster = EXT4_B2C(sbi, pblk);
- else
+ else if (unaligned)
+ *partial_cluster = -((long long)EXT4_B2C(sbi, pblk));
+ else if (*partial_cluster > 0)
*partial_cluster = 0;
} else
ext4_error(sbi->s_sb, "strange request: removal(2) "
@@ -2451,12 +2470,16 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
* @handle: The journal handle
* @inode: The files inode
* @path: The path to the leaf
+ * @partial_cluster: The cluster which we'll have to free if all extents
+ * has been released from it. It gets negative in case
+ * that the cluster is still used.
* @start: The first block to remove
* @end: The last block to remove
*/
static int
ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
- struct ext4_ext_path *path, ext4_fsblk_t *partial_cluster,
+ struct ext4_ext_path *path,
+ signed long long *partial_cluster,
ext4_lblk_t start, ext4_lblk_t end)
{
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
@@ -2469,6 +2492,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
unsigned short ex_ee_len;
unsigned uninitialized = 0;
struct ext4_extent *ex;
+ ext4_fsblk_t pblk;

/* the header must be checked already in ext4_ext_remove_space() */
ext_debug("truncate since %u in leaf to %u\n", start, end);
@@ -2507,6 +2531,16 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,

/* If this extent is beyond the end of the hole, skip it */
if (end < ex_ee_block) {
+ /*
+ * We're going to skip this extent and move to another,
+ * so if this extent is not cluster aligned we have
+ * to mark the current cluster as used to avoid
+ * accidentally freeing it later on
+ */
+ pblk = ext4_ext_pblock(ex);
+ if (pblk & (sbi->s_cluster_ratio - 1))
+ *partial_cluster =
+ -((long long)EXT4_B2C(sbi, pblk));
ex--;
ex_ee_block = le32_to_cpu(ex->ee_block);
ex_ee_len = ext4_ext_get_actual_len(ex);
@@ -2582,7 +2616,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
sizeof(struct ext4_extent));
}
le16_add_cpu(&eh->eh_entries, -1);
- } else
+ } else if (*partial_cluster > 0)
*partial_cluster = 0;

err = ext4_ext_dirty(handle, inode, path + depth);
@@ -2600,11 +2634,10 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
err = ext4_ext_correct_indexes(handle, inode, path);

/*
- * If there is still a entry in the leaf node, check to see if
- * it references the partial cluster. This is the only place
- * where it could; if it doesn't, we can free the cluster.
+ * Free the partial cluster only if the current extent does not
+ * reference it. Otherwise we might free used cluster.
*/
- if (*partial_cluster && ex >= EXT_FIRST_EXTENT(eh) &&
+ if (*partial_cluster > 0 &&
(EXT4_B2C(sbi, ext4_ext_pblock(ex) + ex_ee_len - 1) !=
*partial_cluster)) {
int flags = EXT4_FREE_BLOCKS_FORGET;
@@ -2654,7 +2687,7 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start,
struct super_block *sb = inode->i_sb;
int depth = ext_depth(inode);
struct ext4_ext_path *path = NULL;
- ext4_fsblk_t partial_cluster = 0;
+ signed long long partial_cluster = 0;
handle_t *handle;
int i = 0, err = 0;

@@ -2838,7 +2871,7 @@ again:
/* If we still have something in the partial cluster and we have removed
* even the first extent, then we should free the blocks in the partial
* cluster as well. */
- if (partial_cluster && path->p_hdr->eh_entries == 0) {
+ if (partial_cluster > 0 && path->p_hdr->eh_entries == 0) {
int flags = EXT4_FREE_BLOCKS_FORGET;

if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index c92500c..5028d05 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -1928,7 +1928,7 @@ TRACE_EVENT(ext4_ext_show_extent,
TRACE_EVENT(ext4_remove_blocks,
TP_PROTO(struct inode *inode, struct ext4_extent *ex,
ext4_lblk_t from, ext4_fsblk_t to,
- ext4_fsblk_t partial_cluster),
+ long long int partial_cluster),

TP_ARGS(inode, ex, from, to, partial_cluster),

@@ -1937,7 +1937,7 @@ TRACE_EVENT(ext4_remove_blocks,
__field( ino_t, ino )
__field( ext4_lblk_t, from )
__field( ext4_lblk_t, to )
- __field( ext4_fsblk_t, partial )
+ __field( long long int, partial )
__field( ext4_fsblk_t, ee_pblk )
__field( ext4_lblk_t, ee_lblk )
__field( unsigned short, ee_len )
@@ -1955,7 +1955,7 @@ TRACE_EVENT(ext4_remove_blocks,
),

TP_printk("dev %d,%d ino %lu extent [%u(%llu), %u]"
- "from %u to %u partial_cluster %u",
+ "from %u to %u partial_cluster %lld",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned) __entry->ee_lblk,
@@ -1963,19 +1963,20 @@ TRACE_EVENT(ext4_remove_blocks,
(unsigned short) __entry->ee_len,
(unsigned) __entry->from,
(unsigned) __entry->to,
- (unsigned) __entry->partial)
+ (long long int) __entry->partial)
);

TRACE_EVENT(ext4_ext_rm_leaf,
TP_PROTO(struct inode *inode, ext4_lblk_t start,
- struct ext4_extent *ex, ext4_fsblk_t partial_cluster),
+ struct ext4_extent *ex,
+ long long int partial_cluster),

TP_ARGS(inode, start, ex, partial_cluster),

TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
- __field( ext4_fsblk_t, partial )
+ __field( long long int, partial )
__field( ext4_lblk_t, start )
__field( ext4_lblk_t, ee_lblk )
__field( ext4_fsblk_t, ee_pblk )
@@ -1993,14 +1994,14 @@ TRACE_EVENT(ext4_ext_rm_leaf,
),

TP_printk("dev %d,%d ino %lu start_lblk %u last_extent [%u(%llu), %u]"
- "partial_cluster %u",
+ "partial_cluster %lld",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned) __entry->start,
(unsigned) __entry->ee_lblk,
(unsigned long long) __entry->ee_pblk,
(unsigned short) __entry->ee_len,
- (unsigned) __entry->partial)
+ (long long int) __entry->partial)
);

TRACE_EVENT(ext4_ext_rm_idx,
@@ -2058,7 +2059,7 @@ TRACE_EVENT(ext4_ext_remove_space,

TRACE_EVENT(ext4_ext_remove_space_done,
TP_PROTO(struct inode *inode, ext4_lblk_t start, ext4_lblk_t end,
- int depth, ext4_lblk_t partial, unsigned short eh_entries),
+ int depth, long long int partial, unsigned short eh_entries),

TP_ARGS(inode, start, end, depth, partial, eh_entries),

@@ -2068,7 +2069,7 @@ TRACE_EVENT(ext4_ext_remove_space_done,
__field( ext4_lblk_t, start )
__field( ext4_lblk_t, end )
__field( int, depth )
- __field( ext4_lblk_t, partial )
+ __field( long long int, partial )
__field( unsigned short, eh_entries )
),

@@ -2082,14 +2083,14 @@ TRACE_EVENT(ext4_ext_remove_space_done,
__entry->eh_entries = eh_entries;
),

- TP_printk("dev %d,%d ino %lu start %u end %u depth %d partial %u "
+ TP_printk("dev %d,%d ino %lu start %u end %u depth %d partial %lld "
"remaining_entries %u",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned) __entry->start,
(unsigned) __entry->end,
__entry->depth,
- (unsigned) __entry->partial,
+ (long long int) __entry->partial,
(unsigned short) __entry->eh_entries)
);

--
1.7.7.6

2013-04-09 09:15:20

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v3 18/18] ext4: Allow punch hole with bigalloc enabled

In commits 5f95d21fb6f2aaa52830e5b7fb405f6c71d3ab85 and
30bc2ec9598a1b156ad75217f2e7d4560efdeeab we've reworked punch_hole
implementation and there is noting holding us back from using punch hole
on file system with bigalloc feature enabled.

This has been tested with fsx and xfstests.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/inode.c | 5 -----
1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 0d452c1..87d6171 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3536,11 +3536,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
if (!S_ISREG(inode->i_mode))
return -EOPNOTSUPP;

- if (EXT4_SB(sb)->s_cluster_ratio > 1) {
- /* TODO: Add support for bigalloc file systems */
- return -EOPNOTSUPP;
- }
-
trace_ext4_punch_hole(inode, offset, length);

/*
--
1.7.7.6

2013-04-09 09:17:14

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v3 15/18] ext4: remove unused code from ext4_remove_blocks()

The "head removal" branch in the condition is never used in any code
path in ext4 since the function only caller ext4_ext_rm_leaf() will make
sure that the extent is properly split before removing blocks. Note that
there is a bug in this branch anyway.

This commit removes the unused code completely and makes use of
ext4_error() instead of printk if dubious range is provided.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/extents.c | 21 ++++-----------------
1 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 6c5a70a..4adaa8a 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2435,23 +2435,10 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
*partial_cluster = EXT4_B2C(sbi, pblk);
else
*partial_cluster = 0;
- } else if (from == le32_to_cpu(ex->ee_block)
- && to <= le32_to_cpu(ex->ee_block) + ee_len - 1) {
- /* head removal */
- ext4_lblk_t num;
- ext4_fsblk_t start;
-
- num = to - from;
- start = ext4_ext_pblock(ex);
-
- ext_debug("free first %u blocks starting %llu\n", num, start);
- ext4_free_blocks(handle, inode, NULL, start, num, flags);
-
- } else {
- printk(KERN_INFO "strange request: removal(2) "
- "%u-%u from %u:%u\n",
- from, to, le32_to_cpu(ex->ee_block), ee_len);
- }
+ } else
+ ext4_error(sbi->s_sb, "strange request: removal(2) "
+ "%u-%u from %u:%u\n",
+ from, to, le32_to_cpu(ex->ee_block), ee_len);
return 0;
}

--
1.7.7.6

2013-04-09 09:17:13

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v3 16/18] ext4: update ext4_ext_remove_space trace point

Add "end" variable.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/extents.c | 6 +++---
include/trace/events/ext4.h | 21 ++++++++++++++-------
2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 4adaa8a..9023b76 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2666,7 +2666,7 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start,
return PTR_ERR(handle);

again:
- trace_ext4_ext_remove_space(inode, start, depth);
+ trace_ext4_ext_remove_space(inode, start, end, depth);

/*
* Check if we are removing extents inside the extent tree. If that
@@ -2832,8 +2832,8 @@ again:
}
}

- trace_ext4_ext_remove_space_done(inode, start, depth, partial_cluster,
- path->p_hdr->eh_entries);
+ trace_ext4_ext_remove_space_done(inode, start, end, depth,
+ partial_cluster, path->p_hdr->eh_entries);

/* If we still have something in the partial cluster and we have removed
* even the first extent, then we should free the blocks in the partial
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 60b329a..c92500c 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -2027,14 +2027,16 @@ TRACE_EVENT(ext4_ext_rm_idx,
);

TRACE_EVENT(ext4_ext_remove_space,
- TP_PROTO(struct inode *inode, ext4_lblk_t start, int depth),
+ TP_PROTO(struct inode *inode, ext4_lblk_t start,
+ ext4_lblk_t end, int depth),

- TP_ARGS(inode, start, depth),
+ TP_ARGS(inode, start, end, depth),

TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
__field( ext4_lblk_t, start )
+ __field( ext4_lblk_t, end )
__field( int, depth )
),

@@ -2042,26 +2044,29 @@ TRACE_EVENT(ext4_ext_remove_space,
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
__entry->start = start;
+ __entry->end = end;
__entry->depth = depth;
),

- TP_printk("dev %d,%d ino %lu since %u depth %d",
+ TP_printk("dev %d,%d ino %lu start %u end %u depth %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned) __entry->start,
+ (unsigned) __entry->end,
__entry->depth)
);

TRACE_EVENT(ext4_ext_remove_space_done,
- TP_PROTO(struct inode *inode, ext4_lblk_t start, int depth,
- ext4_lblk_t partial, unsigned short eh_entries),
+ TP_PROTO(struct inode *inode, ext4_lblk_t start, ext4_lblk_t end,
+ int depth, ext4_lblk_t partial, unsigned short eh_entries),

- TP_ARGS(inode, start, depth, partial, eh_entries),
+ TP_ARGS(inode, start, end, depth, partial, eh_entries),

TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
__field( ext4_lblk_t, start )
+ __field( ext4_lblk_t, end )
__field( int, depth )
__field( ext4_lblk_t, partial )
__field( unsigned short, eh_entries )
@@ -2071,16 +2076,18 @@ TRACE_EVENT(ext4_ext_remove_space_done,
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
__entry->start = start;
+ __entry->end = end;
__entry->depth = depth;
__entry->partial = partial;
__entry->eh_entries = eh_entries;
),

- TP_printk("dev %d,%d ino %lu since %u depth %d partial %u "
+ TP_printk("dev %d,%d ino %lu start %u end %u depth %d partial %u "
"remaining_entries %u",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned) __entry->start,
+ (unsigned) __entry->end,
__entry->depth,
(unsigned) __entry->partial,
(unsigned short) __entry->eh_entries)
--
1.7.7.6

2013-04-09 09:19:19

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v3 14/18] ext4: remove unused discard_partial_page_buffers

The discard_partial_page_buffers is no longer used anywhere so we can
simply remove it including the *_no_lock variant and
EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED define.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/ext4.h | 8 --
fs/ext4/inode.c | 206 -------------------------------------------------------
2 files changed, 0 insertions(+), 214 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 2428244..cc9020e 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -593,11 +593,6 @@ enum {
#define EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER 0x0020

/*
- * Flags used by ext4_discard_partial_page_buffers
- */
-#define EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED 0x0001
-
-/*
* ioctl commands
*/
#define EXT4_IOC_GETFLAGS FS_IOC_GETFLAGS
@@ -2111,9 +2106,6 @@ extern int ext4_block_zero_page_range(handle_t *handle,
struct address_space *mapping, loff_t from, loff_t length);
extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
loff_t lstart, loff_t lend);
-extern int ext4_discard_partial_page_buffers(handle_t *handle,
- struct address_space *mapping, loff_t from,
- loff_t length, int flags);
extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
extern qsize_t *ext4_get_reserved_space(struct inode *inode);
extern void ext4_da_update_reserve_space(struct inode *inode,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 6003fd1..0d452c1 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -135,9 +135,6 @@ static void ext4_invalidatepage(struct page *page, unsigned int offset,
unsigned int length);
static int __ext4_journalled_writepage(struct page *page, unsigned int len);
static int ext4_bh_delay_or_unwritten(handle_t *handle, struct buffer_head *bh);
-static int ext4_discard_partial_page_buffers_no_lock(handle_t *handle,
- struct inode *inode, struct page *page, loff_t from,
- loff_t length, int flags);

/*
* Test whether an inode is a fast symlink.
@@ -3352,209 +3349,6 @@ void ext4_set_aops(struct inode *inode)
inode->i_mapping->a_ops = &ext4_aops;
}

-
-/*
- * ext4_discard_partial_page_buffers()
- * Wrapper function for ext4_discard_partial_page_buffers_no_lock.
- * This function finds and locks the page containing the offset
- * "from" and passes it to ext4_discard_partial_page_buffers_no_lock.
- * Calling functions that already have the page locked should call
- * ext4_discard_partial_page_buffers_no_lock directly.
- */
-int ext4_discard_partial_page_buffers(handle_t *handle,
- struct address_space *mapping, loff_t from,
- loff_t length, int flags)
-{
- struct inode *inode = mapping->host;
- struct page *page;
- int err = 0;
-
- page = find_or_create_page(mapping, from >> PAGE_CACHE_SHIFT,
- mapping_gfp_mask(mapping) & ~__GFP_FS);
- if (!page)
- return -ENOMEM;
-
- err = ext4_discard_partial_page_buffers_no_lock(handle, inode, page,
- from, length, flags);
-
- unlock_page(page);
- page_cache_release(page);
- return err;
-}
-
-/*
- * ext4_discard_partial_page_buffers_no_lock()
- * Zeros a page range of length 'length' starting from offset 'from'.
- * Buffer heads that correspond to the block aligned regions of the
- * zeroed range will be unmapped. Unblock aligned regions
- * will have the corresponding buffer head mapped if needed so that
- * that region of the page can be updated with the partial zero out.
- *
- * This function assumes that the page has already been locked. The
- * The range to be discarded must be contained with in the given page.
- * If the specified range exceeds the end of the page it will be shortened
- * to the end of the page that corresponds to 'from'. This function is
- * appropriate for updating a page and it buffer heads to be unmapped and
- * zeroed for blocks that have been either released, or are going to be
- * released.
- *
- * handle: The journal handle
- * inode: The files inode
- * page: A locked page that contains the offset "from"
- * from: The starting byte offset (from the beginning of the file)
- * to begin discarding
- * len: The length of bytes to discard
- * flags: Optional flags that may be used:
- *
- * EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED
- * Only zero the regions of the page whose buffer heads
- * have already been unmapped. This flag is appropriate
- * for updating the contents of a page whose blocks may
- * have already been released, and we only want to zero
- * out the regions that correspond to those released blocks.
- *
- * Returns zero on success or negative on failure.
- */
-static int ext4_discard_partial_page_buffers_no_lock(handle_t *handle,
- struct inode *inode, struct page *page, loff_t from,
- loff_t length, int flags)
-{
- ext4_fsblk_t index = from >> PAGE_CACHE_SHIFT;
- unsigned int offset = from & (PAGE_CACHE_SIZE-1);
- unsigned int blocksize, max, pos;
- ext4_lblk_t iblock;
- struct buffer_head *bh;
- int err = 0;
-
- blocksize = inode->i_sb->s_blocksize;
- max = PAGE_CACHE_SIZE - offset;
-
- if (index != page->index)
- return -EINVAL;
-
- /*
- * correct length if it does not fall between
- * 'from' and the end of the page
- */
- if (length > max || length < 0)
- length = max;
-
- iblock = index << (PAGE_CACHE_SHIFT - inode->i_sb->s_blocksize_bits);
-
- if (!page_has_buffers(page))
- create_empty_buffers(page, blocksize, 0);
-
- /* Find the buffer that contains "offset" */
- bh = page_buffers(page);
- pos = blocksize;
- while (offset >= pos) {
- bh = bh->b_this_page;
- iblock++;
- pos += blocksize;
- }
-
- pos = offset;
- while (pos < offset + length) {
- unsigned int end_of_block, range_to_discard;
-
- err = 0;
-
- /* The length of space left to zero and unmap */
- range_to_discard = offset + length - pos;
-
- /* The length of space until the end of the block */
- end_of_block = blocksize - (pos & (blocksize-1));
-
- /*
- * Do not unmap or zero past end of block
- * for this buffer head
- */
- if (range_to_discard > end_of_block)
- range_to_discard = end_of_block;
-
-
- /*
- * Skip this buffer head if we are only zeroing unampped
- * regions of the page
- */
- if (flags & EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED &&
- buffer_mapped(bh))
- goto next;
-
- /* If the range is block aligned, unmap */
- if (range_to_discard == blocksize) {
- clear_buffer_dirty(bh);
- bh->b_bdev = NULL;
- clear_buffer_mapped(bh);
- clear_buffer_req(bh);
- clear_buffer_new(bh);
- clear_buffer_delay(bh);
- clear_buffer_unwritten(bh);
- clear_buffer_uptodate(bh);
- zero_user(page, pos, range_to_discard);
- BUFFER_TRACE(bh, "Buffer discarded");
- goto next;
- }
-
- /*
- * If this block is not completely contained in the range
- * to be discarded, then it is not going to be released. Because
- * we need to keep this block, we need to make sure this part
- * of the page is uptodate before we modify it by writeing
- * partial zeros on it.
- */
- if (!buffer_mapped(bh)) {
- /*
- * Buffer head must be mapped before we can read
- * from the block
- */
- BUFFER_TRACE(bh, "unmapped");
- ext4_get_block(inode, iblock, bh, 0);
- /* unmapped? It's a hole - nothing to do */
- if (!buffer_mapped(bh)) {
- BUFFER_TRACE(bh, "still unmapped");
- goto next;
- }
- }
-
- /* Ok, it's mapped. Make sure it's up-to-date */
- if (PageUptodate(page))
- set_buffer_uptodate(bh);
-
- if (!buffer_uptodate(bh)) {
- err = -EIO;
- ll_rw_block(READ, 1, &bh);
- wait_on_buffer(bh);
- /* Uhhuh. Read error. Complain and punt.*/
- if (!buffer_uptodate(bh))
- goto next;
- }
-
- if (ext4_should_journal_data(inode)) {
- BUFFER_TRACE(bh, "get write access");
- err = ext4_journal_get_write_access(handle, bh);
- if (err)
- goto next;
- }
-
- zero_user(page, pos, range_to_discard);
-
- err = 0;
- if (ext4_should_journal_data(inode)) {
- err = ext4_handle_dirty_metadata(handle, inode, bh);
- } else
- mark_buffer_dirty(bh);
-
- BUFFER_TRACE(bh, "Partial buffer zeroed");
-next:
- bh = bh->b_this_page;
- iblock++;
- pos += range_to_discard;
- }
-
- return err;
-}
-
/*
* ext4_block_truncate_page() zeroes out a mapping from file offset `from'
* up to the end of the block which corresponds to `from'.
--
1.7.7.6

2013-04-09 09:15:07

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v3 11/18] Revert "ext4: remove no longer used functions in inode.c"

This reverts commit ccb4d7af914e0fe9b2f1022f8ea6c300463fd5e6.

This commit reintroduces functions ext4_block_truncate_page() and
ext4_block_zero_page_range() which has been previously removed in favour
of ext4_discard_partial_page_buffers().

In future commits we want to reintroduce those function and remove
ext4_discard_partial_page_buffers() since it is duplicating some code
and also partially duplicating work of truncate_pagecache_range(),
moreover the old implementation was much clearer.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/ext4.h | 4 ++
fs/ext4/inode.c | 120 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 124 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index a0637e5..3aa5943 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2105,6 +2105,10 @@ extern int ext4_alloc_da_blocks(struct inode *inode);
extern void ext4_set_aops(struct inode *inode);
extern int ext4_writepage_trans_blocks(struct inode *);
extern int ext4_chunk_trans_blocks(struct inode *, int nrblocks);
+extern int ext4_block_truncate_page(handle_t *handle,
+ struct address_space *mapping, loff_t from);
+extern int ext4_block_zero_page_range(handle_t *handle,
+ struct address_space *mapping, loff_t from, loff_t length);
extern int ext4_discard_partial_page_buffers(handle_t *handle,
struct address_space *mapping, loff_t from,
loff_t length, int flags);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index f80e0c3..5729d21 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3555,6 +3555,126 @@ next:
return err;
}

+/*
+ * ext4_block_truncate_page() zeroes out a mapping from file offset `from'
+ * up to the end of the block which corresponds to `from'.
+ * This required during truncate. We need to physically zero the tail end
+ * of that block so it doesn't yield old data if the file is later grown.
+ */
+int ext4_block_truncate_page(handle_t *handle,
+ struct address_space *mapping, loff_t from)
+{
+ unsigned offset = from & (PAGE_CACHE_SIZE-1);
+ unsigned length;
+ unsigned blocksize;
+ struct inode *inode = mapping->host;
+
+ blocksize = inode->i_sb->s_blocksize;
+ length = blocksize - (offset & (blocksize - 1));
+
+ return ext4_block_zero_page_range(handle, mapping, from, length);
+}
+
+/*
+ * ext4_block_zero_page_range() zeros out a mapping of length 'length'
+ * starting from file offset 'from'. The range to be zero'd must
+ * be contained with in one block. If the specified range exceeds
+ * the end of the block it will be shortened to end of the block
+ * that cooresponds to 'from'
+ */
+int ext4_block_zero_page_range(handle_t *handle,
+ struct address_space *mapping, loff_t from, loff_t length)
+{
+ ext4_fsblk_t index = from >> PAGE_CACHE_SHIFT;
+ unsigned offset = from & (PAGE_CACHE_SIZE-1);
+ unsigned blocksize, max, pos;
+ ext4_lblk_t iblock;
+ struct inode *inode = mapping->host;
+ struct buffer_head *bh;
+ struct page *page;
+ int err = 0;
+
+ page = find_or_create_page(mapping, from >> PAGE_CACHE_SHIFT,
+ mapping_gfp_mask(mapping) & ~__GFP_FS);
+ if (!page)
+ return -ENOMEM;
+
+ blocksize = inode->i_sb->s_blocksize;
+ max = blocksize - (offset & (blocksize - 1));
+
+ /*
+ * correct length if it does not fall between
+ * 'from' and the end of the block
+ */
+ if (length > max || length < 0)
+ length = max;
+
+ iblock = index << (PAGE_CACHE_SHIFT - inode->i_sb->s_blocksize_bits);
+
+ if (!page_has_buffers(page))
+ create_empty_buffers(page, blocksize, 0);
+
+ /* Find the buffer that contains "offset" */
+ bh = page_buffers(page);
+ pos = blocksize;
+ while (offset >= pos) {
+ bh = bh->b_this_page;
+ iblock++;
+ pos += blocksize;
+ }
+
+ err = 0;
+ if (buffer_freed(bh)) {
+ BUFFER_TRACE(bh, "freed: skip");
+ goto unlock;
+ }
+
+ if (!buffer_mapped(bh)) {
+ BUFFER_TRACE(bh, "unmapped");
+ ext4_get_block(inode, iblock, bh, 0);
+ /* unmapped? It's a hole - nothing to do */
+ if (!buffer_mapped(bh)) {
+ BUFFER_TRACE(bh, "still unmapped");
+ goto unlock;
+ }
+ }
+
+ /* Ok, it's mapped. Make sure it's up-to-date */
+ if (PageUptodate(page))
+ set_buffer_uptodate(bh);
+
+ if (!buffer_uptodate(bh)) {
+ err = -EIO;
+ ll_rw_block(READ, 1, &bh);
+ wait_on_buffer(bh);
+ /* Uhhuh. Read error. Complain and punt. */
+ if (!buffer_uptodate(bh))
+ goto unlock;
+ }
+
+ if (ext4_should_journal_data(inode)) {
+ BUFFER_TRACE(bh, "get write access");
+ err = ext4_journal_get_write_access(handle, bh);
+ if (err)
+ goto unlock;
+ }
+
+ zero_user(page, offset, length);
+
+ BUFFER_TRACE(bh, "zeroed end of block");
+
+ err = 0;
+ if (ext4_should_journal_data(inode)) {
+ err = ext4_handle_dirty_metadata(handle, inode, bh);
+ } else
+ mark_buffer_dirty(bh);
+
+unlock:
+ unlock_page(page);
+ page_cache_release(page);
+ return err;
+}
+
int ext4_can_truncate(struct inode *inode)
{
if (S_ISREG(inode->i_mode))
--
1.7.7.6

2013-04-09 09:22:34

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v3 12/18] Revert "ext4: fix fsx truncate failure"

This reverts commit 189e868fa8fdca702eb9db9d8afc46b5cb9144c9.

This commit reintroduces the use of ext4_block_truncate_page() in ext4
truncate operation instead of ext4_discard_partial_page_buffers().

The statement in the commit description that the truncate operation only
zero block unaligned portion of the last page is not exactly right,
since truncate_pagecache_range() also zeroes and invalidate the unaligned
portion of the page. Then there is no need to zero and unmap it once more
and ext4_block_truncate_page() was doing the right job, although we
still need to update the buffer head containing the last block, which is
exactly what ext4_block_truncate_page() is doing.

Moreover the problem described in the commit is fixed more properly with
commit

15291164b22a357cb211b618adfef4fa82fc0de3
jbd2: clear BH_Delay & BH_Unwritten in journal_unmap_buffer

This was tested on ppc64 machine with block size of 1024 bytes without
any problems.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/inode.c | 11 ++---------
1 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5729d21..d58e13c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3920,7 +3920,6 @@ void ext4_truncate(struct inode *inode)
unsigned int credits;
handle_t *handle;
struct address_space *mapping = inode->i_mapping;
- loff_t page_len;

/*
* There is a possibility that we're either freeing the inode
@@ -3964,14 +3963,8 @@ void ext4_truncate(struct inode *inode)
return;
}

- if (inode->i_size % PAGE_CACHE_SIZE != 0) {
- page_len = PAGE_CACHE_SIZE -
- (inode->i_size & (PAGE_CACHE_SIZE - 1));
-
- if (ext4_discard_partial_page_buffers(handle,
- mapping, inode->i_size, page_len, 0))
- goto out_stop;
- }
+ if (inode->i_size & (inode->i_sb->s_blocksize - 1))
+ ext4_block_truncate_page(handle, mapping, inode->i_size);

/*
* We add the inode to the orphan list, so that if this
--
1.7.7.6

2013-04-09 09:23:03

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v3 09/18] reiserfs: use ->invalidatepage() length argument

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in reiserfs_invalidatepage()

Signed-off-by: Lukas Czerner <[email protected]>
Cc: [email protected]
---
fs/reiserfs/inode.c | 9 +++++++--
1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index 808e02e..e963164 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -2975,11 +2975,13 @@ static void reiserfs_invalidatepage(struct page *page, unsigned int offset,
struct buffer_head *head, *bh, *next;
struct inode *inode = page->mapping->host;
unsigned int curr_off = 0;
+ unsigned int stop = offset + length;
+ int partial_page = (offset || length < PAGE_CACHE_SIZE);
int ret = 1;

BUG_ON(!PageLocked(page));

- if (offset == 0)
+ if (!partial_page)
ClearPageChecked(page);

if (!page_has_buffers(page))
@@ -2991,6 +2993,9 @@ static void reiserfs_invalidatepage(struct page *page, unsigned int offset,
unsigned int next_off = curr_off + bh->b_size;
next = bh->b_this_page;

+ if (next_off > stop)
+ goto out;
+
/*
* is this block fully invalidated?
*/
@@ -3009,7 +3014,7 @@ static void reiserfs_invalidatepage(struct page *page, unsigned int offset,
* The get_block cached value has been unconditionally invalidated,
* so real IO is not possible anymore.
*/
- if (!offset && ret) {
+ if (!partial_page && ret) {
ret = try_to_release_page(page, 0);
/* maybe should BUG_ON(!ret); - neilb */
}
--
1.7.7.6

2013-04-09 09:27:27

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v3 06/18] ocfs2: use ->invalidatepage() length argument

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in ocfs2_invalidatepage().

Signed-off-by: Lukas Czerner <[email protected]>
Cc: Joel Becker <[email protected]>
---
fs/ocfs2/aops.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 7c47755..79736a2 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -608,8 +608,7 @@ static void ocfs2_invalidatepage(struct page *page, unsigned int offset,
{
journal_t *journal = OCFS2_SB(page->mapping->host->i_sb)->journal->j_journal;

- jbd2_journal_invalidatepage(journal, page, offset,
- PAGE_CACHE_SIZE - offset);
+ jbd2_journal_invalidatepage(journal, page, offset, length);
}

static int ocfs2_releasepage(struct page *page, gfp_t wait)
--
1.7.7.6

2013-04-09 09:14:53

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v3 05/18] xfs: use ->invalidatepage() length argument

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in xfs_vm_invalidatepage()

Signed-off-by: Lukas Czerner <[email protected]>
Cc: [email protected]
---
fs/xfs/xfs_aops.c | 5 +++--
fs/xfs/xfs_trace.h | 41 ++++++++++++++++++++++++++++++++++++++++-
2 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index e426796..e8018d3 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -826,8 +826,9 @@ xfs_vm_invalidatepage(
unsigned int offset,
unsigned int length)
{
- trace_xfs_invalidatepage(page->mapping->host, page, offset);
- block_invalidatepage(page, offset, PAGE_CACHE_SIZE - offset);
+ trace_xfs_invalidatepage(page->mapping->host, page, offset,
+ length);
+ block_invalidatepage(page, offset, length);
}

/*
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 16a8129..91d6434 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -991,7 +991,46 @@ DEFINE_EVENT(xfs_page_class, name, \
TP_ARGS(inode, page, off))
DEFINE_PAGE_EVENT(xfs_writepage);
DEFINE_PAGE_EVENT(xfs_releasepage);
-DEFINE_PAGE_EVENT(xfs_invalidatepage);
+
+TRACE_EVENT(xfs_invalidatepage,
+ TP_PROTO(struct inode *inode, struct page *page, unsigned int off,
+ unsigned int len),
+ TP_ARGS(inode, page, off, len),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_ino_t, ino)
+ __field(pgoff_t, pgoff)
+ __field(loff_t, size)
+ __field(unsigned int, offset)
+ __field(unsigned int, length)
+ __field(int, delalloc)
+ __field(int, unwritten)
+ ),
+ TP_fast_assign(
+ int delalloc = -1, unwritten = -1;
+
+ if (page_has_buffers(page))
+ xfs_count_page_state(page, &delalloc, &unwritten);
+ __entry->dev = inode->i_sb->s_dev;
+ __entry->ino = XFS_I(inode)->i_ino;
+ __entry->pgoff = page_offset(page);
+ __entry->size = i_size_read(inode);
+ __entry->offset = off;
+ __entry->length = len;
+ __entry->delalloc = delalloc;
+ __entry->unwritten = unwritten;
+ ),
+ TP_printk("dev %d:%d ino 0x%llx pgoff 0x%lx size 0x%llx offset %x "
+ "length %x delalloc %d unwritten %d",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->ino,
+ __entry->pgoff,
+ __entry->size,
+ __entry->offset,
+ __entry->length,
+ __entry->delalloc,
+ __entry->unwritten)
+)

DECLARE_EVENT_CLASS(xfs_imap_class,
TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count,
--
1.7.7.6

2013-04-09 09:28:15

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v3 07/18] ceph: use ->invalidatepage() length argument

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in ceph_invalidatepage().

Signed-off-by: Lukas Czerner <[email protected]>
Cc: [email protected]
---
fs/ceph/addr.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 168a35a..d953afd 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -164,20 +164,20 @@ static void ceph_invalidatepage(struct page *page, unsigned int offset,
if (!PageDirty(page))
pr_err("%p invalidatepage %p page not dirty\n", inode, page);

- if (offset == 0)
+ if (offset == 0 && length == PAGE_CACHE_SIZE)
ClearPageChecked(page);

ci = ceph_inode(inode);
- if (offset == 0) {
- dout("%p invalidatepage %p idx %lu full dirty page %u\n",
- inode, page, page->index, offset);
+ if (offset == 0 && length == PAGE_CACHE_SIZE) {
+ dout("%p invalidatepage %p idx %lu full dirty page\n",
+ inode, page, page->index);
ceph_put_wrbuffer_cap_refs(ci, 1, snapc);
ceph_put_snap_context(snapc);
page->private = 0;
ClearPagePrivate(page);
} else {
- dout("%p invalidatepage %p idx %lu partial dirty page\n",
- inode, page, page->index);
+ dout("%p invalidatepage %p idx %lu partial dirty page %u(%u)\n",
+ inode, page, page->index, offset, length);
}
}

--
1.7.7.6

2013-04-09 09:29:00

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v3 04/18] jbd: change journal_invalidatepage() to accept length

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in journal_invalidatepage() and all the users in ext3 file
system. Also update ext3 trace point to print out length argument.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext3/inode.c | 6 +++---
fs/jbd/transaction.c | 19 ++++++++++++++-----
include/linux/jbd.h | 2 +-
include/trace/events/ext3.h | 12 +++++++-----
4 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index 349d4ce..b12936b 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -1828,15 +1828,15 @@ static void ext3_invalidatepage(struct page *page, unsigned int offset,
{
journal_t *journal = EXT3_JOURNAL(page->mapping->host);

- trace_ext3_invalidatepage(page, offset);
+ trace_ext3_invalidatepage(page, offset, length);

/*
* If it's a full truncate we just forget about the pending dirtying
*/
- if (offset == 0)
+ if (offset == 0 && length == PAGE_CACHE_SIZE)
ClearPageChecked(page);

- journal_invalidatepage(journal, page, offset);
+ journal_invalidatepage(journal, page, offset, length);
}

static int ext3_releasepage(struct page *page, gfp_t wait)
diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index 071d690..a1fef89 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -2020,16 +2020,20 @@ zap_buffer_unlocked:
* void journal_invalidatepage() - invalidate a journal page
* @journal: journal to use for flush
* @page: page to flush
- * @offset: length of page to invalidate.
+ * @offset: offset of the range to invalidate
+ * @length: length of the range to invalidate
*
- * Reap page buffers containing data after offset in page.
+ * Reap page buffers containing data in specified range in page.
*/
void journal_invalidatepage(journal_t *journal,
struct page *page,
- unsigned long offset)
+ unsigned int offset,
+ unsigned int length)
{
struct buffer_head *head, *bh, *next;
+ unsigned int stop = offset + length;
unsigned int curr_off = 0;
+ int partial_page = (offset || length < PAGE_CACHE_SIZE);
int may_free = 1;

if (!PageLocked(page))
@@ -2037,6 +2041,8 @@ void journal_invalidatepage(journal_t *journal,
if (!page_has_buffers(page))
return;

+ BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
+
/* We will potentially be playing with lists other than just the
* data lists (especially for journaled data mode), so be
* cautious in our locking. */
@@ -2046,11 +2052,14 @@ void journal_invalidatepage(journal_t *journal,
unsigned int next_off = curr_off + bh->b_size;
next = bh->b_this_page;

+ if (next_off > stop)
+ return;
+
if (offset <= curr_off) {
/* This block is wholly outside the truncation point */
lock_buffer(bh);
may_free &= journal_unmap_buffer(journal, bh,
- offset > 0);
+ partial_page);
unlock_buffer(bh);
}
curr_off = next_off;
@@ -2058,7 +2067,7 @@ void journal_invalidatepage(journal_t *journal,

} while (bh != head);

- if (!offset) {
+ if (!partial_page) {
if (may_free && try_to_free_buffers(page))
J_ASSERT(!page_has_buffers(page));
}
diff --git a/include/linux/jbd.h b/include/linux/jbd.h
index c8f3297..d02e16c 100644
--- a/include/linux/jbd.h
+++ b/include/linux/jbd.h
@@ -840,7 +840,7 @@ extern void journal_release_buffer (handle_t *, struct buffer_head *);
extern int journal_forget (handle_t *, struct buffer_head *);
extern void journal_sync_buffer (struct buffer_head *);
extern void journal_invalidatepage(journal_t *,
- struct page *, unsigned long);
+ struct page *, unsigned int, unsigned int);
extern int journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
extern int journal_stop(handle_t *);
extern int journal_flush (journal_t *);
diff --git a/include/trace/events/ext3.h b/include/trace/events/ext3.h
index 15d11a3..6797b9d 100644
--- a/include/trace/events/ext3.h
+++ b/include/trace/events/ext3.h
@@ -290,13 +290,14 @@ DEFINE_EVENT(ext3__page_op, ext3_releasepage,
);

TRACE_EVENT(ext3_invalidatepage,
- TP_PROTO(struct page *page, unsigned long offset),
+ TP_PROTO(struct page *page, unsigned int offset, unsigned int length),

- TP_ARGS(page, offset),
+ TP_ARGS(page, offset, length),

TP_STRUCT__entry(
__field( pgoff_t, index )
- __field( unsigned long, offset )
+ __field( unsigned int, offset )
+ __field( unsigned int, length )
__field( ino_t, ino )
__field( dev_t, dev )

@@ -305,14 +306,15 @@ TRACE_EVENT(ext3_invalidatepage,
TP_fast_assign(
__entry->index = page->index;
__entry->offset = offset;
+ __entry->length = length;
__entry->ino = page->mapping->host->i_ino;
__entry->dev = page->mapping->host->i_sb->s_dev;
),

- TP_printk("dev %d,%d ino %lu page_index %lu offset %lu",
+ TP_printk("dev %d,%d ino %lu page_index %lu offset %u length %u",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
- __entry->index, __entry->offset)
+ __entry->index, __entry->offset, __entry->length)
);

TRACE_EVENT(ext3_discard_blocks,
--
1.7.7.6

2013-04-09 09:30:22

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v3 03/18] ext4: use ->invalidatepage() length argument

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in all ext4 invalidatepage routines.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/inode.c | 30 +++++++++++++++++++-----------
include/trace/events/ext4.h | 22 ++++++++++++----------
2 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 69595f5..f80e0c3 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1411,21 +1411,28 @@ static void ext4_da_release_space(struct inode *inode, int to_free)
}

static void ext4_da_page_release_reservation(struct page *page,
- unsigned long offset)
+ unsigned int offset,
+ unsigned int length)
{
int to_release = 0;
struct buffer_head *head, *bh;
unsigned int curr_off = 0;
struct inode *inode = page->mapping->host;
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+ unsigned int stop = offset + length;
int num_clusters;
ext4_fsblk_t lblk;

+ BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
+
head = page_buffers(page);
bh = head;
do {
unsigned int next_off = curr_off + bh->b_size;

+ if (next_off > stop)
+ break;
+
if ((offset <= curr_off) && (buffer_delay(bh))) {
to_release++;
clear_buffer_delay(bh);
@@ -2825,7 +2832,7 @@ static void ext4_da_invalidatepage(struct page *page, unsigned int offset,
if (!page_has_buffers(page))
goto out;

- ext4_da_page_release_reservation(page, offset);
+ ext4_da_page_release_reservation(page, offset, length);

out:
ext4_invalidatepage(page, offset, length);
@@ -2979,29 +2986,29 @@ ext4_readpages(struct file *file, struct address_space *mapping,
static void ext4_invalidatepage(struct page *page, unsigned int offset,
unsigned int length)
{
- trace_ext4_invalidatepage(page, offset);
+ trace_ext4_invalidatepage(page, offset, length);

/* No journalling happens on data buffers when this function is used */
WARN_ON(page_has_buffers(page) && buffer_jbd(page_buffers(page)));

- block_invalidatepage(page, offset, PAGE_CACHE_SIZE - offset);
+ block_invalidatepage(page, offset, length);
}

static int __ext4_journalled_invalidatepage(struct page *page,
- unsigned long offset)
+ unsigned int offset,
+ unsigned int length)
{
journal_t *journal = EXT4_JOURNAL(page->mapping->host);

- trace_ext4_journalled_invalidatepage(page, offset);
+ trace_ext4_journalled_invalidatepage(page, offset, length);

/*
* If it's a full truncate we just forget about the pending dirtying
*/
- if (offset == 0)
+ if (offset == 0 && length == PAGE_CACHE_SIZE)
ClearPageChecked(page);

- return jbd2_journal_invalidatepage(journal, page, offset,
- PAGE_CACHE_SIZE - offset);
+ return jbd2_journal_invalidatepage(journal, page, offset, length);
}

/* Wrapper for aops... */
@@ -3009,7 +3016,7 @@ static void ext4_journalled_invalidatepage(struct page *page,
unsigned int offset,
unsigned int length)
{
- WARN_ON(__ext4_journalled_invalidatepage(page, offset) < 0);
+ WARN_ON(__ext4_journalled_invalidatepage(page, offset, length) < 0);
}

static int ext4_releasepage(struct page *page, gfp_t wait)
@@ -4607,7 +4614,8 @@ static void ext4_wait_for_tail_page_commit(struct inode *inode)
inode->i_size >> PAGE_CACHE_SHIFT);
if (!page)
return;
- ret = __ext4_journalled_invalidatepage(page, offset);
+ ret = __ext4_journalled_invalidatepage(page, offset,
+ PAGE_CACHE_SIZE - offset);
unlock_page(page);
page_cache_release(page);
if (ret != -EBUSY)
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 58459b7..60b329a 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -444,16 +444,16 @@ DEFINE_EVENT(ext4__page_op, ext4_releasepage,
);

DECLARE_EVENT_CLASS(ext4_invalidatepage_op,
- TP_PROTO(struct page *page, unsigned long offset),
+ TP_PROTO(struct page *page, unsigned int offset, unsigned int length),

- TP_ARGS(page, offset),
+ TP_ARGS(page, offset, length),

TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
__field( pgoff_t, index )
- __field( unsigned long, offset )
-
+ __field( unsigned int, offset )
+ __field( unsigned int, length )
),

TP_fast_assign(
@@ -461,24 +461,26 @@ DECLARE_EVENT_CLASS(ext4_invalidatepage_op,
__entry->ino = page->mapping->host->i_ino;
__entry->index = page->index;
__entry->offset = offset;
+ __entry->length = length;
),

- TP_printk("dev %d,%d ino %lu page_index %lu offset %lu",
+ TP_printk("dev %d,%d ino %lu page_index %lu offset %u length %u",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
- (unsigned long) __entry->index, __entry->offset)
+ (unsigned long) __entry->index,
+ __entry->offset, __entry->length)
);

DEFINE_EVENT(ext4_invalidatepage_op, ext4_invalidatepage,
- TP_PROTO(struct page *page, unsigned long offset),
+ TP_PROTO(struct page *page, unsigned int offset, unsigned int length),

- TP_ARGS(page, offset)
+ TP_ARGS(page, offset, length)
);

DEFINE_EVENT(ext4_invalidatepage_op, ext4_journalled_invalidatepage,
- TP_PROTO(struct page *page, unsigned long offset),
+ TP_PROTO(struct page *page, unsigned int offset, unsigned int length),

- TP_ARGS(page, offset)
+ TP_ARGS(page, offset, length)
);

TRACE_EVENT(ext4_discard_blocks,
--
1.7.7.6

2013-04-09 09:31:22

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v3 01/18] mm: change invalidatepage prototype to accept length

Currently there is no way to truncate partial page where the end
truncate point is not at the end of the page. This is because it was not
needed and the functionality was enough for file system truncate
operation to work properly. However more file systems now support punch
hole feature and it can benefit from mm supporting truncating page just
up to the certain point.

Specifically, with this functionality truncate_inode_pages_range() can
be changed so it supports truncating partial page at the end of the
range (currently it will BUG_ON() if 'end' is not at the end of the
page).

This commit changes the invalidatepage() address space operation
prototype to accept range to be invalidated and update all the instances
for it.

We also change the block_invalidatepage() in the same way and actually
make a use of the new length argument implementing range invalidation.

Actual file system implementations will follow except the file systems
where the changes are really simple and should not change the behaviour
in any way .Implementation for truncate_page_range() which will be able
to accept page unaligned ranges will follow as well.

Signed-off-by: Lukas Czerner <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Hugh Dickins <[email protected]>
---
Documentation/filesystems/Locking | 6 +++---
Documentation/filesystems/vfs.txt | 20 ++++++++++----------
fs/9p/vfs_addr.c | 5 +++--
fs/afs/file.c | 10 ++++++----
fs/btrfs/disk-io.c | 3 ++-
fs/btrfs/extent_io.c | 2 +-
fs/btrfs/inode.c | 3 ++-
fs/buffer.c | 21 ++++++++++++++++++---
fs/ceph/addr.c | 5 +++--
fs/cifs/file.c | 5 +++--
fs/exofs/inode.c | 6 ++++--
fs/ext3/inode.c | 3 ++-
fs/ext4/inode.c | 18 +++++++++++-------
fs/f2fs/data.c | 3 ++-
fs/f2fs/node.c | 3 ++-
fs/gfs2/aops.c | 8 +++++---
fs/jfs/jfs_metapage.c | 5 +++--
fs/logfs/file.c | 3 ++-
fs/logfs/segment.c | 3 ++-
fs/nfs/file.c | 8 +++++---
fs/ntfs/aops.c | 2 +-
fs/ocfs2/aops.c | 3 ++-
fs/reiserfs/inode.c | 3 ++-
fs/ubifs/file.c | 5 +++--
fs/xfs/xfs_aops.c | 7 ++++---
include/linux/buffer_head.h | 3 ++-
include/linux/fs.h | 2 +-
include/linux/mm.h | 3 ++-
mm/readahead.c | 2 +-
mm/truncate.c | 15 +++++++++------
30 files changed, 116 insertions(+), 69 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 0706d32..cbbac3f 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -189,7 +189,7 @@ prototypes:
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata);
sector_t (*bmap)(struct address_space *, sector_t);
- int (*invalidatepage) (struct page *, unsigned long);
+ void (*invalidatepage) (struct page *, unsigned int, unsigned int);
int (*releasepage) (struct page *, int);
void (*freepage)(struct page *);
int (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
@@ -310,8 +310,8 @@ filesystems and by the swapper. The latter will eventually go away. Please,
keep it that way and don't breed new callers.

->invalidatepage() is called when the filesystem must attempt to drop
-some or all of the buffers from the page when it is being truncated. It
-returns zero on success. If ->invalidatepage is zero, the kernel uses
+some or all of the buffers from the page when it is being truncated. It
+returns zero on success. If ->invalidatepage is zero, the kernel uses
block_invalidatepage() instead.

->releasepage() is called when the kernel is about to try to drop the
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index bc4b06b..e445b95 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -549,7 +549,7 @@ struct address_space_operations
-------------------------------

This describes how the VFS can manipulate mapping of a file to page cache in
-your filesystem. As of kernel 2.6.22, the following members are defined:
+your filesystem. The following members are defined:

struct address_space_operations {
int (*writepage)(struct page *page, struct writeback_control *wbc);
@@ -566,7 +566,7 @@ struct address_space_operations {
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata);
sector_t (*bmap)(struct address_space *, sector_t);
- int (*invalidatepage) (struct page *, unsigned long);
+ void (*invalidatepage) (struct page *, unsigned int, unsigned int);
int (*releasepage) (struct page *, int);
void (*freepage)(struct page *);
ssize_t (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
@@ -685,14 +685,14 @@ struct address_space_operations {
invalidatepage: If a page has PagePrivate set, then invalidatepage
will be called when part or all of the page is to be removed
from the address space. This generally corresponds to either a
- truncation or a complete invalidation of the address space
- (in the latter case 'offset' will always be 0).
- Any private data associated with the page should be updated
- to reflect this truncation. If offset is 0, then
- the private data should be released, because the page
- must be able to be completely discarded. This may be done by
- calling the ->releasepage function, but in this case the
- release MUST succeed.
+ truncation, punch hole or a complete invalidation of the address
+ space (in the latter case 'offset' will always be 0 and 'length'
+ will be PAGE_CACHE_SIZE). Any private data associated with the page
+ should be updated to reflect this truncation. If offset is 0 and
+ length is PAGE_CACHE_SIZE, then the private data should be released,
+ because the page must be able to be completely discarded. This may
+ be done by calling the ->releasepage function, but in this case the
+ release MUST succeed.

releasepage: releasepage is called on PagePrivate pages to indicate
that the page should be freed if possible. ->releasepage
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index 0ad61c6..3bf654a 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -147,13 +147,14 @@ static int v9fs_release_page(struct page *page, gfp_t gfp)
* @offset: offset in the page
*/

-static void v9fs_invalidate_page(struct page *page, unsigned long offset)
+static void v9fs_invalidate_page(struct page *page, unsigned int offset,
+ unsigned int length)
{
/*
* If called with zero offset, we should release
* the private state assocated with the page
*/
- if (offset == 0)
+ if (offset == 0 && length == PAGE_CACHE_SIZE)
v9fs_fscache_invalidate_page(page);
}

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 8f6e923..66d50fe 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -19,7 +19,8 @@
#include "internal.h"

static int afs_readpage(struct file *file, struct page *page);
-static void afs_invalidatepage(struct page *page, unsigned long offset);
+static void afs_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length);
static int afs_releasepage(struct page *page, gfp_t gfp_flags);
static int afs_launder_page(struct page *page);

@@ -310,16 +311,17 @@ static int afs_launder_page(struct page *page)
* - release a page and clean up its private data if offset is 0 (indicating
* the entire page)
*/
-static void afs_invalidatepage(struct page *page, unsigned long offset)
+static void afs_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct afs_writeback *wb = (struct afs_writeback *) page_private(page);

- _enter("{%lu},%lu", page->index, offset);
+ _enter("{%lu},%u,%u", page->index, offset, length);

BUG_ON(!PageLocked(page));

/* we clean up only if the entire page is being invalidated */
- if (offset == 0) {
+ if (offset == 0 && length == PAGE_CACHE_SIZE) {
#ifdef CONFIG_AFS_FSCACHE
if (PageFsCache(page)) {
struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 6d19a0a..fed6e19 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1003,7 +1003,8 @@ static int btree_releasepage(struct page *page, gfp_t gfp_flags)
return try_release_extent_buffer(page, gfp_flags);
}

-static void btree_invalidatepage(struct page *page, unsigned long offset)
+static void btree_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct extent_io_tree *tree;
tree = &BTRFS_I(page->mapping->host)->io_tree;
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index cdee391..a89b561 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2874,7 +2874,7 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
pg_offset = i_size & (PAGE_CACHE_SIZE - 1);
if (page->index > end_index ||
(page->index == end_index && !pg_offset)) {
- page->mapping->a_ops->invalidatepage(page, 0);
+ page->mapping->a_ops->invalidatepage(page, 0, PAGE_CACHE_SIZE);
unlock_page(page);
return 0;
}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 09c58a3..49eebed 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7467,7 +7467,8 @@ static int btrfs_releasepage(struct page *page, gfp_t gfp_flags)
return __btrfs_releasepage(page, gfp_flags & GFP_NOFS);
}

-static void btrfs_invalidatepage(struct page *page, unsigned long offset)
+static void btrfs_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct inode *inode = page->mapping->host;
struct extent_io_tree *tree;
diff --git a/fs/buffer.c b/fs/buffer.c
index b4dcb34..f4c1d9d 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1456,7 +1456,8 @@ static void discard_buffer(struct buffer_head * bh)
* block_invalidatepage - invalidate part or all of a buffer-backed page
*
* @page: the page which is affected
- * @offset: the index of the truncation point
+ * @offset: start of the range to invalidate
+ * @length: length of the range to invalidate
*
* block_invalidatepage() is called when all or part of the page has become
* invalidated by a truncate operation.
@@ -1467,15 +1468,22 @@ static void discard_buffer(struct buffer_head * bh)
* point. Because the caller is about to free (and possibly reuse) those
* blocks on-disk.
*/
-void block_invalidatepage(struct page *page, unsigned long offset)
+void block_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct buffer_head *head, *bh, *next;
unsigned int curr_off = 0;
+ unsigned int stop = length + offset;

BUG_ON(!PageLocked(page));
if (!page_has_buffers(page))
goto out;

+ /*
+ * Check for overflow
+ */
+ BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
+
head = page_buffers(page);
bh = head;
do {
@@ -1483,6 +1491,12 @@ void block_invalidatepage(struct page *page, unsigned long offset)
next = bh->b_this_page;

/*
+ * Are we still fully in range ?
+ */
+ if (next_off > stop)
+ goto out;
+
+ /*
* is this block fully invalidated?
*/
if (offset <= curr_off)
@@ -1503,6 +1517,7 @@ out:
}
EXPORT_SYMBOL(block_invalidatepage);

+
/*
* We attach and possibly dirty the buffers atomically wrt
* __set_page_dirty_buffers() via private_lock. try_to_free_buffers
@@ -2843,7 +2858,7 @@ int block_write_full_page_endio(struct page *page, get_block_t *get_block,
* they may have been added in ext3_writepage(). Make them
* freeable here, so the page does not leak.
*/
- do_invalidatepage(page, 0);
+ do_invalidatepage(page, 0, PAGE_CACHE_SIZE);
unlock_page(page);
return 0; /* don't care */
}
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index a60ea97..168a35a 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -143,7 +143,8 @@ static int ceph_set_page_dirty(struct page *page)
* dirty page counters appropriately. Only called if there is private
* data on the page.
*/
-static void ceph_invalidatepage(struct page *page, unsigned long offset)
+static void ceph_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct inode *inode;
struct ceph_inode_info *ci;
@@ -168,7 +169,7 @@ static void ceph_invalidatepage(struct page *page, unsigned long offset)

ci = ceph_inode(inode);
if (offset == 0) {
- dout("%p invalidatepage %p idx %lu full dirty page %lu\n",
+ dout("%p invalidatepage %p idx %lu full dirty page %u\n",
inode, page, page->index, offset);
ceph_put_wrbuffer_cap_refs(ci, 1, snapc);
ceph_put_snap_context(snapc);
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 7a0dd99..c0edf2c 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -3553,11 +3553,12 @@ static int cifs_release_page(struct page *page, gfp_t gfp)
return cifs_fscache_release_page(page, gfp);
}

-static void cifs_invalidate_page(struct page *page, unsigned long offset)
+static void cifs_invalidate_page(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct cifsInodeInfo *cifsi = CIFS_I(page->mapping->host);

- if (offset == 0)
+ if (offset == 0 && length == PAGE_CACHE_SIZE)
cifs_fscache_invalidate_page(page, &cifsi->vfs_inode);
}

diff --git a/fs/exofs/inode.c b/fs/exofs/inode.c
index d1f80ab..2ec8eb1 100644
--- a/fs/exofs/inode.c
+++ b/fs/exofs/inode.c
@@ -953,9 +953,11 @@ static int exofs_releasepage(struct page *page, gfp_t gfp)
return 0;
}

-static void exofs_invalidatepage(struct page *page, unsigned long offset)
+static void exofs_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
- EXOFS_DBGMSG("page 0x%lx offset 0x%lx\n", page->index, offset);
+ EXOFS_DBGMSG("page 0x%lx offset 0x%x length 0x%x\n",
+ page->index, offset, length);
WARN_ON(1);
}

diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index d512c4b..349d4ce 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -1823,7 +1823,8 @@ ext3_readpages(struct file *file, struct address_space *mapping,
return mpage_readpages(mapping, pages, nr_pages, ext3_get_block);
}

-static void ext3_invalidatepage(struct page *page, unsigned long offset)
+static void ext3_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
journal_t *journal = EXT3_JOURNAL(page->mapping->host);

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 769c656..f5bf189 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -131,7 +131,8 @@ static inline int ext4_begin_ordered_truncate(struct inode *inode,
new_size);
}

-static void ext4_invalidatepage(struct page *page, unsigned long offset);
+static void ext4_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length);
static int __ext4_journalled_writepage(struct page *page, unsigned int len);
static int ext4_bh_delay_or_unwritten(handle_t *handle, struct buffer_head *bh);
static int ext4_discard_partial_page_buffers_no_lock(handle_t *handle,
@@ -1601,7 +1602,7 @@ static void ext4_da_block_invalidatepages(struct mpage_da_data *mpd)
break;
BUG_ON(!PageLocked(page));
BUG_ON(PageWriteback(page));
- block_invalidatepage(page, 0);
+ block_invalidatepage(page, 0, PAGE_CACHE_SIZE);
ClearPageUptodate(page);
unlock_page(page);
}
@@ -2814,7 +2815,8 @@ static int ext4_da_write_end(struct file *file,
return ret ? ret : copied;
}

-static void ext4_da_invalidatepage(struct page *page, unsigned long offset)
+static void ext4_da_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
/*
* Drop reserved blocks
@@ -2826,7 +2828,7 @@ static void ext4_da_invalidatepage(struct page *page, unsigned long offset)
ext4_da_page_release_reservation(page, offset);

out:
- ext4_invalidatepage(page, offset);
+ ext4_invalidatepage(page, offset, length);

return;
}
@@ -2974,14 +2976,15 @@ ext4_readpages(struct file *file, struct address_space *mapping,
return mpage_readpages(mapping, pages, nr_pages, ext4_get_block);
}

-static void ext4_invalidatepage(struct page *page, unsigned long offset)
+static void ext4_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
trace_ext4_invalidatepage(page, offset);

/* No journalling happens on data buffers when this function is used */
WARN_ON(page_has_buffers(page) && buffer_jbd(page_buffers(page)));

- block_invalidatepage(page, offset);
+ block_invalidatepage(page, offset, PAGE_CACHE_SIZE - offset);
}

static int __ext4_journalled_invalidatepage(struct page *page,
@@ -3002,7 +3005,8 @@ static int __ext4_journalled_invalidatepage(struct page *page,

/* Wrapper for aops... */
static void ext4_journalled_invalidatepage(struct page *page,
- unsigned long offset)
+ unsigned int offset,
+ unsigned int length)
{
WARN_ON(__ext4_journalled_invalidatepage(page, offset) < 0);
}
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 7bd22a2..bae8b6c 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -667,7 +667,8 @@ static ssize_t f2fs_direct_IO(int rw, struct kiocb *iocb,
get_data_block_ro);
}

-static void f2fs_invalidate_data_page(struct page *page, unsigned long offset)
+static void f2fs_invalidate_data_page(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct inode *inode = page->mapping->host;
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index e275218..cd564d5 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1166,7 +1166,8 @@ static int f2fs_set_node_page_dirty(struct page *page)
return 0;
}

-static void f2fs_invalidate_node_page(struct page *page, unsigned long offset)
+static void f2fs_invalidate_node_page(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct inode *inode = page->mapping->host;
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 24f414f..37093ba 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -109,7 +109,7 @@ static int gfs2_writepage_common(struct page *page,
/* Is the page fully outside i_size? (truncate in progress) */
offset = i_size & (PAGE_CACHE_SIZE-1);
if (page->index > end_index || (page->index == end_index && !offset)) {
- page->mapping->a_ops->invalidatepage(page, 0);
+ page->mapping->a_ops->invalidatepage(page, 0, PAGE_CACHE_SIZE);
goto out;
}
return 1;
@@ -298,7 +298,8 @@ static int gfs2_write_jdata_pagevec(struct address_space *mapping,

/* Is the page fully outside i_size? (truncate in progress) */
if (page->index > end_index || (page->index == end_index && !offset)) {
- page->mapping->a_ops->invalidatepage(page, 0);
+ page->mapping->a_ops->invalidatepage(page, 0,
+ PAGE_CACHE_SIZE);
unlock_page(page);
continue;
}
@@ -942,7 +943,8 @@ static void gfs2_discard(struct gfs2_sbd *sdp, struct buffer_head *bh)
unlock_buffer(bh);
}

-static void gfs2_invalidatepage(struct page *page, unsigned long offset)
+static void gfs2_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct gfs2_sbd *sdp = GFS2_SB(page->mapping->host);
struct buffer_head *bh, *head;
diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index 6740d34..9e3aaff 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@@ -571,9 +571,10 @@ static int metapage_releasepage(struct page *page, gfp_t gfp_mask)
return ret;
}

-static void metapage_invalidatepage(struct page *page, unsigned long offset)
+static void metapage_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
- BUG_ON(offset);
+ BUG_ON(offset || length < PAGE_CACHE_SIZE);

BUG_ON(PageWriteback(page));

diff --git a/fs/logfs/file.c b/fs/logfs/file.c
index c2219a6..57914fc 100644
--- a/fs/logfs/file.c
+++ b/fs/logfs/file.c
@@ -159,7 +159,8 @@ static int logfs_writepage(struct page *page, struct writeback_control *wbc)
return __logfs_writepage(page);
}

-static void logfs_invalidatepage(struct page *page, unsigned long offset)
+static void logfs_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct logfs_block *block = logfs_block(page);

diff --git a/fs/logfs/segment.c b/fs/logfs/segment.c
index 038da09..d448a77 100644
--- a/fs/logfs/segment.c
+++ b/fs/logfs/segment.c
@@ -884,7 +884,8 @@ static struct logfs_area *alloc_area(struct super_block *sb)
return area;
}

-static void map_invalidatepage(struct page *page, unsigned long l)
+static void map_invalidatepage(struct page *page, unsigned int o,
+ unsigned int l)
{
return;
}
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 29f4a48..901422a 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -451,11 +451,13 @@ static int nfs_write_end(struct file *file, struct address_space *mapping,
* - Called if either PG_private or PG_fscache is set on the page
* - Caller holds page lock
*/
-static void nfs_invalidate_page(struct page *page, unsigned long offset)
+static void nfs_invalidate_page(struct page *page, unsigned int offset,
+ unsigned int length)
{
- dfprintk(PAGECACHE, "NFS: invalidate_page(%p, %lu)\n", page, offset);
+ dfprintk(PAGECACHE, "NFS: invalidate_page(%p, %u, %u)\n",
+ page, offset, length);

- if (offset != 0)
+ if (offset != 0 || length < PAGE_CACHE_SIZE)
return;
/* Cancel any unstarted writes on this page */
nfs_wb_page_cancel(page_file_mapping(page)->host, page);
diff --git a/fs/ntfs/aops.c b/fs/ntfs/aops.c
index fa9c05f..d267ea6 100644
--- a/fs/ntfs/aops.c
+++ b/fs/ntfs/aops.c
@@ -1372,7 +1372,7 @@ retry_writepage:
* The page may have dirty, unmapped buffers. Make them
* freeable here, so the page does not leak.
*/
- block_invalidatepage(page, 0);
+ block_invalidatepage(page, 0, PAGE_CACHE_SIZE);
unlock_page(page);
ntfs_debug("Write outside i_size - truncated?");
return 0;
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 20dfec7..ecb86ca 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -603,7 +603,8 @@ static void ocfs2_dio_end_io(struct kiocb *iocb,
* from ext3. PageChecked() bits have been removed as OCFS2 does not
* do journalled data.
*/
-static void ocfs2_invalidatepage(struct page *page, unsigned long offset)
+static void ocfs2_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
journal_t *journal = OCFS2_SB(page->mapping->host->i_sb)->journal->j_journal;

diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index ea5061f..808e02e 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -2969,7 +2969,8 @@ static int invalidatepage_can_drop(struct inode *inode, struct buffer_head *bh)
}

/* clm -- taken from fs/buffer.c:block_invalidate_page */
-static void reiserfs_invalidatepage(struct page *page, unsigned long offset)
+static void reiserfs_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct buffer_head *head, *bh, *next;
struct inode *inode = page->mapping->host;
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index f12189d..97c7a81 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1276,13 +1276,14 @@ int ubifs_setattr(struct dentry *dentry, struct iattr *attr)
return err;
}

-static void ubifs_invalidatepage(struct page *page, unsigned long offset)
+static void ubifs_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct inode *inode = page->mapping->host;
struct ubifs_info *c = inode->i_sb->s_fs_info;

ubifs_assert(PagePrivate(page));
- if (offset)
+ if (offset || length < PAGE_CACHE_SIZE)
/* Partial page remains dirty */
return;

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 5f707e5..e426796 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -823,10 +823,11 @@ xfs_cluster_write(
STATIC void
xfs_vm_invalidatepage(
struct page *page,
- unsigned long offset)
+ unsigned int offset,
+ unsigned int length)
{
trace_xfs_invalidatepage(page->mapping->host, page, offset);
- block_invalidatepage(page, offset);
+ block_invalidatepage(page, offset, PAGE_CACHE_SIZE - offset);
}

/*
@@ -890,7 +891,7 @@ next_buffer:

xfs_iunlock(ip, XFS_ILOCK_EXCL);
out_invalidate:
- xfs_vm_invalidatepage(page, 0);
+ xfs_vm_invalidatepage(page, 0, PAGE_CACHE_SIZE);
return;
}

diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 5afc4f9..7c2bdfc 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -193,7 +193,8 @@ extern int buffer_heads_over_limit;
* Generic address_space_operations implementations for buffer_head-backed
* address_spaces.
*/
-void block_invalidatepage(struct page *page, unsigned long offset);
+void block_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length);
int block_write_full_page(struct page *page, get_block_t *get_block,
struct writeback_control *wbc);
int block_write_full_page_endio(struct page *page, get_block_t *get_block,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2c28271..03269b4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -364,7 +364,7 @@ struct address_space_operations {

/* Unfortunately this kludge is needed for FIBMAP. Don't use it */
sector_t (*bmap)(struct address_space *, sector_t);
- void (*invalidatepage) (struct page *, unsigned long);
+ void (*invalidatepage) (struct page *, unsigned int, unsigned int);
int (*releasepage) (struct page *, gfp_t);
void (*freepage)(struct page *);
ssize_t (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index e19ff30..1ee3440 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1031,7 +1031,8 @@ int get_kernel_page(unsigned long start, int write, struct page **pages);
struct page *get_dump_page(unsigned long addr);

extern int try_to_release_page(struct page * page, gfp_t gfp_mask);
-extern void do_invalidatepage(struct page *page, unsigned long offset);
+extern void do_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length);

int __set_page_dirty_nobuffers(struct page *page);
int __set_page_dirty_no_writeback(struct page *page);
diff --git a/mm/readahead.c b/mm/readahead.c
index 7963f23..3565e53 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -48,7 +48,7 @@ static void read_cache_pages_invalidate_page(struct address_space *mapping,
if (!trylock_page(page))
BUG();
page->mapping = mapping;
- do_invalidatepage(page, 0);
+ do_invalidatepage(page, 0, PAGE_CACHE_SIZE);
page->mapping = NULL;
unlock_page(page);
}
diff --git a/mm/truncate.c b/mm/truncate.c
index c75b736..fdba083 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -26,7 +26,8 @@
/**
* do_invalidatepage - invalidate part or all of a page
* @page: the page which is affected
- * @offset: the index of the truncation point
+ * @offset: start of the range to invalidate
+ * @length: length of the range to invalidate
*
* do_invalidatepage() is called when all or part of the page has become
* invalidated by a truncate operation.
@@ -37,16 +38,18 @@
* point. Because the caller is about to free (and possibly reuse) those
* blocks on-disk.
*/
-void do_invalidatepage(struct page *page, unsigned long offset)
+void do_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
- void (*invalidatepage)(struct page *, unsigned long);
+ void (*invalidatepage)(struct page *, unsigned int, unsigned int);
+
invalidatepage = page->mapping->a_ops->invalidatepage;
#ifdef CONFIG_BLOCK
if (!invalidatepage)
invalidatepage = block_invalidatepage;
#endif
if (invalidatepage)
- (*invalidatepage)(page, offset);
+ (*invalidatepage)(page, offset, length);
}

static inline void truncate_partial_page(struct page *page, unsigned partial)
@@ -54,7 +57,7 @@ static inline void truncate_partial_page(struct page *page, unsigned partial)
zero_user_segment(page, partial, PAGE_CACHE_SIZE);
cleancache_invalidate_page(page->mapping, page);
if (page_has_private(page))
- do_invalidatepage(page, partial);
+ do_invalidatepage(page, partial, PAGE_CACHE_SIZE - partial);
}

/*
@@ -103,7 +106,7 @@ truncate_complete_page(struct address_space *mapping, struct page *page)
return -EIO;

if (page_has_private(page))
- do_invalidatepage(page, 0);
+ do_invalidatepage(page, 0, PAGE_CACHE_SIZE);

cancel_dirty_page(page, PAGE_CACHE_SIZE);

--
1.7.7.6

2013-04-09 09:31:51

by Steven Whitehouse

[permalink] [raw]
Subject: Re: [Cluster-devel] [PATCH v3 08/18] gfs2: use ->invalidatepage() length argument

Hi,

On Tue, 2013-04-09 at 11:14 +0200, Lukas Czerner wrote:
> ->invalidatepage() aop now accepts range to invalidate so we can make
> use of it in gfs2_invalidatepage().
>
> Signed-off-by: Lukas Czerner <[email protected]>
> Cc: [email protected]
Acked-by: Steven Whitehouse <[email protected]>

Steve.

> ---
> fs/gfs2/aops.c | 9 +++++++--
> 1 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
> index 37093ba..ea920bf 100644
> --- a/fs/gfs2/aops.c
> +++ b/fs/gfs2/aops.c
> @@ -947,24 +947,29 @@ static void gfs2_invalidatepage(struct page *page, unsigned int offset,
> unsigned int length)
> {
> struct gfs2_sbd *sdp = GFS2_SB(page->mapping->host);
> + unsigned int stop = offset + length;
> + int partial_page = (offset || length < PAGE_CACHE_SIZE);
> struct buffer_head *bh, *head;
> unsigned long pos = 0;
>
> BUG_ON(!PageLocked(page));
> - if (offset == 0)
> + if (!partial_page)
> ClearPageChecked(page);
> if (!page_has_buffers(page))
> goto out;
>
> bh = head = page_buffers(page);
> do {
> + if (pos + bh->b_size > stop)
> + return;
> +
> if (offset <= pos)
> gfs2_discard(sdp, bh);
> pos += bh->b_size;
> bh = bh->b_this_page;
> } while (bh != head);
> out:
> - if (offset == 0)
> + if (!partial_page)
> try_to_release_page(page, 0);
> }
>

2013-04-09 13:09:24

by Bob Peterson

[permalink] [raw]
Subject: Re: [PATCH v3 08/18] gfs2: use ->invalidatepage() length argument

Hi,

----- Original Message -----
| ->invalidatepage() aop now accepts range to invalidate so we can make
| use of it in gfs2_invalidatepage().
|
| Signed-off-by: Lukas Czerner <[email protected]>
| Cc: [email protected]
| ---
| fs/gfs2/aops.c | 9 +++++++--
| 1 files changed, 7 insertions(+), 2 deletions(-)
|
| diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
| index 37093ba..ea920bf 100644
| --- a/fs/gfs2/aops.c
| +++ b/fs/gfs2/aops.c
| @@ -947,24 +947,29 @@ static void gfs2_invalidatepage(struct page *page,
| unsigned int offset,
| unsigned int length)
| {
| struct gfs2_sbd *sdp = GFS2_SB(page->mapping->host);
| + unsigned int stop = offset + length;
| + int partial_page = (offset || length < PAGE_CACHE_SIZE);
| struct buffer_head *bh, *head;
| unsigned long pos = 0;
|
| BUG_ON(!PageLocked(page));
| - if (offset == 0)
| + if (!partial_page)
| ClearPageChecked(page);
| if (!page_has_buffers(page))
| goto out;
|
| bh = head = page_buffers(page);
| do {
| + if (pos + bh->b_size > stop)
| + return;
| +

I always regret it when I review something this early in the morning,
or post before the caffeine has taken full effect. But...
Shouldn't the statement above be (1) ">= stop" and (2) goto out;?

| if (offset <= pos)
| gfs2_discard(sdp, bh);
| pos += bh->b_size;
| bh = bh->b_this_page;
| } while (bh != head);
| out:
| - if (offset == 0)
| + if (!partial_page)
| try_to_release_page(page, 0);
| }

Regards,

Bob Peterson
Red Hat File Systems

2013-04-09 13:20:30

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v3 04/18] jbd: change journal_invalidatepage() to accept length

On Tue 09-04-13 11:14:13, Lukas Czerner wrote:
> ->invalidatepage() aop now accepts range to invalidate so we can make
> use of it in journal_invalidatepage() and all the users in ext3 file
> system. Also update ext3 trace point to print out length argument.
>
> Signed-off-by: Lukas Czerner <[email protected]>
Looks good. You can add:
Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/ext3/inode.c | 6 +++---
> fs/jbd/transaction.c | 19 ++++++++++++++-----
> include/linux/jbd.h | 2 +-
> include/trace/events/ext3.h | 12 +++++++-----
> 4 files changed, 25 insertions(+), 14 deletions(-)
>
> diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
> index 349d4ce..b12936b 100644
> --- a/fs/ext3/inode.c
> +++ b/fs/ext3/inode.c
> @@ -1828,15 +1828,15 @@ static void ext3_invalidatepage(struct page *page, unsigned int offset,
> {
> journal_t *journal = EXT3_JOURNAL(page->mapping->host);
>
> - trace_ext3_invalidatepage(page, offset);
> + trace_ext3_invalidatepage(page, offset, length);
>
> /*
> * If it's a full truncate we just forget about the pending dirtying
> */
> - if (offset == 0)
> + if (offset == 0 && length == PAGE_CACHE_SIZE)
> ClearPageChecked(page);
>
> - journal_invalidatepage(journal, page, offset);
> + journal_invalidatepage(journal, page, offset, length);
> }
>
> static int ext3_releasepage(struct page *page, gfp_t wait)
> diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
> index 071d690..a1fef89 100644
> --- a/fs/jbd/transaction.c
> +++ b/fs/jbd/transaction.c
> @@ -2020,16 +2020,20 @@ zap_buffer_unlocked:
> * void journal_invalidatepage() - invalidate a journal page
> * @journal: journal to use for flush
> * @page: page to flush
> - * @offset: length of page to invalidate.
> + * @offset: offset of the range to invalidate
> + * @length: length of the range to invalidate
> *
> - * Reap page buffers containing data after offset in page.
> + * Reap page buffers containing data in specified range in page.
> */
> void journal_invalidatepage(journal_t *journal,
> struct page *page,
> - unsigned long offset)
> + unsigned int offset,
> + unsigned int length)
> {
> struct buffer_head *head, *bh, *next;
> + unsigned int stop = offset + length;
> unsigned int curr_off = 0;
> + int partial_page = (offset || length < PAGE_CACHE_SIZE);
> int may_free = 1;
>
> if (!PageLocked(page))
> @@ -2037,6 +2041,8 @@ void journal_invalidatepage(journal_t *journal,
> if (!page_has_buffers(page))
> return;
>
> + BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
> +
> /* We will potentially be playing with lists other than just the
> * data lists (especially for journaled data mode), so be
> * cautious in our locking. */
> @@ -2046,11 +2052,14 @@ void journal_invalidatepage(journal_t *journal,
> unsigned int next_off = curr_off + bh->b_size;
> next = bh->b_this_page;
>
> + if (next_off > stop)
> + return;
> +
> if (offset <= curr_off) {
> /* This block is wholly outside the truncation point */
> lock_buffer(bh);
> may_free &= journal_unmap_buffer(journal, bh,
> - offset > 0);
> + partial_page);
> unlock_buffer(bh);
> }
> curr_off = next_off;
> @@ -2058,7 +2067,7 @@ void journal_invalidatepage(journal_t *journal,
>
> } while (bh != head);
>
> - if (!offset) {
> + if (!partial_page) {
> if (may_free && try_to_free_buffers(page))
> J_ASSERT(!page_has_buffers(page));
> }
> diff --git a/include/linux/jbd.h b/include/linux/jbd.h
> index c8f3297..d02e16c 100644
> --- a/include/linux/jbd.h
> +++ b/include/linux/jbd.h
> @@ -840,7 +840,7 @@ extern void journal_release_buffer (handle_t *, struct buffer_head *);
> extern int journal_forget (handle_t *, struct buffer_head *);
> extern void journal_sync_buffer (struct buffer_head *);
> extern void journal_invalidatepage(journal_t *,
> - struct page *, unsigned long);
> + struct page *, unsigned int, unsigned int);
> extern int journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
> extern int journal_stop(handle_t *);
> extern int journal_flush (journal_t *);
> diff --git a/include/trace/events/ext3.h b/include/trace/events/ext3.h
> index 15d11a3..6797b9d 100644
> --- a/include/trace/events/ext3.h
> +++ b/include/trace/events/ext3.h
> @@ -290,13 +290,14 @@ DEFINE_EVENT(ext3__page_op, ext3_releasepage,
> );
>
> TRACE_EVENT(ext3_invalidatepage,
> - TP_PROTO(struct page *page, unsigned long offset),
> + TP_PROTO(struct page *page, unsigned int offset, unsigned int length),
>
> - TP_ARGS(page, offset),
> + TP_ARGS(page, offset, length),
>
> TP_STRUCT__entry(
> __field( pgoff_t, index )
> - __field( unsigned long, offset )
> + __field( unsigned int, offset )
> + __field( unsigned int, length )
> __field( ino_t, ino )
> __field( dev_t, dev )
>
> @@ -305,14 +306,15 @@ TRACE_EVENT(ext3_invalidatepage,
> TP_fast_assign(
> __entry->index = page->index;
> __entry->offset = offset;
> + __entry->length = length;
> __entry->ino = page->mapping->host->i_ino;
> __entry->dev = page->mapping->host->i_sb->s_dev;
> ),
>
> - TP_printk("dev %d,%d ino %lu page_index %lu offset %lu",
> + TP_printk("dev %d,%d ino %lu page_index %lu offset %u length %u",
> MAJOR(__entry->dev), MINOR(__entry->dev),
> (unsigned long) __entry->ino,
> - __entry->index, __entry->offset)
> + __entry->index, __entry->offset, __entry->length)
> );
>
> TRACE_EVENT(ext3_discard_blocks,
> --
> 1.7.7.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-04-09 13:22:28

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v3 02/18] jbd2: change jbd2_journal_invalidatepage to accept length

On Tue 09-04-13 11:14:11, Lukas Czerner wrote:
> invalidatepage now accepts range to invalidate and there are two file
> system using jbd2 also implementing punch hole feature which can benefit
> from this. We need to implement the same thing for jbd2 layer in order to
> allow those file system take benefit of this functionality.
>
> This commit adds length argument to the jbd2_journal_invalidatepage()
> and updates all instances in ext4 and ocfs2.
Looks good. You can add:
Reviewed-by: Jan Kara <[email protected]>

Honza
>
> Signed-off-by: Lukas Czerner <[email protected]>
> ---
> fs/ext4/inode.c | 3 ++-
> fs/jbd2/transaction.c | 24 +++++++++++++++++-------
> fs/ocfs2/aops.c | 3 ++-
> include/linux/jbd2.h | 2 +-
> 4 files changed, 22 insertions(+), 10 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index f5bf189..69595f5 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3000,7 +3000,8 @@ static int __ext4_journalled_invalidatepage(struct page *page,
> if (offset == 0)
> ClearPageChecked(page);
>
> - return jbd2_journal_invalidatepage(journal, page, offset);
> + return jbd2_journal_invalidatepage(journal, page, offset,
> + PAGE_CACHE_SIZE - offset);
> }
>
> /* Wrapper for aops... */
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index 325bc01..d334e17 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -2027,18 +2027,23 @@ zap_buffer_unlocked:
> * void jbd2_journal_invalidatepage()
> * @journal: journal to use for flush...
> * @page: page to flush
> - * @offset: length of page to invalidate.
> + * @offset: start of the range to invalidate
> + * @length: length of the range to invalidate
> *
> - * Reap page buffers containing data after offset in page. Can return -EBUSY
> - * if buffers are part of the committing transaction and the page is straddling
> - * i_size. Caller then has to wait for current commit and try again.
> + * Reap page buffers containing data after in the specified range in page.
> + * Can return -EBUSY if buffers are part of the committing transaction and
> + * the page is straddling i_size. Caller then has to wait for current commit
> + * and try again.
> */
> int jbd2_journal_invalidatepage(journal_t *journal,
> struct page *page,
> - unsigned long offset)
> + unsigned int offset,
> + unsigned int length)
> {
> struct buffer_head *head, *bh, *next;
> + unsigned int stop = offset + length;
> unsigned int curr_off = 0;
> + int partial_page = (offset || length < PAGE_CACHE_SIZE);
> int may_free = 1;
> int ret = 0;
>
> @@ -2047,6 +2052,8 @@ int jbd2_journal_invalidatepage(journal_t *journal,
> if (!page_has_buffers(page))
> return 0;
>
> + BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
> +
> /* We will potentially be playing with lists other than just the
> * data lists (especially for journaled data mode), so be
> * cautious in our locking. */
> @@ -2056,10 +2063,13 @@ int jbd2_journal_invalidatepage(journal_t *journal,
> unsigned int next_off = curr_off + bh->b_size;
> next = bh->b_this_page;
>
> + if (next_off > stop)
> + return 0;
> +
> if (offset <= curr_off) {
> /* This block is wholly outside the truncation point */
> lock_buffer(bh);
> - ret = journal_unmap_buffer(journal, bh, offset > 0);
> + ret = journal_unmap_buffer(journal, bh, partial_page);
> unlock_buffer(bh);
> if (ret < 0)
> return ret;
> @@ -2070,7 +2080,7 @@ int jbd2_journal_invalidatepage(journal_t *journal,
>
> } while (bh != head);
>
> - if (!offset) {
> + if (!partial_page) {
> if (may_free && try_to_free_buffers(page))
> J_ASSERT(!page_has_buffers(page));
> }
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index ecb86ca..7c47755 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -608,7 +608,8 @@ static void ocfs2_invalidatepage(struct page *page, unsigned int offset,
> {
> journal_t *journal = OCFS2_SB(page->mapping->host->i_sb)->journal->j_journal;
>
> - jbd2_journal_invalidatepage(journal, page, offset);
> + jbd2_journal_invalidatepage(journal, page, offset,
> + PAGE_CACHE_SIZE - offset);
> }
>
> static int ocfs2_releasepage(struct page *page, gfp_t wait)
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index f9fe889..8c34abd 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -1090,7 +1090,7 @@ extern int jbd2_journal_dirty_metadata (handle_t *, struct buffer_head *);
> extern int jbd2_journal_forget (handle_t *, struct buffer_head *);
> extern void journal_sync_buffer (struct buffer_head *);
> extern int jbd2_journal_invalidatepage(journal_t *,
> - struct page *, unsigned long);
> + struct page *, unsigned int, unsigned int);
> extern int jbd2_journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
> extern int jbd2_journal_stop(handle_t *);
> extern int jbd2_journal_flush (journal_t *);
> --
> 1.7.7.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-04-09 13:24:37

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v3 03/18] ext4: use ->invalidatepage() length argument

On Tue 09-04-13 11:14:12, Lukas Czerner wrote:
> ->invalidatepage() aop now accepts range to invalidate so we can make
> use of it in all ext4 invalidatepage routines.
Looks good. You can add:
Reviewed-by: Jan Kara <[email protected]>

Honza

>
> Signed-off-by: Lukas Czerner <[email protected]>
> ---
> fs/ext4/inode.c | 30 +++++++++++++++++++-----------
> include/trace/events/ext4.h | 22 ++++++++++++----------
> 2 files changed, 31 insertions(+), 21 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 69595f5..f80e0c3 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1411,21 +1411,28 @@ static void ext4_da_release_space(struct inode *inode, int to_free)
> }
>
> static void ext4_da_page_release_reservation(struct page *page,
> - unsigned long offset)
> + unsigned int offset,
> + unsigned int length)
> {
> int to_release = 0;
> struct buffer_head *head, *bh;
> unsigned int curr_off = 0;
> struct inode *inode = page->mapping->host;
> struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> + unsigned int stop = offset + length;
> int num_clusters;
> ext4_fsblk_t lblk;
>
> + BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
> +
> head = page_buffers(page);
> bh = head;
> do {
> unsigned int next_off = curr_off + bh->b_size;
>
> + if (next_off > stop)
> + break;
> +
> if ((offset <= curr_off) && (buffer_delay(bh))) {
> to_release++;
> clear_buffer_delay(bh);
> @@ -2825,7 +2832,7 @@ static void ext4_da_invalidatepage(struct page *page, unsigned int offset,
> if (!page_has_buffers(page))
> goto out;
>
> - ext4_da_page_release_reservation(page, offset);
> + ext4_da_page_release_reservation(page, offset, length);
>
> out:
> ext4_invalidatepage(page, offset, length);
> @@ -2979,29 +2986,29 @@ ext4_readpages(struct file *file, struct address_space *mapping,
> static void ext4_invalidatepage(struct page *page, unsigned int offset,
> unsigned int length)
> {
> - trace_ext4_invalidatepage(page, offset);
> + trace_ext4_invalidatepage(page, offset, length);
>
> /* No journalling happens on data buffers when this function is used */
> WARN_ON(page_has_buffers(page) && buffer_jbd(page_buffers(page)));
>
> - block_invalidatepage(page, offset, PAGE_CACHE_SIZE - offset);
> + block_invalidatepage(page, offset, length);
> }
>
> static int __ext4_journalled_invalidatepage(struct page *page,
> - unsigned long offset)
> + unsigned int offset,
> + unsigned int length)
> {
> journal_t *journal = EXT4_JOURNAL(page->mapping->host);
>
> - trace_ext4_journalled_invalidatepage(page, offset);
> + trace_ext4_journalled_invalidatepage(page, offset, length);
>
> /*
> * If it's a full truncate we just forget about the pending dirtying
> */
> - if (offset == 0)
> + if (offset == 0 && length == PAGE_CACHE_SIZE)
> ClearPageChecked(page);
>
> - return jbd2_journal_invalidatepage(journal, page, offset,
> - PAGE_CACHE_SIZE - offset);
> + return jbd2_journal_invalidatepage(journal, page, offset, length);
> }
>
> /* Wrapper for aops... */
> @@ -3009,7 +3016,7 @@ static void ext4_journalled_invalidatepage(struct page *page,
> unsigned int offset,
> unsigned int length)
> {
> - WARN_ON(__ext4_journalled_invalidatepage(page, offset) < 0);
> + WARN_ON(__ext4_journalled_invalidatepage(page, offset, length) < 0);
> }
>
> static int ext4_releasepage(struct page *page, gfp_t wait)
> @@ -4607,7 +4614,8 @@ static void ext4_wait_for_tail_page_commit(struct inode *inode)
> inode->i_size >> PAGE_CACHE_SHIFT);
> if (!page)
> return;
> - ret = __ext4_journalled_invalidatepage(page, offset);
> + ret = __ext4_journalled_invalidatepage(page, offset,
> + PAGE_CACHE_SIZE - offset);
> unlock_page(page);
> page_cache_release(page);
> if (ret != -EBUSY)
> diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
> index 58459b7..60b329a 100644
> --- a/include/trace/events/ext4.h
> +++ b/include/trace/events/ext4.h
> @@ -444,16 +444,16 @@ DEFINE_EVENT(ext4__page_op, ext4_releasepage,
> );
>
> DECLARE_EVENT_CLASS(ext4_invalidatepage_op,
> - TP_PROTO(struct page *page, unsigned long offset),
> + TP_PROTO(struct page *page, unsigned int offset, unsigned int length),
>
> - TP_ARGS(page, offset),
> + TP_ARGS(page, offset, length),
>
> TP_STRUCT__entry(
> __field( dev_t, dev )
> __field( ino_t, ino )
> __field( pgoff_t, index )
> - __field( unsigned long, offset )
> -
> + __field( unsigned int, offset )
> + __field( unsigned int, length )
> ),
>
> TP_fast_assign(
> @@ -461,24 +461,26 @@ DECLARE_EVENT_CLASS(ext4_invalidatepage_op,
> __entry->ino = page->mapping->host->i_ino;
> __entry->index = page->index;
> __entry->offset = offset;
> + __entry->length = length;
> ),
>
> - TP_printk("dev %d,%d ino %lu page_index %lu offset %lu",
> + TP_printk("dev %d,%d ino %lu page_index %lu offset %u length %u",
> MAJOR(__entry->dev), MINOR(__entry->dev),
> (unsigned long) __entry->ino,
> - (unsigned long) __entry->index, __entry->offset)
> + (unsigned long) __entry->index,
> + __entry->offset, __entry->length)
> );
>
> DEFINE_EVENT(ext4_invalidatepage_op, ext4_invalidatepage,
> - TP_PROTO(struct page *page, unsigned long offset),
> + TP_PROTO(struct page *page, unsigned int offset, unsigned int length),
>
> - TP_ARGS(page, offset)
> + TP_ARGS(page, offset, length)
> );
>
> DEFINE_EVENT(ext4_invalidatepage_op, ext4_journalled_invalidatepage,
> - TP_PROTO(struct page *page, unsigned long offset),
> + TP_PROTO(struct page *page, unsigned int offset, unsigned int length),
>
> - TP_ARGS(page, offset)
> + TP_ARGS(page, offset, length)
> );
>
> TRACE_EVENT(ext4_discard_blocks,
> --
> 1.7.7.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-04-09 13:26:11

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v3 06/18] ocfs2: use ->invalidatepage() length argument

On Tue 09-04-13 11:14:15, Lukas Czerner wrote:
> ->invalidatepage() aop now accepts range to invalidate so we can make
> use of it in ocfs2_invalidatepage().
Looks good. You can add:
Reviewed-by: Jan Kara <[email protected]>

Honza

>
> Signed-off-by: Lukas Czerner <[email protected]>
> Cc: Joel Becker <[email protected]>
> ---
> fs/ocfs2/aops.c | 3 +--
> 1 files changed, 1 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 7c47755..79736a2 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -608,8 +608,7 @@ static void ocfs2_invalidatepage(struct page *page, unsigned int offset,
> {
> journal_t *journal = OCFS2_SB(page->mapping->host->i_sb)->journal->j_journal;
>
> - jbd2_journal_invalidatepage(journal, page, offset,
> - PAGE_CACHE_SIZE - offset);
> + jbd2_journal_invalidatepage(journal, page, offset, length);
> }
>
> static int ocfs2_releasepage(struct page *page, gfp_t wait)
> --
> 1.7.7.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-04-09 13:27:12

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH v3 08/18] gfs2: use ->invalidatepage() length argument

On Tue, 9 Apr 2013, Bob Peterson wrote:

> Date: Tue, 9 Apr 2013 09:09:20 -0400 (EDT)
> From: Bob Peterson <[email protected]>
> To: Lukas Czerner <[email protected]>
> Cc: [email protected], [email protected],
> [email protected], [email protected],
> [email protected]
> Subject: Re: [PATCH v3 08/18] gfs2: use ->invalidatepage() length argument
>
> Hi,
>
> ----- Original Message -----
> | ->invalidatepage() aop now accepts range to invalidate so we can make
> | use of it in gfs2_invalidatepage().
> |
> | Signed-off-by: Lukas Czerner <[email protected]>
> | Cc: [email protected]
> | ---
> | fs/gfs2/aops.c | 9 +++++++--
> | 1 files changed, 7 insertions(+), 2 deletions(-)
> |
> | diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
> | index 37093ba..ea920bf 100644
> | --- a/fs/gfs2/aops.c
> | +++ b/fs/gfs2/aops.c
> | @@ -947,24 +947,29 @@ static void gfs2_invalidatepage(struct page *page,
> | unsigned int offset,
> | unsigned int length)
> | {
> | struct gfs2_sbd *sdp = GFS2_SB(page->mapping->host);
> | + unsigned int stop = offset + length;
> | + int partial_page = (offset || length < PAGE_CACHE_SIZE);
> | struct buffer_head *bh, *head;
> | unsigned long pos = 0;
> |
> | BUG_ON(!PageLocked(page));
> | - if (offset == 0)
> | + if (!partial_page)
> | ClearPageChecked(page);
> | if (!page_has_buffers(page))
> | goto out;
> |
> | bh = head = page_buffers(page);
> | do {
> | + if (pos + bh->b_size > stop)
> | + return;
> | +
>
> I always regret it when I review something this early in the morning,
> or post before the caffeine has taken full effect. But...
> Shouldn't the statement above be (1) ">= stop" and (2) goto out;?

Hi,

1: No, because "stop = offset + length" and we're comparing it with
"pos + bh->b_size". If it would be ">=" then we would always miss
the last buffer. Let's assume pos = 0, b_size = 1024 and we're
invalidating offset = 0 and length = 1024 .. the rest you can try
for yourself :)

2: No, because if this condition is true, we can never have full page
invalidate. With full page invalidation we'll walk all the
buffers within the page hence we would jump our of the cycle when
"bh == head".

Thanks!
-Lukas

>
> | if (offset <= pos)
> | gfs2_discard(sdp, bh);
> | pos += bh->b_size;
> | bh = bh->b_this_page;
> | } while (bh != head);
> | out:
> | - if (offset == 0)
> | + if (!partial_page)
> | try_to_release_page(page, 0);
> | }
>
> Regards,
>
> Bob Peterson
> Red Hat File Systems
>

2013-04-09 13:27:59

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v3 09/18] reiserfs: use ->invalidatepage() length argument

On Tue 09-04-13 11:14:18, Lukas Czerner wrote:
> ->invalidatepage() aop now accepts range to invalidate so we can make
> use of it in reiserfs_invalidatepage()
Hum, reiserfs is probably never going to support punch hole. So shouldn't
we rather WARN and return without doing anything if stop !=
PAGE_CACHE_SIZE?

Honza
>
> Signed-off-by: Lukas Czerner <[email protected]>
> Cc: [email protected]
> ---
> fs/reiserfs/inode.c | 9 +++++++--
> 1 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
> index 808e02e..e963164 100644
> --- a/fs/reiserfs/inode.c
> +++ b/fs/reiserfs/inode.c
> @@ -2975,11 +2975,13 @@ static void reiserfs_invalidatepage(struct page *page, unsigned int offset,
> struct buffer_head *head, *bh, *next;
> struct inode *inode = page->mapping->host;
> unsigned int curr_off = 0;
> + unsigned int stop = offset + length;
> + int partial_page = (offset || length < PAGE_CACHE_SIZE);
> int ret = 1;
>
> BUG_ON(!PageLocked(page));
>
> - if (offset == 0)
> + if (!partial_page)
> ClearPageChecked(page);
>
> if (!page_has_buffers(page))
> @@ -2991,6 +2993,9 @@ static void reiserfs_invalidatepage(struct page *page, unsigned int offset,
> unsigned int next_off = curr_off + bh->b_size;
> next = bh->b_this_page;
>
> + if (next_off > stop)
> + goto out;
> +
> /*
> * is this block fully invalidated?
> */
> @@ -3009,7 +3014,7 @@ static void reiserfs_invalidatepage(struct page *page, unsigned int offset,
> * The get_block cached value has been unconditionally invalidated,
> * so real IO is not possible anymore.
> */
> - if (!offset && ret) {
> + if (!partial_page && ret) {
> ret = try_to_release_page(page, 0);
> /* maybe should BUG_ON(!ret); - neilb */
> }
> --
> 1.7.7.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-04-10 09:52:48

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH v3 09/18] reiserfs: use ->invalidatepage() length argument

On Tue, 9 Apr 2013, Jan Kara wrote:

> Date: Tue, 9 Apr 2013 15:27:56 +0200
> From: Jan Kara <[email protected]>
> To: Lukas Czerner <[email protected]>
> Cc: [email protected], [email protected],
> [email protected], [email protected],
> [email protected]
> Subject: Re: [PATCH v3 09/18] reiserfs: use ->invalidatepage() length argument
>
> On Tue 09-04-13 11:14:18, Lukas Czerner wrote:
> > ->invalidatepage() aop now accepts range to invalidate so we can make
> > use of it in reiserfs_invalidatepage()
> Hum, reiserfs is probably never going to support punch hole. So shouldn't
> we rather WARN and return without doing anything if stop !=
> PAGE_CACHE_SIZE?
>
> Honza

Hi,

I can not even think of the case when this would happen since it
could not happen before either. However in the case it happens the
code will do what's expected. So I do not have any strong preference
about this one, but I do not think it's necessary to WARN here. If
you still insist on the WARN, let me know and I'll resend the patch.

Thanks for the reviews!
-Lukas

> >
> > Signed-off-by: Lukas Czerner <[email protected]>
> > Cc: [email protected]
> > ---
> > fs/reiserfs/inode.c | 9 +++++++--
> > 1 files changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
> > index 808e02e..e963164 100644
> > --- a/fs/reiserfs/inode.c
> > +++ b/fs/reiserfs/inode.c
> > @@ -2975,11 +2975,13 @@ static void reiserfs_invalidatepage(struct page *page, unsigned int offset,
> > struct buffer_head *head, *bh, *next;
> > struct inode *inode = page->mapping->host;
> > unsigned int curr_off = 0;
> > + unsigned int stop = offset + length;
> > + int partial_page = (offset || length < PAGE_CACHE_SIZE);
> > int ret = 1;
> >
> > BUG_ON(!PageLocked(page));
> >
> > - if (offset == 0)
> > + if (!partial_page)
> > ClearPageChecked(page);
> >
> > if (!page_has_buffers(page))
> > @@ -2991,6 +2993,9 @@ static void reiserfs_invalidatepage(struct page *page, unsigned int offset,
> > unsigned int next_off = curr_off + bh->b_size;
> > next = bh->b_this_page;
> >
> > + if (next_off > stop)
> > + goto out;
> > +
> > /*
> > * is this block fully invalidated?
> > */
> > @@ -3009,7 +3014,7 @@ static void reiserfs_invalidatepage(struct page *page, unsigned int offset,
> > * The get_block cached value has been unconditionally invalidated,
> > * so real IO is not possible anymore.
> > */
> > - if (!offset && ret) {
> > + if (!partial_page && ret) {
> > ret = try_to_release_page(page, 0);
> > /* maybe should BUG_ON(!ret); - neilb */
> > }
> > --
> > 1.7.7.6
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2013-04-11 21:18:56

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v3 10/18] mm: teach truncate_inode_pages_range() to handle non page aligned ranges

On Tue 09-04-13 11:14:19, Lukas Czerner wrote:
> This commit changes truncate_inode_pages_range() so it can handle non
> page aligned regions of the truncate. Currently we can hit BUG_ON when
> the end of the range is not page aligned, but we can handle unaligned
> start of the range.
>
> Being able to handle non page aligned regions of the page can help file
> system punch_hole implementations and save some work, because once we're
> holding the page we might as well deal with it right away.
>
> In previous commits we've changed ->invalidatepage() prototype to accept
> 'length' argument to be able to specify range to invalidate. No we can
> use that new ability in truncate_inode_pages_range().
>
> Signed-off-by: Lukas Czerner <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Hugh Dickins <[email protected]>
> ---
> mm/truncate.c | 104 ++++++++++++++++++++++++++++++++++++++++-----------------
> 1 files changed, 73 insertions(+), 31 deletions(-)
>
> diff --git a/mm/truncate.c b/mm/truncate.c
> index fdba083..e2e8a8a 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -203,35 +195,58 @@ int invalidate_inode_page(struct page *page)
> * We pass down the cache-hot hint to the page freeing code. Even if the
> * mapping is large, it is probably the case that the final pages are the most
> * recently touched, and freeing happens in ascending file offset order.
> + *
> + * Note that since ->invalidatepage() accepts range to invalidate
> + * truncate_inode_pages_range is able to handle cases where lend + 1 is not
> + * page aligned properly.
> */
> void truncate_inode_pages_range(struct address_space *mapping,
> loff_t lstart, loff_t lend)
> {
> - const pgoff_t start = (lstart + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
> - const unsigned partial = lstart & (PAGE_CACHE_SIZE - 1);
> - struct pagevec pvec;
> - pgoff_t index;
> - pgoff_t end;
> - int i;
> + pgoff_t start; /* inclusive */
> + pgoff_t end; /* exclusive */
> + unsigned int partial_start; /* inclusive */
> + unsigned int partial_end; /* exclusive */
> + struct pagevec pvec;
> + pgoff_t index;
> + int i;
>
> cleancache_invalidate_inode(mapping);
> if (mapping->nrpages == 0)
> return;
>
> - BUG_ON((lend & (PAGE_CACHE_SIZE - 1)) != (PAGE_CACHE_SIZE - 1));
> - end = (lend >> PAGE_CACHE_SHIFT);
> + /* Offsets within partial pages */
> + partial_start = lstart & (PAGE_CACHE_SIZE - 1);
> + partial_end = (lend + 1) & (PAGE_CACHE_SIZE - 1);
> +
> + /*
> + * 'start' and 'end' always covers the range of pages to be fully
> + * truncated. Partial pages are covered with 'partial_start' at the
> + * start of the range and 'partial_end' at the end of the range.
> + * Note that 'end' is exclusive while 'lend' is inclusive.
> + */
> + start = (lstart + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
> + if (lend == -1)
> + /*
> + * lend == -1 indicates end-of-file so we have to set 'end'
> + * to the highest possible pgoff_t and since the type is
> + * unsigned we're using -1.
> + */
> + end = -1;
> + else
> + end = (lend + 1) >> PAGE_CACHE_SHIFT;
>
> pagevec_init(&pvec, 0);
> index = start;
> - while (index <= end && pagevec_lookup(&pvec, mapping, index,
> - min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
> + while (index < end && pagevec_lookup(&pvec, mapping, index,
> + min(end - index, (pgoff_t)PAGEVEC_SIZE))) {
So does this really work when end == -1 and file has ULONG_MAX pages?
Previously it did but now you seem of skip the last page... Otherwise the
patch looks good to me.

Honza

> mem_cgroup_uncharge_start();
> for (i = 0; i < pagevec_count(&pvec); i++) {
> struct page *page = pvec.pages[i];
>
> /* We rely upon deletion not changing page->index */
> index = page->index;
> - if (index > end)
> + if (index >= end)
> break;
>
> if (!trylock_page(page))
> @@ -250,27 +265,56 @@ void truncate_inode_pages_range(struct address_space *mapping,
> index++;
> }
>
> - if (partial) {
> + if (partial_start) {
> struct page *page = find_lock_page(mapping, start - 1);
> if (page) {
> + unsigned int top = PAGE_CACHE_SIZE;
> + if (start > end) {
> + /* Truncation within a single page */
> + top = partial_end;
> + partial_end = 0;
> + }
> wait_on_page_writeback(page);
> - truncate_partial_page(page, partial);
> + zero_user_segment(page, partial_start, top);
> + cleancache_invalidate_page(mapping, page);
> + if (page_has_private(page))
> + do_invalidatepage(page, partial_start,
> + top - partial_start);
> unlock_page(page);
> page_cache_release(page);
> }
> }
> + if (partial_end) {
> + struct page *page = find_lock_page(mapping, end);
> + if (page) {
> + wait_on_page_writeback(page);
> + zero_user_segment(page, 0, partial_end);
> + cleancache_invalidate_page(mapping, page);
> + if (page_has_private(page))
> + do_invalidatepage(page, 0,
> + partial_end);
> + unlock_page(page);
> + page_cache_release(page);
> + }
> + }
> + /*
> + * If the truncation happened within a single page no pages
> + * will be released, just zeroed, so we can bail out now.
> + */
> + if (start >= end)
> + return;
>
> index = start;
> for ( ; ; ) {
> cond_resched();
> if (!pagevec_lookup(&pvec, mapping, index,
> - min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
> + min(end - index, (pgoff_t)PAGEVEC_SIZE))) {
> if (index == start)
> break;
> index = start;
> continue;
> }
> - if (index == start && pvec.pages[0]->index > end) {
> + if (index == start && pvec.pages[0]->index >= end) {
> pagevec_release(&pvec);
> break;
> }
> @@ -280,7 +324,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
>
> /* We rely upon deletion not changing page->index */
> index = page->index;
> - if (index > end)
> + if (index >= end)
> break;
>
> lock_page(page);
> @@ -601,10 +645,8 @@ void truncate_pagecache_range(struct inode *inode, loff_t lstart, loff_t lend)
> * This rounding is currently just for example: unmap_mapping_range
> * expands its hole outwards, whereas we want it to contract the hole
> * inwards. However, existing callers of truncate_pagecache_range are
> - * doing their own page rounding first; and truncate_inode_pages_range
> - * currently BUGs if lend is not pagealigned-1 (it handles partial
> - * page at start of hole, but not partial page at end of hole). Note
> - * unmap_mapping_range allows holelen 0 for all, and we allow lend -1.
> + * doing their own page rounding first. Note that unmap_mapping_range
> + * allows holelen 0 for all, and we allow lend -1 for end of file.
> */
>
> /*
> --
> 1.7.7.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-04-11 22:40:30

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH v3 10/18] mm: teach truncate_inode_pages_range() to handle non page aligned ranges

On Thu, 11 Apr 2013, Jan Kara wrote:
> On Tue 09-04-13 11:14:19, Lukas Czerner wrote:
> > This commit changes truncate_inode_pages_range() so it can handle non
> > page aligned regions of the truncate. Currently we can hit BUG_ON when
> > the end of the range is not page aligned, but we can handle unaligned
> > start of the range.
> >
> > Being able to handle non page aligned regions of the page can help file
> > system punch_hole implementations and save some work, because once we're
> > holding the page we might as well deal with it right away.
> >
> > In previous commits we've changed ->invalidatepage() prototype to accept
> > 'length' argument to be able to specify range to invalidate. No we can
> > use that new ability in truncate_inode_pages_range().
> >
> > Signed-off-by: Lukas Czerner <[email protected]>
> > Cc: Andrew Morton <[email protected]>
> > Cc: Hugh Dickins <[email protected]>
> > ---
> > mm/truncate.c | 104 ++++++++++++++++++++++++++++++++++++++++-----------------
> > 1 files changed, 73 insertions(+), 31 deletions(-)
> >
> > diff --git a/mm/truncate.c b/mm/truncate.c
> > index fdba083..e2e8a8a 100644
> > --- a/mm/truncate.c
> > +++ b/mm/truncate.c
...
> >
> > pagevec_init(&pvec, 0);
> > index = start;
> > - while (index <= end && pagevec_lookup(&pvec, mapping, index,
> > - min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
> > + while (index < end && pagevec_lookup(&pvec, mapping, index,
> > + min(end - index, (pgoff_t)PAGEVEC_SIZE))) {
> So does this really work when end == -1 and file has ULONG_MAX pages?
> Previously it did but now you seem of skip the last page... Otherwise the
> patch looks good to me.

Recalling earlier discussion of truncate.c (which is indeed schizophrenic
about it), I believe that MAX_LFS_FILESIZE (and in particular that "-1")
prevents a file from having more than LONG_MAX pages on any architecture:

#if BITS_PER_LONG==32
#define MAX_LFS_FILESIZE (((loff_t)PAGE_CACHE_SIZE << (BITS_PER_LONG-1))-1)
#elif BITS_PER_LONG==64
#define MAX_LFS_FILESIZE ((loff_t)0x7fffffffffffffffLL)
#endif

(And if we ever extend that 32-bit range, I would recommend avoiding
the final wrap-around page, which could easily cause trouble elsewhere.)

Hugh

2013-04-18 22:09:29

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v3 11/18] Revert "ext4: remove no longer used functions in inode.c"

On Tue 09-04-13 11:14:20, Lukas Czerner wrote:
> This reverts commit ccb4d7af914e0fe9b2f1022f8ea6c300463fd5e6.
>
> This commit reintroduces functions ext4_block_truncate_page() and
> ext4_block_zero_page_range() which has been previously removed in favour
> of ext4_discard_partial_page_buffers().
>
> In future commits we want to reintroduce those function and remove
> ext4_discard_partial_page_buffers() since it is duplicating some code
> and also partially duplicating work of truncate_pagecache_range(),
> moreover the old implementation was much clearer.
>
> Signed-off-by: Lukas Czerner <[email protected]>
When checking the functions, I've noticed one thing:

> + if (ext4_should_journal_data(inode)) {
> + BUFFER_TRACE(bh, "get write access");
> + err = ext4_journal_get_write_access(handle, bh);
> + if (err)
> + goto unlock;
> + }
> +
> + zero_user(page, offset, length);
> +
> + BUFFER_TRACE(bh, "zeroed end of block");
> +
> + err = 0;
> + if (ext4_should_journal_data(inode)) {
> + err = ext4_handle_dirty_metadata(handle, inode, bh);
> + } else
> + mark_buffer_dirty(bh);
I think we should call also ext4_jbd2_file_inode() in data=ordered mode.
Otherwise a crash after the truncate transaction has committed could still
result in the tail of the block not being zeroed on disk...

Honza
> +
> +unlock:
> + unlock_page(page);
> + page_cache_release(page);
> + return err;
> +}
> +
> int ext4_can_truncate(struct inode *inode)
> {
> if (S_ISREG(inode->i_mode))
> --
> 1.7.7.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-04-18 22:23:18

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v3 12/18] Revert "ext4: fix fsx truncate failure"

On Tue 09-04-13 11:14:21, Lukas Czerner wrote:
> This reverts commit 189e868fa8fdca702eb9db9d8afc46b5cb9144c9.
>
> This commit reintroduces the use of ext4_block_truncate_page() in ext4
> truncate operation instead of ext4_discard_partial_page_buffers().
>
> The statement in the commit description that the truncate operation only
> zero block unaligned portion of the last page is not exactly right,
> since truncate_pagecache_range() also zeroes and invalidate the unaligned
> portion of the page. Then there is no need to zero and unmap it once more
> and ext4_block_truncate_page() was doing the right job, although we
> still need to update the buffer head containing the last block, which is
> exactly what ext4_block_truncate_page() is doing.
>
> Moreover the problem described in the commit is fixed more properly with
> commit
Looks good. You can add
Reviewed-by: Jan Kara <[email protected]>

I'd just add one nit that you might fix. In case of truncate called from
orphan cleanup code, we don't really call mm to zero the tail of the page.
It shouldn't really matter because the page shouldn't be uptodate but it
might be a trap we eventually fall into so calling truncate_inode_pages()
there might be a reasonable precaution.

Honza
>
> 15291164b22a357cb211b618adfef4fa82fc0de3
> jbd2: clear BH_Delay & BH_Unwritten in journal_unmap_buffer
>
> This was tested on ppc64 machine with block size of 1024 bytes without
> any problems.
>
> Signed-off-by: Lukas Czerner <[email protected]>
> ---
> fs/ext4/inode.c | 11 ++---------
> 1 files changed, 2 insertions(+), 9 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 5729d21..d58e13c 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3920,7 +3920,6 @@ void ext4_truncate(struct inode *inode)
> unsigned int credits;
> handle_t *handle;
> struct address_space *mapping = inode->i_mapping;
> - loff_t page_len;
>
> /*
> * There is a possibility that we're either freeing the inode
> @@ -3964,14 +3963,8 @@ void ext4_truncate(struct inode *inode)
> return;
> }
>
> - if (inode->i_size % PAGE_CACHE_SIZE != 0) {
> - page_len = PAGE_CACHE_SIZE -
> - (inode->i_size & (PAGE_CACHE_SIZE - 1));
> -
> - if (ext4_discard_partial_page_buffers(handle,
> - mapping, inode->i_size, page_len, 0))
> - goto out_stop;
> - }
> + if (inode->i_size & (inode->i_sb->s_blocksize - 1))
> + ext4_block_truncate_page(handle, mapping, inode->i_size);
>
> /*
> * We add the inode to the orphan list, so that if this
> --
> 1.7.7.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-04-19 05:21:56

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v3 13/18] ext4: use ext4_zero_partial_blocks in punch_hole

On Tue 09-04-13 11:14:22, Lukas Czerner wrote:
> We're doing to get rid of ext4_discard_partial_page_buffers() since it is
> duplicating some code and also partially duplicating work of
> truncate_pagecache_range(), moreover the old implementation was much
> clearer.
>
> Now when the truncate_inode_pages_range() can handle truncating non page
> aligned regions we can use this to invalidate and zero out block aligned
> region of the punched out range and then use ext4_block_truncate_page()
> to zero the unaligned blocks on the start and end of the range. This
> will greatly simplify the punch hole code. Moreover after this commit we
> can get rid of the ext4_discard_partial_page_buffers() completely.
>
> We also introduce function ext4_prepare_punch_hole() to do come common
> operations before we attempt to do the actual punch hole on
> indirect or extent file which saves us some code duplication.
>
> This has been tested on ppc64 with 1k block size with fsx and xfstests
> without any problems.
>
> Signed-off-by: Lukas Czerner <[email protected]>
Just two nits below, otherwise the patch looks good.

> ---
> fs/ext4/ext4.h | 2 +
> fs/ext4/inode.c | 110 ++++++++++++++++++++-----------------------------------
> 2 files changed, 42 insertions(+), 70 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 3aa5943..2428244 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -2109,6 +2109,8 @@ extern int ext4_block_truncate_page(handle_t *handle,
> struct address_space *mapping, loff_t from);
> extern int ext4_block_zero_page_range(handle_t *handle,
> struct address_space *mapping, loff_t from, loff_t length);
> +extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
> + loff_t lstart, loff_t lend);
> extern int ext4_discard_partial_page_buffers(handle_t *handle,
> struct address_space *mapping, loff_t from,
> loff_t length, int flags);
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index d58e13c..6003fd1 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3675,6 +3675,37 @@ unlock:
> return err;
> }
>
> +int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
> + loff_t lstart, loff_t lend)
> +{
It's a bit confusing that ext4_block_zero_page_range() takes start, len
arguments while this function takes start, end arguments. I think it should
better be unified to avoid stupid mistakes...

> + struct super_block *sb = inode->i_sb;
> + struct address_space *mapping = inode->i_mapping;
> + unsigned partial = lstart & (sb->s_blocksize - 1);
> + ext4_fsblk_t start = lstart >> sb->s_blocksize_bits;
> + ext4_fsblk_t end = lend >> sb->s_blocksize_bits;
> + int err = 0;
> +
> + /* Handle partial zero within the single block */
> + if (start == end) {
> + err = ext4_block_zero_page_range(handle, mapping,
> + lstart, lend - lstart + 1);
> + return err;
> + }
> + /* Handle partial zero out on the start of the range */
> + if (partial) {
> + err = ext4_block_zero_page_range(handle, mapping,
> + lstart, sb->s_blocksize);
> + if (err)
> + return err;
> + }
> + /* Handle partial zero out on the end of the range */
> + partial = lend & (sb->s_blocksize - 1);
> + if (partial != sb->s_blocksize - 1)
> + err = ext4_block_zero_page_range(handle, mapping,
> + lend - partial, partial + 1);
> + return err;
> +}
> +
> int ext4_can_truncate(struct inode *inode)
> {
> if (S_ISREG(inode->i_mode))
> @@ -3703,7 +3734,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
> struct super_block *sb = inode->i_sb;
> ext4_lblk_t first_block, stop_block;
> struct address_space *mapping = inode->i_mapping;
> - loff_t first_page, last_page, page_len;
> loff_t first_page_offset, last_page_offset;
> handle_t *handle;
> unsigned int credits;
> @@ -3755,17 +3785,13 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
> offset;
> }
>
> - first_page = (offset + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
> - last_page = (offset + length) >> PAGE_CACHE_SHIFT;
> -
> - first_page_offset = first_page << PAGE_CACHE_SHIFT;
> - last_page_offset = last_page << PAGE_CACHE_SHIFT;
> + first_page_offset = round_up(offset, sb->s_blocksize);
> + last_page_offset = round_down((offset + length), sb->s_blocksize) - 1;
Calling these {first,last}_block_offset would be more precise, wouldn't
it?

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-04-19 05:22:20

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v3 14/18] ext4: remove unused discard_partial_page_buffers

On Tue 09-04-13 11:14:23, Lukas Czerner wrote:
> The discard_partial_page_buffers is no longer used anywhere so we can
> simply remove it including the *_no_lock variant and
> EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED define.
>
> Signed-off-by: Lukas Czerner <[email protected]>
Looks good, obviously ;) You can add:
Reviewed-by: Jan Kara <[email protected]>

Honza
> ---
> fs/ext4/ext4.h | 8 --
> fs/ext4/inode.c | 206 -------------------------------------------------------
> 2 files changed, 0 insertions(+), 214 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 2428244..cc9020e 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -593,11 +593,6 @@ enum {
> #define EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER 0x0020
>
> /*
> - * Flags used by ext4_discard_partial_page_buffers
> - */
> -#define EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED 0x0001
> -
> -/*
> * ioctl commands
> */
> #define EXT4_IOC_GETFLAGS FS_IOC_GETFLAGS
> @@ -2111,9 +2106,6 @@ extern int ext4_block_zero_page_range(handle_t *handle,
> struct address_space *mapping, loff_t from, loff_t length);
> extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
> loff_t lstart, loff_t lend);
> -extern int ext4_discard_partial_page_buffers(handle_t *handle,
> - struct address_space *mapping, loff_t from,
> - loff_t length, int flags);
> extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
> extern qsize_t *ext4_get_reserved_space(struct inode *inode);
> extern void ext4_da_update_reserve_space(struct inode *inode,
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 6003fd1..0d452c1 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -135,9 +135,6 @@ static void ext4_invalidatepage(struct page *page, unsigned int offset,
> unsigned int length);
> static int __ext4_journalled_writepage(struct page *page, unsigned int len);
> static int ext4_bh_delay_or_unwritten(handle_t *handle, struct buffer_head *bh);
> -static int ext4_discard_partial_page_buffers_no_lock(handle_t *handle,
> - struct inode *inode, struct page *page, loff_t from,
> - loff_t length, int flags);
>
> /*
> * Test whether an inode is a fast symlink.
> @@ -3352,209 +3349,6 @@ void ext4_set_aops(struct inode *inode)
> inode->i_mapping->a_ops = &ext4_aops;
> }
>
> -
> -/*
> - * ext4_discard_partial_page_buffers()
> - * Wrapper function for ext4_discard_partial_page_buffers_no_lock.
> - * This function finds and locks the page containing the offset
> - * "from" and passes it to ext4_discard_partial_page_buffers_no_lock.
> - * Calling functions that already have the page locked should call
> - * ext4_discard_partial_page_buffers_no_lock directly.
> - */
> -int ext4_discard_partial_page_buffers(handle_t *handle,
> - struct address_space *mapping, loff_t from,
> - loff_t length, int flags)
> -{
> - struct inode *inode = mapping->host;
> - struct page *page;
> - int err = 0;
> -
> - page = find_or_create_page(mapping, from >> PAGE_CACHE_SHIFT,
> - mapping_gfp_mask(mapping) & ~__GFP_FS);
> - if (!page)
> - return -ENOMEM;
> -
> - err = ext4_discard_partial_page_buffers_no_lock(handle, inode, page,
> - from, length, flags);
> -
> - unlock_page(page);
> - page_cache_release(page);
> - return err;
> -}
> -
> -/*
> - * ext4_discard_partial_page_buffers_no_lock()
> - * Zeros a page range of length 'length' starting from offset 'from'.
> - * Buffer heads that correspond to the block aligned regions of the
> - * zeroed range will be unmapped. Unblock aligned regions
> - * will have the corresponding buffer head mapped if needed so that
> - * that region of the page can be updated with the partial zero out.
> - *
> - * This function assumes that the page has already been locked. The
> - * The range to be discarded must be contained with in the given page.
> - * If the specified range exceeds the end of the page it will be shortened
> - * to the end of the page that corresponds to 'from'. This function is
> - * appropriate for updating a page and it buffer heads to be unmapped and
> - * zeroed for blocks that have been either released, or are going to be
> - * released.
> - *
> - * handle: The journal handle
> - * inode: The files inode
> - * page: A locked page that contains the offset "from"
> - * from: The starting byte offset (from the beginning of the file)
> - * to begin discarding
> - * len: The length of bytes to discard
> - * flags: Optional flags that may be used:
> - *
> - * EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED
> - * Only zero the regions of the page whose buffer heads
> - * have already been unmapped. This flag is appropriate
> - * for updating the contents of a page whose blocks may
> - * have already been released, and we only want to zero
> - * out the regions that correspond to those released blocks.
> - *
> - * Returns zero on success or negative on failure.
> - */
> -static int ext4_discard_partial_page_buffers_no_lock(handle_t *handle,
> - struct inode *inode, struct page *page, loff_t from,
> - loff_t length, int flags)
> -{
> - ext4_fsblk_t index = from >> PAGE_CACHE_SHIFT;
> - unsigned int offset = from & (PAGE_CACHE_SIZE-1);
> - unsigned int blocksize, max, pos;
> - ext4_lblk_t iblock;
> - struct buffer_head *bh;
> - int err = 0;
> -
> - blocksize = inode->i_sb->s_blocksize;
> - max = PAGE_CACHE_SIZE - offset;
> -
> - if (index != page->index)
> - return -EINVAL;
> -
> - /*
> - * correct length if it does not fall between
> - * 'from' and the end of the page
> - */
> - if (length > max || length < 0)
> - length = max;
> -
> - iblock = index << (PAGE_CACHE_SHIFT - inode->i_sb->s_blocksize_bits);
> -
> - if (!page_has_buffers(page))
> - create_empty_buffers(page, blocksize, 0);
> -
> - /* Find the buffer that contains "offset" */
> - bh = page_buffers(page);
> - pos = blocksize;
> - while (offset >= pos) {
> - bh = bh->b_this_page;
> - iblock++;
> - pos += blocksize;
> - }
> -
> - pos = offset;
> - while (pos < offset + length) {
> - unsigned int end_of_block, range_to_discard;
> -
> - err = 0;
> -
> - /* The length of space left to zero and unmap */
> - range_to_discard = offset + length - pos;
> -
> - /* The length of space until the end of the block */
> - end_of_block = blocksize - (pos & (blocksize-1));
> -
> - /*
> - * Do not unmap or zero past end of block
> - * for this buffer head
> - */
> - if (range_to_discard > end_of_block)
> - range_to_discard = end_of_block;
> -
> -
> - /*
> - * Skip this buffer head if we are only zeroing unampped
> - * regions of the page
> - */
> - if (flags & EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED &&
> - buffer_mapped(bh))
> - goto next;
> -
> - /* If the range is block aligned, unmap */
> - if (range_to_discard == blocksize) {
> - clear_buffer_dirty(bh);
> - bh->b_bdev = NULL;
> - clear_buffer_mapped(bh);
> - clear_buffer_req(bh);
> - clear_buffer_new(bh);
> - clear_buffer_delay(bh);
> - clear_buffer_unwritten(bh);
> - clear_buffer_uptodate(bh);
> - zero_user(page, pos, range_to_discard);
> - BUFFER_TRACE(bh, "Buffer discarded");
> - goto next;
> - }
> -
> - /*
> - * If this block is not completely contained in the range
> - * to be discarded, then it is not going to be released. Because
> - * we need to keep this block, we need to make sure this part
> - * of the page is uptodate before we modify it by writeing
> - * partial zeros on it.
> - */
> - if (!buffer_mapped(bh)) {
> - /*
> - * Buffer head must be mapped before we can read
> - * from the block
> - */
> - BUFFER_TRACE(bh, "unmapped");
> - ext4_get_block(inode, iblock, bh, 0);
> - /* unmapped? It's a hole - nothing to do */
> - if (!buffer_mapped(bh)) {
> - BUFFER_TRACE(bh, "still unmapped");
> - goto next;
> - }
> - }
> -
> - /* Ok, it's mapped. Make sure it's up-to-date */
> - if (PageUptodate(page))
> - set_buffer_uptodate(bh);
> -
> - if (!buffer_uptodate(bh)) {
> - err = -EIO;
> - ll_rw_block(READ, 1, &bh);
> - wait_on_buffer(bh);
> - /* Uhhuh. Read error. Complain and punt.*/
> - if (!buffer_uptodate(bh))
> - goto next;
> - }
> -
> - if (ext4_should_journal_data(inode)) {
> - BUFFER_TRACE(bh, "get write access");
> - err = ext4_journal_get_write_access(handle, bh);
> - if (err)
> - goto next;
> - }
> -
> - zero_user(page, pos, range_to_discard);
> -
> - err = 0;
> - if (ext4_should_journal_data(inode)) {
> - err = ext4_handle_dirty_metadata(handle, inode, bh);
> - } else
> - mark_buffer_dirty(bh);
> -
> - BUFFER_TRACE(bh, "Partial buffer zeroed");
> -next:
> - bh = bh->b_this_page;
> - iblock++;
> - pos += range_to_discard;
> - }
> -
> - return err;
> -}
> -
> /*
> * ext4_block_truncate_page() zeroes out a mapping from file offset `from'
> * up to the end of the block which corresponds to `from'.
> --
> 1.7.7.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-04-19 05:22:17

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v3 16/18] ext4: update ext4_ext_remove_space trace point

On Tue 09-04-13 11:14:25, Lukas Czerner wrote:
> Add "end" variable.
>
> Signed-off-by: Lukas Czerner <[email protected]>
You can add:
Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/ext4/extents.c | 6 +++---
> include/trace/events/ext4.h | 21 ++++++++++++++-------
> 2 files changed, 17 insertions(+), 10 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 4adaa8a..9023b76 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -2666,7 +2666,7 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start,
> return PTR_ERR(handle);
>
> again:
> - trace_ext4_ext_remove_space(inode, start, depth);
> + trace_ext4_ext_remove_space(inode, start, end, depth);
>
> /*
> * Check if we are removing extents inside the extent tree. If that
> @@ -2832,8 +2832,8 @@ again:
> }
> }
>
> - trace_ext4_ext_remove_space_done(inode, start, depth, partial_cluster,
> - path->p_hdr->eh_entries);
> + trace_ext4_ext_remove_space_done(inode, start, end, depth,
> + partial_cluster, path->p_hdr->eh_entries);
>
> /* If we still have something in the partial cluster and we have removed
> * even the first extent, then we should free the blocks in the partial
> diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
> index 60b329a..c92500c 100644
> --- a/include/trace/events/ext4.h
> +++ b/include/trace/events/ext4.h
> @@ -2027,14 +2027,16 @@ TRACE_EVENT(ext4_ext_rm_idx,
> );
>
> TRACE_EVENT(ext4_ext_remove_space,
> - TP_PROTO(struct inode *inode, ext4_lblk_t start, int depth),
> + TP_PROTO(struct inode *inode, ext4_lblk_t start,
> + ext4_lblk_t end, int depth),
>
> - TP_ARGS(inode, start, depth),
> + TP_ARGS(inode, start, end, depth),
>
> TP_STRUCT__entry(
> __field( dev_t, dev )
> __field( ino_t, ino )
> __field( ext4_lblk_t, start )
> + __field( ext4_lblk_t, end )
> __field( int, depth )
> ),
>
> @@ -2042,26 +2044,29 @@ TRACE_EVENT(ext4_ext_remove_space,
> __entry->dev = inode->i_sb->s_dev;
> __entry->ino = inode->i_ino;
> __entry->start = start;
> + __entry->end = end;
> __entry->depth = depth;
> ),
>
> - TP_printk("dev %d,%d ino %lu since %u depth %d",
> + TP_printk("dev %d,%d ino %lu start %u end %u depth %d",
> MAJOR(__entry->dev), MINOR(__entry->dev),
> (unsigned long) __entry->ino,
> (unsigned) __entry->start,
> + (unsigned) __entry->end,
> __entry->depth)
> );
>
> TRACE_EVENT(ext4_ext_remove_space_done,
> - TP_PROTO(struct inode *inode, ext4_lblk_t start, int depth,
> - ext4_lblk_t partial, unsigned short eh_entries),
> + TP_PROTO(struct inode *inode, ext4_lblk_t start, ext4_lblk_t end,
> + int depth, ext4_lblk_t partial, unsigned short eh_entries),
>
> - TP_ARGS(inode, start, depth, partial, eh_entries),
> + TP_ARGS(inode, start, end, depth, partial, eh_entries),
>
> TP_STRUCT__entry(
> __field( dev_t, dev )
> __field( ino_t, ino )
> __field( ext4_lblk_t, start )
> + __field( ext4_lblk_t, end )
> __field( int, depth )
> __field( ext4_lblk_t, partial )
> __field( unsigned short, eh_entries )
> @@ -2071,16 +2076,18 @@ TRACE_EVENT(ext4_ext_remove_space_done,
> __entry->dev = inode->i_sb->s_dev;
> __entry->ino = inode->i_ino;
> __entry->start = start;
> + __entry->end = end;
> __entry->depth = depth;
> __entry->partial = partial;
> __entry->eh_entries = eh_entries;
> ),
>
> - TP_printk("dev %d,%d ino %lu since %u depth %d partial %u "
> + TP_printk("dev %d,%d ino %lu start %u end %u depth %d partial %u "
> "remaining_entries %u",
> MAJOR(__entry->dev), MINOR(__entry->dev),
> (unsigned long) __entry->ino,
> (unsigned) __entry->start,
> + (unsigned) __entry->end,
> __entry->depth,
> (unsigned) __entry->partial,
> (unsigned short) __entry->eh_entries)
> --
> 1.7.7.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-04-19 05:22:15

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v3 15/18] ext4: remove unused code from ext4_remove_blocks()

On Tue 09-04-13 11:14:24, Lukas Czerner wrote:
> The "head removal" branch in the condition is never used in any code
> path in ext4 since the function only caller ext4_ext_rm_leaf() will make
> sure that the extent is properly split before removing blocks. Note that
> there is a bug in this branch anyway.
>
> This commit removes the unused code completely and makes use of
> ext4_error() instead of printk if dubious range is provided.
>
> Signed-off-by: Lukas Czerner <[email protected]>
Looks good. You can add:
Reviewed-by: Jan Kara <[email protected]>

Honza
> ---
> fs/ext4/extents.c | 21 ++++-----------------
> 1 files changed, 4 insertions(+), 17 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 6c5a70a..4adaa8a 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -2435,23 +2435,10 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
> *partial_cluster = EXT4_B2C(sbi, pblk);
> else
> *partial_cluster = 0;
> - } else if (from == le32_to_cpu(ex->ee_block)
> - && to <= le32_to_cpu(ex->ee_block) + ee_len - 1) {
> - /* head removal */
> - ext4_lblk_t num;
> - ext4_fsblk_t start;
> -
> - num = to - from;
> - start = ext4_ext_pblock(ex);
> -
> - ext_debug("free first %u blocks starting %llu\n", num, start);
> - ext4_free_blocks(handle, inode, NULL, start, num, flags);
> -
> - } else {
> - printk(KERN_INFO "strange request: removal(2) "
> - "%u-%u from %u:%u\n",
> - from, to, le32_to_cpu(ex->ee_block), ee_len);
> - }
> + } else
> + ext4_error(sbi->s_sb, "strange request: removal(2) "
> + "%u-%u from %u:%u\n",
> + from, to, le32_to_cpu(ex->ee_block), ee_len);
> return 0;
> }
>
> --
> 1.7.7.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-04-23 08:36:51

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v3 18/18] ext4: Allow punch hole with bigalloc enabled

On Tue 09-04-13 11:14:27, Lukas Czerner wrote:
> In commits 5f95d21fb6f2aaa52830e5b7fb405f6c71d3ab85 and
> 30bc2ec9598a1b156ad75217f2e7d4560efdeeab we've reworked punch_hole
> implementation and there is noting holding us back from using punch hole
> on file system with bigalloc feature enabled.
>
> This has been tested with fsx and xfstests.
Looks good. You can add:
Reviewed-by: Jan Kara <[email protected]>

Honza

>
> Signed-off-by: Lukas Czerner <[email protected]>
> ---
> fs/ext4/inode.c | 5 -----
> 1 files changed, 0 insertions(+), 5 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 0d452c1..87d6171 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3536,11 +3536,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
> if (!S_ISREG(inode->i_mode))
> return -EOPNOTSUPP;
>
> - if (EXT4_SB(sb)->s_cluster_ratio > 1) {
> - /* TODO: Add support for bigalloc file systems */
> - return -EOPNOTSUPP;
> - }
> -
> trace_ext4_punch_hole(inode, offset, length);
>
> /*
> --
> 1.7.7.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-04-23 08:37:17

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v3 17/18] ext4: make punch hole code path work with bigalloc

On Tue 09-04-13 11:14:26, Lukas Czerner wrote:
> Currently punch hole is disabled in file systems with bigalloc
> feature enabled. However the recent changes in punch hole patch should
> make it easier to support punching holes on bigalloc enabled file
> systems.
>
> This commit changes partial_cluster handling in ext4_remove_blocks(),
> ext4_ext_rm_leaf() and ext4_ext_remove_space(). Currently
> partial_cluster is unsigned long long type and it makes sure that we
> will free the partial cluster if all extents has been released from that
> cluster. However it has been specifically designed only for truncate.
>
> With punch hole we can be freeing just some extents in the cluster
> leaving the rest untouched. So we have to make sure that we will notice
> cluster which still has some extents. To do this I've changed
> partial_cluster to be signed long long type. The only scenario where
> this could be a problem is when cluster_size == block size, however in
> that case there would not be any partial clusters so we're safe. For
> bigger clusters the signed type is enough. Now we use the negative value
> in partial_cluster to mark such cluster used, hence we know that we must
> not free it even if all other extents has been freed from such cluster.
>
> This scenario can be described in simple diagram:
>
> |FFF...FF..FF.UUU|
> ^----------^
> punch hole
>
> . - free space
> | - cluster boundary
> F - freed extent
> U - used extent
>
> Also update respective tracepoints to use signed long long type for
> partial_cluster.
The patch looks OK. You can add:
Reviewed-by: Jan Kara <[email protected]>

Just a minor nit - sometimes you use 'signed long long', sometimes 'long
long int', sometimes just 'long long'. In kernel we tend to always use just
'long long' so it would be good to clean that up.

Honza
>
> Signed-off-by: Lukas Czerner <[email protected]>
> ---
> fs/ext4/extents.c | 69 +++++++++++++++++++++++++++++++-----------
> include/trace/events/ext4.h | 25 ++++++++-------
> 2 files changed, 64 insertions(+), 30 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 9023b76..577c4f5 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -2362,7 +2362,7 @@ int ext4_ext_index_trans_blocks(struct inode *inode, int nrblocks, int chunk)
>
> static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
> struct ext4_extent *ex,
> - ext4_fsblk_t *partial_cluster,
> + signed long long *partial_cluster,
> ext4_lblk_t from, ext4_lblk_t to)
> {
> struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> @@ -2391,7 +2391,8 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
> * partial cluster here.
> */
> pblk = ext4_ext_pblock(ex) + ee_len - 1;
> - if (*partial_cluster && (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
> + if ((*partial_cluster > 0) &&
> + (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
> ext4_free_blocks(handle, inode, NULL,
> EXT4_C2B(sbi, *partial_cluster),
> sbi->s_cluster_ratio, flags);
> @@ -2417,23 +2418,41 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
> && to == le32_to_cpu(ex->ee_block) + ee_len - 1) {
> /* tail removal */
> ext4_lblk_t num;
> + unsigned int unaligned;
>
> num = le32_to_cpu(ex->ee_block) + ee_len - from;
> pblk = ext4_ext_pblock(ex) + ee_len - num;
> - ext_debug("free last %u blocks starting %llu\n", num, pblk);
> + /*
> + * Usually we want to free partial cluster at the end of the
> + * extent, except for the situation when the cluster is still
> + * used by any other extent (partial_cluster is negative).
> + */
> + if (*partial_cluster < 0 &&
> + -(*partial_cluster) == EXT4_B2C(sbi, pblk + num - 1))
> + flags |= EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER;
> +
> + ext_debug("free last %u blocks starting %llu partial %lld\n",
> + num, pblk, *partial_cluster);
> ext4_free_blocks(handle, inode, NULL, pblk, num, flags);
> /*
> * If the block range to be freed didn't start at the
> * beginning of a cluster, and we removed the entire
> - * extent, save the partial cluster here, since we
> - * might need to delete if we determine that the
> - * truncate operation has removed all of the blocks in
> - * the cluster.
> + * extent and the cluster is not used by any other extent,
> + * save the partial cluster here, since we might need to
> + * delete if we determine that the truncate operation has
> + * removed all of the blocks in the cluster.
> + *
> + * On the other hand, if we did not manage to free the whole
> + * extent, we have to mark the cluster as used (store negative
> + * cluster number in partial_cluster).
> */
> - if (pblk & (sbi->s_cluster_ratio - 1) &&
> - (ee_len == num))
> + unaligned = pblk & (sbi->s_cluster_ratio - 1);
> + if (unaligned && (ee_len == num) &&
> + (*partial_cluster != -((long long)EXT4_B2C(sbi, pblk))))
> *partial_cluster = EXT4_B2C(sbi, pblk);
> - else
> + else if (unaligned)
> + *partial_cluster = -((long long)EXT4_B2C(sbi, pblk));
> + else if (*partial_cluster > 0)
> *partial_cluster = 0;
> } else
> ext4_error(sbi->s_sb, "strange request: removal(2) "
> @@ -2451,12 +2470,16 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
> * @handle: The journal handle
> * @inode: The files inode
> * @path: The path to the leaf
> + * @partial_cluster: The cluster which we'll have to free if all extents
> + * has been released from it. It gets negative in case
> + * that the cluster is still used.
> * @start: The first block to remove
> * @end: The last block to remove
> */
> static int
> ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
> - struct ext4_ext_path *path, ext4_fsblk_t *partial_cluster,
> + struct ext4_ext_path *path,
> + signed long long *partial_cluster,
> ext4_lblk_t start, ext4_lblk_t end)
> {
> struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> @@ -2469,6 +2492,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
> unsigned short ex_ee_len;
> unsigned uninitialized = 0;
> struct ext4_extent *ex;
> + ext4_fsblk_t pblk;
>
> /* the header must be checked already in ext4_ext_remove_space() */
> ext_debug("truncate since %u in leaf to %u\n", start, end);
> @@ -2507,6 +2531,16 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
>
> /* If this extent is beyond the end of the hole, skip it */
> if (end < ex_ee_block) {
> + /*
> + * We're going to skip this extent and move to another,
> + * so if this extent is not cluster aligned we have
> + * to mark the current cluster as used to avoid
> + * accidentally freeing it later on
> + */
> + pblk = ext4_ext_pblock(ex);
> + if (pblk & (sbi->s_cluster_ratio - 1))
> + *partial_cluster =
> + -((long long)EXT4_B2C(sbi, pblk));
> ex--;
> ex_ee_block = le32_to_cpu(ex->ee_block);
> ex_ee_len = ext4_ext_get_actual_len(ex);
> @@ -2582,7 +2616,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
> sizeof(struct ext4_extent));
> }
> le16_add_cpu(&eh->eh_entries, -1);
> - } else
> + } else if (*partial_cluster > 0)
> *partial_cluster = 0;
>
> err = ext4_ext_dirty(handle, inode, path + depth);
> @@ -2600,11 +2634,10 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
> err = ext4_ext_correct_indexes(handle, inode, path);
>
> /*
> - * If there is still a entry in the leaf node, check to see if
> - * it references the partial cluster. This is the only place
> - * where it could; if it doesn't, we can free the cluster.
> + * Free the partial cluster only if the current extent does not
> + * reference it. Otherwise we might free used cluster.
> */
> - if (*partial_cluster && ex >= EXT_FIRST_EXTENT(eh) &&
> + if (*partial_cluster > 0 &&
> (EXT4_B2C(sbi, ext4_ext_pblock(ex) + ex_ee_len - 1) !=
> *partial_cluster)) {
> int flags = EXT4_FREE_BLOCKS_FORGET;
> @@ -2654,7 +2687,7 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start,
> struct super_block *sb = inode->i_sb;
> int depth = ext_depth(inode);
> struct ext4_ext_path *path = NULL;
> - ext4_fsblk_t partial_cluster = 0;
> + signed long long partial_cluster = 0;
> handle_t *handle;
> int i = 0, err = 0;
>
> @@ -2838,7 +2871,7 @@ again:
> /* If we still have something in the partial cluster and we have removed
> * even the first extent, then we should free the blocks in the partial
> * cluster as well. */
> - if (partial_cluster && path->p_hdr->eh_entries == 0) {
> + if (partial_cluster > 0 && path->p_hdr->eh_entries == 0) {
> int flags = EXT4_FREE_BLOCKS_FORGET;
>
> if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))
> diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
> index c92500c..5028d05 100644
> --- a/include/trace/events/ext4.h
> +++ b/include/trace/events/ext4.h
> @@ -1928,7 +1928,7 @@ TRACE_EVENT(ext4_ext_show_extent,
> TRACE_EVENT(ext4_remove_blocks,
> TP_PROTO(struct inode *inode, struct ext4_extent *ex,
> ext4_lblk_t from, ext4_fsblk_t to,
> - ext4_fsblk_t partial_cluster),
> + long long int partial_cluster),
>
> TP_ARGS(inode, ex, from, to, partial_cluster),
>
> @@ -1937,7 +1937,7 @@ TRACE_EVENT(ext4_remove_blocks,
> __field( ino_t, ino )
> __field( ext4_lblk_t, from )
> __field( ext4_lblk_t, to )
> - __field( ext4_fsblk_t, partial )
> + __field( long long int, partial )
> __field( ext4_fsblk_t, ee_pblk )
> __field( ext4_lblk_t, ee_lblk )
> __field( unsigned short, ee_len )
> @@ -1955,7 +1955,7 @@ TRACE_EVENT(ext4_remove_blocks,
> ),
>
> TP_printk("dev %d,%d ino %lu extent [%u(%llu), %u]"
> - "from %u to %u partial_cluster %u",
> + "from %u to %u partial_cluster %lld",
> MAJOR(__entry->dev), MINOR(__entry->dev),
> (unsigned long) __entry->ino,
> (unsigned) __entry->ee_lblk,
> @@ -1963,19 +1963,20 @@ TRACE_EVENT(ext4_remove_blocks,
> (unsigned short) __entry->ee_len,
> (unsigned) __entry->from,
> (unsigned) __entry->to,
> - (unsigned) __entry->partial)
> + (long long int) __entry->partial)
> );
>
> TRACE_EVENT(ext4_ext_rm_leaf,
> TP_PROTO(struct inode *inode, ext4_lblk_t start,
> - struct ext4_extent *ex, ext4_fsblk_t partial_cluster),
> + struct ext4_extent *ex,
> + long long int partial_cluster),
>
> TP_ARGS(inode, start, ex, partial_cluster),
>
> TP_STRUCT__entry(
> __field( dev_t, dev )
> __field( ino_t, ino )
> - __field( ext4_fsblk_t, partial )
> + __field( long long int, partial )
> __field( ext4_lblk_t, start )
> __field( ext4_lblk_t, ee_lblk )
> __field( ext4_fsblk_t, ee_pblk )
> @@ -1993,14 +1994,14 @@ TRACE_EVENT(ext4_ext_rm_leaf,
> ),
>
> TP_printk("dev %d,%d ino %lu start_lblk %u last_extent [%u(%llu), %u]"
> - "partial_cluster %u",
> + "partial_cluster %lld",
> MAJOR(__entry->dev), MINOR(__entry->dev),
> (unsigned long) __entry->ino,
> (unsigned) __entry->start,
> (unsigned) __entry->ee_lblk,
> (unsigned long long) __entry->ee_pblk,
> (unsigned short) __entry->ee_len,
> - (unsigned) __entry->partial)
> + (long long int) __entry->partial)
> );
>
> TRACE_EVENT(ext4_ext_rm_idx,
> @@ -2058,7 +2059,7 @@ TRACE_EVENT(ext4_ext_remove_space,
>
> TRACE_EVENT(ext4_ext_remove_space_done,
> TP_PROTO(struct inode *inode, ext4_lblk_t start, ext4_lblk_t end,
> - int depth, ext4_lblk_t partial, unsigned short eh_entries),
> + int depth, long long int partial, unsigned short eh_entries),
>
> TP_ARGS(inode, start, end, depth, partial, eh_entries),
>
> @@ -2068,7 +2069,7 @@ TRACE_EVENT(ext4_ext_remove_space_done,
> __field( ext4_lblk_t, start )
> __field( ext4_lblk_t, end )
> __field( int, depth )
> - __field( ext4_lblk_t, partial )
> + __field( long long int, partial )
> __field( unsigned short, eh_entries )
> ),
>
> @@ -2082,14 +2083,14 @@ TRACE_EVENT(ext4_ext_remove_space_done,
> __entry->eh_entries = eh_entries;
> ),
>
> - TP_printk("dev %d,%d ino %lu start %u end %u depth %d partial %u "
> + TP_printk("dev %d,%d ino %lu start %u end %u depth %d partial %lld "
> "remaining_entries %u",
> MAJOR(__entry->dev), MINOR(__entry->dev),
> (unsigned long) __entry->ino,
> (unsigned) __entry->start,
> (unsigned) __entry->end,
> __entry->depth,
> - (unsigned) __entry->partial,
> + (long long int) __entry->partial,
> (unsigned short) __entry->eh_entries)
> );
>
> --
> 1.7.7.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-04-23 09:02:11

by Zheng Liu

[permalink] [raw]
Subject: Re: [PATCH v3 17/18] ext4: make punch hole code path work with bigalloc

On Sat, Apr 20, 2013 at 03:42:41PM +0200, Jan Kara wrote:
> On Tue 09-04-13 11:14:26, Lukas Czerner wrote:
> > Currently punch hole is disabled in file systems with bigalloc
> > feature enabled. However the recent changes in punch hole patch should
> > make it easier to support punching holes on bigalloc enabled file
> > systems.
> >
> > This commit changes partial_cluster handling in ext4_remove_blocks(),
> > ext4_ext_rm_leaf() and ext4_ext_remove_space(). Currently
> > partial_cluster is unsigned long long type and it makes sure that we
> > will free the partial cluster if all extents has been released from that
> > cluster. However it has been specifically designed only for truncate.
> >
> > With punch hole we can be freeing just some extents in the cluster
> > leaving the rest untouched. So we have to make sure that we will notice
> > cluster which still has some extents. To do this I've changed
> > partial_cluster to be signed long long type. The only scenario where
> > this could be a problem is when cluster_size == block size, however in
> > that case there would not be any partial clusters so we're safe. For
> > bigger clusters the signed type is enough. Now we use the negative value
> > in partial_cluster to mark such cluster used, hence we know that we must
> > not free it even if all other extents has been freed from such cluster.
> >
> > This scenario can be described in simple diagram:
> >
> > |FFF...FF..FF.UUU|
> > ^----------^
> > punch hole
> >
> > . - free space
> > | - cluster boundary
> > F - freed extent
> > U - used extent
> >
> > Also update respective tracepoints to use signed long long type for
> > partial_cluster.
> The patch looks OK. You can add:
> Reviewed-by: Jan Kara <[email protected]>
>
> Just a minor nit - sometimes you use 'signed long long', sometimes 'long
> long int', sometimes just 'long long'. In kernel we tend to always use just
> 'long long' so it would be good to clean that up.

Another question is that in patch 01/18 invalidatepage signature is
changed from
int (*invalidatepage) (struct page *, unsigned long);
to
void (*invalidatepage) (struct page *, unsigned int, unsigned int);

The argument type is changed from 'unsigned long' to 'unsigned int'. My
question is why we need to change it.

Thanks,
- Zheng

2013-04-23 14:14:32

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v3 07/18] ceph: use ->invalidatepage() length argument

On Tue, Apr 09, 2013 at 11:14:16AM +0200, Lukas Czerner wrote:
> ->invalidatepage() aop now accepts range to invalidate so we can make
> use of it in ceph_invalidatepage().
>
> Signed-off-by: Lukas Czerner <[email protected]>
> Cc: [email protected]

To ceph-devel,

Since half of this patch series modifies ext4 extensively, and changes
to the other file systems are relatively small, I plan to carry the
invalidatepage patch set in the ext4 tree for the next development
cycle (i.e., not the upcoming merge window, but the next one). To
that end, it would be great if you take a look at this patch set and
send us an Acked-by signoff.

Thanks!!

- Ted

2013-04-23 14:14:54

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v3 05/18] xfs: use ->invalidatepage() length argument

On Tue, Apr 09, 2013 at 11:14:14AM +0200, Lukas Czerner wrote:
> ->invalidatepage() aop now accepts range to invalidate so we can make
> use of it in xfs_vm_invalidatepage()
>
> Signed-off-by: Lukas Czerner <[email protected]>
> Cc: [email protected]

To the xfs team,

Since half of this patch series modifies ext4 extensively, and changes
to the other file systems are relatively small, I plan to carry the
invalidatepage patch set in the ext4 tree for the next development
cycle (i.e., not the upcoming merge window, but the next one). To
that end, it would be great if you take a look at this patch set and
send us an Acked-by signoff.

Thanks!!

- Ted

2013-04-23 14:16:16

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v3 06/18] ocfs2: use ->invalidatepage() length argument

On Tue, Apr 09, 2013 at 11:14:15AM +0200, Lukas Czerner wrote:
> ->invalidatepage() aop now accepts range to invalidate so we can make
> use of it in ocfs2_invalidatepage().
>
> Signed-off-by: Lukas Czerner <[email protected]>
> Cc: Joel Becker <[email protected]>

+Mark Fasheh, ocfs2-devel

To the ocfs2 development team,

Since half of this patch series modifies ext4 extensively, and changes
to the other file systems are relatively small, I plan to carry the
invalidatepage patch set in the ext4 tree for the next development
cycle (i.e., not the upcoming merge window, but the next one). To
that end, it would be great if you take a look at this patch set and
send us an Acked-by signoff.

Thanks!!

- Ted

2013-04-23 14:16:40

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v3 08/18] gfs2: use ->invalidatepage() length argument

On Tue, Apr 09, 2013 at 11:14:17AM +0200, Lukas Czerner wrote:
> ->invalidatepage() aop now accepts range to invalidate so we can make
> use of it in gfs2_invalidatepage().
>
> Signed-off-by: Lukas Czerner <[email protected]>
> Cc: [email protected]

To the gfs2 development team,

Since half of this patch series modifies ext4 extensively, and changes
to the other file systems are relatively small, I plan to carry the
invalidatepage patch set in the ext4 tree for the next development
cycle (i.e., not the upcoming merge window, but the next one). To
that end, it would be great if you take a look at this patch set and
send us an Acked-by signoff.

Thanks!!

- Ted

2013-04-23 14:17:36

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v3 08/18] gfs2: use ->invalidatepage() length argument

On Tue, Apr 23, 2013 at 10:16:31AM -0400, Theodore Ts'o wrote:
> On Tue, Apr 09, 2013 at 11:14:17AM +0200, Lukas Czerner wrote:
> > ->invalidatepage() aop now accepts range to invalidate so we can make
> > use of it in gfs2_invalidatepage().
> >
> > Signed-off-by: Lukas Czerner <[email protected]>
> > Cc: [email protected]
>
> To the gfs2 development team,

Whoops, I missed Steven Whitehouse's Acked-by. Sorry for the noise,

- Ted

2013-04-23 16:04:40

by Sage Weil

[permalink] [raw]
Subject: Re: [PATCH v3 07/18] ceph: use ->invalidatepage() length argument

On Tue, 9 Apr 2013, Lukas Czerner wrote:
> ->invalidatepage() aop now accepts range to invalidate so we can make
> use of it in ceph_invalidatepage().
>
> Signed-off-by: Lukas Czerner <[email protected]>
> Cc: [email protected]

Acked-by: Sage Weil <[email protected]>


> ---
> fs/ceph/addr.c | 12 ++++++------
> 1 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index 168a35a..d953afd 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -164,20 +164,20 @@ static void ceph_invalidatepage(struct page *page, unsigned int offset,
> if (!PageDirty(page))
> pr_err("%p invalidatepage %p page not dirty\n", inode, page);
>
> - if (offset == 0)
> + if (offset == 0 && length == PAGE_CACHE_SIZE)
> ClearPageChecked(page);
>
> ci = ceph_inode(inode);
> - if (offset == 0) {
> - dout("%p invalidatepage %p idx %lu full dirty page %u\n",
> - inode, page, page->index, offset);
> + if (offset == 0 && length == PAGE_CACHE_SIZE) {
> + dout("%p invalidatepage %p idx %lu full dirty page\n",
> + inode, page, page->index);
> ceph_put_wrbuffer_cap_refs(ci, 1, snapc);
> ceph_put_snap_context(snapc);
> page->private = 0;
> ClearPagePrivate(page);
> } else {
> - dout("%p invalidatepage %p idx %lu partial dirty page\n",
> - inode, page, page->index);
> + dout("%p invalidatepage %p idx %lu partial dirty page %u(%u)\n",
> + inode, page, page->index, offset, length);
> }
> }
>
> --
> 1.7.7.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>

2013-04-23 21:06:57

by Ben Myers

[permalink] [raw]
Subject: Re: [PATCH v3 05/18] xfs: use ->invalidatepage() length argument

Hey Lukas,

On Tue, Apr 09, 2013 at 11:14:14AM +0200, Lukas Czerner wrote:
> ->invalidatepage() aop now accepts range to invalidate so we can make
> use of it in xfs_vm_invalidatepage()
>
> Signed-off-by: Lukas Czerner <[email protected]>
> Cc: [email protected]
> ---
> fs/xfs/xfs_aops.c | 5 +++--
> fs/xfs/xfs_trace.h | 41 ++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 43 insertions(+), 3 deletions(-)
>
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index e426796..e8018d3 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -826,8 +826,9 @@ xfs_vm_invalidatepage(
> unsigned int offset,
> unsigned int length)
> {
> - trace_xfs_invalidatepage(page->mapping->host, page, offset);
> - block_invalidatepage(page, offset, PAGE_CACHE_SIZE - offset);
> + trace_xfs_invalidatepage(page->mapping->host, page, offset,
> + length);
> + block_invalidatepage(page, offset, length);
> }
>
> /*
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 16a8129..91d6434 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -991,7 +991,46 @@ DEFINE_EVENT(xfs_page_class, name, \
> TP_ARGS(inode, page, off))
> DEFINE_PAGE_EVENT(xfs_writepage);
> DEFINE_PAGE_EVENT(xfs_releasepage);
> -DEFINE_PAGE_EVENT(xfs_invalidatepage);

I think it might be better if we continue using the xfs_invalidatepage trace
point as part of the xfs_page_class rather than as a separate trace point, like below.

Else this looks great.

Reviewed-by: Ben Myers <[email protected]>

Regards,
Ben



Index: xfs/fs/xfs/xfs_aops.c
===================================================================
--- xfs.orig/fs/xfs/xfs_aops.c
+++ xfs/fs/xfs/xfs_aops.c
@@ -825,7 +825,7 @@ xfs_vm_invalidatepage(
struct page *page,
unsigned long offset)
{
- trace_xfs_invalidatepage(page->mapping->host, page, offset);
+ trace_xfs_invalidatepage(page->mapping->host, page, offset, 0);
block_invalidatepage(page, offset);
}

@@ -920,7 +920,7 @@ xfs_vm_writepage(
int count = 0;
int nonblocking = 0;

- trace_xfs_writepage(inode, page, 0);
+ trace_xfs_writepage(inode, page, 0, 0);

ASSERT(page_has_buffers(page));

@@ -1151,7 +1151,7 @@ xfs_vm_releasepage(
{
int delalloc, unwritten;

- trace_xfs_releasepage(page->mapping->host, page, 0);
+ trace_xfs_releasepage(page->mapping->host, page, 0, 0);

xfs_count_page_state(page, &delalloc, &unwritten);

Index: xfs/fs/xfs/xfs_trace.h
===================================================================
--- xfs.orig/fs/xfs/xfs_trace.h
+++ xfs/fs/xfs/xfs_trace.h
@@ -974,14 +974,16 @@ DEFINE_RW_EVENT(xfs_file_splice_read);
DEFINE_RW_EVENT(xfs_file_splice_write);

DECLARE_EVENT_CLASS(xfs_page_class,
- TP_PROTO(struct inode *inode, struct page *page, unsigned long off),
- TP_ARGS(inode, page, off),
+ TP_PROTO(struct inode *inode, struct page *page, unsigned long off,
+ unsigned int len),
+ TP_ARGS(inode, page, off, len),
TP_STRUCT__entry(
__field(dev_t, dev)
__field(xfs_ino_t, ino)
__field(pgoff_t, pgoff)
__field(loff_t, size)
__field(unsigned long, offset)
+ __field(unsigned int, length)
__field(int, delalloc)
__field(int, unwritten)
),
@@ -995,24 +997,27 @@ DECLARE_EVENT_CLASS(xfs_page_class,
__entry->pgoff = page_offset(page);
__entry->size = i_size_read(inode);
__entry->offset = off;
+ __entry->length = len;
__entry->delalloc = delalloc;
__entry->unwritten = unwritten;
),
TP_printk("dev %d:%d ino 0x%llx pgoff 0x%lx size 0x%llx offset %lx "
- "delalloc %d unwritten %d",
+ "length %x delalloc %d unwritten %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->ino,
__entry->pgoff,
__entry->size,
__entry->offset,
+ __entry->length,
__entry->delalloc,
__entry->unwritten)
)

#define DEFINE_PAGE_EVENT(name) \
DEFINE_EVENT(xfs_page_class, name, \
- TP_PROTO(struct inode *inode, struct page *page, unsigned long off), \
- TP_ARGS(inode, page, off))
+ TP_PROTO(struct inode *inode, struct page *page, unsigned long off, \
+ unsigned int len), \
+ TP_ARGS(inode, page, off, len))
DEFINE_PAGE_EVENT(xfs_writepage);
DEFINE_PAGE_EVENT(xfs_releasepage);
DEFINE_PAGE_EVENT(xfs_invalidatepage);

2013-04-24 10:58:27

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH v3 17/18] ext4: make punch hole code path work with bigalloc

On Sat, 20 Apr 2013, Jan Kara wrote:

> Date: Sat, 20 Apr 2013 15:42:41 +0200
> From: Jan Kara <[email protected]>
> To: Lukas Czerner <[email protected]>
> Cc: [email protected], [email protected],
> [email protected], [email protected]
> Subject: Re: [PATCH v3 17/18] ext4: make punch hole code path work with
> bigalloc
>
> On Tue 09-04-13 11:14:26, Lukas Czerner wrote:
> > Currently punch hole is disabled in file systems with bigalloc
> > feature enabled. However the recent changes in punch hole patch should
> > make it easier to support punching holes on bigalloc enabled file
> > systems.
> >
> > This commit changes partial_cluster handling in ext4_remove_blocks(),
> > ext4_ext_rm_leaf() and ext4_ext_remove_space(). Currently
> > partial_cluster is unsigned long long type and it makes sure that we
> > will free the partial cluster if all extents has been released from that
> > cluster. However it has been specifically designed only for truncate.
> >
> > With punch hole we can be freeing just some extents in the cluster
> > leaving the rest untouched. So we have to make sure that we will notice
> > cluster which still has some extents. To do this I've changed
> > partial_cluster to be signed long long type. The only scenario where
> > this could be a problem is when cluster_size == block size, however in
> > that case there would not be any partial clusters so we're safe. For
> > bigger clusters the signed type is enough. Now we use the negative value
> > in partial_cluster to mark such cluster used, hence we know that we must
> > not free it even if all other extents has been freed from such cluster.
> >
> > This scenario can be described in simple diagram:
> >
> > |FFF...FF..FF.UUU|
> > ^----------^
> > punch hole
> >
> > . - free space
> > | - cluster boundary
> > F - freed extent
> > U - used extent
> >
> > Also update respective tracepoints to use signed long long type for
> > partial_cluster.
> The patch looks OK. You can add:
> Reviewed-by: Jan Kara <[email protected]>
>
> Just a minor nit - sometimes you use 'signed long long', sometimes 'long
> long int', sometimes just 'long long'. In kernel we tend to always use just
> 'long long' so it would be good to clean that up.

Sure, I can do that.

Thanks!
-Lukas

>
> Honza
> >
> > Signed-off-by: Lukas Czerner <[email protected]>
> > ---
> > fs/ext4/extents.c | 69 +++++++++++++++++++++++++++++++-----------
> > include/trace/events/ext4.h | 25 ++++++++-------
> > 2 files changed, 64 insertions(+), 30 deletions(-)
> >
> > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> > index 9023b76..577c4f5 100644
> > --- a/fs/ext4/extents.c
> > +++ b/fs/ext4/extents.c
> > @@ -2362,7 +2362,7 @@ int ext4_ext_index_trans_blocks(struct inode *inode, int nrblocks, int chunk)
> >
> > static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
> > struct ext4_extent *ex,
> > - ext4_fsblk_t *partial_cluster,
> > + signed long long *partial_cluster,
> > ext4_lblk_t from, ext4_lblk_t to)
> > {
> > struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> > @@ -2391,7 +2391,8 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
> > * partial cluster here.
> > */
> > pblk = ext4_ext_pblock(ex) + ee_len - 1;
> > - if (*partial_cluster && (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
> > + if ((*partial_cluster > 0) &&
> > + (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
> > ext4_free_blocks(handle, inode, NULL,
> > EXT4_C2B(sbi, *partial_cluster),
> > sbi->s_cluster_ratio, flags);
> > @@ -2417,23 +2418,41 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
> > && to == le32_to_cpu(ex->ee_block) + ee_len - 1) {
> > /* tail removal */
> > ext4_lblk_t num;
> > + unsigned int unaligned;
> >
> > num = le32_to_cpu(ex->ee_block) + ee_len - from;
> > pblk = ext4_ext_pblock(ex) + ee_len - num;
> > - ext_debug("free last %u blocks starting %llu\n", num, pblk);
> > + /*
> > + * Usually we want to free partial cluster at the end of the
> > + * extent, except for the situation when the cluster is still
> > + * used by any other extent (partial_cluster is negative).
> > + */
> > + if (*partial_cluster < 0 &&
> > + -(*partial_cluster) == EXT4_B2C(sbi, pblk + num - 1))
> > + flags |= EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER;
> > +
> > + ext_debug("free last %u blocks starting %llu partial %lld\n",
> > + num, pblk, *partial_cluster);
> > ext4_free_blocks(handle, inode, NULL, pblk, num, flags);
> > /*
> > * If the block range to be freed didn't start at the
> > * beginning of a cluster, and we removed the entire
> > - * extent, save the partial cluster here, since we
> > - * might need to delete if we determine that the
> > - * truncate operation has removed all of the blocks in
> > - * the cluster.
> > + * extent and the cluster is not used by any other extent,
> > + * save the partial cluster here, since we might need to
> > + * delete if we determine that the truncate operation has
> > + * removed all of the blocks in the cluster.
> > + *
> > + * On the other hand, if we did not manage to free the whole
> > + * extent, we have to mark the cluster as used (store negative
> > + * cluster number in partial_cluster).
> > */
> > - if (pblk & (sbi->s_cluster_ratio - 1) &&
> > - (ee_len == num))
> > + unaligned = pblk & (sbi->s_cluster_ratio - 1);
> > + if (unaligned && (ee_len == num) &&
> > + (*partial_cluster != -((long long)EXT4_B2C(sbi, pblk))))
> > *partial_cluster = EXT4_B2C(sbi, pblk);
> > - else
> > + else if (unaligned)
> > + *partial_cluster = -((long long)EXT4_B2C(sbi, pblk));
> > + else if (*partial_cluster > 0)
> > *partial_cluster = 0;
> > } else
> > ext4_error(sbi->s_sb, "strange request: removal(2) "
> > @@ -2451,12 +2470,16 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
> > * @handle: The journal handle
> > * @inode: The files inode
> > * @path: The path to the leaf
> > + * @partial_cluster: The cluster which we'll have to free if all extents
> > + * has been released from it. It gets negative in case
> > + * that the cluster is still used.
> > * @start: The first block to remove
> > * @end: The last block to remove
> > */
> > static int
> > ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
> > - struct ext4_ext_path *path, ext4_fsblk_t *partial_cluster,
> > + struct ext4_ext_path *path,
> > + signed long long *partial_cluster,
> > ext4_lblk_t start, ext4_lblk_t end)
> > {
> > struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> > @@ -2469,6 +2492,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
> > unsigned short ex_ee_len;
> > unsigned uninitialized = 0;
> > struct ext4_extent *ex;
> > + ext4_fsblk_t pblk;
> >
> > /* the header must be checked already in ext4_ext_remove_space() */
> > ext_debug("truncate since %u in leaf to %u\n", start, end);
> > @@ -2507,6 +2531,16 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
> >
> > /* If this extent is beyond the end of the hole, skip it */
> > if (end < ex_ee_block) {
> > + /*
> > + * We're going to skip this extent and move to another,
> > + * so if this extent is not cluster aligned we have
> > + * to mark the current cluster as used to avoid
> > + * accidentally freeing it later on
> > + */
> > + pblk = ext4_ext_pblock(ex);
> > + if (pblk & (sbi->s_cluster_ratio - 1))
> > + *partial_cluster =
> > + -((long long)EXT4_B2C(sbi, pblk));
> > ex--;
> > ex_ee_block = le32_to_cpu(ex->ee_block);
> > ex_ee_len = ext4_ext_get_actual_len(ex);
> > @@ -2582,7 +2616,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
> > sizeof(struct ext4_extent));
> > }
> > le16_add_cpu(&eh->eh_entries, -1);
> > - } else
> > + } else if (*partial_cluster > 0)
> > *partial_cluster = 0;
> >
> > err = ext4_ext_dirty(handle, inode, path + depth);
> > @@ -2600,11 +2634,10 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
> > err = ext4_ext_correct_indexes(handle, inode, path);
> >
> > /*
> > - * If there is still a entry in the leaf node, check to see if
> > - * it references the partial cluster. This is the only place
> > - * where it could; if it doesn't, we can free the cluster.
> > + * Free the partial cluster only if the current extent does not
> > + * reference it. Otherwise we might free used cluster.
> > */
> > - if (*partial_cluster && ex >= EXT_FIRST_EXTENT(eh) &&
> > + if (*partial_cluster > 0 &&
> > (EXT4_B2C(sbi, ext4_ext_pblock(ex) + ex_ee_len - 1) !=
> > *partial_cluster)) {
> > int flags = EXT4_FREE_BLOCKS_FORGET;
> > @@ -2654,7 +2687,7 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start,
> > struct super_block *sb = inode->i_sb;
> > int depth = ext_depth(inode);
> > struct ext4_ext_path *path = NULL;
> > - ext4_fsblk_t partial_cluster = 0;
> > + signed long long partial_cluster = 0;
> > handle_t *handle;
> > int i = 0, err = 0;
> >
> > @@ -2838,7 +2871,7 @@ again:
> > /* If we still have something in the partial cluster and we have removed
> > * even the first extent, then we should free the blocks in the partial
> > * cluster as well. */
> > - if (partial_cluster && path->p_hdr->eh_entries == 0) {
> > + if (partial_cluster > 0 && path->p_hdr->eh_entries == 0) {
> > int flags = EXT4_FREE_BLOCKS_FORGET;
> >
> > if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))
> > diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
> > index c92500c..5028d05 100644
> > --- a/include/trace/events/ext4.h
> > +++ b/include/trace/events/ext4.h
> > @@ -1928,7 +1928,7 @@ TRACE_EVENT(ext4_ext_show_extent,
> > TRACE_EVENT(ext4_remove_blocks,
> > TP_PROTO(struct inode *inode, struct ext4_extent *ex,
> > ext4_lblk_t from, ext4_fsblk_t to,
> > - ext4_fsblk_t partial_cluster),
> > + long long int partial_cluster),
> >
> > TP_ARGS(inode, ex, from, to, partial_cluster),
> >
> > @@ -1937,7 +1937,7 @@ TRACE_EVENT(ext4_remove_blocks,
> > __field( ino_t, ino )
> > __field( ext4_lblk_t, from )
> > __field( ext4_lblk_t, to )
> > - __field( ext4_fsblk_t, partial )
> > + __field( long long int, partial )
> > __field( ext4_fsblk_t, ee_pblk )
> > __field( ext4_lblk_t, ee_lblk )
> > __field( unsigned short, ee_len )
> > @@ -1955,7 +1955,7 @@ TRACE_EVENT(ext4_remove_blocks,
> > ),
> >
> > TP_printk("dev %d,%d ino %lu extent [%u(%llu), %u]"
> > - "from %u to %u partial_cluster %u",
> > + "from %u to %u partial_cluster %lld",
> > MAJOR(__entry->dev), MINOR(__entry->dev),
> > (unsigned long) __entry->ino,
> > (unsigned) __entry->ee_lblk,
> > @@ -1963,19 +1963,20 @@ TRACE_EVENT(ext4_remove_blocks,
> > (unsigned short) __entry->ee_len,
> > (unsigned) __entry->from,
> > (unsigned) __entry->to,
> > - (unsigned) __entry->partial)
> > + (long long int) __entry->partial)
> > );
> >
> > TRACE_EVENT(ext4_ext_rm_leaf,
> > TP_PROTO(struct inode *inode, ext4_lblk_t start,
> > - struct ext4_extent *ex, ext4_fsblk_t partial_cluster),
> > + struct ext4_extent *ex,
> > + long long int partial_cluster),
> >
> > TP_ARGS(inode, start, ex, partial_cluster),
> >
> > TP_STRUCT__entry(
> > __field( dev_t, dev )
> > __field( ino_t, ino )
> > - __field( ext4_fsblk_t, partial )
> > + __field( long long int, partial )
> > __field( ext4_lblk_t, start )
> > __field( ext4_lblk_t, ee_lblk )
> > __field( ext4_fsblk_t, ee_pblk )
> > @@ -1993,14 +1994,14 @@ TRACE_EVENT(ext4_ext_rm_leaf,
> > ),
> >
> > TP_printk("dev %d,%d ino %lu start_lblk %u last_extent [%u(%llu), %u]"
> > - "partial_cluster %u",
> > + "partial_cluster %lld",
> > MAJOR(__entry->dev), MINOR(__entry->dev),
> > (unsigned long) __entry->ino,
> > (unsigned) __entry->start,
> > (unsigned) __entry->ee_lblk,
> > (unsigned long long) __entry->ee_pblk,
> > (unsigned short) __entry->ee_len,
> > - (unsigned) __entry->partial)
> > + (long long int) __entry->partial)
> > );
> >
> > TRACE_EVENT(ext4_ext_rm_idx,
> > @@ -2058,7 +2059,7 @@ TRACE_EVENT(ext4_ext_remove_space,
> >
> > TRACE_EVENT(ext4_ext_remove_space_done,
> > TP_PROTO(struct inode *inode, ext4_lblk_t start, ext4_lblk_t end,
> > - int depth, ext4_lblk_t partial, unsigned short eh_entries),
> > + int depth, long long int partial, unsigned short eh_entries),
> >
> > TP_ARGS(inode, start, end, depth, partial, eh_entries),
> >
> > @@ -2068,7 +2069,7 @@ TRACE_EVENT(ext4_ext_remove_space_done,
> > __field( ext4_lblk_t, start )
> > __field( ext4_lblk_t, end )
> > __field( int, depth )
> > - __field( ext4_lblk_t, partial )
> > + __field( long long int, partial )
> > __field( unsigned short, eh_entries )
> > ),
> >
> > @@ -2082,14 +2083,14 @@ TRACE_EVENT(ext4_ext_remove_space_done,
> > __entry->eh_entries = eh_entries;
> > ),
> >
> > - TP_printk("dev %d,%d ino %lu start %u end %u depth %d partial %u "
> > + TP_printk("dev %d,%d ino %lu start %u end %u depth %d partial %lld "
> > "remaining_entries %u",
> > MAJOR(__entry->dev), MINOR(__entry->dev),
> > (unsigned long) __entry->ino,
> > (unsigned) __entry->start,
> > (unsigned) __entry->end,
> > __entry->depth,
> > - (unsigned) __entry->partial,
> > + (long long int) __entry->partial,
> > (unsigned short) __entry->eh_entries)
> > );
> >
> > --
> > 1.7.7.6
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2013-04-24 11:08:48

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH v3 17/18] ext4: make punch hole code path work with bigalloc

On Tue, 23 Apr 2013, Zheng Liu wrote:

> Date: Tue, 23 Apr 2013 17:19:28 +0800
> From: Zheng Liu <[email protected]>
> To: Jan Kara <[email protected]>
> Cc: Lukas Czerner <[email protected]>, [email protected],
> [email protected], [email protected],
> [email protected]
> Subject: Re: [PATCH v3 17/18] ext4: make punch hole code path work with
> bigalloc
>
> On Sat, Apr 20, 2013 at 03:42:41PM +0200, Jan Kara wrote:
> > On Tue 09-04-13 11:14:26, Lukas Czerner wrote:
> > > Currently punch hole is disabled in file systems with bigalloc
> > > feature enabled. However the recent changes in punch hole patch should
> > > make it easier to support punching holes on bigalloc enabled file
> > > systems.
> > >
> > > This commit changes partial_cluster handling in ext4_remove_blocks(),
> > > ext4_ext_rm_leaf() and ext4_ext_remove_space(). Currently
> > > partial_cluster is unsigned long long type and it makes sure that we
> > > will free the partial cluster if all extents has been released from that
> > > cluster. However it has been specifically designed only for truncate.
> > >
> > > With punch hole we can be freeing just some extents in the cluster
> > > leaving the rest untouched. So we have to make sure that we will notice
> > > cluster which still has some extents. To do this I've changed
> > > partial_cluster to be signed long long type. The only scenario where
> > > this could be a problem is when cluster_size == block size, however in
> > > that case there would not be any partial clusters so we're safe. For
> > > bigger clusters the signed type is enough. Now we use the negative value
> > > in partial_cluster to mark such cluster used, hence we know that we must
> > > not free it even if all other extents has been freed from such cluster.
> > >
> > > This scenario can be described in simple diagram:
> > >
> > > |FFF...FF..FF.UUU|
> > > ^----------^
> > > punch hole
> > >
> > > . - free space
> > > | - cluster boundary
> > > F - freed extent
> > > U - used extent
> > >
> > > Also update respective tracepoints to use signed long long type for
> > > partial_cluster.
> > The patch looks OK. You can add:
> > Reviewed-by: Jan Kara <[email protected]>
> >
> > Just a minor nit - sometimes you use 'signed long long', sometimes 'long
> > long int', sometimes just 'long long'. In kernel we tend to always use just
> > 'long long' so it would be good to clean that up.
>
> Another question is that in patch 01/18 invalidatepage signature is
> changed from
> int (*invalidatepage) (struct page *, unsigned long);
> to
> void (*invalidatepage) (struct page *, unsigned int, unsigned int);
>
> The argument type is changed from 'unsigned long' to 'unsigned int'. My
> question is why we need to change it.
>
> Thanks,
> - Zheng
>

Hi Zheng,

this was changed on Hugh Dickins request because it makes it clearer
that those args are indeed intended to be offsets within a page
(0..PAGE_CACHE_SIZE).

Even though PAGE_CACHE_SIZE can be defined as unsigned long, this is
only for convenience. Here is quote from Hugh:

"
They would be defined as unsigned long so that they can be used in
masks like ~(PAGE_SIZE - 1), and behave as expected on addresses,
without needing casts to be added all over.

We do not (currently!) expect PAGE_SIZE or PAGE_CACHE_SIZE to grow
beyond an unsigned int - but indeed they can be larger than what's
held in an unsigned short (look no further than ia64 or ppc64).

For more reassurance, see include/linux/highmem.h, which declares
zero_user_segments() and others: unsigned int (well, unsigned with
the int implicit) for offsets within a page.

Hugh
"

I should probably mention that in the description.

Thanks!
-Lukas

2013-04-24 11:11:45

by Zheng Liu

[permalink] [raw]
Subject: Re: [PATCH v3 17/18] ext4: make punch hole code path work with bigalloc

On Wed, Apr 24, 2013 at 01:08:17PM +0200, Lukáš Czerner wrote:
> On Tue, 23 Apr 2013, Zheng Liu wrote:
[snip]
> > > > Also update respective tracepoints to use signed long long type for
> > > > partial_cluster.
> > > The patch looks OK. You can add:
> > > Reviewed-by: Jan Kara <[email protected]>
> > >
> > > Just a minor nit - sometimes you use 'signed long long', sometimes 'long
> > > long int', sometimes just 'long long'. In kernel we tend to always use just
> > > 'long long' so it would be good to clean that up.
> >
> > Another question is that in patch 01/18 invalidatepage signature is
> > changed from
> > int (*invalidatepage) (struct page *, unsigned long);
> > to
> > void (*invalidatepage) (struct page *, unsigned int, unsigned int);
> >
> > The argument type is changed from 'unsigned long' to 'unsigned int'. My
> > question is why we need to change it.
> >
> > Thanks,
> > - Zheng
> >
>
> Hi Zheng,
>
> this was changed on Hugh Dickins request because it makes it clearer
> that those args are indeed intended to be offsets within a page
> (0..PAGE_CACHE_SIZE).
>
> Even though PAGE_CACHE_SIZE can be defined as unsigned long, this is
> only for convenience. Here is quote from Hugh:
>
> "
> They would be defined as unsigned long so that they can be used in
> masks like ~(PAGE_SIZE - 1), and behave as expected on addresses,
> without needing casts to be added all over.
>
> We do not (currently!) expect PAGE_SIZE or PAGE_CACHE_SIZE to grow
> beyond an unsigned int - but indeed they can be larger than what's
> held in an unsigned short (look no further than ia64 or ppc64).
>
> For more reassurance, see include/linux/highmem.h, which declares
> zero_user_segments() and others: unsigned int (well, unsigned with
> the int implicit) for offsets within a page.
>
> Hugh
> "
>
> I should probably mention that in the description.

Ah, thanks for your explanation. I must miss something. :-(

Regards,
- Zheng

2013-05-02 22:00:47

by Joel Becker

[permalink] [raw]
Subject: Re: [PATCH v3 06/18] ocfs2: use ->invalidatepage() length argument

Acked-by: Joel Becker <[email protected]>

On Tue, Apr 23, 2013 at 10:16:04AM -0400, Theodore Ts'o wrote:
> On Tue, Apr 09, 2013 at 11:14:15AM +0200, Lukas Czerner wrote:
> > ->invalidatepage() aop now accepts range to invalidate so we can make
> > use of it in ocfs2_invalidatepage().
> >
> > Signed-off-by: Lukas Czerner <[email protected]>
> > Cc: Joel Becker <[email protected]>
>
> +Mark Fasheh, ocfs2-devel
>
> To the ocfs2 development team,
>
> Since half of this patch series modifies ext4 extensively, and changes
> to the other file systems are relatively small, I plan to carry the
> invalidatepage patch set in the ext4 tree for the next development
> cycle (i.e., not the upcoming merge window, but the next one). To
> that end, it would be great if you take a look at this patch set and
> send us an Acked-by signoff.
>
> Thanks!!
>
> - Ted

--

"Ninety feet between bases is perhaps as close as man has ever come
to perfection."
- Red Smith

http://www.jlbec.org/
[email protected]