2013-02-05 09:11:53

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v2 00/18] change invalidatepage prototype to accept length

Hi,

This set of patches are aimed to allow truncate_inode_pages_range() handle
ranges which are not aligned at the end of the page. Currently it will
hit BUG_ON() when the end of the range is not aligned. Punch hole feature
however can benefit from this ability saving file systems some work not
forcing them to implement their own invalidate code to handle unaligned
ranges.

In order for this to woke we need change ->invalidatepage() address space
operation to to accept range to invalidate by adding 'length' argument in
addition to 'offset'. This is different from my previous attempt to create
new aop ->invalidatepage_range (http://lwn.net/Articles/514828/) which I
reconsidered to ne unnecessary.

For description purposes this patch set can be divided into following
groups:


patch 0001: Change ->invalidatepage() prototype adding 'length' argument
and changing all the instances. In very simple cases file
system methods are completely adapted, otherwise only
prototype is changed and the rest will follow. This patch
also implement the 'range' invalidation in
block_invalidatepage().

patch 0002 - 0009:
Make the use of new 'length' argument in the file system
itself. Some file systems can take advantage of it trying
to invalidate only portion of the page if possible, some
can't, however none of the file systems currently attempt
to truncate non page aligned ranges.


patch 0010: Teach truncate_inode_pages_range() to handle non page aligned
ranges.

patch 0012 - 0018:
Ext4 changes build on top of previous changes, simplifying
punch hole code. Not all changes are realated specifically
to invalidatepage() change, but all are related to the punch
hole feature.

Even though this patch set would mainly affect functionality of the file
file systems implementing punch hole I've tested all the following file
system using xfstests without noticing any bugs related to this change.

ext3, ext4, xfs, btrfs, gfs2 and reiserfs

the much smaller changes in other file systems has not been directly tested,
so please review.


Changes in v2:
patch 10/18 - add more comments. Do not initialize at declaration
to make things more obvious and better to read.


Thanks!
-Lukas

--

Documentation/filesystems/Locking | 6 +-
Documentation/filesystems/vfs.txt | 20 ++--
fs/9p/vfs_addr.c | 5 +-
fs/afs/file.c | 10 +-
fs/btrfs/disk-io.c | 3 +-
fs/btrfs/extent_io.c | 2 +-
fs/btrfs/inode.c | 3 +-
fs/buffer.c | 21 ++-
fs/ceph/addr.c | 15 +-
fs/cifs/file.c | 5 +-
fs/exofs/inode.c | 6 +-
fs/ext3/inode.c | 9 +-
fs/ext4/ext4.h | 14 +-
fs/ext4/extents.c | 188 ++++++++--------------
fs/ext4/indirect.c | 13 +--
fs/ext4/inode.c | 320 ++++++++++++++++--------------------
fs/f2fs/data.c | 3 +-
fs/f2fs/node.c | 3 +-
fs/gfs2/aops.c | 17 ++-
fs/jbd/transaction.c | 19 ++-
fs/jbd2/transaction.c | 24 ++-
fs/jfs/jfs_metapage.c | 5 +-
fs/logfs/file.c | 3 +-
fs/logfs/segment.c | 3 +-
fs/nfs/file.c | 8 +-
fs/ntfs/aops.c | 2 +-
fs/ocfs2/aops.c | 5 +-
fs/reiserfs/inode.c | 12 +-
fs/ubifs/file.c | 5 +-
fs/xfs/xfs_aops.c | 10 +-
fs/xfs/xfs_trace.h | 41 +++++-
include/linux/buffer_head.h | 3 +-
include/linux/fs.h | 2 +-
include/linux/jbd.h | 2 +-
include/linux/jbd2.h | 2 +-
include/linux/mm.h | 3 +-
include/trace/events/ext3.h | 12 +-
include/trace/events/ext4.h | 64 +++++---
mm/readahead.c | 2 +-
mm/truncate.c | 117 ++++++++++----
40 files changed, 538 insertions(+), 469 deletions(-)

[PATCH v2 01/18] mm: change invalidatepage prototype to accept
[PATCH v2 02/18] jbd2: change jbd2_journal_invalidatepage to accept
[PATCH v2 03/18] ext4: use ->invalidatepage() length argument
[PATCH v2 04/18] jbd: change journal_invalidatepage() to accept
[PATCH v2 05/18] xfs: use ->invalidatepage() length argument
[PATCH v2 06/18] ocfs2: use ->invalidatepage() length argument
[PATCH v2 07/18] ceph: use ->invalidatepage() length argument
[PATCH v2 08/18] gfs2: use ->invalidatepage() length argument
[PATCH v2 09/18] reiserfs: use ->invalidatepage() length argument
[PATCH v2 10/18] mm: teach truncate_inode_pages_range() to handle
[PATCH v2 11/18] Revert "ext4: remove no longer used functions in
[PATCH v2 12/18] Revert "ext4: fix fsx truncate failure"
[PATCH v2 13/18] ext4: use ext4_zero_partial_blocks in punch_hole
[PATCH v2 14/18] ext4: remove unused discard_partial_page_buffers
[PATCH v2 15/18] ext4: remove unused code from ext4_remove_blocks()
[PATCH v2 16/18] ext4: update ext4_ext_remove_space trace point
[PATCH v2 17/18] ext4: make punch hole code path work with bigalloc
[PATCH v2 18/18] ext4: Allow punch hole with bigalloc enabled

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>


2013-02-05 09:11:55

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v2 02/18] jbd2: change jbd2_journal_invalidatepage to accept length

invalidatepage now accepts range to invalidate and there are two file
system using jbd2 also implementing punch hole feature which can benefit
from this. We need to implement the same thing for jbd2 layer in order to
allow those file system take benefit of this functionality.

This commit adds length argument to the jbd2_journal_invalidatepage()
and updates all instances in ext4 and ocfs2.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/inode.c | 3 ++-
fs/jbd2/transaction.c | 24 +++++++++++++++++-------
fs/ocfs2/aops.c | 3 ++-
include/linux/jbd2.h | 2 +-
4 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index cbf6574..a3be92c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2910,7 +2910,8 @@ static int __ext4_journalled_invalidatepage(struct page *page,
if (offset == 0)
ClearPageChecked(page);

- return jbd2_journal_invalidatepage(journal, page, offset);
+ return jbd2_journal_invalidatepage(journal, page, offset,
+ PAGE_CACHE_SIZE - offset);
}

/* Wrapper for aops... */
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index df9f297..05b6d59 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1997,18 +1997,23 @@ zap_buffer_unlocked:
* void jbd2_journal_invalidatepage()
* @journal: journal to use for flush...
* @page: page to flush
- * @offset: length of page to invalidate.
+ * @offset: start of the range to invalidate
+ * @length: length of the range to invalidate
*
- * Reap page buffers containing data after offset in page. Can return -EBUSY
- * if buffers are part of the committing transaction and the page is straddling
- * i_size. Caller then has to wait for current commit and try again.
+ * Reap page buffers containing data after in the specified range in page.
+ * Can return -EBUSY if buffers are part of the committing transaction and
+ * the page is straddling i_size. Caller then has to wait for current commit
+ * and try again.
*/
int jbd2_journal_invalidatepage(journal_t *journal,
struct page *page,
- unsigned long offset)
+ unsigned int offset,
+ unsigned int length)
{
struct buffer_head *head, *bh, *next;
+ unsigned int stop = offset + length;
unsigned int curr_off = 0;
+ int partial_page = (offset || length < PAGE_CACHE_SIZE);
int may_free = 1;
int ret = 0;

@@ -2017,6 +2022,8 @@ int jbd2_journal_invalidatepage(journal_t *journal,
if (!page_has_buffers(page))
return 0;

+ BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
+
/* We will potentially be playing with lists other than just the
* data lists (especially for journaled data mode), so be
* cautious in our locking. */
@@ -2026,10 +2033,13 @@ int jbd2_journal_invalidatepage(journal_t *journal,
unsigned int next_off = curr_off + bh->b_size;
next = bh->b_this_page;

+ if (next_off > stop)
+ return 0;
+
if (offset <= curr_off) {
/* This block is wholly outside the truncation point */
lock_buffer(bh);
- ret = journal_unmap_buffer(journal, bh, offset > 0);
+ ret = journal_unmap_buffer(journal, bh, partial_page);
unlock_buffer(bh);
if (ret < 0)
return ret;
@@ -2040,7 +2050,7 @@ int jbd2_journal_invalidatepage(journal_t *journal,

} while (bh != head);

- if (!offset) {
+ if (!partial_page) {
if (may_free && try_to_free_buffers(page))
J_ASSERT(!page_has_buffers(page));
}
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 5d77cf2..1393114 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -608,7 +608,8 @@ static void ocfs2_invalidatepage(struct page *page, unsigned int offset,
{
journal_t *journal = OCFS2_SB(page->mapping->host->i_sb)->journal->j_journal;

- jbd2_journal_invalidatepage(journal, page, offset);
+ jbd2_journal_invalidatepage(journal, page, offset,
+ PAGE_CACHE_SIZE - offset);
}

static int ocfs2_releasepage(struct page *page, gfp_t wait)
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index e30b663..a501f33 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1099,7 +1099,7 @@ extern int jbd2_journal_dirty_metadata (handle_t *, struct buffer_head *);
extern int jbd2_journal_forget (handle_t *, struct buffer_head *);
extern void journal_sync_buffer (struct buffer_head *);
extern int jbd2_journal_invalidatepage(journal_t *,
- struct page *, unsigned long);
+ struct page *, unsigned int, unsigned int);
extern int jbd2_journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
extern int jbd2_journal_stop(handle_t *);
extern int jbd2_journal_flush (journal_t *);
--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-02-05 09:11:54

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v2 01/18] mm: change invalidatepage prototype to accept length

Currently there is no way to truncate partial page where the end
truncate point is not at the end of the page. This is because it was not
needed and the functionality was enough for file system truncate
operation to work properly. However more file systems now support punch
hole feature and it can benefit from mm supporting truncating page just
up to the certain point.

Specifically, with this functionality truncate_inode_pages_range() can
be changed so it supports truncating partial page at the end of the
range (currently it will BUG_ON() if 'end' is not at the end of the
page).

This commit changes the invalidatepage() address space operation
prototype to accept range to be invalidated and update all the instances
for it.

We also change the block_invalidatepage() in the same way and actually
make a use of the new length argument implementing range invalidation.

Actual file system implementations will follow except the file systems
where the changes are really simple and should not change the behaviour
in any way .Implementation for truncate_page_range() which will be able
to accept page unaligned ranges will follow as well.

Signed-off-by: Lukas Czerner <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Hugh Dickins <[email protected]>
---
Documentation/filesystems/Locking | 6 +++---
Documentation/filesystems/vfs.txt | 20 ++++++++++----------
fs/9p/vfs_addr.c | 5 +++--
fs/afs/file.c | 10 ++++++----
fs/btrfs/disk-io.c | 3 ++-
fs/btrfs/extent_io.c | 2 +-
fs/btrfs/inode.c | 3 ++-
fs/buffer.c | 21 ++++++++++++++++++---
fs/ceph/addr.c | 5 +++--
fs/cifs/file.c | 5 +++--
fs/exofs/inode.c | 6 ++++--
fs/ext3/inode.c | 3 ++-
fs/ext4/inode.c | 18 +++++++++++-------
fs/f2fs/data.c | 3 ++-
fs/f2fs/node.c | 3 ++-
fs/gfs2/aops.c | 8 +++++---
fs/jfs/jfs_metapage.c | 5 +++--
fs/logfs/file.c | 3 ++-
fs/logfs/segment.c | 3 ++-
fs/nfs/file.c | 8 +++++---
fs/ntfs/aops.c | 2 +-
fs/ocfs2/aops.c | 3 ++-
fs/reiserfs/inode.c | 3 ++-
fs/ubifs/file.c | 5 +++--
fs/xfs/xfs_aops.c | 7 ++++---
include/linux/buffer_head.h | 3 ++-
include/linux/fs.h | 2 +-
include/linux/mm.h | 3 ++-
mm/readahead.c | 2 +-
mm/truncate.c | 15 +++++++++------
30 files changed, 116 insertions(+), 69 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index f48e0c6..8043a87 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -187,7 +187,7 @@ prototypes:
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata);
sector_t (*bmap)(struct address_space *, sector_t);
- int (*invalidatepage) (struct page *, unsigned long);
+ void (*invalidatepage) (struct page *, unsigned int, unsigned int);
int (*releasepage) (struct page *, int);
void (*freepage)(struct page *);
int (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
@@ -308,8 +308,8 @@ filesystems and by the swapper. The latter will eventually go away. Please,
keep it that way and don't breed new callers.

->invalidatepage() is called when the filesystem must attempt to drop
-some or all of the buffers from the page when it is being truncated. It
-returns zero on success. If ->invalidatepage is zero, the kernel uses
+some or all of the buffers from the page when it is being truncated. It
+returns zero on success. If ->invalidatepage is zero, the kernel uses
block_invalidatepage() instead.

->releasepage() is called when the kernel is about to try to drop the
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index e3869098..279fb73 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -549,7 +549,7 @@ struct address_space_operations
-------------------------------

This describes how the VFS can manipulate mapping of a file to page cache in
-your filesystem. As of kernel 2.6.22, the following members are defined:
+your filesystem. The following members are defined:

struct address_space_operations {
int (*writepage)(struct page *page, struct writeback_control *wbc);
@@ -566,7 +566,7 @@ struct address_space_operations {
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata);
sector_t (*bmap)(struct address_space *, sector_t);
- int (*invalidatepage) (struct page *, unsigned long);
+ void (*invalidatepage) (struct page *, unsigned int, unsigned int);
int (*releasepage) (struct page *, int);
void (*freepage)(struct page *);
ssize_t (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
@@ -685,14 +685,14 @@ struct address_space_operations {
invalidatepage: If a page has PagePrivate set, then invalidatepage
will be called when part or all of the page is to be removed
from the address space. This generally corresponds to either a
- truncation or a complete invalidation of the address space
- (in the latter case 'offset' will always be 0).
- Any private data associated with the page should be updated
- to reflect this truncation. If offset is 0, then
- the private data should be released, because the page
- must be able to be completely discarded. This may be done by
- calling the ->releasepage function, but in this case the
- release MUST succeed.
+ truncation, punch hole or a complete invalidation of the address
+ space (in the latter case 'offset' will always be 0 and 'length'
+ will be PAGE_CACHE_SIZE). Any private data associated with the page
+ should be updated to reflect this truncation. If offset is 0 and
+ length is PAGE_CACHE_SIZE, then the private data should be released,
+ because the page must be able to be completely discarded. This may
+ be done by calling the ->releasepage function, but in this case the
+ release MUST succeed.

releasepage: releasepage is called on PagePrivate pages to indicate
that the page should be freed if possible. ->releasepage
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index 0ad61c6..3bf654a 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -147,13 +147,14 @@ static int v9fs_release_page(struct page *page, gfp_t gfp)
* @offset: offset in the page
*/

-static void v9fs_invalidate_page(struct page *page, unsigned long offset)
+static void v9fs_invalidate_page(struct page *page, unsigned int offset,
+ unsigned int length)
{
/*
* If called with zero offset, we should release
* the private state assocated with the page
*/
- if (offset == 0)
+ if (offset == 0 && length == PAGE_CACHE_SIZE)
v9fs_fscache_invalidate_page(page);
}

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 8f6e923..66d50fe 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -19,7 +19,8 @@
#include "internal.h"

static int afs_readpage(struct file *file, struct page *page);
-static void afs_invalidatepage(struct page *page, unsigned long offset);
+static void afs_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length);
static int afs_releasepage(struct page *page, gfp_t gfp_flags);
static int afs_launder_page(struct page *page);

@@ -310,16 +311,17 @@ static int afs_launder_page(struct page *page)
* - release a page and clean up its private data if offset is 0 (indicating
* the entire page)
*/
-static void afs_invalidatepage(struct page *page, unsigned long offset)
+static void afs_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct afs_writeback *wb = (struct afs_writeback *) page_private(page);

- _enter("{%lu},%lu", page->index, offset);
+ _enter("{%lu},%u,%u", page->index, offset, length);

BUG_ON(!PageLocked(page));

/* we clean up only if the entire page is being invalidated */
- if (offset == 0) {
+ if (offset == 0 && length == PAGE_CACHE_SIZE) {
#ifdef CONFIG_AFS_FSCACHE
if (PageFsCache(page)) {
struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a8f652d..9db8632 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -984,7 +984,8 @@ static int btree_releasepage(struct page *page, gfp_t gfp_flags)
return try_release_extent_buffer(page, gfp_flags);
}

-static void btree_invalidatepage(struct page *page, unsigned long offset)
+static void btree_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct extent_io_tree *tree;
tree = &BTRFS_I(page->mapping->host)->io_tree;
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 1b319df..0aaec4a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2847,7 +2847,7 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
pg_offset = i_size & (PAGE_CACHE_SIZE - 1);
if (page->index > end_index ||
(page->index == end_index && !pg_offset)) {
- page->mapping->a_ops->invalidatepage(page, 0);
+ page->mapping->a_ops->invalidatepage(page, 0, PAGE_CACHE_SIZE);
unlock_page(page);
return 0;
}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index cc93b23..98cdd03 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6711,7 +6711,8 @@ static int btrfs_releasepage(struct page *page, gfp_t gfp_flags)
return __btrfs_releasepage(page, gfp_flags & GFP_NOFS);
}

-static void btrfs_invalidatepage(struct page *page, unsigned long offset)
+static void btrfs_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct inode *inode = page->mapping->host;
struct extent_io_tree *tree;
diff --git a/fs/buffer.c b/fs/buffer.c
index 7a75c3e..4259151 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1446,7 +1446,8 @@ static void discard_buffer(struct buffer_head * bh)
* block_invalidatepage - invalidate part or all of a buffer-backed page
*
* @page: the page which is affected
- * @offset: the index of the truncation point
+ * @offset: start of the range to invalidate
+ * @length: length of the range to invalidate
*
* block_invalidatepage() is called when all or part of the page has become
* invalidated by a truncate operation.
@@ -1457,15 +1458,22 @@ static void discard_buffer(struct buffer_head * bh)
* point. Because the caller is about to free (and possibly reuse) those
* blocks on-disk.
*/
-void block_invalidatepage(struct page *page, unsigned long offset)
+void block_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct buffer_head *head, *bh, *next;
unsigned int curr_off = 0;
+ unsigned int stop = length + offset;

BUG_ON(!PageLocked(page));
if (!page_has_buffers(page))
goto out;

+ /*
+ * Check for overflow
+ */
+ BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
+
head = page_buffers(page);
bh = head;
do {
@@ -1473,6 +1481,12 @@ void block_invalidatepage(struct page *page, unsigned long offset)
next = bh->b_this_page;

/*
+ * Are we still fully in range ?
+ */
+ if (next_off > stop)
+ goto out;
+
+ /*
* is this block fully invalidated?
*/
if (offset <= curr_off)
@@ -1493,6 +1507,7 @@ out:
}
EXPORT_SYMBOL(block_invalidatepage);

+
/*
* We attach and possibly dirty the buffers atomically wrt
* __set_page_dirty_buffers() via private_lock. try_to_free_buffers
@@ -2833,7 +2848,7 @@ int block_write_full_page_endio(struct page *page, get_block_t *get_block,
* they may have been added in ext3_writepage(). Make them
* freeable here, so the page does not leak.
*/
- do_invalidatepage(page, 0);
+ do_invalidatepage(page, 0, PAGE_CACHE_SIZE);
unlock_page(page);
return 0; /* don't care */
}
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 064d1a6..8953532 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -143,7 +143,8 @@ static int ceph_set_page_dirty(struct page *page)
* dirty page counters appropriately. Only called if there is private
* data on the page.
*/
-static void ceph_invalidatepage(struct page *page, unsigned long offset)
+static void ceph_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct inode *inode;
struct ceph_inode_info *ci;
@@ -168,7 +169,7 @@ static void ceph_invalidatepage(struct page *page, unsigned long offset)

ci = ceph_inode(inode);
if (offset == 0) {
- dout("%p invalidatepage %p idx %lu full dirty page %lu\n",
+ dout("%p invalidatepage %p idx %lu full dirty page %u\n",
inode, page, page->index, offset);
ceph_put_wrbuffer_cap_refs(ci, 1, snapc);
ceph_put_snap_context(snapc);
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 8ea6ca5..8193f84 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -3543,11 +3543,12 @@ static int cifs_release_page(struct page *page, gfp_t gfp)
return cifs_fscache_release_page(page, gfp);
}

-static void cifs_invalidate_page(struct page *page, unsigned long offset)
+static void cifs_invalidate_page(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct cifsInodeInfo *cifsi = CIFS_I(page->mapping->host);

- if (offset == 0)
+ if (offset == 0 && length == PAGE_CACHE_SIZE)
cifs_fscache_invalidate_page(page, &cifsi->vfs_inode);
}

diff --git a/fs/exofs/inode.c b/fs/exofs/inode.c
index d1f80ab..2ec8eb1 100644
--- a/fs/exofs/inode.c
+++ b/fs/exofs/inode.c
@@ -953,9 +953,11 @@ static int exofs_releasepage(struct page *page, gfp_t gfp)
return 0;
}

-static void exofs_invalidatepage(struct page *page, unsigned long offset)
+static void exofs_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
- EXOFS_DBGMSG("page 0x%lx offset 0x%lx\n", page->index, offset);
+ EXOFS_DBGMSG("page 0x%lx offset 0x%x length 0x%x\n",
+ page->index, offset, length);
WARN_ON(1);
}

diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index b176d42..0b8d382 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -1819,7 +1819,8 @@ ext3_readpages(struct file *file, struct address_space *mapping,
return mpage_readpages(mapping, pages, nr_pages, ext3_get_block);
}

-static void ext3_invalidatepage(struct page *page, unsigned long offset)
+static void ext3_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
journal_t *journal = EXT3_JOURNAL(page->mapping->host);

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index cbfe13b..cbf6574 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -131,7 +131,8 @@ static inline int ext4_begin_ordered_truncate(struct inode *inode,
new_size);
}

-static void ext4_invalidatepage(struct page *page, unsigned long offset);
+static void ext4_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length);
static int noalloc_get_block_write(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create);
static int ext4_set_bh_endio(struct buffer_head *bh, struct inode *inode);
@@ -1517,7 +1518,7 @@ static void ext4_da_block_invalidatepages(struct mpage_da_data *mpd)
break;
BUG_ON(!PageLocked(page));
BUG_ON(PageWriteback(page));
- block_invalidatepage(page, 0);
+ block_invalidatepage(page, 0, PAGE_CACHE_SIZE);
ClearPageUptodate(page);
unlock_page(page);
}
@@ -2698,7 +2699,8 @@ static int ext4_da_write_end(struct file *file,
return ret ? ret : copied;
}

-static void ext4_da_invalidatepage(struct page *page, unsigned long offset)
+static void ext4_da_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
/*
* Drop reserved blocks
@@ -2710,7 +2712,7 @@ static void ext4_da_invalidatepage(struct page *page, unsigned long offset)
ext4_da_page_release_reservation(page, offset);

out:
- ext4_invalidatepage(page, offset);
+ ext4_invalidatepage(page, offset, length);

return;
}
@@ -2878,7 +2880,8 @@ static void ext4_invalidatepage_free_endio(struct page *page, unsigned long offs
} while (bh != head);
}

-static void ext4_invalidatepage(struct page *page, unsigned long offset)
+static void ext4_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
trace_ext4_invalidatepage(page, offset);

@@ -2891,7 +2894,7 @@ static void ext4_invalidatepage(struct page *page, unsigned long offset)
/* No journalling happens on data buffers when this function is used */
WARN_ON(page_has_buffers(page) && buffer_jbd(page_buffers(page)));

- block_invalidatepage(page, offset);
+ block_invalidatepage(page, offset, PAGE_CACHE_SIZE - offset);
}

static int __ext4_journalled_invalidatepage(struct page *page,
@@ -2912,7 +2915,8 @@ static int __ext4_journalled_invalidatepage(struct page *page,

/* Wrapper for aops... */
static void ext4_journalled_invalidatepage(struct page *page,
- unsigned long offset)
+ unsigned int offset,
+ unsigned int length)
{
WARN_ON(__ext4_journalled_invalidatepage(page, offset) < 0);
}
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 7bd22a2..bae8b6c 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -667,7 +667,8 @@ static ssize_t f2fs_direct_IO(int rw, struct kiocb *iocb,
get_data_block_ro);
}

-static void f2fs_invalidate_data_page(struct page *page, unsigned long offset)
+static void f2fs_invalidate_data_page(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct inode *inode = page->mapping->host;
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 9bda63c..f099a42 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1170,7 +1170,8 @@ static int f2fs_set_node_page_dirty(struct page *page)
return 0;
}

-static void f2fs_invalidate_node_page(struct page *page, unsigned long offset)
+static void f2fs_invalidate_node_page(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct inode *inode = page->mapping->host;
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 30de4f2..5bd558c 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -109,7 +109,7 @@ static int gfs2_writepage_common(struct page *page,
/* Is the page fully outside i_size? (truncate in progress) */
offset = i_size & (PAGE_CACHE_SIZE-1);
if (page->index > end_index || (page->index == end_index && !offset)) {
- page->mapping->a_ops->invalidatepage(page, 0);
+ page->mapping->a_ops->invalidatepage(page, 0, PAGE_CACHE_SIZE);
goto out;
}
return 1;
@@ -300,7 +300,8 @@ static int gfs2_write_jdata_pagevec(struct address_space *mapping,

/* Is the page fully outside i_size? (truncate in progress) */
if (page->index > end_index || (page->index == end_index && !offset)) {
- page->mapping->a_ops->invalidatepage(page, 0);
+ page->mapping->a_ops->invalidatepage(page, 0,
+ PAGE_CACHE_SIZE);
unlock_page(page);
continue;
}
@@ -944,7 +945,8 @@ static void gfs2_discard(struct gfs2_sbd *sdp, struct buffer_head *bh)
unlock_buffer(bh);
}

-static void gfs2_invalidatepage(struct page *page, unsigned long offset)
+static void gfs2_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct gfs2_sbd *sdp = GFS2_SB(page->mapping->host);
struct buffer_head *bh, *head;
diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index 6740d34..9e3aaff 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@@ -571,9 +571,10 @@ static int metapage_releasepage(struct page *page, gfp_t gfp_mask)
return ret;
}

-static void metapage_invalidatepage(struct page *page, unsigned long offset)
+static void metapage_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
- BUG_ON(offset);
+ BUG_ON(offset || length < PAGE_CACHE_SIZE);

BUG_ON(PageWriteback(page));

diff --git a/fs/logfs/file.c b/fs/logfs/file.c
index 3886cde..c2941df 100644
--- a/fs/logfs/file.c
+++ b/fs/logfs/file.c
@@ -159,7 +159,8 @@ static int logfs_writepage(struct page *page, struct writeback_control *wbc)
return __logfs_writepage(page);
}

-static void logfs_invalidatepage(struct page *page, unsigned long offset)
+static void logfs_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct logfs_block *block = logfs_block(page);

diff --git a/fs/logfs/segment.c b/fs/logfs/segment.c
index 038da09..d448a77 100644
--- a/fs/logfs/segment.c
+++ b/fs/logfs/segment.c
@@ -884,7 +884,8 @@ static struct logfs_area *alloc_area(struct super_block *sb)
return area;
}

-static void map_invalidatepage(struct page *page, unsigned long l)
+static void map_invalidatepage(struct page *page, unsigned int o,
+ unsigned int l)
{
return;
}
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 3c2b893..bbe0b72 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -451,11 +451,13 @@ static int nfs_write_end(struct file *file, struct address_space *mapping,
* - Called if either PG_private or PG_fscache is set on the page
* - Caller holds page lock
*/
-static void nfs_invalidate_page(struct page *page, unsigned long offset)
+static void nfs_invalidate_page(struct page *page, unsigned int offset,
+ unsigned int length)
{
- dfprintk(PAGECACHE, "NFS: invalidate_page(%p, %lu)\n", page, offset);
+ dfprintk(PAGECACHE, "NFS: invalidate_page(%p, %u, %u)\n",
+ page, offset, length);

- if (offset != 0)
+ if (offset != 0 || length < PAGE_CACHE_SIZE)
return;
/* Cancel any unstarted writes on this page */
nfs_wb_page_cancel(page_file_mapping(page)->host, page);
diff --git a/fs/ntfs/aops.c b/fs/ntfs/aops.c
index fa9c05f..d267ea6 100644
--- a/fs/ntfs/aops.c
+++ b/fs/ntfs/aops.c
@@ -1372,7 +1372,7 @@ retry_writepage:
* The page may have dirty, unmapped buffers. Make them
* freeable here, so the page does not leak.
*/
- block_invalidatepage(page, 0);
+ block_invalidatepage(page, 0, PAGE_CACHE_SIZE);
unlock_page(page);
ntfs_debug("Write outside i_size - truncated?");
return 0;
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 6577432..5d77cf2 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -603,7 +603,8 @@ static void ocfs2_dio_end_io(struct kiocb *iocb,
* from ext3. PageChecked() bits have been removed as OCFS2 does not
* do journalled data.
*/
-static void ocfs2_invalidatepage(struct page *page, unsigned long offset)
+static void ocfs2_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
journal_t *journal = OCFS2_SB(page->mapping->host->i_sb)->journal->j_journal;

diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index 95d7680..5ca4fb4 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -2969,7 +2969,8 @@ static int invalidatepage_can_drop(struct inode *inode, struct buffer_head *bh)
}

/* clm -- taken from fs/buffer.c:block_invalidate_page */
-static void reiserfs_invalidatepage(struct page *page, unsigned long offset)
+static void reiserfs_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct buffer_head *head, *bh, *next;
struct inode *inode = page->mapping->host;
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 5bc7781..9ec2356 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1276,13 +1276,14 @@ int ubifs_setattr(struct dentry *dentry, struct iattr *attr)
return err;
}

-static void ubifs_invalidatepage(struct page *page, unsigned long offset)
+static void ubifs_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct inode *inode = page->mapping->host;
struct ubifs_info *c = inode->i_sb->s_fs_info;

ubifs_assert(PagePrivate(page));
- if (offset)
+ if (offset || length < PAGE_CACHE_SIZE)
/* Partial page remains dirty */
return;

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 5f707e5..e426796 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -823,10 +823,11 @@ xfs_cluster_write(
STATIC void
xfs_vm_invalidatepage(
struct page *page,
- unsigned long offset)
+ unsigned int offset,
+ unsigned int length)
{
trace_xfs_invalidatepage(page->mapping->host, page, offset);
- block_invalidatepage(page, offset);
+ block_invalidatepage(page, offset, PAGE_CACHE_SIZE - offset);
}

/*
@@ -890,7 +891,7 @@ next_buffer:

xfs_iunlock(ip, XFS_ILOCK_EXCL);
out_invalidate:
- xfs_vm_invalidatepage(page, 0);
+ xfs_vm_invalidatepage(page, 0, PAGE_CACHE_SIZE);
return;
}

diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 458f497..e80b4e4 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -193,7 +193,8 @@ extern int buffer_heads_over_limit;
* Generic address_space_operations implementations for buffer_head-backed
* address_spaces.
*/
-void block_invalidatepage(struct page *page, unsigned long offset);
+void block_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length);
int block_write_full_page(struct page *page, get_block_t *get_block,
struct writeback_control *wbc);
int block_write_full_page_endio(struct page *page, get_block_t *get_block,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7617ee0..ad432fd 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -364,7 +364,7 @@ struct address_space_operations {

/* Unfortunately this kludge is needed for FIBMAP. Don't use it */
sector_t (*bmap)(struct address_space *, sector_t);
- void (*invalidatepage) (struct page *, unsigned long);
+ void (*invalidatepage) (struct page *, unsigned int, unsigned int);
int (*releasepage) (struct page *, gfp_t);
void (*freepage)(struct page *);
ssize_t (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 66e2f7c..9f4fb36 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1056,7 +1056,8 @@ int get_kernel_page(unsigned long start, int write, struct page **pages);
struct page *get_dump_page(unsigned long addr);

extern int try_to_release_page(struct page * page, gfp_t gfp_mask);
-extern void do_invalidatepage(struct page *page, unsigned long offset);
+extern void do_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length);

int __set_page_dirty_nobuffers(struct page *page);
int __set_page_dirty_no_writeback(struct page *page);
diff --git a/mm/readahead.c b/mm/readahead.c
index 7963f23..3565e53 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -48,7 +48,7 @@ static void read_cache_pages_invalidate_page(struct address_space *mapping,
if (!trylock_page(page))
BUG();
page->mapping = mapping;
- do_invalidatepage(page, 0);
+ do_invalidatepage(page, 0, PAGE_CACHE_SIZE);
page->mapping = NULL;
unlock_page(page);
}
diff --git a/mm/truncate.c b/mm/truncate.c
index c75b736..fdba083 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -26,7 +26,8 @@
/**
* do_invalidatepage - invalidate part or all of a page
* @page: the page which is affected
- * @offset: the index of the truncation point
+ * @offset: start of the range to invalidate
+ * @length: length of the range to invalidate
*
* do_invalidatepage() is called when all or part of the page has become
* invalidated by a truncate operation.
@@ -37,16 +38,18 @@
* point. Because the caller is about to free (and possibly reuse) those
* blocks on-disk.
*/
-void do_invalidatepage(struct page *page, unsigned long offset)
+void do_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
- void (*invalidatepage)(struct page *, unsigned long);
+ void (*invalidatepage)(struct page *, unsigned int, unsigned int);
+
invalidatepage = page->mapping->a_ops->invalidatepage;
#ifdef CONFIG_BLOCK
if (!invalidatepage)
invalidatepage = block_invalidatepage;
#endif
if (invalidatepage)
- (*invalidatepage)(page, offset);
+ (*invalidatepage)(page, offset, length);
}

static inline void truncate_partial_page(struct page *page, unsigned partial)
@@ -54,7 +57,7 @@ static inline void truncate_partial_page(struct page *page, unsigned partial)
zero_user_segment(page, partial, PAGE_CACHE_SIZE);
cleancache_invalidate_page(page->mapping, page);
if (page_has_private(page))
- do_invalidatepage(page, partial);
+ do_invalidatepage(page, partial, PAGE_CACHE_SIZE - partial);
}

/*
@@ -103,7 +106,7 @@ truncate_complete_page(struct address_space *mapping, struct page *page)
return -EIO;

if (page_has_private(page))
- do_invalidatepage(page, 0);
+ do_invalidatepage(page, 0, PAGE_CACHE_SIZE);

cancel_dirty_page(page, PAGE_CACHE_SIZE);

--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-02-05 09:11:56

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v2 03/18] ext4: use ->invalidatepage() length argument

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in all ext4 invalidatepage routines.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/inode.c | 47 ++++++++++++++++++++++++++++++------------
include/trace/events/ext4.h | 22 +++++++++++---------
2 files changed, 45 insertions(+), 24 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index a3be92c..eb61fb7 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1289,21 +1289,28 @@ static void ext4_da_release_space(struct inode *inode, int to_free)
}

static void ext4_da_page_release_reservation(struct page *page,
- unsigned long offset)
+ unsigned int offset,
+ unsigned int length)
{
int to_release = 0;
struct buffer_head *head, *bh;
unsigned int curr_off = 0;
struct inode *inode = page->mapping->host;
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+ unsigned int stop = offset + length;
int num_clusters;
ext4_fsblk_t lblk;

+ BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
+
head = page_buffers(page);
bh = head;
do {
unsigned int next_off = curr_off + bh->b_size;

+ if (next_off > stop)
+ break;
+
if ((offset <= curr_off) && (buffer_delay(bh))) {
to_release++;
clear_buffer_delay(bh);
@@ -2709,7 +2716,7 @@ static void ext4_da_invalidatepage(struct page *page, unsigned int offset,
if (!page_has_buffers(page))
goto out;

- ext4_da_page_release_reservation(page, offset);
+ ext4_da_page_release_reservation(page, offset, length);

out:
ext4_invalidatepage(page, offset, length);
@@ -2860,22 +2867,33 @@ ext4_readpages(struct file *file, struct address_space *mapping,
return mpage_readpages(mapping, pages, nr_pages, ext4_get_block);
}

-static void ext4_invalidatepage_free_endio(struct page *page, unsigned long offset)
+static void ext4_invalidatepage_free_endio(struct page *page,
+ unsigned int offset,
+ unsigned int length)
{
struct buffer_head *head, *bh;
unsigned int curr_off = 0;
+ unsigned int stop = offset + length;

if (!page_has_buffers(page))
return;
+
+ BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
+
head = bh = page_buffers(page);
do {
+ unsigned int next_off = curr_off + bh->b_size;
+
+ if (next_off > stop)
+ return;
+
if (offset <= curr_off && test_clear_buffer_uninit(bh)
&& bh->b_private) {
ext4_free_io_end(bh->b_private);
bh->b_private = NULL;
bh->b_end_io = NULL;
}
- curr_off = curr_off + bh->b_size;
+ curr_off = next_off;
bh = bh->b_this_page;
} while (bh != head);
}
@@ -2883,35 +2901,35 @@ static void ext4_invalidatepage_free_endio(struct page *page, unsigned long offs
static void ext4_invalidatepage(struct page *page, unsigned int offset,
unsigned int length)
{
- trace_ext4_invalidatepage(page, offset);
+ trace_ext4_invalidatepage(page, offset, length);

/*
* free any io_end structure allocated for buffers to be discarded
*/
if (ext4_should_dioread_nolock(page->mapping->host))
- ext4_invalidatepage_free_endio(page, offset);
+ ext4_invalidatepage_free_endio(page, offset, length);

/* No journalling happens on data buffers when this function is used */
WARN_ON(page_has_buffers(page) && buffer_jbd(page_buffers(page)));

- block_invalidatepage(page, offset, PAGE_CACHE_SIZE - offset);
+ block_invalidatepage(page, offset, length);
}

static int __ext4_journalled_invalidatepage(struct page *page,
- unsigned long offset)
+ unsigned int offset,
+ unsigned int length)
{
journal_t *journal = EXT4_JOURNAL(page->mapping->host);

- trace_ext4_journalled_invalidatepage(page, offset);
+ trace_ext4_journalled_invalidatepage(page, offset, length);

/*
* If it's a full truncate we just forget about the pending dirtying
*/
- if (offset == 0)
+ if (offset == 0 && length == PAGE_CACHE_SIZE)
ClearPageChecked(page);

- return jbd2_journal_invalidatepage(journal, page, offset,
- PAGE_CACHE_SIZE - offset);
+ return jbd2_journal_invalidatepage(journal, page, offset, length);
}

/* Wrapper for aops... */
@@ -2919,7 +2937,7 @@ static void ext4_journalled_invalidatepage(struct page *page,
unsigned int offset,
unsigned int length)
{
- WARN_ON(__ext4_journalled_invalidatepage(page, offset) < 0);
+ WARN_ON(__ext4_journalled_invalidatepage(page, offset, length) < 0);
}

static int ext4_releasepage(struct page *page, gfp_t wait)
@@ -4351,7 +4369,8 @@ static void ext4_wait_for_tail_page_commit(struct inode *inode)
inode->i_size >> PAGE_CACHE_SHIFT);
if (!page)
return;
- ret = __ext4_journalled_invalidatepage(page, offset);
+ ret = __ext4_journalled_invalidatepage(page, offset,
+ PAGE_CACHE_SIZE - offset);
unlock_page(page);
page_cache_release(page);
if (ret != -EBUSY)
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 7e8c36b..6ab6f8a 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -452,16 +452,16 @@ DEFINE_EVENT(ext4__page_op, ext4_releasepage,
);

DECLARE_EVENT_CLASS(ext4_invalidatepage_op,
- TP_PROTO(struct page *page, unsigned long offset),
+ TP_PROTO(struct page *page, unsigned int offset, unsigned int length),

- TP_ARGS(page, offset),
+ TP_ARGS(page, offset, length),

TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
__field( pgoff_t, index )
- __field( unsigned long, offset )
-
+ __field( unsigned int, offset )
+ __field( unsigned int, length )
),

TP_fast_assign(
@@ -469,24 +469,26 @@ DECLARE_EVENT_CLASS(ext4_invalidatepage_op,
__entry->ino = page->mapping->host->i_ino;
__entry->index = page->index;
__entry->offset = offset;
+ __entry->length = length;
),

- TP_printk("dev %d,%d ino %lu page_index %lu offset %lu",
+ TP_printk("dev %d,%d ino %lu page_index %lu offset %u length %u",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
- (unsigned long) __entry->index, __entry->offset)
+ (unsigned long) __entry->index,
+ __entry->offset, __entry->length)
);

DEFINE_EVENT(ext4_invalidatepage_op, ext4_invalidatepage,
- TP_PROTO(struct page *page, unsigned long offset),
+ TP_PROTO(struct page *page, unsigned int offset, unsigned int length),

- TP_ARGS(page, offset)
+ TP_ARGS(page, offset, length)
);

DEFINE_EVENT(ext4_invalidatepage_op, ext4_journalled_invalidatepage,
- TP_PROTO(struct page *page, unsigned long offset),
+ TP_PROTO(struct page *page, unsigned int offset, unsigned int length),

- TP_ARGS(page, offset)
+ TP_ARGS(page, offset, length)
);

TRACE_EVENT(ext4_discard_blocks,
--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-02-05 09:11:57

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v2 04/18] jbd: change journal_invalidatepage() to accept length

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in journal_invalidatepage() and all the users in ext3 file
system. Also update ext3 trace point to print out length argument.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext3/inode.c | 6 +++---
fs/jbd/transaction.c | 19 ++++++++++++++-----
include/linux/jbd.h | 2 +-
include/trace/events/ext3.h | 12 +++++++-----
4 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index 0b8d382..88404fd 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -1824,15 +1824,15 @@ static void ext3_invalidatepage(struct page *page, unsigned int offset,
{
journal_t *journal = EXT3_JOURNAL(page->mapping->host);

- trace_ext3_invalidatepage(page, offset);
+ trace_ext3_invalidatepage(page, offset, length);

/*
* If it's a full truncate we just forget about the pending dirtying
*/
- if (offset == 0)
+ if (offset == 0 && length == PAGE_CACHE_SIZE)
ClearPageChecked(page);

- journal_invalidatepage(journal, page, offset);
+ journal_invalidatepage(journal, page, offset, length);
}

static int ext3_releasepage(struct page *page, gfp_t wait)
diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index 071d690..a1fef89 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -2020,16 +2020,20 @@ zap_buffer_unlocked:
* void journal_invalidatepage() - invalidate a journal page
* @journal: journal to use for flush
* @page: page to flush
- * @offset: length of page to invalidate.
+ * @offset: offset of the range to invalidate
+ * @length: length of the range to invalidate
*
- * Reap page buffers containing data after offset in page.
+ * Reap page buffers containing data in specified range in page.
*/
void journal_invalidatepage(journal_t *journal,
struct page *page,
- unsigned long offset)
+ unsigned int offset,
+ unsigned int length)
{
struct buffer_head *head, *bh, *next;
+ unsigned int stop = offset + length;
unsigned int curr_off = 0;
+ int partial_page = (offset || length < PAGE_CACHE_SIZE);
int may_free = 1;

if (!PageLocked(page))
@@ -2037,6 +2041,8 @@ void journal_invalidatepage(journal_t *journal,
if (!page_has_buffers(page))
return;

+ BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
+
/* We will potentially be playing with lists other than just the
* data lists (especially for journaled data mode), so be
* cautious in our locking. */
@@ -2046,11 +2052,14 @@ void journal_invalidatepage(journal_t *journal,
unsigned int next_off = curr_off + bh->b_size;
next = bh->b_this_page;

+ if (next_off > stop)
+ return;
+
if (offset <= curr_off) {
/* This block is wholly outside the truncation point */
lock_buffer(bh);
may_free &= journal_unmap_buffer(journal, bh,
- offset > 0);
+ partial_page);
unlock_buffer(bh);
}
curr_off = next_off;
@@ -2058,7 +2067,7 @@ void journal_invalidatepage(journal_t *journal,

} while (bh != head);

- if (!offset) {
+ if (!partial_page) {
if (may_free && try_to_free_buffers(page))
J_ASSERT(!page_has_buffers(page));
}
diff --git a/include/linux/jbd.h b/include/linux/jbd.h
index c8f3297..d02e16c 100644
--- a/include/linux/jbd.h
+++ b/include/linux/jbd.h
@@ -840,7 +840,7 @@ extern void journal_release_buffer (handle_t *, struct buffer_head *);
extern int journal_forget (handle_t *, struct buffer_head *);
extern void journal_sync_buffer (struct buffer_head *);
extern void journal_invalidatepage(journal_t *,
- struct page *, unsigned long);
+ struct page *, unsigned int, unsigned int);
extern int journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
extern int journal_stop(handle_t *);
extern int journal_flush (journal_t *);
diff --git a/include/trace/events/ext3.h b/include/trace/events/ext3.h
index 15d11a3..6797b9d 100644
--- a/include/trace/events/ext3.h
+++ b/include/trace/events/ext3.h
@@ -290,13 +290,14 @@ DEFINE_EVENT(ext3__page_op, ext3_releasepage,
);

TRACE_EVENT(ext3_invalidatepage,
- TP_PROTO(struct page *page, unsigned long offset),
+ TP_PROTO(struct page *page, unsigned int offset, unsigned int length),

- TP_ARGS(page, offset),
+ TP_ARGS(page, offset, length),

TP_STRUCT__entry(
__field( pgoff_t, index )
- __field( unsigned long, offset )
+ __field( unsigned int, offset )
+ __field( unsigned int, length )
__field( ino_t, ino )
__field( dev_t, dev )

@@ -305,14 +306,15 @@ TRACE_EVENT(ext3_invalidatepage,
TP_fast_assign(
__entry->index = page->index;
__entry->offset = offset;
+ __entry->length = length;
__entry->ino = page->mapping->host->i_ino;
__entry->dev = page->mapping->host->i_sb->s_dev;
),

- TP_printk("dev %d,%d ino %lu page_index %lu offset %lu",
+ TP_printk("dev %d,%d ino %lu page_index %lu offset %u length %u",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
- __entry->index, __entry->offset)
+ __entry->index, __entry->offset, __entry->length)
);

TRACE_EVENT(ext3_discard_blocks,
--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-02-05 09:12:00

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v2 07/18] ceph: use ->invalidatepage() length argument

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in ceph_invalidatepage().

Signed-off-by: Lukas Czerner <[email protected]>
Cc: [email protected]
---
fs/ceph/addr.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 8953532..1e09410 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -164,20 +164,20 @@ static void ceph_invalidatepage(struct page *page, unsigned int offset,
if (!PageDirty(page))
pr_err("%p invalidatepage %p page not dirty\n", inode, page);

- if (offset == 0)
+ if (offset == 0 && length == PAGE_CACHE_SIZE)
ClearPageChecked(page);

ci = ceph_inode(inode);
- if (offset == 0) {
- dout("%p invalidatepage %p idx %lu full dirty page %u\n",
- inode, page, page->index, offset);
+ if (offset == 0 && length == PAGE_CACHE_SIZE) {
+ dout("%p invalidatepage %p idx %lu full dirty page\n",
+ inode, page, page->index);
ceph_put_wrbuffer_cap_refs(ci, 1, snapc);
ceph_put_snap_context(snapc);
page->private = 0;
ClearPagePrivate(page);
} else {
- dout("%p invalidatepage %p idx %lu partial dirty page\n",
- inode, page, page->index);
+ dout("%p invalidatepage %p idx %lu partial dirty page %u(%u)\n",
+ inode, page, page->index, offset, length);
}
}

--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-02-05 09:11:58

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v2 05/18] xfs: use ->invalidatepage() length argument

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in xfs_vm_invalidatepage()

Signed-off-by: Lukas Czerner <[email protected]>
Cc: [email protected]
---
fs/xfs/xfs_aops.c | 5 +++--
fs/xfs/xfs_trace.h | 41 ++++++++++++++++++++++++++++++++++++++++-
2 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index e426796..e8018d3 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -826,8 +826,9 @@ xfs_vm_invalidatepage(
unsigned int offset,
unsigned int length)
{
- trace_xfs_invalidatepage(page->mapping->host, page, offset);
- block_invalidatepage(page, offset, PAGE_CACHE_SIZE - offset);
+ trace_xfs_invalidatepage(page->mapping->host, page, offset,
+ length);
+ block_invalidatepage(page, offset, length);
}

/*
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 16a8129..91d6434 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -991,7 +991,46 @@ DEFINE_EVENT(xfs_page_class, name, \
TP_ARGS(inode, page, off))
DEFINE_PAGE_EVENT(xfs_writepage);
DEFINE_PAGE_EVENT(xfs_releasepage);
-DEFINE_PAGE_EVENT(xfs_invalidatepage);
+
+TRACE_EVENT(xfs_invalidatepage,
+ TP_PROTO(struct inode *inode, struct page *page, unsigned int off,
+ unsigned int len),
+ TP_ARGS(inode, page, off, len),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_ino_t, ino)
+ __field(pgoff_t, pgoff)
+ __field(loff_t, size)
+ __field(unsigned int, offset)
+ __field(unsigned int, length)
+ __field(int, delalloc)
+ __field(int, unwritten)
+ ),
+ TP_fast_assign(
+ int delalloc = -1, unwritten = -1;
+
+ if (page_has_buffers(page))
+ xfs_count_page_state(page, &delalloc, &unwritten);
+ __entry->dev = inode->i_sb->s_dev;
+ __entry->ino = XFS_I(inode)->i_ino;
+ __entry->pgoff = page_offset(page);
+ __entry->size = i_size_read(inode);
+ __entry->offset = off;
+ __entry->length = len;
+ __entry->delalloc = delalloc;
+ __entry->unwritten = unwritten;
+ ),
+ TP_printk("dev %d:%d ino 0x%llx pgoff 0x%lx size 0x%llx offset %x "
+ "length %x delalloc %d unwritten %d",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->ino,
+ __entry->pgoff,
+ __entry->size,
+ __entry->offset,
+ __entry->length,
+ __entry->delalloc,
+ __entry->unwritten)
+)

DECLARE_EVENT_CLASS(xfs_imap_class,
TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count,
--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-02-05 09:12:01

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v2 08/18] gfs2: use ->invalidatepage() length argument

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in gfs2_invalidatepage().

Signed-off-by: Lukas Czerner <[email protected]>
Cc: [email protected]
---
fs/gfs2/aops.c | 9 +++++++--
1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 5bd558c..3cf3dc8 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -949,24 +949,29 @@ static void gfs2_invalidatepage(struct page *page, unsigned int offset,
unsigned int length)
{
struct gfs2_sbd *sdp = GFS2_SB(page->mapping->host);
+ unsigned int stop = offset + length;
+ int partial_page = (offset || length < PAGE_CACHE_SIZE);
struct buffer_head *bh, *head;
unsigned long pos = 0;

BUG_ON(!PageLocked(page));
- if (offset == 0)
+ if (!partial_page)
ClearPageChecked(page);
if (!page_has_buffers(page))
goto out;

bh = head = page_buffers(page);
do {
+ if (pos + bh->b_size > stop)
+ return;
+
if (offset <= pos)
gfs2_discard(sdp, bh);
pos += bh->b_size;
bh = bh->b_this_page;
} while (bh != head);
out:
- if (offset == 0)
+ if (!partial_page)
try_to_release_page(page, 0);
}

--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-02-05 09:12:02

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v2 09/18] reiserfs: use ->invalidatepage() length argument

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in reiserfs_invalidatepage()

Signed-off-by: Lukas Czerner <[email protected]>
Cc: [email protected]
---
fs/reiserfs/inode.c | 9 +++++++--
1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index 5ca4fb4..cf949b9 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -2975,11 +2975,13 @@ static void reiserfs_invalidatepage(struct page *page, unsigned int offset,
struct buffer_head *head, *bh, *next;
struct inode *inode = page->mapping->host;
unsigned int curr_off = 0;
+ unsigned int stop = offset + length;
+ int partial_page = (offset || length < PAGE_CACHE_SIZE);
int ret = 1;

BUG_ON(!PageLocked(page));

- if (offset == 0)
+ if (!partial_page)
ClearPageChecked(page);

if (!page_has_buffers(page))
@@ -2991,6 +2993,9 @@ static void reiserfs_invalidatepage(struct page *page, unsigned int offset,
unsigned int next_off = curr_off + bh->b_size;
next = bh->b_this_page;

+ if (next_off > stop)
+ goto out;
+
/*
* is this block fully invalidated?
*/
@@ -3009,7 +3014,7 @@ static void reiserfs_invalidatepage(struct page *page, unsigned int offset,
* The get_block cached value has been unconditionally invalidated,
* so real IO is not possible anymore.
*/
- if (!offset && ret) {
+ if (!partial_page && ret) {
ret = try_to_release_page(page, 0);
/* maybe should BUG_ON(!ret); - neilb */
}
--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-02-05 09:11:59

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v2 06/18] ocfs2: use ->invalidatepage() length argument

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in ocfs2_invalidatepage().

Signed-off-by: Lukas Czerner <[email protected]>
Cc: Joel Becker <[email protected]>
---
fs/ocfs2/aops.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 1393114..d8c0076 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -608,8 +608,7 @@ static void ocfs2_invalidatepage(struct page *page, unsigned int offset,
{
journal_t *journal = OCFS2_SB(page->mapping->host->i_sb)->journal->j_journal;

- jbd2_journal_invalidatepage(journal, page, offset,
- PAGE_CACHE_SIZE - offset);
+ jbd2_journal_invalidatepage(journal, page, offset, length);
}

static int ocfs2_releasepage(struct page *page, gfp_t wait)
--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-02-05 09:12:04

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v2 11/18] Revert "ext4: remove no longer used functions in inode.c"

This reverts commit ccb4d7af914e0fe9b2f1022f8ea6c300463fd5e6.

This commit reintroduces functions ext4_block_truncate_page() and
ext4_block_zero_page_range() which has been previously removed in favour
of ext4_discard_partial_page_buffers().

In future commits we want to reintroduce those function and remove
ext4_discard_partial_page_buffers() since it is duplicating some code
and also partially duplicating work of truncate_pagecache_range(),
moreover the old implementation was much clearer.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/ext4.h | 4 ++
fs/ext4/inode.c | 120 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 124 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 8462eb3..4246a55 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2086,6 +2086,10 @@ extern int ext4_alloc_da_blocks(struct inode *inode);
extern void ext4_set_aops(struct inode *inode);
extern int ext4_writepage_trans_blocks(struct inode *);
extern int ext4_chunk_trans_blocks(struct inode *, int nrblocks);
+extern int ext4_block_truncate_page(handle_t *handle,
+ struct address_space *mapping, loff_t from);
+extern int ext4_block_zero_page_range(handle_t *handle,
+ struct address_space *mapping, loff_t from, loff_t length);
extern int ext4_discard_partial_page_buffers(handle_t *handle,
struct address_space *mapping, loff_t from,
loff_t length, int flags);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index eb61fb7..5ccf556 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3552,6 +3552,126 @@ next:
return err;
}

+/*
+ * ext4_block_truncate_page() zeroes out a mapping from file offset `from'
+ * up to the end of the block which corresponds to `from'.
+ * This required during truncate. We need to physically zero the tail end
+ * of that block so it doesn't yield old data if the file is later grown.
+ */
+int ext4_block_truncate_page(handle_t *handle,
+ struct address_space *mapping, loff_t from)
+{
+ unsigned offset = from & (PAGE_CACHE_SIZE-1);
+ unsigned length;
+ unsigned blocksize;
+ struct inode *inode = mapping->host;
+
+ blocksize = inode->i_sb->s_blocksize;
+ length = blocksize - (offset & (blocksize - 1));
+
+ return ext4_block_zero_page_range(handle, mapping, from, length);
+}
+
+/*
+ * ext4_block_zero_page_range() zeros out a mapping of length 'length'
+ * starting from file offset 'from'. The range to be zero'd must
+ * be contained with in one block. If the specified range exceeds
+ * the end of the block it will be shortened to end of the block
+ * that cooresponds to 'from'
+ */
+int ext4_block_zero_page_range(handle_t *handle,
+ struct address_space *mapping, loff_t from, loff_t length)
+{
+ ext4_fsblk_t index = from >> PAGE_CACHE_SHIFT;
+ unsigned offset = from & (PAGE_CACHE_SIZE-1);
+ unsigned blocksize, max, pos;
+ ext4_lblk_t iblock;
+ struct inode *inode = mapping->host;
+ struct buffer_head *bh;
+ struct page *page;
+ int err = 0;
+
+ page = find_or_create_page(mapping, from >> PAGE_CACHE_SHIFT,
+ mapping_gfp_mask(mapping) & ~__GFP_FS);
+ if (!page)
+ return -ENOMEM;
+
+ blocksize = inode->i_sb->s_blocksize;
+ max = blocksize - (offset & (blocksize - 1));
+
+ /*
+ * correct length if it does not fall between
+ * 'from' and the end of the block
+ */
+ if (length > max || length < 0)
+ length = max;
+
+ iblock = index << (PAGE_CACHE_SHIFT - inode->i_sb->s_blocksize_bits);
+
+ if (!page_has_buffers(page))
+ create_empty_buffers(page, blocksize, 0);
+
+ /* Find the buffer that contains "offset" */
+ bh = page_buffers(page);
+ pos = blocksize;
+ while (offset >= pos) {
+ bh = bh->b_this_page;
+ iblock++;
+ pos += blocksize;
+ }
+
+ err = 0;
+ if (buffer_freed(bh)) {
+ BUFFER_TRACE(bh, "freed: skip");
+ goto unlock;
+ }
+
+ if (!buffer_mapped(bh)) {
+ BUFFER_TRACE(bh, "unmapped");
+ ext4_get_block(inode, iblock, bh, 0);
+ /* unmapped? It's a hole - nothing to do */
+ if (!buffer_mapped(bh)) {
+ BUFFER_TRACE(bh, "still unmapped");
+ goto unlock;
+ }
+ }
+
+ /* Ok, it's mapped. Make sure it's up-to-date */
+ if (PageUptodate(page))
+ set_buffer_uptodate(bh);
+
+ if (!buffer_uptodate(bh)) {
+ err = -EIO;
+ ll_rw_block(READ, 1, &bh);
+ wait_on_buffer(bh);
+ /* Uhhuh. Read error. Complain and punt. */
+ if (!buffer_uptodate(bh))
+ goto unlock;
+ }
+
+ if (ext4_should_journal_data(inode)) {
+ BUFFER_TRACE(bh, "get write access");
+ err = ext4_journal_get_write_access(handle, bh);
+ if (err)
+ goto unlock;
+ }
+
+ zero_user(page, offset, length);
+
+ BUFFER_TRACE(bh, "zeroed end of block");
+
+ err = 0;
+ if (ext4_should_journal_data(inode)) {
+ err = ext4_handle_dirty_metadata(handle, inode, bh);
+ } else
+ mark_buffer_dirty(bh);
+
+unlock:
+ unlock_page(page);
+ page_cache_release(page);
+ return err;
+}
+
int ext4_can_truncate(struct inode *inode)
{
if (S_ISREG(inode->i_mode))
--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-02-05 09:12:03

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v2 10/18] mm: teach truncate_inode_pages_range() to handle non page aligned ranges

This commit changes truncate_inode_pages_range() so it can handle non
page aligned regions of the truncate. Currently we can hit BUG_ON when
the end of the range is not page aligned, but we can handle unaligned
start of the range.

Being able to handle non page aligned regions of the page can help file
system punch_hole implementations and save some work, because once we're
holding the page we might as well deal with it right away.

In previous commits we've changed ->invalidatepage() prototype to accept
'length' argument to be able to specify range to invalidate. No we can
use that new ability in truncate_inode_pages_range().

Signed-off-by: Lukas Czerner <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Hugh Dickins <[email protected]>
---
mm/truncate.c | 104 ++++++++++++++++++++++++++++++++++++++++-----------------
1 files changed, 73 insertions(+), 31 deletions(-)

diff --git a/mm/truncate.c b/mm/truncate.c
index fdba083..e2e8a8a 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -52,14 +52,6 @@ void do_invalidatepage(struct page *page, unsigned int offset,
(*invalidatepage)(page, offset, length);
}

-static inline void truncate_partial_page(struct page *page, unsigned partial)
-{
- zero_user_segment(page, partial, PAGE_CACHE_SIZE);
- cleancache_invalidate_page(page->mapping, page);
- if (page_has_private(page))
- do_invalidatepage(page, partial, PAGE_CACHE_SIZE - partial);
-}
-
/*
* This cancels just the dirty bit on the kernel page itself, it
* does NOT actually remove dirty bits on any mmap's that may be
@@ -188,11 +180,11 @@ int invalidate_inode_page(struct page *page)
* truncate_inode_pages_range - truncate range of pages specified by start & end byte offsets
* @mapping: mapping to truncate
* @lstart: offset from which to truncate
- * @lend: offset to which to truncate
+ * @lend: offset to which to truncate (inclusive)
*
* Truncate the page cache, removing the pages that are between
- * specified offsets (and zeroing out partial page
- * (if lstart is not page aligned)).
+ * specified offsets (and zeroing out partial pages
+ * if lstart or lend + 1 is not page aligned).
*
* Truncate takes two passes - the first pass is nonblocking. It will not
* block on page locks and it will not block on writeback. The second pass
@@ -203,35 +195,58 @@ int invalidate_inode_page(struct page *page)
* We pass down the cache-hot hint to the page freeing code. Even if the
* mapping is large, it is probably the case that the final pages are the most
* recently touched, and freeing happens in ascending file offset order.
+ *
+ * Note that since ->invalidatepage() accepts range to invalidate
+ * truncate_inode_pages_range is able to handle cases where lend + 1 is not
+ * page aligned properly.
*/
void truncate_inode_pages_range(struct address_space *mapping,
loff_t lstart, loff_t lend)
{
- const pgoff_t start = (lstart + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
- const unsigned partial = lstart & (PAGE_CACHE_SIZE - 1);
- struct pagevec pvec;
- pgoff_t index;
- pgoff_t end;
- int i;
+ pgoff_t start; /* inclusive */
+ pgoff_t end; /* exclusive */
+ unsigned int partial_start; /* inclusive */
+ unsigned int partial_end; /* exclusive */
+ struct pagevec pvec;
+ pgoff_t index;
+ int i;

cleancache_invalidate_inode(mapping);
if (mapping->nrpages == 0)
return;

- BUG_ON((lend & (PAGE_CACHE_SIZE - 1)) != (PAGE_CACHE_SIZE - 1));
- end = (lend >> PAGE_CACHE_SHIFT);
+ /* Offsets within partial pages */
+ partial_start = lstart & (PAGE_CACHE_SIZE - 1);
+ partial_end = (lend + 1) & (PAGE_CACHE_SIZE - 1);
+
+ /*
+ * 'start' and 'end' always covers the range of pages to be fully
+ * truncated. Partial pages are covered with 'partial_start' at the
+ * start of the range and 'partial_end' at the end of the range.
+ * Note that 'end' is exclusive while 'lend' is inclusive.
+ */
+ start = (lstart + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
+ if (lend == -1)
+ /*
+ * lend == -1 indicates end-of-file so we have to set 'end'
+ * to the highest possible pgoff_t and since the type is
+ * unsigned we're using -1.
+ */
+ end = -1;
+ else
+ end = (lend + 1) >> PAGE_CACHE_SHIFT;

pagevec_init(&pvec, 0);
index = start;
- while (index <= end && pagevec_lookup(&pvec, mapping, index,
- min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
+ while (index < end && pagevec_lookup(&pvec, mapping, index,
+ min(end - index, (pgoff_t)PAGEVEC_SIZE))) {
mem_cgroup_uncharge_start();
for (i = 0; i < pagevec_count(&pvec); i++) {
struct page *page = pvec.pages[i];

/* We rely upon deletion not changing page->index */
index = page->index;
- if (index > end)
+ if (index >= end)
break;

if (!trylock_page(page))
@@ -250,27 +265,56 @@ void truncate_inode_pages_range(struct address_space *mapping,
index++;
}

- if (partial) {
+ if (partial_start) {
struct page *page = find_lock_page(mapping, start - 1);
if (page) {
+ unsigned int top = PAGE_CACHE_SIZE;
+ if (start > end) {
+ /* Truncation within a single page */
+ top = partial_end;
+ partial_end = 0;
+ }
wait_on_page_writeback(page);
- truncate_partial_page(page, partial);
+ zero_user_segment(page, partial_start, top);
+ cleancache_invalidate_page(mapping, page);
+ if (page_has_private(page))
+ do_invalidatepage(page, partial_start,
+ top - partial_start);
unlock_page(page);
page_cache_release(page);
}
}
+ if (partial_end) {
+ struct page *page = find_lock_page(mapping, end);
+ if (page) {
+ wait_on_page_writeback(page);
+ zero_user_segment(page, 0, partial_end);
+ cleancache_invalidate_page(mapping, page);
+ if (page_has_private(page))
+ do_invalidatepage(page, 0,
+ partial_end);
+ unlock_page(page);
+ page_cache_release(page);
+ }
+ }
+ /*
+ * If the truncation happened within a single page no pages
+ * will be released, just zeroed, so we can bail out now.
+ */
+ if (start >= end)
+ return;

index = start;
for ( ; ; ) {
cond_resched();
if (!pagevec_lookup(&pvec, mapping, index,
- min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
+ min(end - index, (pgoff_t)PAGEVEC_SIZE))) {
if (index == start)
break;
index = start;
continue;
}
- if (index == start && pvec.pages[0]->index > end) {
+ if (index == start && pvec.pages[0]->index >= end) {
pagevec_release(&pvec);
break;
}
@@ -280,7 +324,7 @@ void truncate_inode_pages_range(struct address_space *mapping,

/* We rely upon deletion not changing page->index */
index = page->index;
- if (index > end)
+ if (index >= end)
break;

lock_page(page);
@@ -601,10 +645,8 @@ void truncate_pagecache_range(struct inode *inode, loff_t lstart, loff_t lend)
* This rounding is currently just for example: unmap_mapping_range
* expands its hole outwards, whereas we want it to contract the hole
* inwards. However, existing callers of truncate_pagecache_range are
- * doing their own page rounding first; and truncate_inode_pages_range
- * currently BUGs if lend is not pagealigned-1 (it handles partial
- * page at start of hole, but not partial page at end of hole). Note
- * unmap_mapping_range allows holelen 0 for all, and we allow lend -1.
+ * doing their own page rounding first. Note that unmap_mapping_range
+ * allows holelen 0 for all, and we allow lend -1 for end of file.
*/

/*
--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-02-05 09:12:05

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v2 12/18] Revert "ext4: fix fsx truncate failure"

This reverts commit 189e868fa8fdca702eb9db9d8afc46b5cb9144c9.

This commit reintroduces the use of ext4_block_truncate_page() in ext4
truncate operation instead of ext4_discard_partial_page_buffers().

The statement in the commit description that the truncate operation only
zero block unaligned portion of the last page is not exactly right,
since truncate_pagecache_range() also zeroes and invalidate the unaligned
portion of the page. Then there is no need to zero and unmap it once more
and ext4_block_truncate_page() was doing the right job, although we
still need to update the buffer head containing the last block, which is
exactly what ext4_block_truncate_page() is doing.

Moreover the problem described in the commit is fixed more properly with
commit

15291164b22a357cb211b618adfef4fa82fc0de3
jbd2: clear BH_Delay & BH_Unwritten in journal_unmap_buffer

This was tested on ppc64 machine with block size of 1024 bytes without
any problems.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/extents.c | 13 ++-----------
fs/ext4/indirect.c | 13 ++-----------
2 files changed, 4 insertions(+), 22 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 5ae1674..5ce5a14 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4271,7 +4271,6 @@ void ext4_ext_truncate(struct inode *inode)
struct super_block *sb = inode->i_sb;
ext4_lblk_t last_block;
handle_t *handle;
- loff_t page_len;
int err = 0;

/*
@@ -4288,16 +4287,8 @@ void ext4_ext_truncate(struct inode *inode)
if (IS_ERR(handle))
return;

- if (inode->i_size % PAGE_CACHE_SIZE != 0) {
- page_len = PAGE_CACHE_SIZE -
- (inode->i_size & (PAGE_CACHE_SIZE - 1));
-
- err = ext4_discard_partial_page_buffers(handle,
- mapping, inode->i_size, page_len, 0);
-
- if (err)
- goto out_stop;
- }
+ if (inode->i_size & (sb->s_blocksize - 1))
+ ext4_block_truncate_page(handle, mapping, inode->i_size);

if (ext4_orphan_add(handle, inode))
goto out_stop;
diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index 20862f9..48d10b2 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -1363,9 +1363,7 @@ void ext4_ind_truncate(struct inode *inode)
__le32 nr = 0;
int n = 0;
ext4_lblk_t last_block, max_block;
- loff_t page_len;
unsigned blocksize = inode->i_sb->s_blocksize;
- int err;

handle = start_transaction(inode);
if (IS_ERR(handle))
@@ -1376,16 +1374,9 @@ void ext4_ind_truncate(struct inode *inode)
max_block = (EXT4_SB(inode->i_sb)->s_bitmap_maxbytes + blocksize-1)
>> EXT4_BLOCK_SIZE_BITS(inode->i_sb);

- if (inode->i_size % PAGE_CACHE_SIZE != 0) {
- page_len = PAGE_CACHE_SIZE -
- (inode->i_size & (PAGE_CACHE_SIZE - 1));
-
- err = ext4_discard_partial_page_buffers(handle,
- mapping, inode->i_size, page_len, 0);
-
- if (err)
+ if (inode->i_size & (blocksize - 1))
+ if (ext4_block_truncate_page(handle, mapping, inode->i_size))
goto out_stop;
- }

if (last_block != max_block) {
n = ext4_block_to_path(inode, last_block, offsets, NULL);
--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-02-05 09:12:06

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v2 13/18] ext4: use ext4_zero_partial_blocks in punch_hole

We're doing to get rid of ext4_discard_partial_page_buffers() since it is
duplicating some code and also partially duplicating work of
truncate_pagecache_range(), moreover the old implementation was much
clearer.

Now when the truncate_inode_pages_range() can handle truncating non page
aligned regions we can use this to invalidate and zero out block aligned
region of the punched out range and then use ext4_block_truncate_page()
to zero the unaligned blocks on the start and end of the range. This
will greatly simplify the punch hole code. Moreover after this commit we
can get rid of the ext4_discard_partial_page_buffers() completely.

This has been tested on ppc64 with 1k block size with fsx and xfstests
without any problems.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/ext4.h | 2 +
fs/ext4/extents.c | 79 ++++++-----------------------------------------------
fs/ext4/inode.c | 31 +++++++++++++++++++++
3 files changed, 42 insertions(+), 70 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 4246a55..665b975 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2090,6 +2090,8 @@ extern int ext4_block_truncate_page(handle_t *handle,
struct address_space *mapping, loff_t from);
extern int ext4_block_zero_page_range(handle_t *handle,
struct address_space *mapping, loff_t from, loff_t length);
+extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
+ loff_t lstart, loff_t lend);
extern int ext4_discard_partial_page_buffers(handle_t *handle,
struct address_space *mapping, loff_t from,
loff_t length, int flags);
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 5ce5a14..7b44dc1 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4639,7 +4639,6 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
ext4_lblk_t first_block, stop_block;
struct address_space *mapping = inode->i_mapping;
handle_t *handle;
- loff_t first_page, last_page, page_len;
loff_t first_page_offset, last_page_offset;
int credits, err = 0;

@@ -4680,17 +4679,13 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
offset;
}

- first_page = (offset + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
- last_page = (offset + length) >> PAGE_CACHE_SHIFT;
+ first_page_offset = round_up(offset, sb->s_blocksize);
+ last_page_offset = round_down((offset + length), sb->s_blocksize) - 1;

- first_page_offset = first_page << PAGE_CACHE_SHIFT;
- last_page_offset = last_page << PAGE_CACHE_SHIFT;
-
- /* Now release the pages */
- if (last_page_offset > first_page_offset) {
+ /* Now release the pages and zero block aligned part of pages*/
+ if (last_page_offset > first_page_offset)
truncate_pagecache_range(inode, first_page_offset,
- last_page_offset - 1);
- }
+ last_page_offset);

/* Wait all existing dio workers, newcomers will block on i_mutex */
ext4_inode_block_unlocked_dio(inode);
@@ -4707,66 +4702,10 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
}


- /*
- * Now we need to zero out the non-page-aligned data in the
- * pages at the start and tail of the hole, and unmap the buffer
- * heads for the block aligned regions of the page that were
- * completely zeroed.
- */
- if (first_page > last_page) {
- /*
- * If the file space being truncated is contained within a page
- * just zero out and unmap the middle of that page
- */
- err = ext4_discard_partial_page_buffers(handle,
- mapping, offset, length, 0);
-
- if (err)
- goto out;
- } else {
- /*
- * zero out and unmap the partial page that contains
- * the start of the hole
- */
- page_len = first_page_offset - offset;
- if (page_len > 0) {
- err = ext4_discard_partial_page_buffers(handle, mapping,
- offset, page_len, 0);
- if (err)
- goto out;
- }
-
- /*
- * zero out and unmap the partial page that contains
- * the end of the hole
- */
- page_len = offset + length - last_page_offset;
- if (page_len > 0) {
- err = ext4_discard_partial_page_buffers(handle, mapping,
- last_page_offset, page_len, 0);
- if (err)
- goto out;
- }
- }
-
- /*
- * If i_size is contained in the last page, we need to
- * unmap and zero the partial page after i_size
- */
- if (inode->i_size >> PAGE_CACHE_SHIFT == last_page &&
- inode->i_size % PAGE_CACHE_SIZE != 0) {
-
- page_len = PAGE_CACHE_SIZE -
- (inode->i_size & (PAGE_CACHE_SIZE - 1));
-
- if (page_len > 0) {
- err = ext4_discard_partial_page_buffers(handle,
- mapping, inode->i_size, page_len, 0);
-
- if (err)
- goto out;
- }
- }
+ err = ext4_zero_partial_blocks(handle, inode, offset,
+ offset + length - 1);
+ if (err)
+ goto out;

first_block = (offset + sb->s_blocksize - 1) >>
EXT4_BLOCK_SIZE_BITS(sb);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5ccf556..5df9716 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3672,6 +3672,37 @@ unlock:
return err;
}

+int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
+ loff_t lstart, loff_t lend)
+{
+ struct super_block *sb = inode->i_sb;
+ struct address_space *mapping = inode->i_mapping;
+ unsigned partial = lstart & (sb->s_blocksize - 1);
+ ext4_fsblk_t start = lstart >> sb->s_blocksize_bits;
+ ext4_fsblk_t end = lend >> sb->s_blocksize_bits;
+ int err = 0;
+
+ /* Handle partial zero within the single block */
+ if (start == end) {
+ err = ext4_block_zero_page_range(handle, mapping,
+ lstart, lend - lstart + 1);
+ return err;
+ }
+ /* Handle partial zero out on the start of the range */
+ if (partial) {
+ err = ext4_block_zero_page_range(handle, mapping,
+ lstart, sb->s_blocksize);
+ if (err)
+ return err;
+ }
+ /* Handle partial zero out on the end of the range */
+ partial = lend & (sb->s_blocksize - 1);
+ if (partial != sb->s_blocksize - 1)
+ err = ext4_block_zero_page_range(handle, mapping,
+ lend - partial, partial + 1);
+ return err;
+}
+
int ext4_can_truncate(struct inode *inode)
{
if (S_ISREG(inode->i_mode))
--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-02-05 09:12:11

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v2 18/18] ext4: Allow punch hole with bigalloc enabled

In commits 5f95d21fb6f2aaa52830e5b7fb405f6c71d3ab85 and
30bc2ec9598a1b156ad75217f2e7d4560efdeeab we've reworked punch_hole
implementation and there is noting holding us back from using punch hole
on file system with bigalloc feature enabled.

This has been tested with fsx and xfstests.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/inode.c | 5 -----
1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e7bf594..01cf049 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3530,11 +3530,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
return -EOPNOTSUPP;
}

- if (EXT4_SB(inode->i_sb)->s_cluster_ratio > 1) {
- /* TODO: Add support for bigalloc file systems */
- return -EOPNOTSUPP;
- }
-
return ext4_ext_punch_hole(file, offset, length);
}

--
1.7.7.6

2013-02-05 09:12:07

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v2 14/18] ext4: remove unused discard_partial_page_buffers

The discard_partial_page_buffers is no longer used anywhere so we can
simply remove it including the *_no_lock variant and
EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED define.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/ext4.h | 8 --
fs/ext4/inode.c | 206 -------------------------------------------------------
2 files changed, 0 insertions(+), 214 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 665b975..c276dcb 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -594,11 +594,6 @@ enum {
#define EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER 0x0020

/*
- * Flags used by ext4_discard_partial_page_buffers
- */
-#define EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED 0x0001
-
-/*
* ioctl commands
*/
#define EXT4_IOC_GETFLAGS FS_IOC_GETFLAGS
@@ -2092,9 +2087,6 @@ extern int ext4_block_zero_page_range(handle_t *handle,
struct address_space *mapping, loff_t from, loff_t length);
extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
loff_t lstart, loff_t lend);
-extern int ext4_discard_partial_page_buffers(handle_t *handle,
- struct address_space *mapping, loff_t from,
- loff_t length, int flags);
extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
extern qsize_t *ext4_get_reserved_space(struct inode *inode);
extern void ext4_da_update_reserve_space(struct inode *inode,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5df9716..e7bf594 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -139,9 +139,6 @@ static int ext4_set_bh_endio(struct buffer_head *bh, struct inode *inode);
static void ext4_end_io_buffer_write(struct buffer_head *bh, int uptodate);
static int __ext4_journalled_writepage(struct page *page, unsigned int len);
static int ext4_bh_delay_or_unwritten(handle_t *handle, struct buffer_head *bh);
-static int ext4_discard_partial_page_buffers_no_lock(handle_t *handle,
- struct inode *inode, struct page *page, loff_t from,
- loff_t length, int flags);

/*
* Test whether an inode is a fast symlink.
@@ -3349,209 +3346,6 @@ void ext4_set_aops(struct inode *inode)
}
}

-
-/*
- * ext4_discard_partial_page_buffers()
- * Wrapper function for ext4_discard_partial_page_buffers_no_lock.
- * This function finds and locks the page containing the offset
- * "from" and passes it to ext4_discard_partial_page_buffers_no_lock.
- * Calling functions that already have the page locked should call
- * ext4_discard_partial_page_buffers_no_lock directly.
- */
-int ext4_discard_partial_page_buffers(handle_t *handle,
- struct address_space *mapping, loff_t from,
- loff_t length, int flags)
-{
- struct inode *inode = mapping->host;
- struct page *page;
- int err = 0;
-
- page = find_or_create_page(mapping, from >> PAGE_CACHE_SHIFT,
- mapping_gfp_mask(mapping) & ~__GFP_FS);
- if (!page)
- return -ENOMEM;
-
- err = ext4_discard_partial_page_buffers_no_lock(handle, inode, page,
- from, length, flags);
-
- unlock_page(page);
- page_cache_release(page);
- return err;
-}
-
-/*
- * ext4_discard_partial_page_buffers_no_lock()
- * Zeros a page range of length 'length' starting from offset 'from'.
- * Buffer heads that correspond to the block aligned regions of the
- * zeroed range will be unmapped. Unblock aligned regions
- * will have the corresponding buffer head mapped if needed so that
- * that region of the page can be updated with the partial zero out.
- *
- * This function assumes that the page has already been locked. The
- * The range to be discarded must be contained with in the given page.
- * If the specified range exceeds the end of the page it will be shortened
- * to the end of the page that corresponds to 'from'. This function is
- * appropriate for updating a page and it buffer heads to be unmapped and
- * zeroed for blocks that have been either released, or are going to be
- * released.
- *
- * handle: The journal handle
- * inode: The files inode
- * page: A locked page that contains the offset "from"
- * from: The starting byte offset (from the beginning of the file)
- * to begin discarding
- * len: The length of bytes to discard
- * flags: Optional flags that may be used:
- *
- * EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED
- * Only zero the regions of the page whose buffer heads
- * have already been unmapped. This flag is appropriate
- * for updating the contents of a page whose blocks may
- * have already been released, and we only want to zero
- * out the regions that correspond to those released blocks.
- *
- * Returns zero on success or negative on failure.
- */
-static int ext4_discard_partial_page_buffers_no_lock(handle_t *handle,
- struct inode *inode, struct page *page, loff_t from,
- loff_t length, int flags)
-{
- ext4_fsblk_t index = from >> PAGE_CACHE_SHIFT;
- unsigned int offset = from & (PAGE_CACHE_SIZE-1);
- unsigned int blocksize, max, pos;
- ext4_lblk_t iblock;
- struct buffer_head *bh;
- int err = 0;
-
- blocksize = inode->i_sb->s_blocksize;
- max = PAGE_CACHE_SIZE - offset;
-
- if (index != page->index)
- return -EINVAL;
-
- /*
- * correct length if it does not fall between
- * 'from' and the end of the page
- */
- if (length > max || length < 0)
- length = max;
-
- iblock = index << (PAGE_CACHE_SHIFT - inode->i_sb->s_blocksize_bits);
-
- if (!page_has_buffers(page))
- create_empty_buffers(page, blocksize, 0);
-
- /* Find the buffer that contains "offset" */
- bh = page_buffers(page);
- pos = blocksize;
- while (offset >= pos) {
- bh = bh->b_this_page;
- iblock++;
- pos += blocksize;
- }
-
- pos = offset;
- while (pos < offset + length) {
- unsigned int end_of_block, range_to_discard;
-
- err = 0;
-
- /* The length of space left to zero and unmap */
- range_to_discard = offset + length - pos;
-
- /* The length of space until the end of the block */
- end_of_block = blocksize - (pos & (blocksize-1));
-
- /*
- * Do not unmap or zero past end of block
- * for this buffer head
- */
- if (range_to_discard > end_of_block)
- range_to_discard = end_of_block;
-
-
- /*
- * Skip this buffer head if we are only zeroing unampped
- * regions of the page
- */
- if (flags & EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED &&
- buffer_mapped(bh))
- goto next;
-
- /* If the range is block aligned, unmap */
- if (range_to_discard == blocksize) {
- clear_buffer_dirty(bh);
- bh->b_bdev = NULL;
- clear_buffer_mapped(bh);
- clear_buffer_req(bh);
- clear_buffer_new(bh);
- clear_buffer_delay(bh);
- clear_buffer_unwritten(bh);
- clear_buffer_uptodate(bh);
- zero_user(page, pos, range_to_discard);
- BUFFER_TRACE(bh, "Buffer discarded");
- goto next;
- }
-
- /*
- * If this block is not completely contained in the range
- * to be discarded, then it is not going to be released. Because
- * we need to keep this block, we need to make sure this part
- * of the page is uptodate before we modify it by writeing
- * partial zeros on it.
- */
- if (!buffer_mapped(bh)) {
- /*
- * Buffer head must be mapped before we can read
- * from the block
- */
- BUFFER_TRACE(bh, "unmapped");
- ext4_get_block(inode, iblock, bh, 0);
- /* unmapped? It's a hole - nothing to do */
- if (!buffer_mapped(bh)) {
- BUFFER_TRACE(bh, "still unmapped");
- goto next;
- }
- }
-
- /* Ok, it's mapped. Make sure it's up-to-date */
- if (PageUptodate(page))
- set_buffer_uptodate(bh);
-
- if (!buffer_uptodate(bh)) {
- err = -EIO;
- ll_rw_block(READ, 1, &bh);
- wait_on_buffer(bh);
- /* Uhhuh. Read error. Complain and punt.*/
- if (!buffer_uptodate(bh))
- goto next;
- }
-
- if (ext4_should_journal_data(inode)) {
- BUFFER_TRACE(bh, "get write access");
- err = ext4_journal_get_write_access(handle, bh);
- if (err)
- goto next;
- }
-
- zero_user(page, pos, range_to_discard);
-
- err = 0;
- if (ext4_should_journal_data(inode)) {
- err = ext4_handle_dirty_metadata(handle, inode, bh);
- } else
- mark_buffer_dirty(bh);
-
- BUFFER_TRACE(bh, "Partial buffer zeroed");
-next:
- bh = bh->b_this_page;
- iblock++;
- pos += range_to_discard;
- }
-
- return err;
-}
-
/*
* ext4_block_truncate_page() zeroes out a mapping from file offset `from'
* up to the end of the block which corresponds to `from'.
--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-02-05 09:12:08

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v2 15/18] ext4: remove unused code from ext4_remove_blocks()

The "head removal" branch in the condition is never used in any code
path in ext4 since the function only caller ext4_ext_rm_leaf() will make
sure that the extent is properly split before removing blocks. Note that
there is a bug in this branch anyway.

This commit removes the unused code completely and makes use of
ext4_error() instead of printk if dubious range is provided.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/extents.c | 21 ++++-----------------
1 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 7b44dc1..966a09e 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2414,23 +2414,10 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
*partial_cluster = EXT4_B2C(sbi, pblk);
else
*partial_cluster = 0;
- } else if (from == le32_to_cpu(ex->ee_block)
- && to <= le32_to_cpu(ex->ee_block) + ee_len - 1) {
- /* head removal */
- ext4_lblk_t num;
- ext4_fsblk_t start;
-
- num = to - from;
- start = ext4_ext_pblock(ex);
-
- ext_debug("free first %u blocks starting %llu\n", num, start);
- ext4_free_blocks(handle, inode, NULL, start, num, flags);
-
- } else {
- printk(KERN_INFO "strange request: removal(2) "
- "%u-%u from %u:%u\n",
- from, to, le32_to_cpu(ex->ee_block), ee_len);
- }
+ } else
+ ext4_error(sbi->s_sb, "strange request: removal(2) "
+ "%u-%u from %u:%u\n",
+ from, to, le32_to_cpu(ex->ee_block), ee_len);
return 0;
}

--
1.7.7.6

2013-02-05 09:12:09

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v2 16/18] ext4: update ext4_ext_remove_space trace point

Add "end" variable.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/extents.c | 6 +++---
include/trace/events/ext4.h | 21 ++++++++++++++-------
2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 966a09e..2b7e521 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2647,7 +2647,7 @@ static int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start,
again:
ext4_ext_invalidate_cache(inode);

- trace_ext4_ext_remove_space(inode, start, depth);
+ trace_ext4_ext_remove_space(inode, start, end, depth);

/*
* Check if we are removing extents inside the extent tree. If that
@@ -2813,8 +2813,8 @@ again:
}
}

- trace_ext4_ext_remove_space_done(inode, start, depth, partial_cluster,
- path->p_hdr->eh_entries);
+ trace_ext4_ext_remove_space_done(inode, start, end, depth,
+ partial_cluster, path->p_hdr->eh_entries);

/* If we still have something in the partial cluster and we have removed
* even the first extent, then we should free the blocks in the partial
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 6ab6f8a..5fd98f9 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -2010,14 +2010,16 @@ TRACE_EVENT(ext4_ext_rm_idx,
);

TRACE_EVENT(ext4_ext_remove_space,
- TP_PROTO(struct inode *inode, ext4_lblk_t start, int depth),
+ TP_PROTO(struct inode *inode, ext4_lblk_t start,
+ ext4_lblk_t end, int depth),

- TP_ARGS(inode, start, depth),
+ TP_ARGS(inode, start, end, depth),

TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
__field( ext4_lblk_t, start )
+ __field( ext4_lblk_t, end )
__field( int, depth )
),

@@ -2025,26 +2027,29 @@ TRACE_EVENT(ext4_ext_remove_space,
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
__entry->start = start;
+ __entry->end = end;
__entry->depth = depth;
),

- TP_printk("dev %d,%d ino %lu since %u depth %d",
+ TP_printk("dev %d,%d ino %lu start %u end %u depth %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned) __entry->start,
+ (unsigned) __entry->end,
__entry->depth)
);

TRACE_EVENT(ext4_ext_remove_space_done,
- TP_PROTO(struct inode *inode, ext4_lblk_t start, int depth,
- ext4_lblk_t partial, unsigned short eh_entries),
+ TP_PROTO(struct inode *inode, ext4_lblk_t start, ext4_lblk_t end,
+ int depth, ext4_lblk_t partial, unsigned short eh_entries),

- TP_ARGS(inode, start, depth, partial, eh_entries),
+ TP_ARGS(inode, start, end, depth, partial, eh_entries),

TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
__field( ext4_lblk_t, start )
+ __field( ext4_lblk_t, end )
__field( int, depth )
__field( ext4_lblk_t, partial )
__field( unsigned short, eh_entries )
@@ -2054,16 +2059,18 @@ TRACE_EVENT(ext4_ext_remove_space_done,
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
__entry->start = start;
+ __entry->end = end;
__entry->depth = depth;
__entry->partial = partial;
__entry->eh_entries = eh_entries;
),

- TP_printk("dev %d,%d ino %lu since %u depth %d partial %u "
+ TP_printk("dev %d,%d ino %lu start %u end %u depth %d partial %u "
"remaining_entries %u",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned) __entry->start,
+ (unsigned) __entry->end,
__entry->depth,
(unsigned) __entry->partial,
(unsigned short) __entry->eh_entries)
--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-02-05 09:12:10

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v2 17/18] ext4: make punch hole code path work with bigalloc

Currently punch hole is disabled in file systems with bigalloc
feature enabled. However the recent changes in punch hole patch should
make it easier to support punching holes on bigalloc enabled file
systems.

This commit changes partial_cluster handling in ext4_remove_blocks(),
ext4_ext_rm_leaf() and ext4_ext_remove_space(). Currently
partial_cluster is unsigned long long type and it makes sure that we
will free the partial cluster if all extents has been released from that
cluster. However it has been specifically designed only for truncate.

With punch hole we can be freeing just some extents in the cluster
leaving the rest untouched. So we have to make sure that we will notice
cluster which still has some extents. To do this I've changed
partial_cluster to be signed long long type. The only scenario where
this could be a problem is when cluster_size == block size, however in
that case there would not be any partial clusters so we're safe. For
bigger clusters the signed type is enough. Now we use the negative value
in partial_cluster to mark such cluster used, hence we know that we must
not free it even if all other extents has been freed from such cluster.

This scenario can be described in simple diagram:

|FFF...FF..FF.UUU|
^----------^
punch hole

. - free space
| - cluster boundary
F - freed extent
U - used extent

Also update respective tracepoints to use signed long long type for
partial_cluster.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/extents.c | 69 +++++++++++++++++++++++++++++++-----------
include/trace/events/ext4.h | 25 ++++++++-------
2 files changed, 64 insertions(+), 30 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 2b7e521..fbb2940 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2341,7 +2341,7 @@ int ext4_ext_index_trans_blocks(struct inode *inode, int nrblocks, int chunk)

static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
struct ext4_extent *ex,
- ext4_fsblk_t *partial_cluster,
+ signed long long *partial_cluster,
ext4_lblk_t from, ext4_lblk_t to)
{
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
@@ -2370,7 +2370,8 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
* partial cluster here.
*/
pblk = ext4_ext_pblock(ex) + ee_len - 1;
- if (*partial_cluster && (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
+ if ((*partial_cluster > 0) &&
+ (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
ext4_free_blocks(handle, inode, NULL,
EXT4_C2B(sbi, *partial_cluster),
sbi->s_cluster_ratio, flags);
@@ -2396,23 +2397,41 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
&& to == le32_to_cpu(ex->ee_block) + ee_len - 1) {
/* tail removal */
ext4_lblk_t num;
+ unsigned int unaligned;

num = le32_to_cpu(ex->ee_block) + ee_len - from;
pblk = ext4_ext_pblock(ex) + ee_len - num;
- ext_debug("free last %u blocks starting %llu\n", num, pblk);
+ /*
+ * Usually we want to free partial cluster at the end of the
+ * extent, except for the situation when the cluster is still
+ * used by any other extent (partial_cluster is negative).
+ */
+ if (*partial_cluster < 0 &&
+ -(*partial_cluster) == EXT4_B2C(sbi, pblk + num - 1))
+ flags |= EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER;
+
+ ext_debug("free last %u blocks starting %llu partial %lld\n",
+ num, pblk, *partial_cluster);
ext4_free_blocks(handle, inode, NULL, pblk, num, flags);
/*
* If the block range to be freed didn't start at the
* beginning of a cluster, and we removed the entire
- * extent, save the partial cluster here, since we
- * might need to delete if we determine that the
- * truncate operation has removed all of the blocks in
- * the cluster.
+ * extent and the cluster is not used by any other extent,
+ * save the partial cluster here, since we might need to
+ * delete if we determine that the truncate operation has
+ * removed all of the blocks in the cluster.
+ *
+ * On the other hand, if we did not manage to free the whole
+ * extent, we have to mark the cluster as used (store negative
+ * cluster number in partial_cluster).
*/
- if (pblk & (sbi->s_cluster_ratio - 1) &&
- (ee_len == num))
+ unaligned = pblk & (sbi->s_cluster_ratio - 1);
+ if (unaligned && (ee_len == num) &&
+ (*partial_cluster != -((long long)EXT4_B2C(sbi, pblk))))
*partial_cluster = EXT4_B2C(sbi, pblk);
- else
+ else if (unaligned)
+ *partial_cluster = -((long long)EXT4_B2C(sbi, pblk));
+ else if (*partial_cluster > 0)
*partial_cluster = 0;
} else
ext4_error(sbi->s_sb, "strange request: removal(2) "
@@ -2430,12 +2449,16 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
* @handle: The journal handle
* @inode: The files inode
* @path: The path to the leaf
+ * @partial_cluster: The cluster which we'll have to free if all extents
+ * has been released from it. It gets negative in case
+ * that the cluster is still used.
* @start: The first block to remove
* @end: The last block to remove
*/
static int
ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
- struct ext4_ext_path *path, ext4_fsblk_t *partial_cluster,
+ struct ext4_ext_path *path,
+ signed long long *partial_cluster,
ext4_lblk_t start, ext4_lblk_t end)
{
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
@@ -2448,6 +2471,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
unsigned short ex_ee_len;
unsigned uninitialized = 0;
struct ext4_extent *ex;
+ ext4_fsblk_t pblk;

/* the header must be checked already in ext4_ext_remove_space() */
ext_debug("truncate since %u in leaf to %u\n", start, end);
@@ -2486,6 +2510,16 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,

/* If this extent is beyond the end of the hole, skip it */
if (end < ex_ee_block) {
+ /*
+ * We're going to skip this extent and move to another,
+ * so if this extent is not cluster aligned we have
+ * to mark the current cluster as used to avoid
+ * accidentally freeing it later on
+ */
+ pblk = ext4_ext_pblock(ex);
+ if (pblk & (sbi->s_cluster_ratio - 1))
+ *partial_cluster =
+ -((long long)EXT4_B2C(sbi, pblk));
ex--;
ex_ee_block = le32_to_cpu(ex->ee_block);
ex_ee_len = ext4_ext_get_actual_len(ex);
@@ -2561,7 +2595,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
sizeof(struct ext4_extent));
}
le16_add_cpu(&eh->eh_entries, -1);
- } else
+ } else if (*partial_cluster > 0)
*partial_cluster = 0;

err = ext4_ext_dirty(handle, inode, path + depth);
@@ -2579,11 +2613,10 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
err = ext4_ext_correct_indexes(handle, inode, path);

/*
- * If there is still a entry in the leaf node, check to see if
- * it references the partial cluster. This is the only place
- * where it could; if it doesn't, we can free the cluster.
+ * Free the partial cluster only if the current extent does not
+ * reference it. Otherwise we might free used cluster.
*/
- if (*partial_cluster && ex >= EXT_FIRST_EXTENT(eh) &&
+ if (*partial_cluster > 0 &&
(EXT4_B2C(sbi, ext4_ext_pblock(ex) + ex_ee_len - 1) !=
*partial_cluster)) {
int flags = EXT4_FREE_BLOCKS_FORGET;
@@ -2633,7 +2666,7 @@ static int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start,
struct super_block *sb = inode->i_sb;
int depth = ext_depth(inode);
struct ext4_ext_path *path = NULL;
- ext4_fsblk_t partial_cluster = 0;
+ signed long long partial_cluster = 0;
handle_t *handle;
int i = 0, err = 0;

@@ -2819,7 +2852,7 @@ again:
/* If we still have something in the partial cluster and we have removed
* even the first extent, then we should free the blocks in the partial
* cluster as well. */
- if (partial_cluster && path->p_hdr->eh_entries == 0) {
+ if (partial_cluster > 0 && path->p_hdr->eh_entries == 0) {
int flags = EXT4_FREE_BLOCKS_FORGET;

if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 5fd98f9..2bfdf67 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -1911,7 +1911,7 @@ TRACE_EVENT(ext4_ext_show_extent,
TRACE_EVENT(ext4_remove_blocks,
TP_PROTO(struct inode *inode, struct ext4_extent *ex,
ext4_lblk_t from, ext4_fsblk_t to,
- ext4_fsblk_t partial_cluster),
+ long long int partial_cluster),

TP_ARGS(inode, ex, from, to, partial_cluster),

@@ -1920,7 +1920,7 @@ TRACE_EVENT(ext4_remove_blocks,
__field( ino_t, ino )
__field( ext4_lblk_t, from )
__field( ext4_lblk_t, to )
- __field( ext4_fsblk_t, partial )
+ __field( long long int, partial )
__field( ext4_fsblk_t, ee_pblk )
__field( ext4_lblk_t, ee_lblk )
__field( unsigned short, ee_len )
@@ -1938,7 +1938,7 @@ TRACE_EVENT(ext4_remove_blocks,
),

TP_printk("dev %d,%d ino %lu extent [%u(%llu), %u]"
- "from %u to %u partial_cluster %u",
+ "from %u to %u partial_cluster %lld",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned) __entry->ee_lblk,
@@ -1946,19 +1946,20 @@ TRACE_EVENT(ext4_remove_blocks,
(unsigned short) __entry->ee_len,
(unsigned) __entry->from,
(unsigned) __entry->to,
- (unsigned) __entry->partial)
+ (long long int) __entry->partial)
);

TRACE_EVENT(ext4_ext_rm_leaf,
TP_PROTO(struct inode *inode, ext4_lblk_t start,
- struct ext4_extent *ex, ext4_fsblk_t partial_cluster),
+ struct ext4_extent *ex,
+ long long int partial_cluster),

TP_ARGS(inode, start, ex, partial_cluster),

TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
- __field( ext4_fsblk_t, partial )
+ __field( long long int, partial )
__field( ext4_lblk_t, start )
__field( ext4_lblk_t, ee_lblk )
__field( ext4_fsblk_t, ee_pblk )
@@ -1976,14 +1977,14 @@ TRACE_EVENT(ext4_ext_rm_leaf,
),

TP_printk("dev %d,%d ino %lu start_lblk %u last_extent [%u(%llu), %u]"
- "partial_cluster %u",
+ "partial_cluster %lld",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned) __entry->start,
(unsigned) __entry->ee_lblk,
(unsigned long long) __entry->ee_pblk,
(unsigned short) __entry->ee_len,
- (unsigned) __entry->partial)
+ (long long int) __entry->partial)
);

TRACE_EVENT(ext4_ext_rm_idx,
@@ -2041,7 +2042,7 @@ TRACE_EVENT(ext4_ext_remove_space,

TRACE_EVENT(ext4_ext_remove_space_done,
TP_PROTO(struct inode *inode, ext4_lblk_t start, ext4_lblk_t end,
- int depth, ext4_lblk_t partial, unsigned short eh_entries),
+ int depth, long long int partial, unsigned short eh_entries),

TP_ARGS(inode, start, end, depth, partial, eh_entries),

@@ -2051,7 +2052,7 @@ TRACE_EVENT(ext4_ext_remove_space_done,
__field( ext4_lblk_t, start )
__field( ext4_lblk_t, end )
__field( int, depth )
- __field( ext4_lblk_t, partial )
+ __field( long long int, partial )
__field( unsigned short, eh_entries )
),

@@ -2065,14 +2066,14 @@ TRACE_EVENT(ext4_ext_remove_space_done,
__entry->eh_entries = eh_entries;
),

- TP_printk("dev %d,%d ino %lu start %u end %u depth %d partial %u "
+ TP_printk("dev %d,%d ino %lu start %u end %u depth %d partial %lld "
"remaining_entries %u",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned) __entry->start,
(unsigned) __entry->end,
__entry->depth,
- (unsigned) __entry->partial,
+ (long long int) __entry->partial,
(unsigned short) __entry->eh_entries)
);

--
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-02-05 09:47:39

by Steven Whitehouse

[permalink] [raw]
Subject: Re: [Cluster-devel] [PATCH v2 08/18] gfs2: use ->invalidatepage() length argument

Hi,

Acked-by: Steven Whitehouse <[email protected]>

Steve.

On Tue, 2013-02-05 at 10:12 +0100, Lukas Czerner wrote:
> ->invalidatepage() aop now accepts range to invalidate so we can make
> use of it in gfs2_invalidatepage().
>
> Signed-off-by: Lukas Czerner <[email protected]>
> Cc: [email protected]
> ---
> fs/gfs2/aops.c | 9 +++++++--
> 1 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
> index 5bd558c..3cf3dc8 100644
> --- a/fs/gfs2/aops.c
> +++ b/fs/gfs2/aops.c
> @@ -949,24 +949,29 @@ static void gfs2_invalidatepage(struct page *page, unsigned int offset,
> unsigned int length)
> {
> struct gfs2_sbd *sdp = GFS2_SB(page->mapping->host);
> + unsigned int stop = offset + length;
> + int partial_page = (offset || length < PAGE_CACHE_SIZE);
> struct buffer_head *bh, *head;
> unsigned long pos = 0;
>
> BUG_ON(!PageLocked(page));
> - if (offset == 0)
> + if (!partial_page)
> ClearPageChecked(page);
> if (!page_has_buffers(page))
> goto out;
>
> bh = head = page_buffers(page);
> do {
> + if (pos + bh->b_size > stop)
> + return;
> +
> if (offset <= pos)
> gfs2_discard(sdp, bh);
> pos += bh->b_size;
> bh = bh->b_this_page;
> } while (bh != head);
> out:
> - if (offset == 0)
> + if (!partial_page)
> try_to_release_page(page, 0);
> }
>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-02-05 16:39:50

by Sage Weil

[permalink] [raw]
Subject: Re: [PATCH v2 07/18] ceph: use ->invalidatepage() length argument

On Tue, 5 Feb 2013, Lukas Czerner wrote:
> ->invalidatepage() aop now accepts range to invalidate so we can make
> use of it in ceph_invalidatepage().
>
> Signed-off-by: Lukas Czerner <[email protected]>
> Cc: [email protected]

Reviewed-by: Sage Weil <[email protected]>

> ---
> fs/ceph/addr.c | 12 ++++++------
> 1 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index 8953532..1e09410 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -164,20 +164,20 @@ static void ceph_invalidatepage(struct page *page, unsigned int offset,
> if (!PageDirty(page))
> pr_err("%p invalidatepage %p page not dirty\n", inode, page);
>
> - if (offset == 0)
> + if (offset == 0 && length == PAGE_CACHE_SIZE)
> ClearPageChecked(page);
>
> ci = ceph_inode(inode);
> - if (offset == 0) {
> - dout("%p invalidatepage %p idx %lu full dirty page %u\n",
> - inode, page, page->index, offset);
> + if (offset == 0 && length == PAGE_CACHE_SIZE) {
> + dout("%p invalidatepage %p idx %lu full dirty page\n",
> + inode, page, page->index);
> ceph_put_wrbuffer_cap_refs(ci, 1, snapc);
> ceph_put_snap_context(snapc);
> page->private = 0;
> ClearPagePrivate(page);
> } else {
> - dout("%p invalidatepage %p idx %lu partial dirty page\n",
> - inode, page, page->index);
> + dout("%p invalidatepage %p idx %lu partial dirty page %u(%u)\n",
> + inode, page, page->index, offset, length);
> }
> }
>
> --
> 1.7.7.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-02-07 23:40:54

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v2 10/18] mm: teach truncate_inode_pages_range() to handle non page aligned ranges

On Tue, 5 Feb 2013 10:12:03 +0100
Lukas Czerner <[email protected]> wrote:

> This commit changes truncate_inode_pages_range() so it can handle non
> page aligned regions of the truncate. Currently we can hit BUG_ON when
> the end of the range is not page aligned, but we can handle unaligned
> start of the range.
>
> Being able to handle non page aligned regions of the page can help file
> system punch_hole implementations and save some work, because once we're
> holding the page we might as well deal with it right away.
>
> In previous commits we've changed ->invalidatepage() prototype to accept
> 'length' argument to be able to specify range to invalidate. No we can
> use that new ability in truncate_inode_pages_range().
>
> ...
>
> + /*
> + * 'start' and 'end' always covers the range of pages to be fully
> + * truncated. Partial pages are covered with 'partial_start' at the
> + * start of the range and 'partial_end' at the end of the range.
> + * Note that 'end' is exclusive while 'lend' is inclusive.
> + */

That helped ;) So the bytes to be truncated are

(start*PAGE_SIZE + partial_start) -> (end*PAGE_SIZE + partial_end) - 1

yes?



2013-02-08 09:08:05

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH v2 10/18] mm: teach truncate_inode_pages_range() to handle non page aligned ranges

On Thu, 7 Feb 2013, Andrew Morton wrote:

> Date: Thu, 7 Feb 2013 15:40:42 -0800
> From: Andrew Morton <[email protected]>
> To: Lukas Czerner <[email protected]>
> Cc: [email protected], [email protected],
> [email protected], [email protected],
> Hugh Dickins <[email protected]>
> Subject: Re: [PATCH v2 10/18] mm: teach truncate_inode_pages_range() to handle
> non page aligned ranges
>
> On Tue, 5 Feb 2013 10:12:03 +0100
> Lukas Czerner <[email protected]> wrote:
>
> > This commit changes truncate_inode_pages_range() so it can handle non
> > page aligned regions of the truncate. Currently we can hit BUG_ON when
> > the end of the range is not page aligned, but we can handle unaligned
> > start of the range.
> >
> > Being able to handle non page aligned regions of the page can help file
> > system punch_hole implementations and save some work, because once we're
> > holding the page we might as well deal with it right away.
> >
> > In previous commits we've changed ->invalidatepage() prototype to accept
> > 'length' argument to be able to specify range to invalidate. No we can
> > use that new ability in truncate_inode_pages_range().
> >
> > ...
> >
> > + /*
> > + * 'start' and 'end' always covers the range of pages to be fully
> > + * truncated. Partial pages are covered with 'partial_start' at the
> > + * start of the range and 'partial_end' at the end of the range.
> > + * Note that 'end' is exclusive while 'lend' is inclusive.
> > + */
>
> That helped ;) So the bytes to be truncated are
>
> (start*PAGE_SIZE + partial_start) -> (end*PAGE_SIZE + partial_end) - 1
>
> yes?

The start of the range is not right, because 'start' and 'end'
covers pages to be _fully_ truncated. See the while cycle and
then 'if (partial_start)' condition where we search for the
page (start - 1) and do_invalidate within that page.

So it should be like this:


(start*PAGE_SIZE - partial_start*(PAGE_SIZE - partial_start) ->
(end*PAGE_END + partial_end) - 1


assuming that you want the range to be inclusive on both sides.

-Lukas

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-02-21 08:33:56

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH v2 10/18] mm: teach truncate_inode_pages_range() to handle non page aligned ranges

On Fri, 8 Feb 2013, Luk?? Czerner wrote:

> Date: Fri, 8 Feb 2013 10:08:05 +0100 (CET)
> From: Luk?? Czerner <[email protected]>
> To: Andrew Morton <[email protected]>
> Cc: Lukas Czerner <[email protected]>, [email protected],
> [email protected], [email protected],
> [email protected], Hugh Dickins <[email protected]>
> Subject: Re: [PATCH v2 10/18] mm: teach truncate_inode_pages_range() to handle
> non page aligned ranges

..snip..

> > > + /*
> > > + * 'start' and 'end' always covers the range of pages to be fully
> > > + * truncated. Partial pages are covered with 'partial_start' at the
> > > + * start of the range and 'partial_end' at the end of the range.
> > > + * Note that 'end' is exclusive while 'lend' is inclusive.
> > > + */
> >
> > That helped ;) So the bytes to be truncated are
> >
> > (start*PAGE_SIZE + partial_start) -> (end*PAGE_SIZE + partial_end) - 1
> >
> > yes?
>
> The start of the range is not right, because 'start' and 'end'
> covers pages to be _fully_ truncated. See the while cycle and
> then 'if (partial_start)' condition where we search for the
> page (start - 1) and do_invalidate within that page.
>
> So it should be like this:
>
>
> (start*PAGE_SIZE - partial_start*(PAGE_SIZE - partial_start) ->
> (end*PAGE_END + partial_end) - 1
>
>
> assuming that you want the range to be inclusive on both sides.
>
> -Lukas
>

Hi Andrew,

what's the status of the patch set ? Do you have any more comments
or questions ? Can we get this in in this merge window ?

Thanks!
-Lukas

2013-02-21 21:49:04

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v2 10/18] mm: teach truncate_inode_pages_range() to handle non page aligned ranges

On Thu, 21 Feb 2013 09:33:56 +0100 (CET)
Luk____ Czerner <[email protected]> wrote:

> what's the status of the patch set ?

Forgotten about :(

> Can we get this in in this merge window ?

Please do a full resend after 3.9-rc1 and let's take it up again.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-02-22 08:06:00

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH v2 10/18] mm: teach truncate_inode_pages_range() to handle non page aligned ranges

On Thu, 21 Feb 2013, Andrew Morton wrote:

> Date: Thu, 21 Feb 2013 13:49:04 -0800
> From: Andrew Morton <[email protected]>
> To: Luk?? Czerner <[email protected]>
> Cc: [email protected], [email protected],
> [email protected], [email protected],
> Hugh Dickins <[email protected]>
> Subject: Re: [PATCH v2 10/18] mm: teach truncate_inode_pages_range() to handle
> non page aligned ranges
>
> On Thu, 21 Feb 2013 09:33:56 +0100 (CET)
> Luk____ Czerner <[email protected]> wrote:
>
> > what's the status of the patch set ?
>
> Forgotten about :(
>
> > Can we get this in in this merge window ?
>
> Please do a full resend after 3.9-rc1 and let's take it up again.
>

I'll do that. Thanks.

-Lukas