Hi,
This set of patches are aimed to allow truncate_inode_pages_range() handle
ranges which are not aligned at the end of the page. Currently it will
hit BUG_ON() when the end of the range is not aligned. Punch hole feature
however can benefit from this ability saving file systems some work not
forcing them to implement their own invalidate code to handle unaligned
ranges.
In order for this to woke we need change ->invalidatepage() address space
operation to to accept range to invalidate by adding 'length' argument in
addition to 'offset'. This is different from my previous attempt to create
new aop ->invalidatepage_range (http://lwn.net/Articles/514828/) which I
reconsidered to be unnecessary.
It would be for the best if this series could go through ext4 branch since
there are a lot of ext4 changes which are based on dev branch of ext4
(git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git)
For description purposes this patch set can be divided into following
groups:
patch 0001: Change ->invalidatepage() prototype adding 'length' argument
and changing all the instances. In very simple cases file
system methods are completely adapted, otherwise only
prototype is changed and the rest will follow. This patch
also implement the 'range' invalidation in
block_invalidatepage().
patch 0002 - 0009:
Make the use of new 'length' argument in the file system
itself. Some file systems can take advantage of it trying
to invalidate only portion of the page if possible, some
can't, however none of the file systems currently attempt
to truncate non page aligned ranges.
patch 0010: Teach truncate_inode_pages_range() to handle non page aligned
ranges.
patch 0011 - 0020:
Ext4 changes build on top of previous changes, simplifying
punch hole code. Not all changes are realated specifically
to invalidatepage() change, but all are related to the punch
hole feature.
Even though this patch set would mainly affect functionality of the file
file systems implementing punch hole I've tested all the following file
system using xfstests without noticing any bugs related to this change.
ext3, ext4, xfs, btrfs, gfs2 and reiserfs
I've also tested block size < page size on ext4 with xfstests and fsx.
v3 -> v4: Some minor changes based on the reviews. Added two ext4 patches
as suggested by Jan Kara.
Thanks!
-Lukas
--
Documentation/filesystems/Locking | 6 +-
Documentation/filesystems/vfs.txt | 20 +-
fs/9p/vfs_addr.c | 5 +-
fs/afs/file.c | 10 +-
fs/btrfs/disk-io.c | 3 +-
fs/btrfs/extent_io.c | 2 +-
fs/btrfs/inode.c | 3 +-
fs/buffer.c | 21 ++-
fs/ceph/addr.c | 15 +-
fs/cifs/file.c | 5 +-
fs/exofs/inode.c | 6 +-
fs/ext3/inode.c | 9 +-
fs/ext4/ext4.h | 14 +-
fs/ext4/extents.c | 96 ++++++----
fs/ext4/inode.c | 402 ++++++++++++++-----------------------
fs/ext4/super.c | 1 +
fs/f2fs/data.c | 3 +-
fs/f2fs/node.c | 3 +-
fs/gfs2/aops.c | 17 +-
fs/jbd/transaction.c | 19 ++-
fs/jbd2/transaction.c | 24 ++-
fs/jfs/jfs_metapage.c | 5 +-
fs/logfs/file.c | 3 +-
fs/logfs/segment.c | 3 +-
fs/nfs/file.c | 8 +-
fs/ntfs/aops.c | 2 +-
fs/ocfs2/aops.c | 5 +-
fs/reiserfs/inode.c | 12 +-
fs/ubifs/file.c | 5 +-
fs/xfs/xfs_aops.c | 14 +-
fs/xfs/xfs_trace.h | 15 +-
include/linux/buffer_head.h | 3 +-
include/linux/fs.h | 2 +-
include/linux/jbd.h | 2 +-
include/linux/jbd2.h | 2 +-
include/linux/mm.h | 3 +-
include/trace/events/ext3.h | 12 +-
include/trace/events/ext4.h | 64 ++++---
mm/readahead.c | 2 +-
mm/truncate.c | 117 ++++++++----
40 files changed, 503 insertions(+), 460 deletions(-)
[PATCH v4 01/20] mm: change invalidatepage prototype to accept
[PATCH v4 02/20] jbd2: change jbd2_journal_invalidatepage to accept
[PATCH v4 03/20] ext4: use ->invalidatepage() length argument
[PATCH v4 04/20] jbd: change journal_invalidatepage() to accept
[PATCH v4 05/20] xfs: use ->invalidatepage() length argument
[PATCH v4 06/20] ocfs2: use ->invalidatepage() length argument
[PATCH v4 07/20] ceph: use ->invalidatepage() length argument
[PATCH v4 08/20] gfs2: use ->invalidatepage() length argument
[PATCH v4 09/20] reiserfs: use ->invalidatepage() length argument
[PATCH v4 10/20] mm: teach truncate_inode_pages_range() to handle
[PATCH v4 11/20] Revert "ext4: remove no longer used functions in
[PATCH v4 12/20] ext4: Call ext4_jbd2_file_inode() after zeroing
[PATCH v4 13/20] Revert "ext4: fix fsx truncate failure"
[PATCH v4 14/20] ext4: truncate_inode_pages() in orphan cleanup path
[PATCH v4 15/20] ext4: use ext4_zero_partial_blocks in punch_hole
[PATCH v4 16/20] ext4: remove unused discard_partial_page_buffers
[PATCH v4 17/20] ext4: remove unused code from ext4_remove_blocks()
[PATCH v4 18/20] ext4: update ext4_ext_remove_space trace point
[PATCH v4 19/20] ext4: make punch hole code path work with bigalloc
[PATCH v4 20/20] ext4: Allow punch hole with bigalloc enabled
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
Currently there is no way to truncate partial page where the end
truncate point is not at the end of the page. This is because it was not
needed and the functionality was enough for file system truncate
operation to work properly. However more file systems now support punch
hole feature and it can benefit from mm supporting truncating page just
up to the certain point.
Specifically, with this functionality truncate_inode_pages_range() can
be changed so it supports truncating partial page at the end of the
range (currently it will BUG_ON() if 'end' is not at the end of the
page).
This commit changes the invalidatepage() address space operation
prototype to accept range to be invalidated and update all the instances
for it.
We also change the block_invalidatepage() in the same way and actually
make a use of the new length argument implementing range invalidation.
Actual file system implementations will follow except the file systems
where the changes are really simple and should not change the behaviour
in any way .Implementation for truncate_page_range() which will be able
to accept page unaligned ranges will follow as well.
Signed-off-by: Lukas Czerner <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Hugh Dickins <[email protected]>
---
Documentation/filesystems/Locking | 6 +++---
Documentation/filesystems/vfs.txt | 20 ++++++++++----------
fs/9p/vfs_addr.c | 5 +++--
fs/afs/file.c | 10 ++++++----
fs/btrfs/disk-io.c | 3 ++-
fs/btrfs/extent_io.c | 2 +-
fs/btrfs/inode.c | 3 ++-
fs/buffer.c | 21 ++++++++++++++++++---
fs/ceph/addr.c | 5 +++--
fs/cifs/file.c | 5 +++--
fs/exofs/inode.c | 6 ++++--
fs/ext3/inode.c | 3 ++-
fs/ext4/inode.c | 18 +++++++++++-------
fs/f2fs/data.c | 3 ++-
fs/f2fs/node.c | 3 ++-
fs/gfs2/aops.c | 8 +++++---
fs/jfs/jfs_metapage.c | 5 +++--
fs/logfs/file.c | 3 ++-
fs/logfs/segment.c | 3 ++-
fs/nfs/file.c | 8 +++++---
fs/ntfs/aops.c | 2 +-
fs/ocfs2/aops.c | 3 ++-
fs/reiserfs/inode.c | 3 ++-
fs/ubifs/file.c | 5 +++--
fs/xfs/xfs_aops.c | 7 ++++---
include/linux/buffer_head.h | 3 ++-
include/linux/fs.h | 2 +-
include/linux/mm.h | 3 ++-
mm/readahead.c | 2 +-
mm/truncate.c | 15 +++++++++------
30 files changed, 116 insertions(+), 69 deletions(-)
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 0706d32..cbbac3f 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -189,7 +189,7 @@ prototypes:
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata);
sector_t (*bmap)(struct address_space *, sector_t);
- int (*invalidatepage) (struct page *, unsigned long);
+ void (*invalidatepage) (struct page *, unsigned int, unsigned int);
int (*releasepage) (struct page *, int);
void (*freepage)(struct page *);
int (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
@@ -310,8 +310,8 @@ filesystems and by the swapper. The latter will eventually go away. Please,
keep it that way and don't breed new callers.
->invalidatepage() is called when the filesystem must attempt to drop
-some or all of the buffers from the page when it is being truncated. It
-returns zero on success. If ->invalidatepage is zero, the kernel uses
+some or all of the buffers from the page when it is being truncated. It
+returns zero on success. If ->invalidatepage is zero, the kernel uses
block_invalidatepage() instead.
->releasepage() is called when the kernel is about to try to drop the
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index bc4b06b..e445b95 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -549,7 +549,7 @@ struct address_space_operations
-------------------------------
This describes how the VFS can manipulate mapping of a file to page cache in
-your filesystem. As of kernel 2.6.22, the following members are defined:
+your filesystem. The following members are defined:
struct address_space_operations {
int (*writepage)(struct page *page, struct writeback_control *wbc);
@@ -566,7 +566,7 @@ struct address_space_operations {
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata);
sector_t (*bmap)(struct address_space *, sector_t);
- int (*invalidatepage) (struct page *, unsigned long);
+ void (*invalidatepage) (struct page *, unsigned int, unsigned int);
int (*releasepage) (struct page *, int);
void (*freepage)(struct page *);
ssize_t (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
@@ -685,14 +685,14 @@ struct address_space_operations {
invalidatepage: If a page has PagePrivate set, then invalidatepage
will be called when part or all of the page is to be removed
from the address space. This generally corresponds to either a
- truncation or a complete invalidation of the address space
- (in the latter case 'offset' will always be 0).
- Any private data associated with the page should be updated
- to reflect this truncation. If offset is 0, then
- the private data should be released, because the page
- must be able to be completely discarded. This may be done by
- calling the ->releasepage function, but in this case the
- release MUST succeed.
+ truncation, punch hole or a complete invalidation of the address
+ space (in the latter case 'offset' will always be 0 and 'length'
+ will be PAGE_CACHE_SIZE). Any private data associated with the page
+ should be updated to reflect this truncation. If offset is 0 and
+ length is PAGE_CACHE_SIZE, then the private data should be released,
+ because the page must be able to be completely discarded. This may
+ be done by calling the ->releasepage function, but in this case the
+ release MUST succeed.
releasepage: releasepage is called on PagePrivate pages to indicate
that the page should be freed if possible. ->releasepage
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index 0ad61c6..3bf654a 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -147,13 +147,14 @@ static int v9fs_release_page(struct page *page, gfp_t gfp)
* @offset: offset in the page
*/
-static void v9fs_invalidate_page(struct page *page, unsigned long offset)
+static void v9fs_invalidate_page(struct page *page, unsigned int offset,
+ unsigned int length)
{
/*
* If called with zero offset, we should release
* the private state assocated with the page
*/
- if (offset == 0)
+ if (offset == 0 && length == PAGE_CACHE_SIZE)
v9fs_fscache_invalidate_page(page);
}
diff --git a/fs/afs/file.c b/fs/afs/file.c
index 8f6e923..66d50fe 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -19,7 +19,8 @@
#include "internal.h"
static int afs_readpage(struct file *file, struct page *page);
-static void afs_invalidatepage(struct page *page, unsigned long offset);
+static void afs_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length);
static int afs_releasepage(struct page *page, gfp_t gfp_flags);
static int afs_launder_page(struct page *page);
@@ -310,16 +311,17 @@ static int afs_launder_page(struct page *page)
* - release a page and clean up its private data if offset is 0 (indicating
* the entire page)
*/
-static void afs_invalidatepage(struct page *page, unsigned long offset)
+static void afs_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct afs_writeback *wb = (struct afs_writeback *) page_private(page);
- _enter("{%lu},%lu", page->index, offset);
+ _enter("{%lu},%u,%u", page->index, offset, length);
BUG_ON(!PageLocked(page));
/* we clean up only if the entire page is being invalidated */
- if (offset == 0) {
+ if (offset == 0 && length == PAGE_CACHE_SIZE) {
#ifdef CONFIG_AFS_FSCACHE
if (PageFsCache(page)) {
struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 6d19a0a..fed6e19 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1003,7 +1003,8 @@ static int btree_releasepage(struct page *page, gfp_t gfp_flags)
return try_release_extent_buffer(page, gfp_flags);
}
-static void btree_invalidatepage(struct page *page, unsigned long offset)
+static void btree_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct extent_io_tree *tree;
tree = &BTRFS_I(page->mapping->host)->io_tree;
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index cdee391..a89b561 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2874,7 +2874,7 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
pg_offset = i_size & (PAGE_CACHE_SIZE - 1);
if (page->index > end_index ||
(page->index == end_index && !pg_offset)) {
- page->mapping->a_ops->invalidatepage(page, 0);
+ page->mapping->a_ops->invalidatepage(page, 0, PAGE_CACHE_SIZE);
unlock_page(page);
return 0;
}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 09c58a3..49eebed 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7467,7 +7467,8 @@ static int btrfs_releasepage(struct page *page, gfp_t gfp_flags)
return __btrfs_releasepage(page, gfp_flags & GFP_NOFS);
}
-static void btrfs_invalidatepage(struct page *page, unsigned long offset)
+static void btrfs_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct inode *inode = page->mapping->host;
struct extent_io_tree *tree;
diff --git a/fs/buffer.c b/fs/buffer.c
index a15575c..53762f5 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1456,7 +1456,8 @@ static void discard_buffer(struct buffer_head * bh)
* block_invalidatepage - invalidate part or all of a buffer-backed page
*
* @page: the page which is affected
- * @offset: the index of the truncation point
+ * @offset: start of the range to invalidate
+ * @length: length of the range to invalidate
*
* block_invalidatepage() is called when all or part of the page has become
* invalidated by a truncate operation.
@@ -1467,15 +1468,22 @@ static void discard_buffer(struct buffer_head * bh)
* point. Because the caller is about to free (and possibly reuse) those
* blocks on-disk.
*/
-void block_invalidatepage(struct page *page, unsigned long offset)
+void block_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct buffer_head *head, *bh, *next;
unsigned int curr_off = 0;
+ unsigned int stop = length + offset;
BUG_ON(!PageLocked(page));
if (!page_has_buffers(page))
goto out;
+ /*
+ * Check for overflow
+ */
+ BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
+
head = page_buffers(page);
bh = head;
do {
@@ -1483,6 +1491,12 @@ void block_invalidatepage(struct page *page, unsigned long offset)
next = bh->b_this_page;
/*
+ * Are we still fully in range ?
+ */
+ if (next_off > stop)
+ goto out;
+
+ /*
* is this block fully invalidated?
*/
if (offset <= curr_off)
@@ -1503,6 +1517,7 @@ out:
}
EXPORT_SYMBOL(block_invalidatepage);
+
/*
* We attach and possibly dirty the buffers atomically wrt
* __set_page_dirty_buffers() via private_lock. try_to_free_buffers
@@ -2843,7 +2858,7 @@ int block_write_full_page_endio(struct page *page, get_block_t *get_block,
* they may have been added in ext3_writepage(). Make them
* freeable here, so the page does not leak.
*/
- do_invalidatepage(page, 0);
+ do_invalidatepage(page, 0, PAGE_CACHE_SIZE);
unlock_page(page);
return 0; /* don't care */
}
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index a60ea97..168a35a 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -143,7 +143,8 @@ static int ceph_set_page_dirty(struct page *page)
* dirty page counters appropriately. Only called if there is private
* data on the page.
*/
-static void ceph_invalidatepage(struct page *page, unsigned long offset)
+static void ceph_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct inode *inode;
struct ceph_inode_info *ci;
@@ -168,7 +169,7 @@ static void ceph_invalidatepage(struct page *page, unsigned long offset)
ci = ceph_inode(inode);
if (offset == 0) {
- dout("%p invalidatepage %p idx %lu full dirty page %lu\n",
+ dout("%p invalidatepage %p idx %lu full dirty page %u\n",
inode, page, page->index, offset);
ceph_put_wrbuffer_cap_refs(ci, 1, snapc);
ceph_put_snap_context(snapc);
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 7a0dd99..c0edf2c 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -3553,11 +3553,12 @@ static int cifs_release_page(struct page *page, gfp_t gfp)
return cifs_fscache_release_page(page, gfp);
}
-static void cifs_invalidate_page(struct page *page, unsigned long offset)
+static void cifs_invalidate_page(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct cifsInodeInfo *cifsi = CIFS_I(page->mapping->host);
- if (offset == 0)
+ if (offset == 0 && length == PAGE_CACHE_SIZE)
cifs_fscache_invalidate_page(page, &cifsi->vfs_inode);
}
diff --git a/fs/exofs/inode.c b/fs/exofs/inode.c
index d1f80ab..2ec8eb1 100644
--- a/fs/exofs/inode.c
+++ b/fs/exofs/inode.c
@@ -953,9 +953,11 @@ static int exofs_releasepage(struct page *page, gfp_t gfp)
return 0;
}
-static void exofs_invalidatepage(struct page *page, unsigned long offset)
+static void exofs_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
- EXOFS_DBGMSG("page 0x%lx offset 0x%lx\n", page->index, offset);
+ EXOFS_DBGMSG("page 0x%lx offset 0x%x length 0x%x\n",
+ page->index, offset, length);
WARN_ON(1);
}
diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index d512c4b..349d4ce 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -1823,7 +1823,8 @@ ext3_readpages(struct file *file, struct address_space *mapping,
return mpage_readpages(mapping, pages, nr_pages, ext3_get_block);
}
-static void ext3_invalidatepage(struct page *page, unsigned long offset)
+static void ext3_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
journal_t *journal = EXT3_JOURNAL(page->mapping->host);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d666569..2930eb7 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -131,7 +131,8 @@ static inline int ext4_begin_ordered_truncate(struct inode *inode,
new_size);
}
-static void ext4_invalidatepage(struct page *page, unsigned long offset);
+static void ext4_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length);
static int __ext4_journalled_writepage(struct page *page, unsigned int len);
static int ext4_bh_delay_or_unwritten(handle_t *handle, struct buffer_head *bh);
static int ext4_discard_partial_page_buffers_no_lock(handle_t *handle,
@@ -1605,7 +1606,7 @@ static void ext4_da_block_invalidatepages(struct mpage_da_data *mpd)
break;
BUG_ON(!PageLocked(page));
BUG_ON(PageWriteback(page));
- block_invalidatepage(page, 0);
+ block_invalidatepage(page, 0, PAGE_CACHE_SIZE);
ClearPageUptodate(page);
unlock_page(page);
}
@@ -2828,7 +2829,8 @@ static int ext4_da_write_end(struct file *file,
return ret ? ret : copied;
}
-static void ext4_da_invalidatepage(struct page *page, unsigned long offset)
+static void ext4_da_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
/*
* Drop reserved blocks
@@ -2840,7 +2842,7 @@ static void ext4_da_invalidatepage(struct page *page, unsigned long offset)
ext4_da_page_release_reservation(page, offset);
out:
- ext4_invalidatepage(page, offset);
+ ext4_invalidatepage(page, offset, length);
return;
}
@@ -2988,14 +2990,15 @@ ext4_readpages(struct file *file, struct address_space *mapping,
return mpage_readpages(mapping, pages, nr_pages, ext4_get_block);
}
-static void ext4_invalidatepage(struct page *page, unsigned long offset)
+static void ext4_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
trace_ext4_invalidatepage(page, offset);
/* No journalling happens on data buffers when this function is used */
WARN_ON(page_has_buffers(page) && buffer_jbd(page_buffers(page)));
- block_invalidatepage(page, offset);
+ block_invalidatepage(page, offset, PAGE_CACHE_SIZE - offset);
}
static int __ext4_journalled_invalidatepage(struct page *page,
@@ -3016,7 +3019,8 @@ static int __ext4_journalled_invalidatepage(struct page *page,
/* Wrapper for aops... */
static void ext4_journalled_invalidatepage(struct page *page,
- unsigned long offset)
+ unsigned int offset,
+ unsigned int length)
{
WARN_ON(__ext4_journalled_invalidatepage(page, offset) < 0);
}
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 7bd22a2..bae8b6c 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -667,7 +667,8 @@ static ssize_t f2fs_direct_IO(int rw, struct kiocb *iocb,
get_data_block_ro);
}
-static void f2fs_invalidate_data_page(struct page *page, unsigned long offset)
+static void f2fs_invalidate_data_page(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct inode *inode = page->mapping->host;
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index e275218..cd564d5 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1166,7 +1166,8 @@ static int f2fs_set_node_page_dirty(struct page *page)
return 0;
}
-static void f2fs_invalidate_node_page(struct page *page, unsigned long offset)
+static void f2fs_invalidate_node_page(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct inode *inode = page->mapping->host;
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 24f414f..37093ba 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -109,7 +109,7 @@ static int gfs2_writepage_common(struct page *page,
/* Is the page fully outside i_size? (truncate in progress) */
offset = i_size & (PAGE_CACHE_SIZE-1);
if (page->index > end_index || (page->index == end_index && !offset)) {
- page->mapping->a_ops->invalidatepage(page, 0);
+ page->mapping->a_ops->invalidatepage(page, 0, PAGE_CACHE_SIZE);
goto out;
}
return 1;
@@ -298,7 +298,8 @@ static int gfs2_write_jdata_pagevec(struct address_space *mapping,
/* Is the page fully outside i_size? (truncate in progress) */
if (page->index > end_index || (page->index == end_index && !offset)) {
- page->mapping->a_ops->invalidatepage(page, 0);
+ page->mapping->a_ops->invalidatepage(page, 0,
+ PAGE_CACHE_SIZE);
unlock_page(page);
continue;
}
@@ -942,7 +943,8 @@ static void gfs2_discard(struct gfs2_sbd *sdp, struct buffer_head *bh)
unlock_buffer(bh);
}
-static void gfs2_invalidatepage(struct page *page, unsigned long offset)
+static void gfs2_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct gfs2_sbd *sdp = GFS2_SB(page->mapping->host);
struct buffer_head *bh, *head;
diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index 6740d34..9e3aaff 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@@ -571,9 +571,10 @@ static int metapage_releasepage(struct page *page, gfp_t gfp_mask)
return ret;
}
-static void metapage_invalidatepage(struct page *page, unsigned long offset)
+static void metapage_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
- BUG_ON(offset);
+ BUG_ON(offset || length < PAGE_CACHE_SIZE);
BUG_ON(PageWriteback(page));
diff --git a/fs/logfs/file.c b/fs/logfs/file.c
index c2219a6..57914fc 100644
--- a/fs/logfs/file.c
+++ b/fs/logfs/file.c
@@ -159,7 +159,8 @@ static int logfs_writepage(struct page *page, struct writeback_control *wbc)
return __logfs_writepage(page);
}
-static void logfs_invalidatepage(struct page *page, unsigned long offset)
+static void logfs_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct logfs_block *block = logfs_block(page);
diff --git a/fs/logfs/segment.c b/fs/logfs/segment.c
index 038da09..d448a77 100644
--- a/fs/logfs/segment.c
+++ b/fs/logfs/segment.c
@@ -884,7 +884,8 @@ static struct logfs_area *alloc_area(struct super_block *sb)
return area;
}
-static void map_invalidatepage(struct page *page, unsigned long l)
+static void map_invalidatepage(struct page *page, unsigned int o,
+ unsigned int l)
{
return;
}
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 29f4a48..901422a 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -451,11 +451,13 @@ static int nfs_write_end(struct file *file, struct address_space *mapping,
* - Called if either PG_private or PG_fscache is set on the page
* - Caller holds page lock
*/
-static void nfs_invalidate_page(struct page *page, unsigned long offset)
+static void nfs_invalidate_page(struct page *page, unsigned int offset,
+ unsigned int length)
{
- dfprintk(PAGECACHE, "NFS: invalidate_page(%p, %lu)\n", page, offset);
+ dfprintk(PAGECACHE, "NFS: invalidate_page(%p, %u, %u)\n",
+ page, offset, length);
- if (offset != 0)
+ if (offset != 0 || length < PAGE_CACHE_SIZE)
return;
/* Cancel any unstarted writes on this page */
nfs_wb_page_cancel(page_file_mapping(page)->host, page);
diff --git a/fs/ntfs/aops.c b/fs/ntfs/aops.c
index fa9c05f..d267ea6 100644
--- a/fs/ntfs/aops.c
+++ b/fs/ntfs/aops.c
@@ -1372,7 +1372,7 @@ retry_writepage:
* The page may have dirty, unmapped buffers. Make them
* freeable here, so the page does not leak.
*/
- block_invalidatepage(page, 0);
+ block_invalidatepage(page, 0, PAGE_CACHE_SIZE);
unlock_page(page);
ntfs_debug("Write outside i_size - truncated?");
return 0;
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 20dfec7..ecb86ca 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -603,7 +603,8 @@ static void ocfs2_dio_end_io(struct kiocb *iocb,
* from ext3. PageChecked() bits have been removed as OCFS2 does not
* do journalled data.
*/
-static void ocfs2_invalidatepage(struct page *page, unsigned long offset)
+static void ocfs2_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
journal_t *journal = OCFS2_SB(page->mapping->host->i_sb)->journal->j_journal;
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index ea5061f..808e02e 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -2969,7 +2969,8 @@ static int invalidatepage_can_drop(struct inode *inode, struct buffer_head *bh)
}
/* clm -- taken from fs/buffer.c:block_invalidate_page */
-static void reiserfs_invalidatepage(struct page *page, unsigned long offset)
+static void reiserfs_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct buffer_head *head, *bh, *next;
struct inode *inode = page->mapping->host;
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index f12189d..97c7a81 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1276,13 +1276,14 @@ int ubifs_setattr(struct dentry *dentry, struct iattr *attr)
return err;
}
-static void ubifs_invalidatepage(struct page *page, unsigned long offset)
+static void ubifs_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
struct inode *inode = page->mapping->host;
struct ubifs_info *c = inode->i_sb->s_fs_info;
ubifs_assert(PagePrivate(page));
- if (offset)
+ if (offset || length < PAGE_CACHE_SIZE)
/* Partial page remains dirty */
return;
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 5f707e5..e426796 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -823,10 +823,11 @@ xfs_cluster_write(
STATIC void
xfs_vm_invalidatepage(
struct page *page,
- unsigned long offset)
+ unsigned int offset,
+ unsigned int length)
{
trace_xfs_invalidatepage(page->mapping->host, page, offset);
- block_invalidatepage(page, offset);
+ block_invalidatepage(page, offset, PAGE_CACHE_SIZE - offset);
}
/*
@@ -890,7 +891,7 @@ next_buffer:
xfs_iunlock(ip, XFS_ILOCK_EXCL);
out_invalidate:
- xfs_vm_invalidatepage(page, 0);
+ xfs_vm_invalidatepage(page, 0, PAGE_CACHE_SIZE);
return;
}
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 33c0f81..22ff048 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -197,7 +197,8 @@ extern int buffer_heads_over_limit;
* Generic address_space_operations implementations for buffer_head-backed
* address_spaces.
*/
-void block_invalidatepage(struct page *page, unsigned long offset);
+void block_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length);
int block_write_full_page(struct page *page, get_block_t *get_block,
struct writeback_control *wbc);
int block_write_full_page_endio(struct page *page, get_block_t *get_block,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2c28271..03269b4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -364,7 +364,7 @@ struct address_space_operations {
/* Unfortunately this kludge is needed for FIBMAP. Don't use it */
sector_t (*bmap)(struct address_space *, sector_t);
- void (*invalidatepage) (struct page *, unsigned long);
+ void (*invalidatepage) (struct page *, unsigned int, unsigned int);
int (*releasepage) (struct page *, gfp_t);
void (*freepage)(struct page *);
ssize_t (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index e19ff30..1ee3440 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1031,7 +1031,8 @@ int get_kernel_page(unsigned long start, int write, struct page **pages);
struct page *get_dump_page(unsigned long addr);
extern int try_to_release_page(struct page * page, gfp_t gfp_mask);
-extern void do_invalidatepage(struct page *page, unsigned long offset);
+extern void do_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length);
int __set_page_dirty_nobuffers(struct page *page);
int __set_page_dirty_no_writeback(struct page *page);
diff --git a/mm/readahead.c b/mm/readahead.c
index 7963f23..3565e53 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -48,7 +48,7 @@ static void read_cache_pages_invalidate_page(struct address_space *mapping,
if (!trylock_page(page))
BUG();
page->mapping = mapping;
- do_invalidatepage(page, 0);
+ do_invalidatepage(page, 0, PAGE_CACHE_SIZE);
page->mapping = NULL;
unlock_page(page);
}
diff --git a/mm/truncate.c b/mm/truncate.c
index c75b736..fdba083 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -26,7 +26,8 @@
/**
* do_invalidatepage - invalidate part or all of a page
* @page: the page which is affected
- * @offset: the index of the truncation point
+ * @offset: start of the range to invalidate
+ * @length: length of the range to invalidate
*
* do_invalidatepage() is called when all or part of the page has become
* invalidated by a truncate operation.
@@ -37,16 +38,18 @@
* point. Because the caller is about to free (and possibly reuse) those
* blocks on-disk.
*/
-void do_invalidatepage(struct page *page, unsigned long offset)
+void do_invalidatepage(struct page *page, unsigned int offset,
+ unsigned int length)
{
- void (*invalidatepage)(struct page *, unsigned long);
+ void (*invalidatepage)(struct page *, unsigned int, unsigned int);
+
invalidatepage = page->mapping->a_ops->invalidatepage;
#ifdef CONFIG_BLOCK
if (!invalidatepage)
invalidatepage = block_invalidatepage;
#endif
if (invalidatepage)
- (*invalidatepage)(page, offset);
+ (*invalidatepage)(page, offset, length);
}
static inline void truncate_partial_page(struct page *page, unsigned partial)
@@ -54,7 +57,7 @@ static inline void truncate_partial_page(struct page *page, unsigned partial)
zero_user_segment(page, partial, PAGE_CACHE_SIZE);
cleancache_invalidate_page(page->mapping, page);
if (page_has_private(page))
- do_invalidatepage(page, partial);
+ do_invalidatepage(page, partial, PAGE_CACHE_SIZE - partial);
}
/*
@@ -103,7 +106,7 @@ truncate_complete_page(struct address_space *mapping, struct page *page)
return -EIO;
if (page_has_private(page))
- do_invalidatepage(page, 0);
+ do_invalidatepage(page, 0, PAGE_CACHE_SIZE);
cancel_dirty_page(page, PAGE_CACHE_SIZE);
--
1.7.7.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
->invalidatepage() aop now accepts range to invalidate so we can make
use of it in all ext4 invalidatepage routines.
Signed-off-by: Lukas Czerner <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/ext4/inode.c | 30 +++++++++++++++++++-----------
include/trace/events/ext4.h | 22 ++++++++++++----------
2 files changed, 31 insertions(+), 21 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 96d5927..ae58749 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1415,21 +1415,28 @@ static void ext4_da_release_space(struct inode *inode, int to_free)
}
static void ext4_da_page_release_reservation(struct page *page,
- unsigned long offset)
+ unsigned int offset,
+ unsigned int length)
{
int to_release = 0;
struct buffer_head *head, *bh;
unsigned int curr_off = 0;
struct inode *inode = page->mapping->host;
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+ unsigned int stop = offset + length;
int num_clusters;
ext4_fsblk_t lblk;
+ BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
+
head = page_buffers(page);
bh = head;
do {
unsigned int next_off = curr_off + bh->b_size;
+ if (next_off > stop)
+ break;
+
if ((offset <= curr_off) && (buffer_delay(bh))) {
to_release++;
clear_buffer_delay(bh);
@@ -2839,7 +2846,7 @@ static void ext4_da_invalidatepage(struct page *page, unsigned int offset,
if (!page_has_buffers(page))
goto out;
- ext4_da_page_release_reservation(page, offset);
+ ext4_da_page_release_reservation(page, offset, length);
out:
ext4_invalidatepage(page, offset, length);
@@ -2993,29 +3000,29 @@ ext4_readpages(struct file *file, struct address_space *mapping,
static void ext4_invalidatepage(struct page *page, unsigned int offset,
unsigned int length)
{
- trace_ext4_invalidatepage(page, offset);
+ trace_ext4_invalidatepage(page, offset, length);
/* No journalling happens on data buffers when this function is used */
WARN_ON(page_has_buffers(page) && buffer_jbd(page_buffers(page)));
- block_invalidatepage(page, offset, PAGE_CACHE_SIZE - offset);
+ block_invalidatepage(page, offset, length);
}
static int __ext4_journalled_invalidatepage(struct page *page,
- unsigned long offset)
+ unsigned int offset,
+ unsigned int length)
{
journal_t *journal = EXT4_JOURNAL(page->mapping->host);
- trace_ext4_journalled_invalidatepage(page, offset);
+ trace_ext4_journalled_invalidatepage(page, offset, length);
/*
* If it's a full truncate we just forget about the pending dirtying
*/
- if (offset == 0)
+ if (offset == 0 && length == PAGE_CACHE_SIZE)
ClearPageChecked(page);
- return jbd2_journal_invalidatepage(journal, page, offset,
- PAGE_CACHE_SIZE - offset);
+ return jbd2_journal_invalidatepage(journal, page, offset, length);
}
/* Wrapper for aops... */
@@ -3023,7 +3030,7 @@ static void ext4_journalled_invalidatepage(struct page *page,
unsigned int offset,
unsigned int length)
{
- WARN_ON(__ext4_journalled_invalidatepage(page, offset) < 0);
+ WARN_ON(__ext4_journalled_invalidatepage(page, offset, length) < 0);
}
static int ext4_releasepage(struct page *page, gfp_t wait)
@@ -4627,7 +4634,8 @@ static void ext4_wait_for_tail_page_commit(struct inode *inode)
inode->i_size >> PAGE_CACHE_SHIFT);
if (!page)
return;
- ret = __ext4_journalled_invalidatepage(page, offset);
+ ret = __ext4_journalled_invalidatepage(page, offset,
+ PAGE_CACHE_SIZE - offset);
unlock_page(page);
page_cache_release(page);
if (ret != -EBUSY)
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 8ee15b9..dcfce96 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -444,16 +444,16 @@ DEFINE_EVENT(ext4__page_op, ext4_releasepage,
);
DECLARE_EVENT_CLASS(ext4_invalidatepage_op,
- TP_PROTO(struct page *page, unsigned long offset),
+ TP_PROTO(struct page *page, unsigned int offset, unsigned int length),
- TP_ARGS(page, offset),
+ TP_ARGS(page, offset, length),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
__field( pgoff_t, index )
- __field( unsigned long, offset )
-
+ __field( unsigned int, offset )
+ __field( unsigned int, length )
),
TP_fast_assign(
@@ -461,24 +461,26 @@ DECLARE_EVENT_CLASS(ext4_invalidatepage_op,
__entry->ino = page->mapping->host->i_ino;
__entry->index = page->index;
__entry->offset = offset;
+ __entry->length = length;
),
- TP_printk("dev %d,%d ino %lu page_index %lu offset %lu",
+ TP_printk("dev %d,%d ino %lu page_index %lu offset %u length %u",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
- (unsigned long) __entry->index, __entry->offset)
+ (unsigned long) __entry->index,
+ __entry->offset, __entry->length)
);
DEFINE_EVENT(ext4_invalidatepage_op, ext4_invalidatepage,
- TP_PROTO(struct page *page, unsigned long offset),
+ TP_PROTO(struct page *page, unsigned int offset, unsigned int length),
- TP_ARGS(page, offset)
+ TP_ARGS(page, offset, length)
);
DEFINE_EVENT(ext4_invalidatepage_op, ext4_journalled_invalidatepage,
- TP_PROTO(struct page *page, unsigned long offset),
+ TP_PROTO(struct page *page, unsigned int offset, unsigned int length),
- TP_ARGS(page, offset)
+ TP_ARGS(page, offset, length)
);
TRACE_EVENT(ext4_discard_blocks,
--
1.7.7.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
invalidatepage now accepts range to invalidate and there are two file
system using jbd2 also implementing punch hole feature which can benefit
from this. We need to implement the same thing for jbd2 layer in order to
allow those file system take benefit of this functionality.
This commit adds length argument to the jbd2_journal_invalidatepage()
and updates all instances in ext4 and ocfs2.
Signed-off-by: Lukas Czerner <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/ext4/inode.c | 3 ++-
fs/jbd2/transaction.c | 24 +++++++++++++++++-------
fs/ocfs2/aops.c | 3 ++-
include/linux/jbd2.h | 2 +-
4 files changed, 22 insertions(+), 10 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 2930eb7..96d5927 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3014,7 +3014,8 @@ static int __ext4_journalled_invalidatepage(struct page *page,
if (offset == 0)
ClearPageChecked(page);
- return jbd2_journal_invalidatepage(journal, page, offset);
+ return jbd2_journal_invalidatepage(journal, page, offset,
+ PAGE_CACHE_SIZE - offset);
}
/* Wrapper for aops... */
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 10f524c..5d8268a 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -2034,18 +2034,23 @@ zap_buffer_unlocked:
* void jbd2_journal_invalidatepage()
* @journal: journal to use for flush...
* @page: page to flush
- * @offset: length of page to invalidate.
+ * @offset: start of the range to invalidate
+ * @length: length of the range to invalidate
*
- * Reap page buffers containing data after offset in page. Can return -EBUSY
- * if buffers are part of the committing transaction and the page is straddling
- * i_size. Caller then has to wait for current commit and try again.
+ * Reap page buffers containing data after in the specified range in page.
+ * Can return -EBUSY if buffers are part of the committing transaction and
+ * the page is straddling i_size. Caller then has to wait for current commit
+ * and try again.
*/
int jbd2_journal_invalidatepage(journal_t *journal,
struct page *page,
- unsigned long offset)
+ unsigned int offset,
+ unsigned int length)
{
struct buffer_head *head, *bh, *next;
+ unsigned int stop = offset + length;
unsigned int curr_off = 0;
+ int partial_page = (offset || length < PAGE_CACHE_SIZE);
int may_free = 1;
int ret = 0;
@@ -2054,6 +2059,8 @@ int jbd2_journal_invalidatepage(journal_t *journal,
if (!page_has_buffers(page))
return 0;
+ BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
+
/* We will potentially be playing with lists other than just the
* data lists (especially for journaled data mode), so be
* cautious in our locking. */
@@ -2063,10 +2070,13 @@ int jbd2_journal_invalidatepage(journal_t *journal,
unsigned int next_off = curr_off + bh->b_size;
next = bh->b_this_page;
+ if (next_off > stop)
+ return 0;
+
if (offset <= curr_off) {
/* This block is wholly outside the truncation point */
lock_buffer(bh);
- ret = journal_unmap_buffer(journal, bh, offset > 0);
+ ret = journal_unmap_buffer(journal, bh, partial_page);
unlock_buffer(bh);
if (ret < 0)
return ret;
@@ -2077,7 +2087,7 @@ int jbd2_journal_invalidatepage(journal_t *journal,
} while (bh != head);
- if (!offset) {
+ if (!partial_page) {
if (may_free && try_to_free_buffers(page))
J_ASSERT(!page_has_buffers(page));
}
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index ecb86ca..7c47755 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -608,7 +608,8 @@ static void ocfs2_invalidatepage(struct page *page, unsigned int offset,
{
journal_t *journal = OCFS2_SB(page->mapping->host->i_sb)->journal->j_journal;
- jbd2_journal_invalidatepage(journal, page, offset);
+ jbd2_journal_invalidatepage(journal, page, offset,
+ PAGE_CACHE_SIZE - offset);
}
static int ocfs2_releasepage(struct page *page, gfp_t wait)
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 6e051f4..682a63c 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1090,7 +1090,7 @@ extern int jbd2_journal_dirty_metadata (handle_t *, struct buffer_head *);
extern int jbd2_journal_forget (handle_t *, struct buffer_head *);
extern void journal_sync_buffer (struct buffer_head *);
extern int jbd2_journal_invalidatepage(journal_t *,
- struct page *, unsigned long);
+ struct page *, unsigned int, unsigned int);
extern int jbd2_journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
extern int jbd2_journal_stop(handle_t *);
extern int jbd2_journal_flush (journal_t *);
--
1.7.7.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
->invalidatepage() aop now accepts range to invalidate so we can make
use of it in journal_invalidatepage() and all the users in ext3 file
system. Also update ext3 trace point to print out length argument.
Signed-off-by: Lukas Czerner <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/ext3/inode.c | 6 +++---
fs/jbd/transaction.c | 19 ++++++++++++++-----
include/linux/jbd.h | 2 +-
include/trace/events/ext3.h | 12 +++++++-----
4 files changed, 25 insertions(+), 14 deletions(-)
diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index 349d4ce..b12936b 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -1828,15 +1828,15 @@ static void ext3_invalidatepage(struct page *page, unsigned int offset,
{
journal_t *journal = EXT3_JOURNAL(page->mapping->host);
- trace_ext3_invalidatepage(page, offset);
+ trace_ext3_invalidatepage(page, offset, length);
/*
* If it's a full truncate we just forget about the pending dirtying
*/
- if (offset == 0)
+ if (offset == 0 && length == PAGE_CACHE_SIZE)
ClearPageChecked(page);
- journal_invalidatepage(journal, page, offset);
+ journal_invalidatepage(journal, page, offset, length);
}
static int ext3_releasepage(struct page *page, gfp_t wait)
diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index 071d690..a1fef89 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -2020,16 +2020,20 @@ zap_buffer_unlocked:
* void journal_invalidatepage() - invalidate a journal page
* @journal: journal to use for flush
* @page: page to flush
- * @offset: length of page to invalidate.
+ * @offset: offset of the range to invalidate
+ * @length: length of the range to invalidate
*
- * Reap page buffers containing data after offset in page.
+ * Reap page buffers containing data in specified range in page.
*/
void journal_invalidatepage(journal_t *journal,
struct page *page,
- unsigned long offset)
+ unsigned int offset,
+ unsigned int length)
{
struct buffer_head *head, *bh, *next;
+ unsigned int stop = offset + length;
unsigned int curr_off = 0;
+ int partial_page = (offset || length < PAGE_CACHE_SIZE);
int may_free = 1;
if (!PageLocked(page))
@@ -2037,6 +2041,8 @@ void journal_invalidatepage(journal_t *journal,
if (!page_has_buffers(page))
return;
+ BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
+
/* We will potentially be playing with lists other than just the
* data lists (especially for journaled data mode), so be
* cautious in our locking. */
@@ -2046,11 +2052,14 @@ void journal_invalidatepage(journal_t *journal,
unsigned int next_off = curr_off + bh->b_size;
next = bh->b_this_page;
+ if (next_off > stop)
+ return;
+
if (offset <= curr_off) {
/* This block is wholly outside the truncation point */
lock_buffer(bh);
may_free &= journal_unmap_buffer(journal, bh,
- offset > 0);
+ partial_page);
unlock_buffer(bh);
}
curr_off = next_off;
@@ -2058,7 +2067,7 @@ void journal_invalidatepage(journal_t *journal,
} while (bh != head);
- if (!offset) {
+ if (!partial_page) {
if (may_free && try_to_free_buffers(page))
J_ASSERT(!page_has_buffers(page));
}
diff --git a/include/linux/jbd.h b/include/linux/jbd.h
index c8f3297..d02e16c 100644
--- a/include/linux/jbd.h
+++ b/include/linux/jbd.h
@@ -840,7 +840,7 @@ extern void journal_release_buffer (handle_t *, struct buffer_head *);
extern int journal_forget (handle_t *, struct buffer_head *);
extern void journal_sync_buffer (struct buffer_head *);
extern void journal_invalidatepage(journal_t *,
- struct page *, unsigned long);
+ struct page *, unsigned int, unsigned int);
extern int journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
extern int journal_stop(handle_t *);
extern int journal_flush (journal_t *);
diff --git a/include/trace/events/ext3.h b/include/trace/events/ext3.h
index 15d11a3..6797b9d 100644
--- a/include/trace/events/ext3.h
+++ b/include/trace/events/ext3.h
@@ -290,13 +290,14 @@ DEFINE_EVENT(ext3__page_op, ext3_releasepage,
);
TRACE_EVENT(ext3_invalidatepage,
- TP_PROTO(struct page *page, unsigned long offset),
+ TP_PROTO(struct page *page, unsigned int offset, unsigned int length),
- TP_ARGS(page, offset),
+ TP_ARGS(page, offset, length),
TP_STRUCT__entry(
__field( pgoff_t, index )
- __field( unsigned long, offset )
+ __field( unsigned int, offset )
+ __field( unsigned int, length )
__field( ino_t, ino )
__field( dev_t, dev )
@@ -305,14 +306,15 @@ TRACE_EVENT(ext3_invalidatepage,
TP_fast_assign(
__entry->index = page->index;
__entry->offset = offset;
+ __entry->length = length;
__entry->ino = page->mapping->host->i_ino;
__entry->dev = page->mapping->host->i_sb->s_dev;
),
- TP_printk("dev %d,%d ino %lu page_index %lu offset %lu",
+ TP_printk("dev %d,%d ino %lu page_index %lu offset %u length %u",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
- __entry->index, __entry->offset)
+ __entry->index, __entry->offset, __entry->length)
);
TRACE_EVENT(ext3_discard_blocks,
--
1.7.7.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
->invalidatepage() aop now accepts range to invalidate so we can make
use of it in xfs_vm_invalidatepage()
Signed-off-by: Lukas Czerner <[email protected]>
Reviewed-by: Ben Myers <[email protected]>
Cc: [email protected]
---
v4: use xfs_page_class instead of separate tracepoint
fs/xfs/xfs_aops.c | 9 +++++----
fs/xfs/xfs_trace.h | 15 ++++++++++-----
2 files changed, 15 insertions(+), 9 deletions(-)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index e426796..55c85ec 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -826,8 +826,9 @@ xfs_vm_invalidatepage(
unsigned int offset,
unsigned int length)
{
- trace_xfs_invalidatepage(page->mapping->host, page, offset);
- block_invalidatepage(page, offset, PAGE_CACHE_SIZE - offset);
+ trace_xfs_invalidatepage(page->mapping->host, page, offset,
+ length);
+ block_invalidatepage(page, offset, length);
}
/*
@@ -921,7 +922,7 @@ xfs_vm_writepage(
int count = 0;
int nonblocking = 0;
- trace_xfs_writepage(inode, page, 0);
+ trace_xfs_writepage(inode, page, 0, 0);
ASSERT(page_has_buffers(page));
@@ -1152,7 +1153,7 @@ xfs_vm_releasepage(
{
int delalloc, unwritten;
- trace_xfs_releasepage(page->mapping->host, page, 0);
+ trace_xfs_releasepage(page->mapping->host, page, 0, 0);
xfs_count_page_state(page, &delalloc, &unwritten);
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 16a8129..7f075ed 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -950,14 +950,16 @@ DEFINE_RW_EVENT(xfs_file_splice_read);
DEFINE_RW_EVENT(xfs_file_splice_write);
DECLARE_EVENT_CLASS(xfs_page_class,
- TP_PROTO(struct inode *inode, struct page *page, unsigned long off),
- TP_ARGS(inode, page, off),
+ TP_PROTO(struct inode *inode, struct page *page, unsigned long off,
+ unsigned int len),
+ TP_ARGS(inode, page, off, len),
TP_STRUCT__entry(
__field(dev_t, dev)
__field(xfs_ino_t, ino)
__field(pgoff_t, pgoff)
__field(loff_t, size)
__field(unsigned long, offset)
+ __field(unsigned int, length)
__field(int, delalloc)
__field(int, unwritten)
),
@@ -971,24 +973,27 @@ DECLARE_EVENT_CLASS(xfs_page_class,
__entry->pgoff = page_offset(page);
__entry->size = i_size_read(inode);
__entry->offset = off;
+ __entry->length = len;
__entry->delalloc = delalloc;
__entry->unwritten = unwritten;
),
TP_printk("dev %d:%d ino 0x%llx pgoff 0x%lx size 0x%llx offset %lx "
- "delalloc %d unwritten %d",
+ "length %x delalloc %d unwritten %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->ino,
__entry->pgoff,
__entry->size,
__entry->offset,
+ __entry->length,
__entry->delalloc,
__entry->unwritten)
)
#define DEFINE_PAGE_EVENT(name) \
DEFINE_EVENT(xfs_page_class, name, \
- TP_PROTO(struct inode *inode, struct page *page, unsigned long off), \
- TP_ARGS(inode, page, off))
+ TP_PROTO(struct inode *inode, struct page *page, unsigned long off, \
+ unsigned int len), \
+ TP_ARGS(inode, page, off, len))
DEFINE_PAGE_EVENT(xfs_writepage);
DEFINE_PAGE_EVENT(xfs_releasepage);
DEFINE_PAGE_EVENT(xfs_invalidatepage);
--
1.7.7.6
_______________________________________________
xfs mailing list
[email protected]
http://oss.sgi.com/mailman/listinfo/xfs
->invalidatepage() aop now accepts range to invalidate so we can make
use of it in ocfs2_invalidatepage().
Signed-off-by: Lukas Czerner <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Acked-by: Joel Becker <[email protected]>
---
fs/ocfs2/aops.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 7c47755..79736a2 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -608,8 +608,7 @@ static void ocfs2_invalidatepage(struct page *page, unsigned int offset,
{
journal_t *journal = OCFS2_SB(page->mapping->host->i_sb)->journal->j_journal;
- jbd2_journal_invalidatepage(journal, page, offset,
- PAGE_CACHE_SIZE - offset);
+ jbd2_journal_invalidatepage(journal, page, offset, length);
}
static int ocfs2_releasepage(struct page *page, gfp_t wait)
--
1.7.7.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
->invalidatepage() aop now accepts range to invalidate so we can make
use of it in ceph_invalidatepage().
Signed-off-by: Lukas Czerner <[email protected]>
Acked-by: Sage Weil <[email protected]>
Cc: [email protected]
---
fs/ceph/addr.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 168a35a..d953afd 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -164,20 +164,20 @@ static void ceph_invalidatepage(struct page *page, unsigned int offset,
if (!PageDirty(page))
pr_err("%p invalidatepage %p page not dirty\n", inode, page);
- if (offset == 0)
+ if (offset == 0 && length == PAGE_CACHE_SIZE)
ClearPageChecked(page);
ci = ceph_inode(inode);
- if (offset == 0) {
- dout("%p invalidatepage %p idx %lu full dirty page %u\n",
- inode, page, page->index, offset);
+ if (offset == 0 && length == PAGE_CACHE_SIZE) {
+ dout("%p invalidatepage %p idx %lu full dirty page\n",
+ inode, page, page->index);
ceph_put_wrbuffer_cap_refs(ci, 1, snapc);
ceph_put_snap_context(snapc);
page->private = 0;
ClearPagePrivate(page);
} else {
- dout("%p invalidatepage %p idx %lu partial dirty page\n",
- inode, page, page->index);
+ dout("%p invalidatepage %p idx %lu partial dirty page %u(%u)\n",
+ inode, page, page->index, offset, length);
}
}
--
1.7.7.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
->invalidatepage() aop now accepts range to invalidate so we can make
use of it in reiserfs_invalidatepage()
Signed-off-by: Lukas Czerner <[email protected]>
Cc: [email protected]
---
fs/reiserfs/inode.c | 9 +++++++--
1 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index 808e02e..e963164 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -2975,11 +2975,13 @@ static void reiserfs_invalidatepage(struct page *page, unsigned int offset,
struct buffer_head *head, *bh, *next;
struct inode *inode = page->mapping->host;
unsigned int curr_off = 0;
+ unsigned int stop = offset + length;
+ int partial_page = (offset || length < PAGE_CACHE_SIZE);
int ret = 1;
BUG_ON(!PageLocked(page));
- if (offset == 0)
+ if (!partial_page)
ClearPageChecked(page);
if (!page_has_buffers(page))
@@ -2991,6 +2993,9 @@ static void reiserfs_invalidatepage(struct page *page, unsigned int offset,
unsigned int next_off = curr_off + bh->b_size;
next = bh->b_this_page;
+ if (next_off > stop)
+ goto out;
+
/*
* is this block fully invalidated?
*/
@@ -3009,7 +3014,7 @@ static void reiserfs_invalidatepage(struct page *page, unsigned int offset,
* The get_block cached value has been unconditionally invalidated,
* so real IO is not possible anymore.
*/
- if (!offset && ret) {
+ if (!partial_page && ret) {
ret = try_to_release_page(page, 0);
/* maybe should BUG_ON(!ret); - neilb */
}
--
1.7.7.6
->invalidatepage() aop now accepts range to invalidate so we can make
use of it in gfs2_invalidatepage().
Signed-off-by: Lukas Czerner <[email protected]>
Acked-by: Steven Whitehouse <[email protected]>
Cc: [email protected]
---
fs/gfs2/aops.c | 9 +++++++--
1 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 37093ba..ea920bf 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -947,24 +947,29 @@ static void gfs2_invalidatepage(struct page *page, unsigned int offset,
unsigned int length)
{
struct gfs2_sbd *sdp = GFS2_SB(page->mapping->host);
+ unsigned int stop = offset + length;
+ int partial_page = (offset || length < PAGE_CACHE_SIZE);
struct buffer_head *bh, *head;
unsigned long pos = 0;
BUG_ON(!PageLocked(page));
- if (offset == 0)
+ if (!partial_page)
ClearPageChecked(page);
if (!page_has_buffers(page))
goto out;
bh = head = page_buffers(page);
do {
+ if (pos + bh->b_size > stop)
+ return;
+
if (offset <= pos)
gfs2_discard(sdp, bh);
pos += bh->b_size;
bh = bh->b_this_page;
} while (bh != head);
out:
- if (offset == 0)
+ if (!partial_page)
try_to_release_page(page, 0);
}
--
1.7.7.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
This commit changes truncate_inode_pages_range() so it can handle non
page aligned regions of the truncate. Currently we can hit BUG_ON when
the end of the range is not page aligned, but we can handle unaligned
start of the range.
Being able to handle non page aligned regions of the page can help file
system punch_hole implementations and save some work, because once we're
holding the page we might as well deal with it right away.
In previous commits we've changed ->invalidatepage() prototype to accept
'length' argument to be able to specify range to invalidate. No we can
use that new ability in truncate_inode_pages_range().
Signed-off-by: Lukas Czerner <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Hugh Dickins <[email protected]>
---
mm/truncate.c | 104 ++++++++++++++++++++++++++++++++++++++++-----------------
1 files changed, 73 insertions(+), 31 deletions(-)
diff --git a/mm/truncate.c b/mm/truncate.c
index fdba083..e2e8a8a 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -52,14 +52,6 @@ void do_invalidatepage(struct page *page, unsigned int offset,
(*invalidatepage)(page, offset, length);
}
-static inline void truncate_partial_page(struct page *page, unsigned partial)
-{
- zero_user_segment(page, partial, PAGE_CACHE_SIZE);
- cleancache_invalidate_page(page->mapping, page);
- if (page_has_private(page))
- do_invalidatepage(page, partial, PAGE_CACHE_SIZE - partial);
-}
-
/*
* This cancels just the dirty bit on the kernel page itself, it
* does NOT actually remove dirty bits on any mmap's that may be
@@ -188,11 +180,11 @@ int invalidate_inode_page(struct page *page)
* truncate_inode_pages_range - truncate range of pages specified by start & end byte offsets
* @mapping: mapping to truncate
* @lstart: offset from which to truncate
- * @lend: offset to which to truncate
+ * @lend: offset to which to truncate (inclusive)
*
* Truncate the page cache, removing the pages that are between
- * specified offsets (and zeroing out partial page
- * (if lstart is not page aligned)).
+ * specified offsets (and zeroing out partial pages
+ * if lstart or lend + 1 is not page aligned).
*
* Truncate takes two passes - the first pass is nonblocking. It will not
* block on page locks and it will not block on writeback. The second pass
@@ -203,35 +195,58 @@ int invalidate_inode_page(struct page *page)
* We pass down the cache-hot hint to the page freeing code. Even if the
* mapping is large, it is probably the case that the final pages are the most
* recently touched, and freeing happens in ascending file offset order.
+ *
+ * Note that since ->invalidatepage() accepts range to invalidate
+ * truncate_inode_pages_range is able to handle cases where lend + 1 is not
+ * page aligned properly.
*/
void truncate_inode_pages_range(struct address_space *mapping,
loff_t lstart, loff_t lend)
{
- const pgoff_t start = (lstart + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
- const unsigned partial = lstart & (PAGE_CACHE_SIZE - 1);
- struct pagevec pvec;
- pgoff_t index;
- pgoff_t end;
- int i;
+ pgoff_t start; /* inclusive */
+ pgoff_t end; /* exclusive */
+ unsigned int partial_start; /* inclusive */
+ unsigned int partial_end; /* exclusive */
+ struct pagevec pvec;
+ pgoff_t index;
+ int i;
cleancache_invalidate_inode(mapping);
if (mapping->nrpages == 0)
return;
- BUG_ON((lend & (PAGE_CACHE_SIZE - 1)) != (PAGE_CACHE_SIZE - 1));
- end = (lend >> PAGE_CACHE_SHIFT);
+ /* Offsets within partial pages */
+ partial_start = lstart & (PAGE_CACHE_SIZE - 1);
+ partial_end = (lend + 1) & (PAGE_CACHE_SIZE - 1);
+
+ /*
+ * 'start' and 'end' always covers the range of pages to be fully
+ * truncated. Partial pages are covered with 'partial_start' at the
+ * start of the range and 'partial_end' at the end of the range.
+ * Note that 'end' is exclusive while 'lend' is inclusive.
+ */
+ start = (lstart + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
+ if (lend == -1)
+ /*
+ * lend == -1 indicates end-of-file so we have to set 'end'
+ * to the highest possible pgoff_t and since the type is
+ * unsigned we're using -1.
+ */
+ end = -1;
+ else
+ end = (lend + 1) >> PAGE_CACHE_SHIFT;
pagevec_init(&pvec, 0);
index = start;
- while (index <= end && pagevec_lookup(&pvec, mapping, index,
- min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
+ while (index < end && pagevec_lookup(&pvec, mapping, index,
+ min(end - index, (pgoff_t)PAGEVEC_SIZE))) {
mem_cgroup_uncharge_start();
for (i = 0; i < pagevec_count(&pvec); i++) {
struct page *page = pvec.pages[i];
/* We rely upon deletion not changing page->index */
index = page->index;
- if (index > end)
+ if (index >= end)
break;
if (!trylock_page(page))
@@ -250,27 +265,56 @@ void truncate_inode_pages_range(struct address_space *mapping,
index++;
}
- if (partial) {
+ if (partial_start) {
struct page *page = find_lock_page(mapping, start - 1);
if (page) {
+ unsigned int top = PAGE_CACHE_SIZE;
+ if (start > end) {
+ /* Truncation within a single page */
+ top = partial_end;
+ partial_end = 0;
+ }
wait_on_page_writeback(page);
- truncate_partial_page(page, partial);
+ zero_user_segment(page, partial_start, top);
+ cleancache_invalidate_page(mapping, page);
+ if (page_has_private(page))
+ do_invalidatepage(page, partial_start,
+ top - partial_start);
unlock_page(page);
page_cache_release(page);
}
}
+ if (partial_end) {
+ struct page *page = find_lock_page(mapping, end);
+ if (page) {
+ wait_on_page_writeback(page);
+ zero_user_segment(page, 0, partial_end);
+ cleancache_invalidate_page(mapping, page);
+ if (page_has_private(page))
+ do_invalidatepage(page, 0,
+ partial_end);
+ unlock_page(page);
+ page_cache_release(page);
+ }
+ }
+ /*
+ * If the truncation happened within a single page no pages
+ * will be released, just zeroed, so we can bail out now.
+ */
+ if (start >= end)
+ return;
index = start;
for ( ; ; ) {
cond_resched();
if (!pagevec_lookup(&pvec, mapping, index,
- min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
+ min(end - index, (pgoff_t)PAGEVEC_SIZE))) {
if (index == start)
break;
index = start;
continue;
}
- if (index == start && pvec.pages[0]->index > end) {
+ if (index == start && pvec.pages[0]->index >= end) {
pagevec_release(&pvec);
break;
}
@@ -280,7 +324,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
/* We rely upon deletion not changing page->index */
index = page->index;
- if (index > end)
+ if (index >= end)
break;
lock_page(page);
@@ -601,10 +645,8 @@ void truncate_pagecache_range(struct inode *inode, loff_t lstart, loff_t lend)
* This rounding is currently just for example: unmap_mapping_range
* expands its hole outwards, whereas we want it to contract the hole
* inwards. However, existing callers of truncate_pagecache_range are
- * doing their own page rounding first; and truncate_inode_pages_range
- * currently BUGs if lend is not pagealigned-1 (it handles partial
- * page at start of hole, but not partial page at end of hole). Note
- * unmap_mapping_range allows holelen 0 for all, and we allow lend -1.
+ * doing their own page rounding first. Note that unmap_mapping_range
+ * allows holelen 0 for all, and we allow lend -1 for end of file.
*/
/*
--
1.7.7.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
This reverts commit ccb4d7af914e0fe9b2f1022f8ea6c300463fd5e6.
This commit reintroduces functions ext4_block_truncate_page() and
ext4_block_zero_page_range() which has been previously removed in favour
of ext4_discard_partial_page_buffers().
In future commits we want to reintroduce those function and remove
ext4_discard_partial_page_buffers() since it is duplicating some code
and also partially duplicating work of truncate_pagecache_range(),
moreover the old implementation was much clearer.
Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/ext4.h | 4 ++
fs/ext4/inode.c | 120 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 124 insertions(+), 0 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 5aae3d1..9f9719f 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2096,6 +2096,10 @@ extern int ext4_alloc_da_blocks(struct inode *inode);
extern void ext4_set_aops(struct inode *inode);
extern int ext4_writepage_trans_blocks(struct inode *);
extern int ext4_chunk_trans_blocks(struct inode *, int nrblocks);
+extern int ext4_block_truncate_page(handle_t *handle,
+ struct address_space *mapping, loff_t from);
+extern int ext4_block_zero_page_range(handle_t *handle,
+ struct address_space *mapping, loff_t from, loff_t length);
extern int ext4_discard_partial_page_buffers(handle_t *handle,
struct address_space *mapping, loff_t from,
loff_t length, int flags);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index ae58749..11c07e1 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3569,6 +3569,126 @@ next:
return err;
}
+/*
+ * ext4_block_truncate_page() zeroes out a mapping from file offset `from'
+ * up to the end of the block which corresponds to `from'.
+ * This required during truncate. We need to physically zero the tail end
+ * of that block so it doesn't yield old data if the file is later grown.
+ */
+int ext4_block_truncate_page(handle_t *handle,
+ struct address_space *mapping, loff_t from)
+{
+ unsigned offset = from & (PAGE_CACHE_SIZE-1);
+ unsigned length;
+ unsigned blocksize;
+ struct inode *inode = mapping->host;
+
+ blocksize = inode->i_sb->s_blocksize;
+ length = blocksize - (offset & (blocksize - 1));
+
+ return ext4_block_zero_page_range(handle, mapping, from, length);
+}
+
+/*
+ * ext4_block_zero_page_range() zeros out a mapping of length 'length'
+ * starting from file offset 'from'. The range to be zero'd must
+ * be contained with in one block. If the specified range exceeds
+ * the end of the block it will be shortened to end of the block
+ * that cooresponds to 'from'
+ */
+int ext4_block_zero_page_range(handle_t *handle,
+ struct address_space *mapping, loff_t from, loff_t length)
+{
+ ext4_fsblk_t index = from >> PAGE_CACHE_SHIFT;
+ unsigned offset = from & (PAGE_CACHE_SIZE-1);
+ unsigned blocksize, max, pos;
+ ext4_lblk_t iblock;
+ struct inode *inode = mapping->host;
+ struct buffer_head *bh;
+ struct page *page;
+ int err = 0;
+
+ page = find_or_create_page(mapping, from >> PAGE_CACHE_SHIFT,
+ mapping_gfp_mask(mapping) & ~__GFP_FS);
+ if (!page)
+ return -ENOMEM;
+
+ blocksize = inode->i_sb->s_blocksize;
+ max = blocksize - (offset & (blocksize - 1));
+
+ /*
+ * correct length if it does not fall between
+ * 'from' and the end of the block
+ */
+ if (length > max || length < 0)
+ length = max;
+
+ iblock = index << (PAGE_CACHE_SHIFT - inode->i_sb->s_blocksize_bits);
+
+ if (!page_has_buffers(page))
+ create_empty_buffers(page, blocksize, 0);
+
+ /* Find the buffer that contains "offset" */
+ bh = page_buffers(page);
+ pos = blocksize;
+ while (offset >= pos) {
+ bh = bh->b_this_page;
+ iblock++;
+ pos += blocksize;
+ }
+
+ err = 0;
+ if (buffer_freed(bh)) {
+ BUFFER_TRACE(bh, "freed: skip");
+ goto unlock;
+ }
+
+ if (!buffer_mapped(bh)) {
+ BUFFER_TRACE(bh, "unmapped");
+ ext4_get_block(inode, iblock, bh, 0);
+ /* unmapped? It's a hole - nothing to do */
+ if (!buffer_mapped(bh)) {
+ BUFFER_TRACE(bh, "still unmapped");
+ goto unlock;
+ }
+ }
+
+ /* Ok, it's mapped. Make sure it's up-to-date */
+ if (PageUptodate(page))
+ set_buffer_uptodate(bh);
+
+ if (!buffer_uptodate(bh)) {
+ err = -EIO;
+ ll_rw_block(READ, 1, &bh);
+ wait_on_buffer(bh);
+ /* Uhhuh. Read error. Complain and punt. */
+ if (!buffer_uptodate(bh))
+ goto unlock;
+ }
+
+ if (ext4_should_journal_data(inode)) {
+ BUFFER_TRACE(bh, "get write access");
+ err = ext4_journal_get_write_access(handle, bh);
+ if (err)
+ goto unlock;
+ }
+
+ zero_user(page, offset, length);
+
+ BUFFER_TRACE(bh, "zeroed end of block");
+
+ err = 0;
+ if (ext4_should_journal_data(inode)) {
+ err = ext4_handle_dirty_metadata(handle, inode, bh);
+ } else
+ mark_buffer_dirty(bh);
+
+unlock:
+ unlock_page(page);
+ page_cache_release(page);
+ return err;
+}
+
int ext4_can_truncate(struct inode *inode)
{
if (S_ISREG(inode->i_mode))
--
1.7.7.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
In data=ordered mode we should call ext4_jbd2_file_inode() so that crash
after the truncate transaction has committed does not expose stall data
in the tail of the block.
Thanks Jan Kara for pointing that out.
Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/inode.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 11c07e1..8187c3e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3680,8 +3680,11 @@ int ext4_block_zero_page_range(handle_t *handle,
err = 0;
if (ext4_should_journal_data(inode)) {
err = ext4_handle_dirty_metadata(handle, inode, bh);
- } else
+ } else {
mark_buffer_dirty(bh);
+ if (ext4_test_inode_state(inode, EXT4_STATE_ORDERED_MODE))
+ err = ext4_jbd2_file_inode(handle, inode);
+ }
unlock:
unlock_page(page);
--
1.7.7.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
The "head removal" branch in the condition is never used in any code
path in ext4 since the function only caller ext4_ext_rm_leaf() will make
sure that the extent is properly split before removing blocks. Note that
there is a bug in this branch anyway.
This commit removes the unused code completely and makes use of
ext4_error() instead of printk if dubious range is provided.
Signed-off-by: Lukas Czerner <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/ext4/extents.c | 21 ++++-----------------
1 files changed, 4 insertions(+), 17 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index bc0f191..18c3f1a 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2432,23 +2432,10 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
*partial_cluster = EXT4_B2C(sbi, pblk);
else
*partial_cluster = 0;
- } else if (from == le32_to_cpu(ex->ee_block)
- && to <= le32_to_cpu(ex->ee_block) + ee_len - 1) {
- /* head removal */
- ext4_lblk_t num;
- ext4_fsblk_t start;
-
- num = to - from;
- start = ext4_ext_pblock(ex);
-
- ext_debug("free first %u blocks starting %llu\n", num, start);
- ext4_free_blocks(handle, inode, NULL, start, num, flags);
-
- } else {
- printk(KERN_INFO "strange request: removal(2) "
- "%u-%u from %u:%u\n",
- from, to, le32_to_cpu(ex->ee_block), ee_len);
- }
+ } else
+ ext4_error(sbi->s_sb, "strange request: removal(2) "
+ "%u-%u from %u:%u\n",
+ from, to, le32_to_cpu(ex->ee_block), ee_len);
return 0;
}
--
1.7.7.6
Currently punch hole is disabled in file systems with bigalloc
feature enabled. However the recent changes in punch hole patch should
make it easier to support punching holes on bigalloc enabled file
systems.
This commit changes partial_cluster handling in ext4_remove_blocks(),
ext4_ext_rm_leaf() and ext4_ext_remove_space(). Currently
partial_cluster is unsigned long long type and it makes sure that we
will free the partial cluster if all extents has been released from that
cluster. However it has been specifically designed only for truncate.
With punch hole we can be freeing just some extents in the cluster
leaving the rest untouched. So we have to make sure that we will notice
cluster which still has some extents. To do this I've changed
partial_cluster to be signed long long type. The only scenario where
this could be a problem is when cluster_size == block size, however in
that case there would not be any partial clusters so we're safe. For
bigger clusters the signed type is enough. Now we use the negative value
in partial_cluster to mark such cluster used, hence we know that we must
not free it even if all other extents has been freed from such cluster.
This scenario can be described in simple diagram:
|FFF...FF..FF.UUU|
^----------^
punch hole
. - free space
| - cluster boundary
F - freed extent
U - used extent
Also update respective tracepoints to use signed long long type for
partial_cluster.
Signed-off-by: Lukas Czerner <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/ext4/extents.c | 69 +++++++++++++++++++++++++++++++-----------
include/trace/events/ext4.h | 25 ++++++++-------
2 files changed, 64 insertions(+), 30 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index fb9b414..214e68a 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2359,7 +2359,7 @@ int ext4_ext_index_trans_blocks(struct inode *inode, int nrblocks, int chunk)
static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
struct ext4_extent *ex,
- ext4_fsblk_t *partial_cluster,
+ long long *partial_cluster,
ext4_lblk_t from, ext4_lblk_t to)
{
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
@@ -2388,7 +2388,8 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
* partial cluster here.
*/
pblk = ext4_ext_pblock(ex) + ee_len - 1;
- if (*partial_cluster && (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
+ if ((*partial_cluster > 0) &&
+ (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
ext4_free_blocks(handle, inode, NULL,
EXT4_C2B(sbi, *partial_cluster),
sbi->s_cluster_ratio, flags);
@@ -2414,23 +2415,41 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
&& to == le32_to_cpu(ex->ee_block) + ee_len - 1) {
/* tail removal */
ext4_lblk_t num;
+ unsigned int unaligned;
num = le32_to_cpu(ex->ee_block) + ee_len - from;
pblk = ext4_ext_pblock(ex) + ee_len - num;
- ext_debug("free last %u blocks starting %llu\n", num, pblk);
+ /*
+ * Usually we want to free partial cluster at the end of the
+ * extent, except for the situation when the cluster is still
+ * used by any other extent (partial_cluster is negative).
+ */
+ if (*partial_cluster < 0 &&
+ -(*partial_cluster) == EXT4_B2C(sbi, pblk + num - 1))
+ flags |= EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER;
+
+ ext_debug("free last %u blocks starting %llu partial %lld\n",
+ num, pblk, *partial_cluster);
ext4_free_blocks(handle, inode, NULL, pblk, num, flags);
/*
* If the block range to be freed didn't start at the
* beginning of a cluster, and we removed the entire
- * extent, save the partial cluster here, since we
- * might need to delete if we determine that the
- * truncate operation has removed all of the blocks in
- * the cluster.
+ * extent and the cluster is not used by any other extent,
+ * save the partial cluster here, since we might need to
+ * delete if we determine that the truncate operation has
+ * removed all of the blocks in the cluster.
+ *
+ * On the other hand, if we did not manage to free the whole
+ * extent, we have to mark the cluster as used (store negative
+ * cluster number in partial_cluster).
*/
- if (pblk & (sbi->s_cluster_ratio - 1) &&
- (ee_len == num))
+ unaligned = pblk & (sbi->s_cluster_ratio - 1);
+ if (unaligned && (ee_len == num) &&
+ (*partial_cluster != -((long long)EXT4_B2C(sbi, pblk))))
*partial_cluster = EXT4_B2C(sbi, pblk);
- else
+ else if (unaligned)
+ *partial_cluster = -((long long)EXT4_B2C(sbi, pblk));
+ else if (*partial_cluster > 0)
*partial_cluster = 0;
} else
ext4_error(sbi->s_sb, "strange request: removal(2) "
@@ -2448,12 +2467,16 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
* @handle: The journal handle
* @inode: The files inode
* @path: The path to the leaf
+ * @partial_cluster: The cluster which we'll have to free if all extents
+ * has been released from it. It gets negative in case
+ * that the cluster is still used.
* @start: The first block to remove
* @end: The last block to remove
*/
static int
ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
- struct ext4_ext_path *path, ext4_fsblk_t *partial_cluster,
+ struct ext4_ext_path *path,
+ long long *partial_cluster,
ext4_lblk_t start, ext4_lblk_t end)
{
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
@@ -2466,6 +2489,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
unsigned short ex_ee_len;
unsigned uninitialized = 0;
struct ext4_extent *ex;
+ ext4_fsblk_t pblk;
/* the header must be checked already in ext4_ext_remove_space() */
ext_debug("truncate since %u in leaf to %u\n", start, end);
@@ -2504,6 +2528,16 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
/* If this extent is beyond the end of the hole, skip it */
if (end < ex_ee_block) {
+ /*
+ * We're going to skip this extent and move to another,
+ * so if this extent is not cluster aligned we have
+ * to mark the current cluster as used to avoid
+ * accidentally freeing it later on
+ */
+ pblk = ext4_ext_pblock(ex);
+ if (pblk & (sbi->s_cluster_ratio - 1))
+ *partial_cluster =
+ -((long long)EXT4_B2C(sbi, pblk));
ex--;
ex_ee_block = le32_to_cpu(ex->ee_block);
ex_ee_len = ext4_ext_get_actual_len(ex);
@@ -2579,7 +2613,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
sizeof(struct ext4_extent));
}
le16_add_cpu(&eh->eh_entries, -1);
- } else
+ } else if (*partial_cluster > 0)
*partial_cluster = 0;
err = ext4_ext_dirty(handle, inode, path + depth);
@@ -2597,11 +2631,10 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
err = ext4_ext_correct_indexes(handle, inode, path);
/*
- * If there is still a entry in the leaf node, check to see if
- * it references the partial cluster. This is the only place
- * where it could; if it doesn't, we can free the cluster.
+ * Free the partial cluster only if the current extent does not
+ * reference it. Otherwise we might free used cluster.
*/
- if (*partial_cluster && ex >= EXT_FIRST_EXTENT(eh) &&
+ if (*partial_cluster > 0 &&
(EXT4_B2C(sbi, ext4_ext_pblock(ex) + ex_ee_len - 1) !=
*partial_cluster)) {
int flags = EXT4_FREE_BLOCKS_FORGET;
@@ -2651,7 +2684,7 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start,
struct super_block *sb = inode->i_sb;
int depth = ext_depth(inode);
struct ext4_ext_path *path = NULL;
- ext4_fsblk_t partial_cluster = 0;
+ long long partial_cluster = 0;
handle_t *handle;
int i = 0, err = 0;
@@ -2837,7 +2870,7 @@ again:
/* If we still have something in the partial cluster and we have removed
* even the first extent, then we should free the blocks in the partial
* cluster as well. */
- if (partial_cluster && path->p_hdr->eh_entries == 0) {
+ if (partial_cluster > 0 && path->p_hdr->eh_entries == 0) {
int flags = EXT4_FREE_BLOCKS_FORGET;
if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index bcb5a02..e23b218 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -1928,7 +1928,7 @@ TRACE_EVENT(ext4_ext_show_extent,
TRACE_EVENT(ext4_remove_blocks,
TP_PROTO(struct inode *inode, struct ext4_extent *ex,
ext4_lblk_t from, ext4_fsblk_t to,
- ext4_fsblk_t partial_cluster),
+ long long partial_cluster),
TP_ARGS(inode, ex, from, to, partial_cluster),
@@ -1937,7 +1937,7 @@ TRACE_EVENT(ext4_remove_blocks,
__field( ino_t, ino )
__field( ext4_lblk_t, from )
__field( ext4_lblk_t, to )
- __field( ext4_fsblk_t, partial )
+ __field( long long, partial )
__field( ext4_fsblk_t, ee_pblk )
__field( ext4_lblk_t, ee_lblk )
__field( unsigned short, ee_len )
@@ -1955,7 +1955,7 @@ TRACE_EVENT(ext4_remove_blocks,
),
TP_printk("dev %d,%d ino %lu extent [%u(%llu), %u]"
- "from %u to %u partial_cluster %u",
+ "from %u to %u partial_cluster %lld",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned) __entry->ee_lblk,
@@ -1963,19 +1963,20 @@ TRACE_EVENT(ext4_remove_blocks,
(unsigned short) __entry->ee_len,
(unsigned) __entry->from,
(unsigned) __entry->to,
- (unsigned) __entry->partial)
+ (long long) __entry->partial)
);
TRACE_EVENT(ext4_ext_rm_leaf,
TP_PROTO(struct inode *inode, ext4_lblk_t start,
- struct ext4_extent *ex, ext4_fsblk_t partial_cluster),
+ struct ext4_extent *ex,
+ long long partial_cluster),
TP_ARGS(inode, start, ex, partial_cluster),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
- __field( ext4_fsblk_t, partial )
+ __field( long long, partial )
__field( ext4_lblk_t, start )
__field( ext4_lblk_t, ee_lblk )
__field( ext4_fsblk_t, ee_pblk )
@@ -1993,14 +1994,14 @@ TRACE_EVENT(ext4_ext_rm_leaf,
),
TP_printk("dev %d,%d ino %lu start_lblk %u last_extent [%u(%llu), %u]"
- "partial_cluster %u",
+ "partial_cluster %lld",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned) __entry->start,
(unsigned) __entry->ee_lblk,
(unsigned long long) __entry->ee_pblk,
(unsigned short) __entry->ee_len,
- (unsigned) __entry->partial)
+ (long long) __entry->partial)
);
TRACE_EVENT(ext4_ext_rm_idx,
@@ -2058,7 +2059,7 @@ TRACE_EVENT(ext4_ext_remove_space,
TRACE_EVENT(ext4_ext_remove_space_done,
TP_PROTO(struct inode *inode, ext4_lblk_t start, ext4_lblk_t end,
- int depth, ext4_lblk_t partial, __le16 eh_entries),
+ int depth, long long partial, __le16 eh_entries),
TP_ARGS(inode, start, end, depth, partial, eh_entries),
@@ -2068,7 +2069,7 @@ TRACE_EVENT(ext4_ext_remove_space_done,
__field( ext4_lblk_t, start )
__field( ext4_lblk_t, end )
__field( int, depth )
- __field( ext4_lblk_t, partial )
+ __field( long long, partial )
__field( unsigned short, eh_entries )
),
@@ -2082,14 +2083,14 @@ TRACE_EVENT(ext4_ext_remove_space_done,
__entry->eh_entries = le16_to_cpu(eh_entries);
),
- TP_printk("dev %d,%d ino %lu since %u end %u depth %d partial %u "
+ TP_printk("dev %d,%d ino %lu since %u end %u depth %d partial %lld "
"remaining_entries %u",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned) __entry->start,
(unsigned) __entry->end,
__entry->depth,
- (unsigned) __entry->partial,
+ (long long) __entry->partial,
(unsigned short) __entry->eh_entries)
);
--
1.7.7.6
This reverts commit 189e868fa8fdca702eb9db9d8afc46b5cb9144c9.
This commit reintroduces the use of ext4_block_truncate_page() in ext4
truncate operation instead of ext4_discard_partial_page_buffers().
The statement in the commit description that the truncate operation only
zero block unaligned portion of the last page is not exactly right,
since truncate_pagecache_range() also zeroes and invalidate the unaligned
portion of the page. Then there is no need to zero and unmap it once more
and ext4_block_truncate_page() was doing the right job, although we
still need to update the buffer head containing the last block, which is
exactly what ext4_block_truncate_page() is doing.
Moreover the problem described in the commit is fixed more properly with
commit
15291164b22a357cb211b618adfef4fa82fc0de3
jbd2: clear BH_Delay & BH_Unwritten in journal_unmap_buffer
This was tested on ppc64 machine with block size of 1024 bytes without
any problems.
Signed-off-by: Lukas Czerner <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/ext4/inode.c | 11 ++---------
1 files changed, 2 insertions(+), 9 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 8187c3e..34ebb62 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3937,7 +3937,6 @@ void ext4_truncate(struct inode *inode)
unsigned int credits;
handle_t *handle;
struct address_space *mapping = inode->i_mapping;
- loff_t page_len;
/*
* There is a possibility that we're either freeing the inode
@@ -3981,14 +3980,8 @@ void ext4_truncate(struct inode *inode)
return;
}
- if (inode->i_size % PAGE_CACHE_SIZE != 0) {
- page_len = PAGE_CACHE_SIZE -
- (inode->i_size & (PAGE_CACHE_SIZE - 1));
-
- if (ext4_discard_partial_page_buffers(handle,
- mapping, inode->i_size, page_len, 0))
- goto out_stop;
- }
+ if (inode->i_size & (inode->i_sb->s_blocksize - 1))
+ ext4_block_truncate_page(handle, mapping, inode->i_size);
/*
* We add the inode to the orphan list, so that if this
--
1.7.7.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
Currently we do not tell mm to zero out tail of the page before truncate
in orphan_cleanup(). This is ok, because the page should not be
uptodate, however this may eventually change and I might cause problems.
Call truncate_inode_pages() as precautionary measure. Thanks Jan Kara
for pointing this out.
Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/super.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index dbc7c09..b971066 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2173,6 +2173,7 @@ static void ext4_orphan_cleanup(struct super_block *sb,
jbd_debug(2, "truncating inode %lu to %lld bytes\n",
inode->i_ino, inode->i_size);
mutex_lock(&inode->i_mutex);
+ truncate_inode_pages(inode->i_mapping, inode->i_size);
ext4_truncate(inode);
mutex_unlock(&inode->i_mutex);
nr_truncates++;
--
1.7.7.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
We're doing to get rid of ext4_discard_partial_page_buffers() since it is
duplicating some code and also partially duplicating work of
truncate_pagecache_range(), moreover the old implementation was much
clearer.
Now when the truncate_inode_pages_range() can handle truncating non page
aligned regions we can use this to invalidate and zero out block aligned
region of the punched out range and then use ext4_block_truncate_page()
to zero the unaligned blocks on the start and end of the range. This
will greatly simplify the punch hole code. Moreover after this commit we
can get rid of the ext4_discard_partial_page_buffers() completely.
We also introduce function ext4_prepare_punch_hole() to do come common
operations before we attempt to do the actual punch hole on
indirect or extent file which saves us some code duplication.
This has been tested on ppc64 with 1k block size with fsx and xfstests
without any problems.
Signed-off-by: Lukas Czerner <[email protected]>
---
v4: Use start-len arguments in ext4_zero_partial_blocks()
fs/ext4/ext4.h | 2 +
fs/ext4/inode.c | 118 +++++++++++++++++++++---------------------------------
2 files changed, 48 insertions(+), 72 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 9f9719f..2d4b0aa 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2100,6 +2100,8 @@ extern int ext4_block_truncate_page(handle_t *handle,
struct address_space *mapping, loff_t from);
extern int ext4_block_zero_page_range(handle_t *handle,
struct address_space *mapping, loff_t from, loff_t length);
+extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
+ loff_t lstart, loff_t lend);
extern int ext4_discard_partial_page_buffers(handle_t *handle,
struct address_space *mapping, loff_t from,
loff_t length, int flags);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 34ebb62..44333a5 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3692,6 +3692,41 @@ unlock:
return err;
}
+int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
+ loff_t lstart, loff_t length)
+{
+ struct super_block *sb = inode->i_sb;
+ struct address_space *mapping = inode->i_mapping;
+ unsigned partial = lstart & (sb->s_blocksize - 1);
+ ext4_fsblk_t start, end;
+ loff_t byte_end = (lstart + length - 1);
+ int err = 0;
+
+ start = lstart >> sb->s_blocksize_bits;
+ end = byte_end >> sb->s_blocksize_bits;
+
+ /* Handle partial zero within the single block */
+ if (start == end) {
+ err = ext4_block_zero_page_range(handle, mapping,
+ lstart, length);
+ return err;
+ }
+ /* Handle partial zero out on the start of the range */
+ if (partial) {
+ err = ext4_block_zero_page_range(handle, mapping,
+ lstart, sb->s_blocksize);
+ if (err)
+ return err;
+ }
+ /* Handle partial zero out on the end of the range */
+ partial = byte_end & (sb->s_blocksize - 1);
+ if (partial != sb->s_blocksize - 1)
+ err = ext4_block_zero_page_range(handle, mapping,
+ byte_end - partial,
+ partial + 1);
+ return err;
+}
+
int ext4_can_truncate(struct inode *inode)
{
if (S_ISREG(inode->i_mode))
@@ -3720,8 +3755,7 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
struct super_block *sb = inode->i_sb;
ext4_lblk_t first_block, stop_block;
struct address_space *mapping = inode->i_mapping;
- loff_t first_page, last_page, page_len;
- loff_t first_page_offset, last_page_offset;
+ loff_t first_block_offset, last_block_offset;
handle_t *handle;
unsigned int credits;
int ret = 0;
@@ -3772,17 +3806,13 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
offset;
}
- first_page = (offset + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
- last_page = (offset + length) >> PAGE_CACHE_SHIFT;
+ first_block_offset = round_up(offset, sb->s_blocksize);
+ last_block_offset = round_down((offset + length), sb->s_blocksize) - 1;
- first_page_offset = first_page << PAGE_CACHE_SHIFT;
- last_page_offset = last_page << PAGE_CACHE_SHIFT;
-
- /* Now release the pages */
- if (last_page_offset > first_page_offset) {
- truncate_pagecache_range(inode, first_page_offset,
- last_page_offset - 1);
- }
+ /* Now release the pages and zero block aligned part of pages*/
+ if (last_block_offset > first_block_offset)
+ truncate_pagecache_range(inode, first_block_offset,
+ last_block_offset);
/* Wait all existing dio workers, newcomers will block on i_mutex */
ext4_inode_block_unlocked_dio(inode);
@@ -3802,66 +3832,10 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
goto out_dio;
}
- /*
- * Now we need to zero out the non-page-aligned data in the
- * pages at the start and tail of the hole, and unmap the
- * buffer heads for the block aligned regions of the page that
- * were completely zeroed.
- */
- if (first_page > last_page) {
- /*
- * If the file space being truncated is contained
- * within a page just zero out and unmap the middle of
- * that page
- */
- ret = ext4_discard_partial_page_buffers(handle,
- mapping, offset, length, 0);
-
- if (ret)
- goto out_stop;
- } else {
- /*
- * zero out and unmap the partial page that contains
- * the start of the hole
- */
- page_len = first_page_offset - offset;
- if (page_len > 0) {
- ret = ext4_discard_partial_page_buffers(handle, mapping,
- offset, page_len, 0);
- if (ret)
- goto out_stop;
- }
-
- /*
- * zero out and unmap the partial page that contains
- * the end of the hole
- */
- page_len = offset + length - last_page_offset;
- if (page_len > 0) {
- ret = ext4_discard_partial_page_buffers(handle, mapping,
- last_page_offset, page_len, 0);
- if (ret)
- goto out_stop;
- }
- }
-
- /*
- * If i_size is contained in the last page, we need to
- * unmap and zero the partial page after i_size
- */
- if (inode->i_size >> PAGE_CACHE_SHIFT == last_page &&
- inode->i_size % PAGE_CACHE_SIZE != 0) {
- page_len = PAGE_CACHE_SIZE -
- (inode->i_size & (PAGE_CACHE_SIZE - 1));
-
- if (page_len > 0) {
- ret = ext4_discard_partial_page_buffers(handle,
- mapping, inode->i_size, page_len, 0);
-
- if (ret)
- goto out_stop;
- }
- }
+ ret = ext4_zero_partial_blocks(handle, inode, offset,
+ length);
+ if (ret)
+ goto out_stop;
first_block = (offset + sb->s_blocksize - 1) >>
EXT4_BLOCK_SIZE_BITS(sb);
--
1.7.7.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
The discard_partial_page_buffers is no longer used anywhere so we can
simply remove it including the *_no_lock variant and
EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED define.
Signed-off-by: Lukas Czerner <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/ext4/ext4.h | 8 --
fs/ext4/inode.c | 206 -------------------------------------------------------
2 files changed, 0 insertions(+), 214 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 2d4b0aa..019db3c 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -581,11 +581,6 @@ enum {
#define EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER 0x0020
/*
- * Flags used by ext4_discard_partial_page_buffers
- */
-#define EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED 0x0001
-
-/*
* ioctl commands
*/
#define EXT4_IOC_GETFLAGS FS_IOC_GETFLAGS
@@ -2102,9 +2097,6 @@ extern int ext4_block_zero_page_range(handle_t *handle,
struct address_space *mapping, loff_t from, loff_t length);
extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
loff_t lstart, loff_t lend);
-extern int ext4_discard_partial_page_buffers(handle_t *handle,
- struct address_space *mapping, loff_t from,
- loff_t length, int flags);
extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
extern qsize_t *ext4_get_reserved_space(struct inode *inode);
extern void ext4_da_update_reserve_space(struct inode *inode,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 44333a5..f504efa 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -135,9 +135,6 @@ static void ext4_invalidatepage(struct page *page, unsigned int offset,
unsigned int length);
static int __ext4_journalled_writepage(struct page *page, unsigned int len);
static int ext4_bh_delay_or_unwritten(handle_t *handle, struct buffer_head *bh);
-static int ext4_discard_partial_page_buffers_no_lock(handle_t *handle,
- struct inode *inode, struct page *page, loff_t from,
- loff_t length, int flags);
/*
* Test whether an inode is a fast symlink.
@@ -3366,209 +3363,6 @@ void ext4_set_aops(struct inode *inode)
inode->i_mapping->a_ops = &ext4_aops;
}
-
-/*
- * ext4_discard_partial_page_buffers()
- * Wrapper function for ext4_discard_partial_page_buffers_no_lock.
- * This function finds and locks the page containing the offset
- * "from" and passes it to ext4_discard_partial_page_buffers_no_lock.
- * Calling functions that already have the page locked should call
- * ext4_discard_partial_page_buffers_no_lock directly.
- */
-int ext4_discard_partial_page_buffers(handle_t *handle,
- struct address_space *mapping, loff_t from,
- loff_t length, int flags)
-{
- struct inode *inode = mapping->host;
- struct page *page;
- int err = 0;
-
- page = find_or_create_page(mapping, from >> PAGE_CACHE_SHIFT,
- mapping_gfp_mask(mapping) & ~__GFP_FS);
- if (!page)
- return -ENOMEM;
-
- err = ext4_discard_partial_page_buffers_no_lock(handle, inode, page,
- from, length, flags);
-
- unlock_page(page);
- page_cache_release(page);
- return err;
-}
-
-/*
- * ext4_discard_partial_page_buffers_no_lock()
- * Zeros a page range of length 'length' starting from offset 'from'.
- * Buffer heads that correspond to the block aligned regions of the
- * zeroed range will be unmapped. Unblock aligned regions
- * will have the corresponding buffer head mapped if needed so that
- * that region of the page can be updated with the partial zero out.
- *
- * This function assumes that the page has already been locked. The
- * The range to be discarded must be contained with in the given page.
- * If the specified range exceeds the end of the page it will be shortened
- * to the end of the page that corresponds to 'from'. This function is
- * appropriate for updating a page and it buffer heads to be unmapped and
- * zeroed for blocks that have been either released, or are going to be
- * released.
- *
- * handle: The journal handle
- * inode: The files inode
- * page: A locked page that contains the offset "from"
- * from: The starting byte offset (from the beginning of the file)
- * to begin discarding
- * len: The length of bytes to discard
- * flags: Optional flags that may be used:
- *
- * EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED
- * Only zero the regions of the page whose buffer heads
- * have already been unmapped. This flag is appropriate
- * for updating the contents of a page whose blocks may
- * have already been released, and we only want to zero
- * out the regions that correspond to those released blocks.
- *
- * Returns zero on success or negative on failure.
- */
-static int ext4_discard_partial_page_buffers_no_lock(handle_t *handle,
- struct inode *inode, struct page *page, loff_t from,
- loff_t length, int flags)
-{
- ext4_fsblk_t index = from >> PAGE_CACHE_SHIFT;
- unsigned int offset = from & (PAGE_CACHE_SIZE-1);
- unsigned int blocksize, max, pos;
- ext4_lblk_t iblock;
- struct buffer_head *bh;
- int err = 0;
-
- blocksize = inode->i_sb->s_blocksize;
- max = PAGE_CACHE_SIZE - offset;
-
- if (index != page->index)
- return -EINVAL;
-
- /*
- * correct length if it does not fall between
- * 'from' and the end of the page
- */
- if (length > max || length < 0)
- length = max;
-
- iblock = index << (PAGE_CACHE_SHIFT - inode->i_sb->s_blocksize_bits);
-
- if (!page_has_buffers(page))
- create_empty_buffers(page, blocksize, 0);
-
- /* Find the buffer that contains "offset" */
- bh = page_buffers(page);
- pos = blocksize;
- while (offset >= pos) {
- bh = bh->b_this_page;
- iblock++;
- pos += blocksize;
- }
-
- pos = offset;
- while (pos < offset + length) {
- unsigned int end_of_block, range_to_discard;
-
- err = 0;
-
- /* The length of space left to zero and unmap */
- range_to_discard = offset + length - pos;
-
- /* The length of space until the end of the block */
- end_of_block = blocksize - (pos & (blocksize-1));
-
- /*
- * Do not unmap or zero past end of block
- * for this buffer head
- */
- if (range_to_discard > end_of_block)
- range_to_discard = end_of_block;
-
-
- /*
- * Skip this buffer head if we are only zeroing unampped
- * regions of the page
- */
- if (flags & EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED &&
- buffer_mapped(bh))
- goto next;
-
- /* If the range is block aligned, unmap */
- if (range_to_discard == blocksize) {
- clear_buffer_dirty(bh);
- bh->b_bdev = NULL;
- clear_buffer_mapped(bh);
- clear_buffer_req(bh);
- clear_buffer_new(bh);
- clear_buffer_delay(bh);
- clear_buffer_unwritten(bh);
- clear_buffer_uptodate(bh);
- zero_user(page, pos, range_to_discard);
- BUFFER_TRACE(bh, "Buffer discarded");
- goto next;
- }
-
- /*
- * If this block is not completely contained in the range
- * to be discarded, then it is not going to be released. Because
- * we need to keep this block, we need to make sure this part
- * of the page is uptodate before we modify it by writeing
- * partial zeros on it.
- */
- if (!buffer_mapped(bh)) {
- /*
- * Buffer head must be mapped before we can read
- * from the block
- */
- BUFFER_TRACE(bh, "unmapped");
- ext4_get_block(inode, iblock, bh, 0);
- /* unmapped? It's a hole - nothing to do */
- if (!buffer_mapped(bh)) {
- BUFFER_TRACE(bh, "still unmapped");
- goto next;
- }
- }
-
- /* Ok, it's mapped. Make sure it's up-to-date */
- if (PageUptodate(page))
- set_buffer_uptodate(bh);
-
- if (!buffer_uptodate(bh)) {
- err = -EIO;
- ll_rw_block(READ, 1, &bh);
- wait_on_buffer(bh);
- /* Uhhuh. Read error. Complain and punt.*/
- if (!buffer_uptodate(bh))
- goto next;
- }
-
- if (ext4_should_journal_data(inode)) {
- BUFFER_TRACE(bh, "get write access");
- err = ext4_journal_get_write_access(handle, bh);
- if (err)
- goto next;
- }
-
- zero_user(page, pos, range_to_discard);
-
- err = 0;
- if (ext4_should_journal_data(inode)) {
- err = ext4_handle_dirty_metadata(handle, inode, bh);
- } else
- mark_buffer_dirty(bh);
-
- BUFFER_TRACE(bh, "Partial buffer zeroed");
-next:
- bh = bh->b_this_page;
- iblock++;
- pos += range_to_discard;
- }
-
- return err;
-}
-
/*
* ext4_block_truncate_page() zeroes out a mapping from file offset `from'
* up to the end of the block which corresponds to `from'.
--
1.7.7.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
Add "end" variable.
Signed-off-by: Lukas Czerner <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/ext4/extents.c | 6 +++---
include/trace/events/ext4.h | 21 ++++++++++++++-------
2 files changed, 17 insertions(+), 10 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 18c3f1a..fb9b414 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2663,7 +2663,7 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start,
return PTR_ERR(handle);
again:
- trace_ext4_ext_remove_space(inode, start, depth);
+ trace_ext4_ext_remove_space(inode, start, end, depth);
/*
* Check if we are removing extents inside the extent tree. If that
@@ -2831,8 +2831,8 @@ again:
}
}
- trace_ext4_ext_remove_space_done(inode, start, depth, partial_cluster,
- path->p_hdr->eh_entries);
+ trace_ext4_ext_remove_space_done(inode, start, end, depth,
+ partial_cluster, path->p_hdr->eh_entries);
/* If we still have something in the partial cluster and we have removed
* even the first extent, then we should free the blocks in the partial
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index dcfce96..bcb5a02 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -2027,14 +2027,16 @@ TRACE_EVENT(ext4_ext_rm_idx,
);
TRACE_EVENT(ext4_ext_remove_space,
- TP_PROTO(struct inode *inode, ext4_lblk_t start, int depth),
+ TP_PROTO(struct inode *inode, ext4_lblk_t start,
+ ext4_lblk_t end, int depth),
- TP_ARGS(inode, start, depth),
+ TP_ARGS(inode, start, end, depth),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
__field( ext4_lblk_t, start )
+ __field( ext4_lblk_t, end )
__field( int, depth )
),
@@ -2042,26 +2044,29 @@ TRACE_EVENT(ext4_ext_remove_space,
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
__entry->start = start;
+ __entry->end = end;
__entry->depth = depth;
),
- TP_printk("dev %d,%d ino %lu since %u depth %d",
+ TP_printk("dev %d,%d ino %lu since %u end %u depth %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned) __entry->start,
+ (unsigned) __entry->end,
__entry->depth)
);
TRACE_EVENT(ext4_ext_remove_space_done,
- TP_PROTO(struct inode *inode, ext4_lblk_t start, int depth,
- ext4_lblk_t partial, __le16 eh_entries),
+ TP_PROTO(struct inode *inode, ext4_lblk_t start, ext4_lblk_t end,
+ int depth, ext4_lblk_t partial, __le16 eh_entries),
- TP_ARGS(inode, start, depth, partial, eh_entries),
+ TP_ARGS(inode, start, end, depth, partial, eh_entries),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
__field( ext4_lblk_t, start )
+ __field( ext4_lblk_t, end )
__field( int, depth )
__field( ext4_lblk_t, partial )
__field( unsigned short, eh_entries )
@@ -2071,16 +2076,18 @@ TRACE_EVENT(ext4_ext_remove_space_done,
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
__entry->start = start;
+ __entry->end = end;
__entry->depth = depth;
__entry->partial = partial;
__entry->eh_entries = le16_to_cpu(eh_entries);
),
- TP_printk("dev %d,%d ino %lu since %u depth %d partial %u "
+ TP_printk("dev %d,%d ino %lu since %u end %u depth %d partial %u "
"remaining_entries %u",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned) __entry->start,
+ (unsigned) __entry->end,
__entry->depth,
(unsigned) __entry->partial,
(unsigned short) __entry->eh_entries)
--
1.7.7.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
In commits 5f95d21fb6f2aaa52830e5b7fb405f6c71d3ab85 and
30bc2ec9598a1b156ad75217f2e7d4560efdeeab we've reworked punch_hole
implementation and there is noting holding us back from using punch hole
on file system with bigalloc feature enabled.
This has been tested with fsx and xfstests.
Signed-off-by: Lukas Czerner <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/ext4/inode.c | 5 -----
1 files changed, 0 insertions(+), 5 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index f504efa..daffbb8 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3557,11 +3557,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
if (!S_ISREG(inode->i_mode))
return -EOPNOTSUPP;
- if (EXT4_SB(sb)->s_cluster_ratio > 1) {
- /* TODO: Add support for bigalloc file systems */
- return -EOPNOTSUPP;
- }
-
trace_ext4_punch_hole(inode, offset, length);
/*
--
1.7.7.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
On Tue, May 14, 2013 at 06:37:19PM +0200, Lukas Czerner wrote:
> ->invalidatepage() aop now accepts range to invalidate so we can make
> use of it in xfs_vm_invalidatepage()
>
> Signed-off-by: Lukas Czerner <[email protected]>
> Reviewed-by: Ben Myers <[email protected]>
> Cc: [email protected]
Acked-by: Dave Chinner <[email protected]>
--
Dave Chinner
[email protected]
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
On Tue, 14 May 2013, Lukas Czerner wrote:
> Date: Tue, 14 May 2013 18:37:14 +0200
> From: Lukas Czerner <[email protected]>
> To: [email protected]
> Cc: [email protected], [email protected],
> [email protected], [email protected], [email protected],
> [email protected]
> Subject: [PATCH v4 00/20] change invalidatepage prototype to accept length
Hi Ted,
you've mentioned that you'll carry the changed in the ext4 tree. Are
you going to take it for the next merge window ?
However I still need some review on the mm part of the series,
Andrew, Hugh, anyone ?
Thanks!
-Lukas
>
> Hi,
>
> This set of patches are aimed to allow truncate_inode_pages_range() handle
> ranges which are not aligned at the end of the page. Currently it will
> hit BUG_ON() when the end of the range is not aligned. Punch hole feature
> however can benefit from this ability saving file systems some work not
> forcing them to implement their own invalidate code to handle unaligned
> ranges.
>
> In order for this to woke we need change ->invalidatepage() address space
> operation to to accept range to invalidate by adding 'length' argument in
> addition to 'offset'. This is different from my previous attempt to create
> new aop ->invalidatepage_range (http://lwn.net/Articles/514828/) which I
> reconsidered to be unnecessary.
>
> It would be for the best if this series could go through ext4 branch since
> there are a lot of ext4 changes which are based on dev branch of ext4
> (git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git)
>
> For description purposes this patch set can be divided into following
> groups:
>
> patch 0001: Change ->invalidatepage() prototype adding 'length' argument
> and changing all the instances. In very simple cases file
> system methods are completely adapted, otherwise only
> prototype is changed and the rest will follow. This patch
> also implement the 'range' invalidation in
> block_invalidatepage().
>
> patch 0002 - 0009:
> Make the use of new 'length' argument in the file system
> itself. Some file systems can take advantage of it trying
> to invalidate only portion of the page if possible, some
> can't, however none of the file systems currently attempt
> to truncate non page aligned ranges.
>
>
> patch 0010: Teach truncate_inode_pages_range() to handle non page aligned
> ranges.
>
> patch 0011 - 0020:
> Ext4 changes build on top of previous changes, simplifying
> punch hole code. Not all changes are realated specifically
> to invalidatepage() change, but all are related to the punch
> hole feature.
>
> Even though this patch set would mainly affect functionality of the file
> file systems implementing punch hole I've tested all the following file
> system using xfstests without noticing any bugs related to this change.
>
> ext3, ext4, xfs, btrfs, gfs2 and reiserfs
>
> I've also tested block size < page size on ext4 with xfstests and fsx.
>
>
> v3 -> v4: Some minor changes based on the reviews. Added two ext4 patches
> as suggested by Jan Kara.
>
> Thanks!
> -Lukas
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
On Tue, May 21, 2013 at 04:34:25PM +0200, Lukáš Czerner wrote:
> On Tue, 14 May 2013, Lukas Czerner wrote:
>
> > Date: Tue, 14 May 2013 18:37:14 +0200
> > From: Lukas Czerner <[email protected]>
> > To: [email protected]
> > Cc: [email protected], [email protected],
> > [email protected], [email protected], [email protected],
> > [email protected]
> > Subject: [PATCH v4 00/20] change invalidatepage prototype to accept length
>
> Hi Ted,
>
> you've mentioned that you'll carry the changed in the ext4 tree. Are
> you going to take it for the next merge window ?
>
> However I still need some review on the mm part of the series,
> Andrew, Hugh, anyone ?
The ext4 tree now has the the v4 version of hte invalidatepage range
patches in it. So it will be showing up in the next linux-next tree,
for the convenience of the mm folks. :-)
- Ted
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
On Tue, May 14, 2013 at 06:37:34PM +0200, Lukas Czerner wrote:
> In commits 5f95d21fb6f2aaa52830e5b7fb405f6c71d3ab85 and
> 30bc2ec9598a1b156ad75217f2e7d4560efdeeab we've reworked punch_hole
> implementation and there is noting holding us back from using punch hole
> on file system with bigalloc feature enabled.
>
> This has been tested with fsx and xfstests.
>
> Signed-off-by: Lukas Czerner <[email protected]>
> Reviewed-by: Jan Kara <[email protected]>
This patch is causing a test failure with bigalloc enabled with the
xfstests shared/298.
Since it's at the end of the invalidate page range tests, I'm going to
drop this patch for now. Could you take a look at this?
Thanks!!
- Ted
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
On Fri, 31 May 2013, Theodore Ts'o wrote:
> Date: Fri, 31 May 2013 11:14:54 -0400
> From: Theodore Ts'o <[email protected]>
> To: Lukas Czerner <[email protected]>
> Cc: [email protected], [email protected],
> [email protected], [email protected],
> [email protected], [email protected]
> Subject: Re: [PATCH v4 20/20] ext4: Allow punch hole with bigalloc enabled
>
> On Tue, May 14, 2013 at 06:37:34PM +0200, Lukas Czerner wrote:
> > In commits 5f95d21fb6f2aaa52830e5b7fb405f6c71d3ab85 and
> > 30bc2ec9598a1b156ad75217f2e7d4560efdeeab we've reworked punch_hole
> > implementation and there is noting holding us back from using punch hole
> > on file system with bigalloc feature enabled.
> >
> > This has been tested with fsx and xfstests.
> >
> > Signed-off-by: Lukas Czerner <[email protected]>
> > Reviewed-by: Jan Kara <[email protected]>
>
> This patch is causing a test failure with bigalloc enabled with the
> xfstests shared/298.
>
> Since it's at the end of the invalidate page range tests, I'm going to
> drop this patch for now. Could you take a look at this?
Hi Ted,
sorry for the delay, I've been on vacation last week so I am trying
to catch on the recent development :) I'll take a look at it
hopefully this week. Thanks for letting me know.
-Lukas
>
> Thanks!!
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
On Fri, 31 May 2013, Theodore Ts'o wrote:
> Date: Fri, 31 May 2013 11:14:54 -0400
> From: Theodore Ts'o <[email protected]>
> To: Lukas Czerner <[email protected]>
> Cc: [email protected], [email protected],
> [email protected], [email protected],
> [email protected], [email protected]
> Subject: Re: [PATCH v4 20/20] ext4: Allow punch hole with bigalloc enabled
>
> On Tue, May 14, 2013 at 06:37:34PM +0200, Lukas Czerner wrote:
> > In commits 5f95d21fb6f2aaa52830e5b7fb405f6c71d3ab85 and
> > 30bc2ec9598a1b156ad75217f2e7d4560efdeeab we've reworked punch_hole
> > implementation and there is noting holding us back from using punch hole
> > on file system with bigalloc feature enabled.
> >
> > This has been tested with fsx and xfstests.
> >
> > Signed-off-by: Lukas Czerner <[email protected]>
> > Reviewed-by: Jan Kara <[email protected]>
>
> This patch is causing a test failure with bigalloc enabled with the
> xfstests shared/298.
>
> Since it's at the end of the invalidate page range tests, I'm going to
> drop this patch for now. Could you take a look at this?
>
> Thanks!!
>
> - Ted
Hi Ted,
I should have really noticed this earlier. This test (shared/298)
have nothing to do with bigalloc, nor punch hole. It tests file
system discard implementation.
The most likely reason it failed for you is that the tests does not
count with bigalloc feature. However it seems to be working for me
without any problems. Can you provide more information about the
problem you've seen, or at least your xfstest configuration so we
can see what went wrong and possibly fix the test ?
Tom can you take a look at this ? (Adding Tomas Racek to the CC)
So, since this failure is not really related to the patch itself,
can we re-include the patch (it might be already too late I guess).
Thanks!
-Lukas
On Tue, May 14, 2013 at 06:37:29PM +0200, Lukas Czerner wrote:
> We're doing to get rid of ext4_discard_partial_page_buffers() since it is
> duplicating some code and also partially duplicating work of
> truncate_pagecache_range(), moreover the old implementation was much
> clearer.
>
> Now when the truncate_inode_pages_range() can handle truncating non page
> aligned regions we can use this to invalidate and zero out block aligned
> region of the punched out range and then use ext4_block_truncate_page()
> to zero the unaligned blocks on the start and end of the range. This
> will greatly simplify the punch hole code. Moreover after this commit we
> can get rid of the ext4_discard_partial_page_buffers() completely.
>
> We also introduce function ext4_prepare_punch_hole() to do come common
> operations before we attempt to do the actual punch hole on
> indirect or extent file which saves us some code duplication.
>
> This has been tested on ppc64 with 1k block size with fsx and xfstests
> without any problems.
>
> Signed-off-by: Lukas Czerner <[email protected]>
Hi Lukas,
I've been seeing xfstests failures on test generic/300 in nojournal
mode.
BEGIN TEST: Ext4 4k block w/ no journal Thu Jun 13 22:38:47 EDT 2013
Device: /dev/vdb
mk2fs options: -q -O ^has_journal
mount options: -o block_validity,noload
FSTYP -- ext4
PLATFORM -- Linux/i686 candygram 3.10.0-rc2-00477-g1e1cad7
MKFS_OPTIONS -- -q -O ^has_journal /dev/vdc
MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity,noload /dev/vdc /vdc
generic/300 [20:42:18][ 116.877278] fio (3320) used greatest stack depth: 5580 bytes left
[ 116.967122] fio (3321) used greatest stack depth: 5560 bytes left
[ 117.573861] fio (3325) used greatest stack depth: 5504 bytes left
[20:44:01] [failed, exit status 1] - output mismatch (see /root/xfstests/results/generic/300.out.bad)
--- tests/generic/300.out 2013-06-04 22:42:55.000000000 -0400
+++ /root/xfstests/results/generic/300.out.bad 2013-06-13 20:44:01.306666665 -0400
@@ -2,3 +2,4 @@
Run fio with random aio-dio pattern
+_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/300.full)
...
(Run 'diff -u tests/generic/300.out /root/xfstests/results/generic/300.out.bad' to see the entire diff)
It bisects down to this patch, and if I take the dev branch, and
revert patches #15 through #19 in this series, the problem goes away.
Can you investigate and recommend a better fix?
Thanks,
- Ted
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
On Thu, 13 Jun 2013, Theodore Ts'o wrote:
> Date: Thu, 13 Jun 2013 23:01:54 -0400
> From: Theodore Ts'o <[email protected]>
> To: Lukas Czerner <[email protected]>
> Cc: [email protected], [email protected],
> [email protected], [email protected],
> [email protected], [email protected]
> Subject: Re: [PATCH v4 15/20] ext4: use ext4_zero_partial_blocks in punch_hole
>
> On Tue, May 14, 2013 at 06:37:29PM +0200, Lukas Czerner wrote:
> > We're doing to get rid of ext4_discard_partial_page_buffers() since it is
> > duplicating some code and also partially duplicating work of
> > truncate_pagecache_range(), moreover the old implementation was much
> > clearer.
> >
> > Now when the truncate_inode_pages_range() can handle truncating non page
> > aligned regions we can use this to invalidate and zero out block aligned
> > region of the punched out range and then use ext4_block_truncate_page()
> > to zero the unaligned blocks on the start and end of the range. This
> > will greatly simplify the punch hole code. Moreover after this commit we
> > can get rid of the ext4_discard_partial_page_buffers() completely.
> >
> > We also introduce function ext4_prepare_punch_hole() to do come common
> > operations before we attempt to do the actual punch hole on
> > indirect or extent file which saves us some code duplication.
> >
> > This has been tested on ppc64 with 1k block size with fsx and xfstests
> > without any problems.
> >
> > Signed-off-by: Lukas Czerner <[email protected]>
>
> Hi Lukas,
>
> I've been seeing xfstests failures on test generic/300 in nojournal
> mode.
>
> BEGIN TEST: Ext4 4k block w/ no journal Thu Jun 13 22:38:47 EDT 2013
> Device: /dev/vdb
> mk2fs options: -q -O ^has_journal
> mount options: -o block_validity,noload
> FSTYP -- ext4
> PLATFORM -- Linux/i686 candygram 3.10.0-rc2-00477-g1e1cad7
> MKFS_OPTIONS -- -q -O ^has_journal /dev/vdc
> MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity,noload /dev/vdc /vdc
>
> generic/300 [20:42:18][ 116.877278] fio (3320) used greatest stack depth: 5580 bytes left
> [ 116.967122] fio (3321) used greatest stack depth: 5560 bytes left
> [ 117.573861] fio (3325) used greatest stack depth: 5504 bytes left
> [20:44:01] [failed, exit status 1] - output mismatch (see /root/xfstests/results/generic/300.out.bad)
> --- tests/generic/300.out 2013-06-04 22:42:55.000000000 -0400
> +++ /root/xfstests/results/generic/300.out.bad 2013-06-13 20:44:01.306666665 -0400
> @@ -2,3 +2,4 @@
>
> Run fio with random aio-dio pattern
>
> +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/300.full)
> ...
> (Run 'diff -u tests/generic/300.out /root/xfstests/results/generic/300.out.bad' to see the entire diff)
>
> It bisects down to this patch, and if I take the dev branch, and
> revert patches #15 through #19 in this series, the problem goes away.
>
> Can you investigate and recommend a better fix?
Hi Ted,
I'll take a look at this. Thanks for letting me know.
-Lukas
>
> Thanks,
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
On Fri, Jun 14, 2013 at 12:16:53PM +0200, Lukáš Czerner wrote:
> > It bisects down to this patch, and if I take the dev branch, and
> > revert patches #15 through #19 in this series, the problem goes away.
Correction... reverting patches #15 through #19 (which is what I did in
the dev-with-revert branch found on ext4.git) causes the problem to go
away in the nojournal case, but it causes a huge number of other
problems. Some of the reverts weren't clean, so it's possible I
screwed up one of the reverts. It's also possible that only applying
part of this series leaves the tree in an unstable state.
I'd much rather figure out how to fix the problem on the dev branch,
so thank you for looking into this!
- Ted
BEGIN TEST: Ext4 4k block Thu Jun 13 23:25:45 EDT 2013
Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/255 generic/263 shared/218
END TEST: Ext4 4k block Fri Jun 14 00:29:17 EDT 2013
BEGIN TEST: Ext4 4k block w/nodelalloc, no flex_bg, and no extents Fri Jun 14 00:29:22 EDT 2013
+_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/269.full)
generic/270 69s ... [01:34:24][ 8102.435986] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #25735: block 99331: comm quotacheck: bad entry in directory: rec_len % 4 != 0 -\
offset=0(0), inode=3739147998, rec_len=57054, name_len=222
Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/263 generic/269 generic/270 generic/285 generic/300 shared/218
END TEST: Ext4 4k block w/nodelalloc, no flex_bg, and no extents Fri Jun 14 01:49:59 EDT 2013
BEGIN TEST: Ext4 4k block w/ no journal Fri Jun 14 01:50:00 EDT 2013
+_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/269.full)
generic/270 69s ... [02:20:21][10531.911437] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9894: block 6526: comm quotacheck: bad entry in directory: rec_len is smaller t\
han minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[10532.535861] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9692: block 6534: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
len=15934, name_len=62
[10534.266775] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9906: block 6530: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
len=15934, name_len=62
[10534.697885] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #24673: block 6531: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec\
_len=15934, name_len=62
[10535.157126] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9898: block 6532: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
len=15934, name_len=62
[10536.395838] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9905: block 6529: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
len=15934, name_len=62
[10537.029470] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9899: block 6533: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
len=15934, name_len=62
[10537.259601] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9929: block 6527: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
len=15934, name_len=62
Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/255 generic/263 generic/269 generic/270 shared/218 shared/298
END TEST: Ext4 4k block w/ no journal Fri Jun 14 02:32:14 EDT 2013
BEGIN TEST: Ext4 1k block Fri Jun 14 02:32:18 EDT 2013
Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/255 generic/263 generic/285 shared/218
END TEST: Ext4 1k block Fri Jun 14 04:00:17 EDT 2013
BEGIN TEST: Ext4 4k block w/nodelalloc and no flex_bg Fri Jun 14 04:00:20 EDT 2013
+_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/269.full)
Failures: generic/075 generic/091 generic/112 generic/127 generic/223 generic/231 generic/255 generic/263 generic/269 shared/218
END TEST: Ext4 4k block w/nodelalloc and no flex_bg Fri Jun 14 05:16:10 EDT 2013
BEGIN TEST: Ext4 4k block w/metadata_csum Fri Jun 14 05:16:12 EDT 2013
Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/255 generic/263 shared/218
END TEST: Ext4 4k block w/metadata_csum Fri Jun 14 06:16:31 EDT 2013
BEGIN TEST: Ext4 4k block w/dioread_nolock Fri Jun 14 06:16:31 EDT 2013
_check_generic_filesystem: filesystem on /dev/vdb is inconsistent (see /root/xfstests/results/generic/013.full)
Failures: generic/013
END TEST: Ext4 4k block w/dioread_nolock Fri Jun 14 06:21:37 EDT 2013
BEGIN TEST: Ext4 4k block w/data=journal Fri Jun 14 06:21:41 EDT 2013
+_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/269.full)
+_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/300.full)
[29242.456266] WARNING: at /usr/local/google/home/tytso/linux/ext4/fs/buffer.c:1120 mark_buffer_dirty+0x54/0x1ff()
Failures: generic/075 generic/091 generic/112 generic/127 generic/223 generic/231 generic/255 generic/263 generic/269 generic/270 generic/300 shared/218
END TEST: Ext4 4k block w/data=journal Fri Jun 14 07:33:10 EDT 2013
BEGIN TEST: Ext4 4k block w/bigalloc Fri Jun 14 07:33:16 EDT 2013
[33544.485801] WARNING: at /usr/local/google/home/tytso/linux/ext4/fs/quota/dquot.c:1090 dquot_claim_space_nodirty+0xf1/0x1e3()
Failures: generic/204 generic/219 generic/233 generic/235 generic/273 generic/275 generic/300 shared/218
END TEST: Ext4 4k block w/bigalloc Fri Jun 14 08:56:56 EDT 2013
On Fri, 14 Jun 2013, Ted Ts'o wrote:
> Date: Fri, 14 Jun 2013 09:39:39 -0400
> From: Ted Ts'o <[email protected]>
> To: Lukáš Czerner <[email protected]>
> Cc: [email protected]
> Subject: Re: [PATCH v4 15/20] ext4: use ext4_zero_partial_blocks in punch_hole
>
> On Fri, Jun 14, 2013 at 12:16:53PM +0200, Lukáš Czerner wrote:
> > > It bisects down to this patch, and if I take the dev branch, and
> > > revert patches #15 through #19 in this series, the problem goes away.
>
> Correction... reverting patches #15 through #19 (which is what I did in
> the dev-with-revert branch found on ext4.git) causes the problem to go
> away in the nojournal case, but it causes a huge number of other
> problems. Some of the reverts weren't clean, so it's possible I
> screwed up one of the reverts. It's also possible that only applying
> part of this series leaves the tree in an unstable state.
>
> I'd much rather figure out how to fix the problem on the dev branch,
> so thank you for looking into this!
Wow, this looks bad. Theoretically reverting patches %15 through
#19 should not have any real impact. So far I do not see what is
causing that, but I am looking into this.
I see that there are problems in other mode, not just nojournal. Are
those caused by this as well, or are you seeing those even without
the patchset ?
Thanks!
-Lukas
>
> - Ted
>
> BEGIN TEST: Ext4 4k block Thu Jun 13 23:25:45 EDT 2013
> Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/255 generic/263 shared/218
> END TEST: Ext4 4k block Fri Jun 14 00:29:17 EDT 2013
>
> BEGIN TEST: Ext4 4k block w/nodelalloc, no flex_bg, and no extents Fri Jun 14 00:29:22 EDT 2013
> +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/269.full)
> generic/270 69s ... [01:34:24][ 8102.435986] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #25735: block 99331: comm quotacheck: bad entry in directory: rec_len % 4 != 0 -\
> offset=0(0), inode=3739147998, rec_len=57054, name_len=222
> Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/263 generic/269 generic/270 generic/285 generic/300 shared/218
> END TEST: Ext4 4k block w/nodelalloc, no flex_bg, and no extents Fri Jun 14 01:49:59 EDT 2013
>
> BEGIN TEST: Ext4 4k block w/ no journal Fri Jun 14 01:50:00 EDT 2013
> +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/269.full)
> generic/270 69s ... [02:20:21][10531.911437] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9894: block 6526: comm quotacheck: bad entry in directory: rec_len is smaller t\
> han minimal - offset=0(0), inode=0, rec_len=0, name_len=0
> [10532.535861] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9692: block 6534: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
> len=15934, name_len=62
> [10534.266775] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9906: block 6530: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
> len=15934, name_len=62
> [10534.697885] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #24673: block 6531: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec\
> _len=15934, name_len=62
> [10535.157126] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9898: block 6532: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
> len=15934, name_len=62
> [10536.395838] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9905: block 6529: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
> len=15934, name_len=62
> [10537.029470] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9899: block 6533: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
> len=15934, name_len=62
> [10537.259601] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9929: block 6527: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
> len=15934, name_len=62
> Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/255 generic/263 generic/269 generic/270 shared/218 shared/298
> END TEST: Ext4 4k block w/ no journal Fri Jun 14 02:32:14 EDT 2013
>
> BEGIN TEST: Ext4 1k block Fri Jun 14 02:32:18 EDT 2013
> Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/255 generic/263 generic/285 shared/218
> END TEST: Ext4 1k block Fri Jun 14 04:00:17 EDT 2013
>
> BEGIN TEST: Ext4 4k block w/nodelalloc and no flex_bg Fri Jun 14 04:00:20 EDT 2013
> +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/269.full)
> Failures: generic/075 generic/091 generic/112 generic/127 generic/223 generic/231 generic/255 generic/263 generic/269 shared/218
> END TEST: Ext4 4k block w/nodelalloc and no flex_bg Fri Jun 14 05:16:10 EDT 2013
>
> BEGIN TEST: Ext4 4k block w/metadata_csum Fri Jun 14 05:16:12 EDT 2013
> Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/255 generic/263 shared/218
> END TEST: Ext4 4k block w/metadata_csum Fri Jun 14 06:16:31 EDT 2013
>
> BEGIN TEST: Ext4 4k block w/dioread_nolock Fri Jun 14 06:16:31 EDT 2013
> _check_generic_filesystem: filesystem on /dev/vdb is inconsistent (see /root/xfstests/results/generic/013.full)
> Failures: generic/013
> END TEST: Ext4 4k block w/dioread_nolock Fri Jun 14 06:21:37 EDT 2013
>
> BEGIN TEST: Ext4 4k block w/data=journal Fri Jun 14 06:21:41 EDT 2013
> +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/269.full)
> +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/300.full)
> [29242.456266] WARNING: at /usr/local/google/home/tytso/linux/ext4/fs/buffer.c:1120 mark_buffer_dirty+0x54/0x1ff()
> Failures: generic/075 generic/091 generic/112 generic/127 generic/223 generic/231 generic/255 generic/263 generic/269 generic/270 generic/300 shared/218
> END TEST: Ext4 4k block w/data=journal Fri Jun 14 07:33:10 EDT 2013
>
> BEGIN TEST: Ext4 4k block w/bigalloc Fri Jun 14 07:33:16 EDT 2013
> [33544.485801] WARNING: at /usr/local/google/home/tytso/linux/ext4/fs/quota/dquot.c:1090 dquot_claim_space_nodirty+0xf1/0x1e3()
> Failures: generic/204 generic/219 generic/233 generic/235 generic/273 generic/275 generic/300 shared/218
> END TEST: Ext4 4k block w/bigalloc Fri Jun 14 08:56:56 EDT 2013
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On Mon, Jun 17, 2013 at 11:08:32AM +0200, Lukáš Czerner wrote:
> > Correction... reverting patches #15 through #19 (which is what I did in
> > the dev-with-revert branch found on ext4.git) causes the problem to go
> > away in the nojournal case, but it causes a huge number of other
> > problems. Some of the reverts weren't clean, so it's possible I
> > screwed up one of the reverts. It's also possible that only applying
> > part of this series leaves the tree in an unstable state.
> >
> > I'd much rather figure out how to fix the problem on the dev branch,
> > so thank you for looking into this!
>
> Wow, this looks bad. Theoretically reverting patches %15 through
> #19 should not have any real impact. So far I do not see what is
> causing that, but I am looking into this.
I've been looking into this more intensively over the weekend. I'm
now beginning to think we have had a pre-existing race, and the
changes in question has simply changed the timing. I tried a version
of the dev branch (you can find it as the branch dev2 in my
kernel.org's ext4.git tree) which only had patches 1 through 10 of the
invalidate page range patches (dropping patches 11 through 20), and I
found that generic/300 was failing in the configuration ext3 (a file
system with nodelalloc, no flex_bg, and no extents). I also found
the same failure with a 3.10-rc2 configuration.
The your changes seem to make generic/300 failure consistently for me
using the nojournal configuration, but looking at patches in question,
I don't think they could have directly caused the problem. Instead, I
think they just changed the timing to unmask the problem.
Given that I've seen generic/300 test failures in various different
baselines going all the way back to 3.9-rc4, this isn't a recent
regression. And given that it does seem to be timing sensitive,
bisecting it is going to be difficult. On the other hand, given that
using the dev (or master) branch, generic/300 is failing with a
greater than 70% probability using kvm with 2 cpu's, 2 megs of RAM and
5400 rpm laptop drives in nojournal mode, the fact that it's
reproducing relatively reliably hopefully will make it easier to find
the problem.
> I see that there are problems in other mode, not just nojournal. Are
> those caused by this as well, or are you seeing those even without
> the patchset ?
I think the other problems in my dev-with-revert branch was caused by
some screw up on my part when did the revert using git. I found that
dropping the patches from a copy of the guilt patch stack, and then
applying all of the patches except for the last half of the invalidate
page range patch series, resulted in a clean branch that didn't have
any of these failures. It's what I should have done late last week,
instead of trying to use "git revert".
Cheers,
- Ted
On Mon, 17 Jun 2013, Theodore Ts'o wrote:
> Date: Mon, 17 Jun 2013 08:25:18 -0400
> From: Theodore Ts'o <[email protected]>
> To: Lukáš Czerner <[email protected]>
> Cc: [email protected]
> Subject: Re: [PATCH v4 15/20] ext4: use ext4_zero_partial_blocks in punch_hole
>
> On Mon, Jun 17, 2013 at 11:08:32AM +0200, Lukáš Czerner wrote:
> > > Correction... reverting patches #15 through #19 (which is what I did in
> > > the dev-with-revert branch found on ext4.git) causes the problem to go
> > > away in the nojournal case, but it causes a huge number of other
> > > problems. Some of the reverts weren't clean, so it's possible I
> > > screwed up one of the reverts. It's also possible that only applying
> > > part of this series leaves the tree in an unstable state.
> > >
> > > I'd much rather figure out how to fix the problem on the dev branch,
> > > so thank you for looking into this!
> >
> > Wow, this looks bad. Theoretically reverting patches %15 through
> > #19 should not have any real impact. So far I do not see what is
> > causing that, but I am looking into this.
>
> I've been looking into this more intensively over the weekend. I'm
> now beginning to think we have had a pre-existing race, and the
> changes in question has simply changed the timing. I tried a version
> of the dev branch (you can find it as the branch dev2 in my
> kernel.org's ext4.git tree) which only had patches 1 through 10 of the
> invalidate page range patches (dropping patches 11 through 20), and I
> found that generic/300 was failing in the configuration ext3 (a file
> system with nodelalloc, no flex_bg, and no extents). I also found
> the same failure with a 3.10-rc2 configuration.
>
> The your changes seem to make generic/300 failure consistently for me
> using the nojournal configuration, but looking at patches in question,
> I don't think they could have directly caused the problem. Instead, I
> think they just changed the timing to unmask the problem.
Ok, I though that there is something weird because patches #1-#14
should not cause anything like that and from my testing (see my
previous mail) it really seems it does not cause it, at least not
directly.
>
> Given that I've seen generic/300 test failures in various different
> baselines going all the way back to 3.9-rc4, this isn't a recent
> regression. And given that it does seem to be timing sensitive,
> bisecting it is going to be difficult. On the other hand, given that
> using the dev (or master) branch, generic/300 is failing with a
> greater than 70% probability using kvm with 2 cpu's, 2 megs of RAM and
> 5400 rpm laptop drives in nojournal mode, the fact that it's
> reproducing relatively reliably hopefully will make it easier to find
> the problem.
As mentioned in previous email test generic/300 runs without any
problems (even in the loop) without journal with patches #1 through
#14 applied on 3.10-rc2 (c7788792a5e7b0d5d7f96d0766b4cb6112d47d75).
This is on kvm with 24 cpu's, 8GB of RAM (I suppose you're not using
2MB of ram in your setup, but rather 2GB :) and server drives with
linear lvm on top of it.
-Lukas
>
> > I see that there are problems in other mode, not just nojournal. Are
> > those caused by this as well, or are you seeing those even without
> > the patchset ?
>
> I think the other problems in my dev-with-revert branch was caused by
> some screw up on my part when did the revert using git. I found that
> dropping the patches from a copy of the guilt patch stack, and then
> applying all of the patches except for the last half of the invalidate
> page range patch series, resulted in a clean branch that didn't have
> any of these failures. It's what I should have done late last week,
> instead of trying to use "git revert".
>
> Cheers,
>
> - Ted
>
On Fri, 14 Jun 2013, Ted Ts'o wrote:
> Date: Fri, 14 Jun 2013 09:39:39 -0400
> From: Ted Ts'o <[email protected]>
> To: Lukáš Czerner <[email protected]>
> Cc: [email protected]
> Subject: Re: [PATCH v4 15/20] ext4: use ext4_zero_partial_blocks in punch_hole
>
> On Fri, Jun 14, 2013 at 12:16:53PM +0200, Lukáš Czerner wrote:
> > > It bisects down to this patch, and if I take the dev branch, and
> > > revert patches #15 through #19 in this series, the problem goes away.
>
> Correction... reverting patches #15 through #19 (which is what I did in
> the dev-with-revert branch found on ext4.git) causes the problem to go
> away in the nojournal case, but it causes a huge number of other
> problems. Some of the reverts weren't clean, so it's possible I
> screwed up one of the reverts. It's also possible that only applying
> part of this series leaves the tree in an unstable state.
>
> I'd much rather figure out how to fix the problem on the dev branch,
> so thank you for looking into this!
Ok so it seem like the problems you're seeing here after the revert
might be cause wrong revert. Simply applying patches #1 through #14
on top of the c7788792a5e7b0d5d7f96d0766b4cb6112d47d75 (Linux
3.10-rc2 - that seem to be what is ext4 branch based on?) does not
yield any errors without journal.
FSTYP -- ext4
PLATFORM -- Linux/x86_64 rhel6_vm1 3.10.0-rc2-debug+
MKFS_OPTIONS -- -q -F -b4096 -O ^has_journal /dev/sdb
MOUNT_OPTIONS -- -o acl,user_xattr /dev/sdb /mnt/test1
generic/075 10s ... 10s
generic/091 21s ... 18s
generic/112 10s
generic/127 286s ... 248s
generic/231 206s ... 77s
generic/255 1s ... 1s
generic/263 16s ... 11s
generic/269 29s ... 30s
generic/270 32s ... 32s
generic/300 8s ... 7s
shared/218 4s
shared/298 33s ... 21s
Ran: generic/075 generic/091 generic/112 generic/127 generic/231
generic/255 generic/263 generic/269 generic/270 generic/300
shared/218 shared/298
Passed all 12 tests
If required I will do the revert myself to make sure that nothing
breaks. However I certainly hope it would not be necessary.
I am still working to figure out what's going on. I'll keep you
posted.
Thanks!
-Lukas
>
> - Ted
>
> BEGIN TEST: Ext4 4k block Thu Jun 13 23:25:45 EDT 2013
> Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/255 generic/263 shared/218
> END TEST: Ext4 4k block Fri Jun 14 00:29:17 EDT 2013
>
> BEGIN TEST: Ext4 4k block w/nodelalloc, no flex_bg, and no extents Fri Jun 14 00:29:22 EDT 2013
> +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/269.full)
> generic/270 69s ... [01:34:24][ 8102.435986] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #25735: block 99331: comm quotacheck: bad entry in directory: rec_len % 4 != 0 -\
> offset=0(0), inode=3739147998, rec_len=57054, name_len=222
> Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/263 generic/269 generic/270 generic/285 generic/300 shared/218
> END TEST: Ext4 4k block w/nodelalloc, no flex_bg, and no extents Fri Jun 14 01:49:59 EDT 2013
>
> BEGIN TEST: Ext4 4k block w/ no journal Fri Jun 14 01:50:00 EDT 2013
> +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/269.full)
> generic/270 69s ... [02:20:21][10531.911437] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9894: block 6526: comm quotacheck: bad entry in directory: rec_len is smaller t\
> han minimal - offset=0(0), inode=0, rec_len=0, name_len=0
> [10532.535861] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9692: block 6534: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
> len=15934, name_len=62
> [10534.266775] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9906: block 6530: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
> len=15934, name_len=62
> [10534.697885] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #24673: block 6531: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec\
> _len=15934, name_len=62
> [10535.157126] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9898: block 6532: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
> len=15934, name_len=62
> [10536.395838] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9905: block 6529: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
> len=15934, name_len=62
> [10537.029470] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9899: block 6533: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
> len=15934, name_len=62
> [10537.259601] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9929: block 6527: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
> len=15934, name_len=62
> Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/255 generic/263 generic/269 generic/270 shared/218 shared/298
> END TEST: Ext4 4k block w/ no journal Fri Jun 14 02:32:14 EDT 2013
>
> BEGIN TEST: Ext4 1k block Fri Jun 14 02:32:18 EDT 2013
> Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/255 generic/263 generic/285 shared/218
> END TEST: Ext4 1k block Fri Jun 14 04:00:17 EDT 2013
>
> BEGIN TEST: Ext4 4k block w/nodelalloc and no flex_bg Fri Jun 14 04:00:20 EDT 2013
> +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/269.full)
> Failures: generic/075 generic/091 generic/112 generic/127 generic/223 generic/231 generic/255 generic/263 generic/269 shared/218
> END TEST: Ext4 4k block w/nodelalloc and no flex_bg Fri Jun 14 05:16:10 EDT 2013
>
> BEGIN TEST: Ext4 4k block w/metadata_csum Fri Jun 14 05:16:12 EDT 2013
> Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/255 generic/263 shared/218
> END TEST: Ext4 4k block w/metadata_csum Fri Jun 14 06:16:31 EDT 2013
>
> BEGIN TEST: Ext4 4k block w/dioread_nolock Fri Jun 14 06:16:31 EDT 2013
> _check_generic_filesystem: filesystem on /dev/vdb is inconsistent (see /root/xfstests/results/generic/013.full)
> Failures: generic/013
> END TEST: Ext4 4k block w/dioread_nolock Fri Jun 14 06:21:37 EDT 2013
>
> BEGIN TEST: Ext4 4k block w/data=journal Fri Jun 14 06:21:41 EDT 2013
> +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/269.full)
> +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/300.full)
> [29242.456266] WARNING: at /usr/local/google/home/tytso/linux/ext4/fs/buffer.c:1120 mark_buffer_dirty+0x54/0x1ff()
> Failures: generic/075 generic/091 generic/112 generic/127 generic/223 generic/231 generic/255 generic/263 generic/269 generic/270 generic/300 shared/218
> END TEST: Ext4 4k block w/data=journal Fri Jun 14 07:33:10 EDT 2013
>
> BEGIN TEST: Ext4 4k block w/bigalloc Fri Jun 14 07:33:16 EDT 2013
> [33544.485801] WARNING: at /usr/local/google/home/tytso/linux/ext4/fs/quota/dquot.c:1090 dquot_claim_space_nodirty+0xf1/0x1e3()
> Failures: generic/204 generic/219 generic/233 generic/235 generic/273 generic/275 generic/300 shared/218
> END TEST: Ext4 4k block w/bigalloc Fri Jun 14 08:56:56 EDT 2013
>
>
----- Original Message -----
> On Fri, 31 May 2013, Theodore Ts'o wrote:
>
> > Date: Fri, 31 May 2013 11:14:54 -0400
> > From: Theodore Ts'o <[email protected]>
> > To: Lukas Czerner <[email protected]>
> > Cc: [email protected], [email protected],
> > [email protected], [email protected],
> > [email protected], [email protected]
> > Subject: Re: [PATCH v4 20/20] ext4: Allow punch hole with bigalloc enabled
> >
> > On Tue, May 14, 2013 at 06:37:34PM +0200, Lukas Czerner wrote:
> > > In commits 5f95d21fb6f2aaa52830e5b7fb405f6c71d3ab85 and
> > > 30bc2ec9598a1b156ad75217f2e7d4560efdeeab we've reworked punch_hole
> > > implementation and there is noting holding us back from using punch hole
> > > on file system with bigalloc feature enabled.
> > >
> > > This has been tested with fsx and xfstests.
> > >
> > > Signed-off-by: Lukas Czerner <[email protected]>
> > > Reviewed-by: Jan Kara <[email protected]>
> >
> > This patch is causing a test failure with bigalloc enabled with the
> > xfstests shared/298.
> >
> > Since it's at the end of the invalidate page range tests, I'm going to
> > drop this patch for now. Could you take a look at this?
> >
> > Thanks!!
> >
> > - Ted
>
> Hi Ted,
>
> I should have really noticed this earlier. This test (shared/298)
> have nothing to do with bigalloc, nor punch hole. It tests file
> system discard implementation.
>
> The most likely reason it failed for you is that the tests does not
> count with bigalloc feature. However it seems to be working for me
> without any problems. Can you provide more information about the
> problem you've seen, or at least your xfstest configuration so we
> can see what went wrong and possibly fix the test ?
You are right, the test doesn't count with bigalloc. I was able to trigger the test failure by using MKFS_OPTIONS="-O bigalloc -C 8192"
If I understood it correctly, the dumpe2fs outputs free blocks even if bigalloc is used, that is only the first block of the cluster. I changed the test to count with whole cluster. Please try the following patch to the xfstests if it helps you. I tried different cluster sizes from 8192 to 65536 and it works for me.
Thank you!
Tom
http://www.fi.muni.cz/~xracek/0001-xfstests-298-fix-failure-on-ext4-with-bigalloc.patch
>From ccf4cb26505c3e64ef1bfb1264a17ed840a03a81 Mon Sep 17 00:00:00 2001
From: Tomas Racek <[email protected]>
Date: Tue, 18 Jun 2013 13:45:50 +0200
Subject: [PATCH] xfstests: 298: fix failure on ext4 with bigalloc
Count with cluster size instead of block size if bigalloc is used.
Signed-off-by: Tomas Racek <[email protected]>
---
tests/shared/298 | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/tests/shared/298 b/tests/shared/298
index 4541798..78a229d 100755
--- a/tests/shared/298
+++ b/tests/shared/298
@@ -56,14 +56,21 @@ get_free_sectors()
{
case $FSTYP in
ext4)
+ cluster_size=$($DUMPE2FS_PROG $img_file 2>&1 | sed -n 's/Cluster size: *\(.*\)/\1/p')
+ if [ -n "$cluster_size" ]; then
+ blocks_per_cluster=`expr $cluster_size / $block_size`
+ else
+ blocks_per_cluster=1
+ fi
+
$DUMPE2FS_PROG $img_file 2>&1 | grep " Free blocks" | cut -d ":" -f2- | \
tr ',' '\n' | $SED_PROG 's/^ //' | \
- $AWK_PROG -v spb=$sectors_per_block 'BEGIN{FS="-"};
+ $AWK_PROG -v spb=$sectors_per_block -v bpc=$blocks_per_cluster 'BEGIN{FS="-"};
NF {
if($2 != "") # range of blocks
- print spb * $1, spb * ($2 + 1) - 1;
+ print spb * $1, spb * ($2 + bpc) - 1;
else # just single block
- print spb * $1, spb * ($1 + 1) - 1;
+ print spb * $1, spb * ($1 + bpc) - 1;
}'
;;
xfs)
--
1.7.11.7
>
> Tom can you take a look at this ? (Adding Tomas Racek to the CC)
>
> So, since this failure is not really related to the patch itself,
> can we re-include the patch (it might be already too late I guess).
>
> Thanks!
> -Lukas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
On Thu, 13 Jun 2013, Theodore Ts'o wrote:
> Date: Thu, 13 Jun 2013 23:01:54 -0400
> From: Theodore Ts'o <[email protected]>
> To: Lukas Czerner <[email protected]>
> Cc: [email protected], [email protected],
> [email protected], [email protected],
> [email protected], [email protected]
> Subject: Re: [PATCH v4 15/20] ext4: use ext4_zero_partial_blocks in punch_hole
>
> On Tue, May 14, 2013 at 06:37:29PM +0200, Lukas Czerner wrote:
> > We're doing to get rid of ext4_discard_partial_page_buffers() since it is
> > duplicating some code and also partially duplicating work of
> > truncate_pagecache_range(), moreover the old implementation was much
> > clearer.
> >
> > Now when the truncate_inode_pages_range() can handle truncating non page
> > aligned regions we can use this to invalidate and zero out block aligned
> > region of the punched out range and then use ext4_block_truncate_page()
> > to zero the unaligned blocks on the start and end of the range. This
> > will greatly simplify the punch hole code. Moreover after this commit we
> > can get rid of the ext4_discard_partial_page_buffers() completely.
> >
> > We also introduce function ext4_prepare_punch_hole() to do come common
> > operations before we attempt to do the actual punch hole on
> > indirect or extent file which saves us some code duplication.
> >
> > This has been tested on ppc64 with 1k block size with fsx and xfstests
> > without any problems.
> >
> > Signed-off-by: Lukas Czerner <[email protected]>
>
> Hi Lukas,
>
> I've been seeing xfstests failures on test generic/300 in nojournal
> mode.
>
> BEGIN TEST: Ext4 4k block w/ no journal Thu Jun 13 22:38:47 EDT 2013
> Device: /dev/vdb
> mk2fs options: -q -O ^has_journal
> mount options: -o block_validity,noload
> FSTYP -- ext4
> PLATFORM -- Linux/i686 candygram 3.10.0-rc2-00477-g1e1cad7
> MKFS_OPTIONS -- -q -O ^has_journal /dev/vdc
> MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity,noload /dev/vdc /vdc
>
> generic/300 [20:42:18][ 116.877278] fio (3320) used greatest stack depth: 5580 bytes left
> [ 116.967122] fio (3321) used greatest stack depth: 5560 bytes left
> [ 117.573861] fio (3325) used greatest stack depth: 5504 bytes left
> [20:44:01] [failed, exit status 1] - output mismatch (see /root/xfstests/results/generic/300.out.bad)
> --- tests/generic/300.out 2013-06-04 22:42:55.000000000 -0400
> +++ /root/xfstests/results/generic/300.out.bad 2013-06-13 20:44:01.306666665 -0400
> @@ -2,3 +2,4 @@
>
> Run fio with random aio-dio pattern
>
> +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/300.full)
> ...
> (Run 'diff -u tests/generic/300.out /root/xfstests/results/generic/300.out.bad' to see the entire diff)
I think I've got this. The problem actually is in
ext4_zero_partial_blocks() where we would attempt to zero out page
which has been previously released by truncate_pagecache_range().
This might happen when we're punching out just a single page because
in ext4_zero_partial_blocks() we do not check whether we're dealing
with the whole, or partial page. At the point we're going to zero it
out it might have been already released and reused by someone else.
This patch should fix this issue. And indeed with this applied I do
not see the problem anymore but I am still testing.
-Lukas
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3acf353..ce9f926 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3698,33 +3698,36 @@ int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
{
struct super_block *sb = inode->i_sb;
struct address_space *mapping = inode->i_mapping;
- unsigned partial = lstart & (sb->s_blocksize - 1);
+ unsigned partial_start, partial_end;
ext4_fsblk_t start, end;
loff_t byte_end = (lstart + length - 1);
int err = 0;
+ partial_start = lstart & (sb->s_blocksize - 1);
+ partial_end = byte_end & (sb->s_blocksize - 1);
+
start = lstart >> sb->s_blocksize_bits;
end = byte_end >> sb->s_blocksize_bits;
/* Handle partial zero within the single block */
- if (start == end) {
+ if (start == end &&
+ (partial_start || (partial_end != sb->s_blocksize - 1))) {
err = ext4_block_zero_page_range(handle, mapping,
lstart, length);
return err;
}
/* Handle partial zero out on the start of the range */
- if (partial) {
+ if (partial_start) {
err = ext4_block_zero_page_range(handle, mapping,
lstart, sb->s_blocksize);
if (err)
return err;
}
/* Handle partial zero out on the end of the range */
- partial = byte_end & (sb->s_blocksize - 1);
- if (partial != sb->s_blocksize - 1)
+ if (partial_end != sb->s_blocksize - 1)
err = ext4_block_zero_page_range(handle, mapping,
- byte_end - partial,
- partial + 1);
+ byte_end - partial_end,
+ partial_end + 1);
return err;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
On Wed, Jun 19, 2013 at 06:37:53PM +0200, Lukáš Czerner wrote:
>
> I think I've got this. The problem actually is in
> ext4_zero_partial_blocks() where we would attempt to zero out page
> which has been previously released by truncate_pagecache_range().
> This might happen when we're punching out just a single page because
> in ext4_zero_partial_blocks() we do not check whether we're dealing
> with the whole, or partial page. At the point we're going to zero it
> out it might have been already released and reused by someone else.
>
> This patch should fix this issue. And indeed with this applied I do
> not see the problem anymore but I am still testing.
Thanks for finding this! I'm still doing testing of your trial patch
myself, but initial results seem to indicate that this also solves the
failures 269 and 270 which was apparently uncovered by Ashish's patch
"ext4: optimize extent selection for block removal in case of hole
punch".
- Ted
On Wed, 19 Jun 2013, Theodore Ts'o wrote:
> Date: Wed, 19 Jun 2013 19:42:43 -0400
> From: Theodore Ts'o <[email protected]>
> To: Lukáš Czerner <[email protected]>
> Cc: [email protected], Ashish Sangwan <[email protected]>
> Subject: Re: [PATCH v4 15/20] ext4: use ext4_zero_partial_blocks in punch_hole
>
> On Wed, Jun 19, 2013 at 06:37:53PM +0200, Lukáš Czerner wrote:
> >
> > I think I've got this. The problem actually is in
> > ext4_zero_partial_blocks() where we would attempt to zero out page
> > which has been previously released by truncate_pagecache_range().
> > This might happen when we're punching out just a single page because
> > in ext4_zero_partial_blocks() we do not check whether we're dealing
> > with the whole, or partial page. At the point we're going to zero it
> > out it might have been already released and reused by someone else.
> >
> > This patch should fix this issue. And indeed with this applied I do
> > not see the problem anymore but I am still testing.
>
> Thanks for finding this! I'm still doing testing of your trial patch
> myself, but initial results seem to indicate that this also solves the
> failures 269 and 270 which was apparently uncovered by Ashish's patch
> "ext4: optimize extent selection for block removal in case of hole
> punch".
>
> - Ted
Hi Ted,
from my testing I still see problems with test 269, however I've
seen this even without the patch #15 so I am not sure that it's
caused by this.
I'll send a proper patch soon.
Thanks!
-Lukas
Currently if we pass range into ext4_zero_partial_blocks() which covers
entire block we would attempt to zero it even though we should only zero
unaligned part of the block.
Fix this by checking whether the range covers the whole block skip
zeroing if so.
Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/inode.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3acf353..ce9f926 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3698,33 +3698,36 @@ int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
{
struct super_block *sb = inode->i_sb;
struct address_space *mapping = inode->i_mapping;
- unsigned partial = lstart & (sb->s_blocksize - 1);
+ unsigned partial_start, partial_end;
ext4_fsblk_t start, end;
loff_t byte_end = (lstart + length - 1);
int err = 0;
+ partial_start = lstart & (sb->s_blocksize - 1);
+ partial_end = byte_end & (sb->s_blocksize - 1);
+
start = lstart >> sb->s_blocksize_bits;
end = byte_end >> sb->s_blocksize_bits;
/* Handle partial zero within the single block */
- if (start == end) {
+ if (start == end &&
+ (partial_start || (partial_end != sb->s_blocksize - 1))) {
err = ext4_block_zero_page_range(handle, mapping,
lstart, length);
return err;
}
/* Handle partial zero out on the start of the range */
- if (partial) {
+ if (partial_start) {
err = ext4_block_zero_page_range(handle, mapping,
lstart, sb->s_blocksize);
if (err)
return err;
}
/* Handle partial zero out on the end of the range */
- partial = byte_end & (sb->s_blocksize - 1);
- if (partial != sb->s_blocksize - 1)
+ if (partial_end != sb->s_blocksize - 1)
err = ext4_block_zero_page_range(handle, mapping,
- byte_end - partial,
- partial + 1);
+ byte_end - partial_end,
+ partial_end + 1);
return err;
}
--
1.8.2.1
On Thu, Jun 20, 2013 at 11:14:33AM +0200, Lukas Czerner wrote:
> Currently if we pass range into ext4_zero_partial_blocks() which covers
> entire block we would attempt to zero it even though we should only zero
> unaligned part of the block.
>
> Fix this by checking whether the range covers the whole block skip
> zeroing if so.
>
> Signed-off-by: Lukas Czerner <[email protected]>
Applied, thanks.
- Ted