2013-02-09 21:53:59

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 00/12] jbd2 optimization and bug fixes

I've been recently looking at journalling code in ext4, and it's clear
that no one has really been through this code in a while. It's easy to
find some easy optimization.

Some general rules of thumb that developers should keep in mind when
making changes to ext4:

1) It's important to minimize the amount of time that a handle is held
active, since a journal commit can't be closed until all handles have
been stopped. In the ideal world, disk reads and memory allocations
should be done *before* calling ext4_journal_start().

2) It's important that number of credits needed for a particular handle
be large enough; otherwise it's possible that we run out of space in the
journal without being able to finish the handle.

3) It's also important that we not overestimate then number of credits
needed, since otherwise we might close a transaction too early --- and
in particular, the process which starts the transaction commit process
will tend to suffer the greatest amount of latency.

Currently enabling quotas cause the number of credits required to
balloon significantly. Most of the time we need nowhere the number of
credits which we reserve, so there's opportunity to optimize things.
(For example, we could make sure create the quota record is created in a
seprate transaction.)


Theodore Ts'o (12):
jbd2: track request delay statistics
jbd2: revert "jbd2: add COW fields to struct jbd2_journal_handle"
jbd2: add tracepoints which provide per-handle statistics
ext4: move the jbd2 wrapper functions out of super.c
ext4: pass context information to jbd2__journal_start()
ext4: grab page before starting transaction handle in write_begin()
ext4: start handle at the last possible moment in ext4_unlink()
ext4: start handle at the last possible moment in ext4_rmdir()
ext4: fix the number of credits needed for ext4_ext_migrate()
ext4: fix the number of credits needed for ext4_unlink() and
ext4_rmdir()
ext4: fix the number of credits needed for acl ops with inline data
ext4: start handle at the last possible moment when creating inodes

fs/ext4/acl.c | 7 ++-
fs/ext4/ext4.h | 89 +++++++++++++++++++++++++-
fs/ext4/ext4_jbd2.c | 102 ++++++++++++++++++++++++++++++
fs/ext4/ext4_jbd2.h | 51 ++++++++++++---
fs/ext4/extents.c | 11 ++--
fs/ext4/file.c | 2 +-
fs/ext4/ialloc.c | 17 ++++-
fs/ext4/indirect.c | 7 ++-
fs/ext4/inline.c | 10 +--
fs/ext4/inode.c | 144 +++++++++++++++++++++++++-----------------
fs/ext4/ioctl.c | 4 +-
fs/ext4/migrate.c | 15 +++--
fs/ext4/move_extent.c | 2 +-
fs/ext4/namei.c | 150 +++++++++++++++++++++++---------------------
fs/ext4/resize.c | 10 +--
fs/ext4/super.c | 117 +++-------------------------------
fs/ext4/xattr.c | 11 +---
fs/ext4/xattr.h | 68 --------------------
fs/jbd2/commit.c | 8 +++
fs/jbd2/journal.c | 12 +++-
fs/jbd2/transaction.c | 29 ++++++++-
include/linux/jbd2.h | 41 +++++-------
include/trace/events/jbd2.h | 106 ++++++++++++++++++++++++++++++-
23 files changed, 621 insertions(+), 392 deletions(-)

--
1.7.12.rc0.22.gcdd159b



2013-02-09 21:53:54

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 02/12] jbd2: revert "jbd2: add COW fields to struct jbd2_journal_handle"

This reverts commit 93737456d68ddcb86232f669b83da673dd12e351.

The cow-snapshots effort is no longer active, so remove these extra
fields to shrink down the handle structure.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
include/linux/jbd2.h | 28 +++-------------------------
1 file changed, 3 insertions(+), 25 deletions(-)

diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index e0aafc4..24db725 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -397,35 +397,13 @@ struct jbd2_journal_handle
int h_err;

/* Flags [no locking] */
- unsigned int h_sync:1; /* sync-on-close */
- unsigned int h_jdata:1; /* force data journaling */
- unsigned int h_aborted:1; /* fatal error on handle */
- unsigned int h_cowing:1; /* COWing block to snapshot */
-
- /* Number of buffers requested by user:
- * (before adding the COW credits factor) */
- unsigned int h_base_credits:14;
-
- /* Number of buffers the user is allowed to dirty:
- * (counts only buffers dirtied when !h_cowing) */
- unsigned int h_user_credits:14;
-
+ unsigned int h_sync: 1; /* sync-on-close */
+ unsigned int h_jdata: 1; /* force data journaling */
+ unsigned int h_aborted: 1; /* fatal error on handle */

#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map h_lockdep_map;
#endif

2013-02-09 21:53:54

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 04/12] ext4: move the jbd2 wrapper functions out of super.c

Move the jbd2 wrapper functions which start and stop handles out of
super.c, where they don't really logically belong, and into
ext4_jbd2.c.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/ext4.h | 2 +
fs/ext4/ext4_jbd2.c | 101 +++++++++++++++++++++++++++++++++++++++++++++++++
fs/ext4/super.c | 107 +---------------------------------------------------
3 files changed, 105 insertions(+), 105 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 829cba9..c2c64da 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2136,6 +2136,8 @@ extern void *ext4_kvzalloc(size_t size, gfp_t flags);
extern void ext4_kvfree(void *ptr);
extern int ext4_alloc_flex_bg_array(struct super_block *sb,
ext4_group_t ngroup);
+extern const char *ext4_decode_error(struct super_block *sb, int errno,
+ char nbuf[16]);
extern __printf(4, 5)
void __ext4_error(struct super_block *, const char *, unsigned int,
const char *, ...);
diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index b4323ba..6f61145 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -6,6 +6,107 @@

#include <trace/events/ext4.h>

+/* Just increment the non-pointer handle value */
+static handle_t *ext4_get_nojournal(void)
+{
+ handle_t *handle = current->journal_info;
+ unsigned long ref_cnt = (unsigned long)handle;
+
+ BUG_ON(ref_cnt >= EXT4_NOJOURNAL_MAX_REF_COUNT);
+
+ ref_cnt++;
+ handle = (handle_t *)ref_cnt;
+
+ current->journal_info = handle;
+ return handle;
+}
+
+
+/* Decrement the non-pointer handle value */
+static void ext4_put_nojournal(handle_t *handle)
+{
+ unsigned long ref_cnt = (unsigned long)handle;
+
+ BUG_ON(ref_cnt == 0);
+
+ ref_cnt--;
+ handle = (handle_t *)ref_cnt;
+
+ current->journal_info = handle;
+}
+
+/*
+ * Wrappers for jbd2_journal_start/end.
+ */
+handle_t *ext4_journal_start_sb(struct super_block *sb, int nblocks)
+{
+ journal_t *journal;
+
+ trace_ext4_journal_start(sb, nblocks, _RET_IP_);
+ if (sb->s_flags & MS_RDONLY)
+ return ERR_PTR(-EROFS);
+
+ WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
+ journal = EXT4_SB(sb)->s_journal;
+ if (!journal)
+ return ext4_get_nojournal();
+ /*
+ * Special case here: if the journal has aborted behind our
+ * backs (eg. EIO in the commit thread), then we still need to
+ * take the FS itself readonly cleanly.
+ */
+ if (is_journal_aborted(journal)) {
+ ext4_abort(sb, "Detected aborted journal");
+ return ERR_PTR(-EROFS);
+ }
+ return jbd2_journal_start(journal, nblocks);
+}
+
+int __ext4_journal_stop(const char *where, unsigned int line, handle_t *handle)
+{
+ struct super_block *sb;
+ int err;
+ int rc;
+
+ if (!ext4_handle_valid(handle)) {
+ ext4_put_nojournal(handle);
+ return 0;
+ }
+ sb = handle->h_transaction->t_journal->j_private;
+ err = handle->h_err;
+ rc = jbd2_journal_stop(handle);
+
+ if (!err)
+ err = rc;
+ if (err)
+ __ext4_std_error(sb, where, line, err);
+ return err;
+}
+
+void ext4_journal_abort_handle(const char *caller, unsigned int line,
+ const char *err_fn, struct buffer_head *bh,
+ handle_t *handle, int err)
+{
+ char nbuf[16];
+ const char *errstr = ext4_decode_error(NULL, err, nbuf);
+
+ BUG_ON(!ext4_handle_valid(handle));
+
+ if (bh)
+ BUFFER_TRACE(bh, "abort");
+
+ if (!handle->h_err)
+ handle->h_err = err;
+
+ if (is_handle_aborted(handle))
+ return;
+
+ printk(KERN_ERR "EXT4-fs: %s:%d: aborting transaction: %s in %s\n",
+ caller, line, errstr, err_fn);
+
+ jbd2_journal_abort_handle(handle);
+}
+
int __ext4_journal_get_write_access(const char *where, unsigned int line,
handle_t *handle, struct buffer_head *bh)
{
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index b132df5..5812f29 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -69,8 +69,6 @@ static void ext4_mark_recovery_complete(struct super_block *sb,
static void ext4_clear_journal_err(struct super_block *sb,
struct ext4_super_block *es);
static int ext4_sync_fs(struct super_block *sb, int wait);
-static const char *ext4_decode_error(struct super_block *sb, int errno,
- char nbuf[16]);
static int ext4_remount(struct super_block *sb, int *flags, char *data);
static int ext4_statfs(struct dentry *dentry, struct kstatfs *buf);
static int ext4_unfreeze(struct super_block *sb);
@@ -296,107 +294,6 @@ void ext4_itable_unused_set(struct super_block *sb,
}


-/* Just increment the non-pointer handle value */
-static handle_t *ext4_get_nojournal(void)
-{
- handle_t *handle = current->journal_info;
- unsigned long ref_cnt = (unsigned long)handle;
-
- BUG_ON(ref_cnt >= EXT4_NOJOURNAL_MAX_REF_COUNT);
-
- ref_cnt++;
- handle = (handle_t *)ref_cnt;
-
- current->journal_info = handle;
- return handle;
-}
-
-
-/* Decrement the non-pointer handle value */
-static void ext4_put_nojournal(handle_t *handle)
-{
- unsigned long ref_cnt = (unsigned long)handle;
-
- BUG_ON(ref_cnt == 0);
-
- ref_cnt--;
- handle = (handle_t *)ref_cnt;
-
- current->journal_info = handle;
-}
-
-/*
- * Wrappers for jbd2_journal_start/end.
- */
-handle_t *ext4_journal_start_sb(struct super_block *sb, int nblocks)
-{
- journal_t *journal;
-
- trace_ext4_journal_start(sb, nblocks, _RET_IP_);
- if (sb->s_flags & MS_RDONLY)
- return ERR_PTR(-EROFS);
-
- WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
- journal = EXT4_SB(sb)->s_journal;
- if (!journal)
- return ext4_get_nojournal();
- /*
- * Special case here: if the journal has aborted behind our
- * backs (eg. EIO in the commit thread), then we still need to
- * take the FS itself readonly cleanly.
- */
- if (is_journal_aborted(journal)) {
- ext4_abort(sb, "Detected aborted journal");
- return ERR_PTR(-EROFS);
- }
- return jbd2_journal_start(journal, nblocks);
-}
-
-int __ext4_journal_stop(const char *where, unsigned int line, handle_t *handle)
-{
- struct super_block *sb;
- int err;
- int rc;
-
- if (!ext4_handle_valid(handle)) {
- ext4_put_nojournal(handle);
- return 0;
- }
- sb = handle->h_transaction->t_journal->j_private;
- err = handle->h_err;
- rc = jbd2_journal_stop(handle);
-
- if (!err)
- err = rc;
- if (err)
- __ext4_std_error(sb, where, line, err);
- return err;
-}
-
-void ext4_journal_abort_handle(const char *caller, unsigned int line,
- const char *err_fn, struct buffer_head *bh,
- handle_t *handle, int err)
-{
- char nbuf[16];
- const char *errstr = ext4_decode_error(NULL, err, nbuf);
-
- BUG_ON(!ext4_handle_valid(handle));
-
- if (bh)
- BUFFER_TRACE(bh, "abort");
-
- if (!handle->h_err)
- handle->h_err = err;
-
- if (is_handle_aborted(handle))
- return;
-
- printk(KERN_ERR "EXT4-fs: %s:%d: aborting transaction: %s in %s\n",
- caller, line, errstr, err_fn);
-
- jbd2_journal_abort_handle(handle);
-}

2013-02-09 21:53:56

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 11/12] ext4: fix the number of credits needed for acl ops with inline data

Operations which modify extended attributes may need extra journal
credits if inline data is used, since there is a chance that some
extended attributes may need to get pushed to an external attribute
block.

Changes to reflect this was made in xattr.c, but they were missed in
fs/ext4/acl.c. To fix this, abstract the calculation of the number of
credits needed for xattr operations to an inline function defined in
ext4_jbd2.h, and use it in acl.c and xattr.c.

Also move the function declarations used in inline.c from xattr.h
(where they are non-obviously hidden, and caused problems since
ext4_jbd2.h needs to use the function ext4_has_inline_data), and move
them to ext4.h.

Signed-off-by: "Theodore Ts'o" <[email protected]>
Cc: Tao Ma <[email protected]>
---
fs/ext4/acl.c | 4 +--
fs/ext4/ext4.h | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/ext4/ext4_jbd2.h | 14 +++++++++++
fs/ext4/xattr.c | 9 +------
fs/ext4/xattr.h | 68 ---------------------------------------------------
5 files changed, 87 insertions(+), 78 deletions(-)

diff --git a/fs/ext4/acl.c b/fs/ext4/acl.c
index 406cf8b..39a54a0 100644
--- a/fs/ext4/acl.c
+++ b/fs/ext4/acl.c
@@ -325,7 +325,7 @@ ext4_acl_chmod(struct inode *inode)
return error;
retry:
handle = ext4_journal_start(inode, EXT4_HT_XATTR,
- EXT4_DATA_TRANS_BLOCKS(inode->i_sb));
+ ext4_jbd2_credits_xattr(inode));
if (IS_ERR(handle)) {
error = PTR_ERR(handle);
ext4_std_error(inode->i_sb, error);
@@ -423,7 +423,7 @@ ext4_xattr_set_acl(struct dentry *dentry, const char *name, const void *value,

retry:
handle = ext4_journal_start(inode, EXT4_HT_XATTR,
- EXT4_DATA_TRANS_BLOCKS(inode->i_sb));
+ ext4_jbd2_credits_xattr(inode));
if (IS_ERR(handle)) {
error = PTR_ERR(handle);
goto release_and_out;
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index c2c64da..9897cdf 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2443,6 +2443,75 @@ extern const struct file_operations ext4_file_operations;
extern loff_t ext4_llseek(struct file *file, loff_t offset, int origin);
extern void ext4_unwritten_wait(struct inode *inode);

+/* inline.c */
+extern int ext4_has_inline_data(struct inode *inode);
+extern int ext4_get_inline_size(struct inode *inode);
+extern int ext4_get_max_inline_size(struct inode *inode);
+extern int ext4_find_inline_data_nolock(struct inode *inode);
+extern void ext4_write_inline_data(struct inode *inode,
+ struct ext4_iloc *iloc,
+ void *buffer, loff_t pos,
+ unsigned int len);
+extern int ext4_prepare_inline_data(handle_t *handle, struct inode *inode,
+ unsigned int len);
+extern int ext4_init_inline_data(handle_t *handle, struct inode *inode,
+ unsigned int len);
+extern int ext4_destroy_inline_data(handle_t *handle, struct inode *inode);
+
+extern int ext4_readpage_inline(struct inode *inode, struct page *page);
+extern int ext4_try_to_write_inline_data(struct address_space *mapping,
+ struct inode *inode,
+ loff_t pos, unsigned len,
+ unsigned flags,
+ struct page **pagep);
+extern int ext4_write_inline_data_end(struct inode *inode,
+ loff_t pos, unsigned len,
+ unsigned copied,
+ struct page *page);
+extern struct buffer_head *
+ext4_journalled_write_inline_data(struct inode *inode,
+ unsigned len,
+ struct page *page);
+extern int ext4_da_write_inline_data_begin(struct address_space *mapping,
+ struct inode *inode,
+ loff_t pos, unsigned len,
+ unsigned flags,
+ struct page **pagep,
+ void **fsdata);
+extern int ext4_da_write_inline_data_end(struct inode *inode, loff_t pos,
+ unsigned len, unsigned copied,
+ struct page *page);
+extern int ext4_try_add_inline_entry(handle_t *handle, struct dentry *dentry,
+ struct inode *inode);
+extern int ext4_try_create_inline_dir(handle_t *handle,
+ struct inode *parent,
+ struct inode *inode);
+extern int ext4_read_inline_dir(struct file *filp,
+ void *dirent, filldir_t filldir,
+ int *has_inline_data);
+extern struct buffer_head *ext4_find_inline_entry(struct inode *dir,
+ const struct qstr *d_name,
+ struct ext4_dir_entry_2 **res_dir,
+ int *has_inline_data);
+extern int ext4_delete_inline_entry(handle_t *handle,
+ struct inode *dir,
+ struct ext4_dir_entry_2 *de_del,
+ struct buffer_head *bh,
+ int *has_inline_data);
+extern int empty_inline_dir(struct inode *dir, int *has_inline_data);
+extern struct buffer_head *ext4_get_first_inline_block(struct inode *inode,
+ struct ext4_dir_entry_2 **parent_de,
+ int *retval);
+extern int ext4_inline_data_fiemap(struct inode *inode,
+ struct fiemap_extent_info *fieinfo,
+ int *has_inline);
+extern int ext4_try_to_evict_inline_data(handle_t *handle,
+ struct inode *inode,
+ int needed);
+extern void ext4_inline_data_truncate(struct inode *inode, int *has_inline);
+
+extern int ext4_convert_inline_data(struct inode *inode);
+
/* namei.c */
extern const struct inode_operations ext4_dir_inode_operations;
extern const struct inode_operations ext4_special_inode_operations;
@@ -2539,6 +2608,7 @@ extern void ext4_mmp_csum_set(struct super_block *sb, struct mmp_struct *mmp);
extern int ext4_mmp_csum_verify(struct super_block *sb,
struct mmp_struct *mmp);

+
/* BH_Uninit flag: blocks are allocated but uninitialized on disk */
enum ext4_state_bits {
BH_Uninit /* blocks are allocated but uninitialized on disk */
diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index c1fc2dc..4c216b1 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -104,6 +104,20 @@
#define EXT4_MAXQUOTAS_INIT_BLOCKS(sb) (MAXQUOTAS*EXT4_QUOTA_INIT_BLOCKS(sb))
#define EXT4_MAXQUOTAS_DEL_BLOCKS(sb) (MAXQUOTAS*EXT4_QUOTA_DEL_BLOCKS(sb))

+static inline int ext4_jbd2_credits_xattr(struct inode *inode)
+{
+ int credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb);
+
+ /*
+ * In case of inline data, we may push out the data to a block,
+ * so we need to reserve credits for this eventuality
+ */
+ if (ext4_has_inline_data(inode))
+ credits += ext4_writepage_trans_blocks(inode) + 1;
+ return credits;
+}
+
+
/*
* Ext4 handle operation types -- for logging purposes
*/
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 2efc560..cc31da0 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -1165,16 +1165,9 @@ ext4_xattr_set(struct inode *inode, int name_index, const char *name,
{
handle_t *handle;
int error, retries = 0;
- int credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb);
+ int credits = ext4_jbd2_credits_xattr(inode);

retry:
- /*
- * In case of inline data, we may push out the data to a block,
- * So reserve the journal space first.
- */
- if (ext4_has_inline_data(inode))
- credits += ext4_writepage_trans_blocks(inode) + 1;
-
handle = ext4_journal_start(inode, EXT4_HT_XATTR, credits);
if (IS_ERR(handle)) {
error = PTR_ERR(handle);
diff --git a/fs/ext4/xattr.h b/fs/ext4/xattr.h
index 69eda78..aa25deb 100644
--- a/fs/ext4/xattr.h
+++ b/fs/ext4/xattr.h
@@ -125,74 +125,6 @@ extern int ext4_xattr_ibody_inline_set(handle_t *handle, struct inode *inode,
struct ext4_xattr_info *i,
struct ext4_xattr_ibody_find *is);

-extern int ext4_has_inline_data(struct inode *inode);
-extern int ext4_get_inline_size(struct inode *inode);
-extern int ext4_get_max_inline_size(struct inode *inode);
-extern int ext4_find_inline_data_nolock(struct inode *inode);
-extern void ext4_write_inline_data(struct inode *inode,
- struct ext4_iloc *iloc,
- void *buffer, loff_t pos,
- unsigned int len);
-extern int ext4_prepare_inline_data(handle_t *handle, struct inode *inode,
- unsigned int len);
-extern int ext4_init_inline_data(handle_t *handle, struct inode *inode,
- unsigned int len);
-extern int ext4_destroy_inline_data(handle_t *handle, struct inode *inode);
-
-extern int ext4_readpage_inline(struct inode *inode, struct page *page);
-extern int ext4_try_to_write_inline_data(struct address_space *mapping,
- struct inode *inode,
- loff_t pos, unsigned len,
- unsigned flags,
- struct page **pagep);
-extern int ext4_write_inline_data_end(struct inode *inode,
- loff_t pos, unsigned len,
- unsigned copied,
- struct page *page);
-extern struct buffer_head *
-ext4_journalled_write_inline_data(struct inode *inode,
- unsigned len,
- struct page *page);
-extern int ext4_da_write_inline_data_begin(struct address_space *mapping,
- struct inode *inode,
- loff_t pos, unsigned len,
- unsigned flags,
- struct page **pagep,
- void **fsdata);
-extern int ext4_da_write_inline_data_end(struct inode *inode, loff_t pos,
- unsigned len, unsigned copied,
- struct page *page);
-extern int ext4_try_add_inline_entry(handle_t *handle, struct dentry *dentry,
- struct inode *inode);
-extern int ext4_try_create_inline_dir(handle_t *handle,
- struct inode *parent,
- struct inode *inode);
-extern int ext4_read_inline_dir(struct file *filp,
- void *dirent, filldir_t filldir,
- int *has_inline_data);
-extern struct buffer_head *ext4_find_inline_entry(struct inode *dir,
- const struct qstr *d_name,
- struct ext4_dir_entry_2 **res_dir,
- int *has_inline_data);
-extern int ext4_delete_inline_entry(handle_t *handle,
- struct inode *dir,
- struct ext4_dir_entry_2 *de_del,
- struct buffer_head *bh,
- int *has_inline_data);
-extern int empty_inline_dir(struct inode *dir, int *has_inline_data);
-extern struct buffer_head *ext4_get_first_inline_block(struct inode *inode,
- struct ext4_dir_entry_2 **parent_de,
- int *retval);
-extern int ext4_inline_data_fiemap(struct inode *inode,
- struct fiemap_extent_info *fieinfo,
- int *has_inline);
-extern int ext4_try_to_evict_inline_data(handle_t *handle,
- struct inode *inode,
- int needed);
-extern void ext4_inline_data_truncate(struct inode *inode, int *has_inline);
-
-extern int ext4_convert_inline_data(struct inode *inode);

2013-02-09 21:53:55

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 09/12] ext4: fix the number of credits needed for ext4_ext_migrate()

The migration ioctl creates a temporary inode. Since this inode is
never linked to a directory, we don't need to reserve journal credits
required for modifying the directory.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/migrate.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
index 4e4fcfd..480acf4 100644
--- a/fs/ext4/migrate.c
+++ b/fs/ext4/migrate.c
@@ -456,11 +456,14 @@ int ext4_ext_migrate(struct inode *inode)
*/
return retval;

+ /*
+ * Worst case we can touch the allocation bitmaps, a bgd
+ * block, and a block to link in the orphan list. We do need
+ * need to worry about credits for modifying the quota inode.
+ */
handle = ext4_journal_start(inode, EXT4_HT_MIGRATE,
- EXT4_DATA_TRANS_BLOCKS(inode->i_sb) +
- EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
- EXT4_MAXQUOTAS_INIT_BLOCKS(inode->i_sb)
- + 1);
+ 4 + EXT4_MAXQUOTAS_TRANS_BLOCKS(inode->i_sb));
+
if (IS_ERR(handle)) {
retval = PTR_ERR(handle);
return retval;
--
1.7.12.rc0.22.gcdd159b


2013-02-09 21:53:55

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 08/12] ext4: start handle at the last possible moment in ext4_rmdir()

Don't start the jbd2 transaction handle until after the directory
entry has been found, to minimize the amount of time that a handle is
held active.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/namei.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 4a27069..36a4afd 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2725,26 +2725,18 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry)
struct inode *inode;
struct buffer_head *bh;
struct ext4_dir_entry_2 *de;
- handle_t *handle;
+ handle_t *handle = NULL;

/* Initialize quotas before so that eventual writes go in
* separate transaction */
dquot_initialize(dir);
dquot_initialize(dentry->d_inode);

- handle = ext4_journal_start(dir, EXT4_HT_DIR,
- EXT4_DELETE_TRANS_BLOCKS(dir->i_sb));
- if (IS_ERR(handle))
- return PTR_ERR(handle);
-
retval = -ENOENT;
bh = ext4_find_entry(dir, &dentry->d_name, &de, NULL);
if (!bh)
goto end_rmdir;

- if (IS_DIRSYNC(dir))
- ext4_handle_sync(handle);
-
inode = dentry->d_inode;

retval = -EIO;
@@ -2755,6 +2747,17 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry)
if (!empty_dir(inode))
goto end_rmdir;

+ handle = ext4_journal_start(dir, EXT4_HT_DIR,
+ EXT4_DELETE_TRANS_BLOCKS(dir->i_sb));
+ if (IS_ERR(handle)) {
+ retval = PTR_ERR(handle);
+ handle = NULL;
+ goto end_rmdir;
+ }
+
+ if (IS_DIRSYNC(dir))
+ ext4_handle_sync(handle);
+
retval = ext4_delete_entry(handle, dir, de, bh);
if (retval)
goto end_rmdir;
@@ -2776,8 +2779,9 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry)
ext4_mark_inode_dirty(handle, dir);

end_rmdir:
- ext4_journal_stop(handle);
brelse(bh);
+ if (handle)
+ ext4_journal_stop(handle);
return retval;
}

--
1.7.12.rc0.22.gcdd159b


2013-02-09 21:53:55

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 12/12] ext4: start handle at the last possible moment when creating inodes

In ext4_{create,mknod,mkdir,symlink}(), don't start the journal handle
until the inode has been succesfully allocated. In order to do this,
we need to start the handle in the ext4_new_inode(). So create a new
variant of this function, ext4_new_inode_start_handle(), so the handle
can be created at the last possible minute, before we need to modify
the inode allocation bitmap block.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/ext4.h | 17 ++++++++--
fs/ext4/ialloc.c | 15 +++++++--
fs/ext4/namei.c | 94 ++++++++++++++++++++++++++------------------------------
3 files changed, 71 insertions(+), 55 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 9897cdf..05940ce 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1991,9 +1991,20 @@ extern int ext4fs_dirhash(const char *name, int len, struct
dx_hash_info *hinfo);

/* ialloc.c */
-extern struct inode *ext4_new_inode(handle_t *, struct inode *, umode_t,
- const struct qstr *qstr, __u32 goal,
- uid_t *owner);
+extern struct inode *__ext4_new_inode(handle_t *, struct inode *, umode_t,
+ const struct qstr *qstr, __u32 goal,
+ uid_t *owner, int handle_type,
+ unsigned int line_no, int nblocks);
+
+#define ext4_new_inode(handle, dir, mode, qstr, goal, owner) \
+ __ext4_new_inode((handle), (dir), (mode), (qstr), (goal), (owner), \
+ 0, 0, 0)
+#define ext4_new_inode_start_handle(dir, mode, qstr, goal, owner, \
+ type, nblocks) \
+ __ext4_new_inode(0, (dir), (mode), (qstr), (goal), (owner), \
+ (type), __LINE__, (nblocks))
+
+
extern void ext4_free_inode(handle_t *, struct inode *);
extern struct inode * ext4_orphan_get(struct super_block *, unsigned long);
extern unsigned long ext4_count_free_inodes(struct super_block *);
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 10bd6fe..943665b 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -634,8 +634,10 @@ static int find_group_other(struct super_block *sb, struct inode *parent,
* For other inodes, search forward from the parent directory's block
* group to find a free inode.
*/
-struct inode *ext4_new_inode(handle_t *handle, struct inode *dir, umode_t mode,
- const struct qstr *qstr, __u32 goal, uid_t *owner)
+struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
+ umode_t mode, const struct qstr *qstr,
+ __u32 goal, uid_t *owner, int handle_type,
+ unsigned int line_no, int nblocks)
{
struct super_block *sb;
struct buffer_head *inode_bitmap_bh = NULL;
@@ -725,6 +727,15 @@ repeat_in_this_group:
"inode=%lu", ino + 1);
continue;
}
+ if (!handle) {
+ BUG_ON(nblocks == 0);
+ handle = __ext4_journal_start_sb(dir->i_sb, line_no,
+ handle_type, nblocks);
+ if (IS_ERR(handle)) {
+ err = PTR_ERR(handle);
+ goto fail;
+ }
+ }
BUFFER_TRACE(inode_bitmap_bh, "get_write_access");
err = ext4_journal_get_write_access(handle, inode_bitmap_bh);
if (err)
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 5f3d2b5..3e1529c 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2257,30 +2257,28 @@ static int ext4_create(struct inode *dir, struct dentry *dentry, umode_t mode,
{
handle_t *handle;
struct inode *inode;
- int err, retries = 0;
+ int err, credits, retries = 0;

dquot_initialize(dir);

+ credits = (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
+ EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
+ EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
retry:
- handle = ext4_journal_start(dir, EXT4_HT_DIR,
- (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
- EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
- EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb)));
- if (IS_ERR(handle))
- return PTR_ERR(handle);
-
- if (IS_DIRSYNC(dir))
- ext4_handle_sync(handle);
-
- inode = ext4_new_inode(handle, dir, mode, &dentry->d_name, 0, NULL);
+ inode = ext4_new_inode_start_handle(dir, mode, &dentry->d_name, 0,
+ NULL, EXT4_HT_DIR, credits);
+ handle = ext4_journal_current_handle();
err = PTR_ERR(inode);
if (!IS_ERR(inode)) {
inode->i_op = &ext4_file_inode_operations;
inode->i_fop = &ext4_file_operations;
ext4_set_aops(inode);
err = ext4_add_nondir(handle, dentry, inode);
+ if (!err && IS_DIRSYNC(dir))
+ ext4_handle_sync(handle);
}
- ext4_journal_stop(handle);
+ if (handle)
+ ext4_journal_stop(handle);
if (err == -ENOSPC && ext4_should_retry_alloc(dir->i_sb, &retries))
goto retry;
return err;
@@ -2291,32 +2289,30 @@ static int ext4_mknod(struct inode *dir, struct dentry *dentry,
{
handle_t *handle;
struct inode *inode;
- int err, retries = 0;
+ int err, credits, retries = 0;

if (!new_valid_dev(rdev))
return -EINVAL;

dquot_initialize(dir);

+ credits = (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
+ EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
+ EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
retry:
- handle = ext4_journal_start(dir, EXT4_HT_DIR,
- (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
- EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
- EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb)));
- if (IS_ERR(handle))
- return PTR_ERR(handle);
-
- if (IS_DIRSYNC(dir))
- ext4_handle_sync(handle);
-
- inode = ext4_new_inode(handle, dir, mode, &dentry->d_name, 0, NULL);
+ inode = ext4_new_inode_start_handle(dir, mode, &dentry->d_name, 0,
+ NULL, EXT4_HT_DIR, credits);
+ handle = ext4_journal_current_handle();
err = PTR_ERR(inode);
if (!IS_ERR(inode)) {
init_special_inode(inode, inode->i_mode, rdev);
inode->i_op = &ext4_special_inode_operations;
err = ext4_add_nondir(handle, dentry, inode);
+ if (!err && IS_DIRSYNC(dir))
+ ext4_handle_sync(handle);
}
- ext4_journal_stop(handle);
+ if (handle)
+ ext4_journal_stop(handle);
if (err == -ENOSPC && ext4_should_retry_alloc(dir->i_sb, &retries))
goto retry;
return err;
@@ -2408,26 +2404,21 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
{
handle_t *handle;
struct inode *inode;
- int err, retries = 0;
+ int err, credits, retries = 0;

if (EXT4_DIR_LINK_MAX(dir))
return -EMLINK;

dquot_initialize(dir);

+ credits = (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
+ EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
+ EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
retry:
- handle = ext4_journal_start(dir, EXT4_HT_DIR,
- (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
- EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
- EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb)));
- if (IS_ERR(handle))
- return PTR_ERR(handle);
-
- if (IS_DIRSYNC(dir))
- ext4_handle_sync(handle);
-
- inode = ext4_new_inode(handle, dir, S_IFDIR | mode,
- &dentry->d_name, 0, NULL);
+ inode = ext4_new_inode_start_handle(dir, S_IFDIR | mode,
+ &dentry->d_name,
+ 0, NULL, EXT4_HT_DIR, credits);
+ handle = ext4_journal_current_handle();
err = PTR_ERR(inode);
if (IS_ERR(inode))
goto out_stop;
@@ -2455,8 +2446,12 @@ out_clear_inode:
goto out_clear_inode;
unlock_new_inode(inode);
d_instantiate(dentry, inode);
+ if (IS_DIRSYNC(dir))
+ ext4_handle_sync(handle);
+
out_stop:
- ext4_journal_stop(handle);
+ if (handle)
+ ext4_journal_stop(handle);
if (err == -ENOSPC && ext4_should_retry_alloc(dir->i_sb, &retries))
goto retry;
return err;
@@ -2883,15 +2878,10 @@ static int ext4_symlink(struct inode *dir,
EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb);
}
retry:
- handle = ext4_journal_start(dir, EXT4_HT_DIR, credits);
- if (IS_ERR(handle))
- return PTR_ERR(handle);
-
- if (IS_DIRSYNC(dir))
- ext4_handle_sync(handle);
-
- inode = ext4_new_inode(handle, dir, S_IFLNK|S_IRWXUGO,
- &dentry->d_name, 0, NULL);
+ inode = ext4_new_inode_start_handle(dir, S_IFLNK|S_IRWXUGO,
+ &dentry->d_name, 0, NULL,
+ EXT4_HT_DIR, credits);
+ handle = ext4_journal_current_handle();
err = PTR_ERR(inode);
if (IS_ERR(inode))
goto out_stop;
@@ -2944,8 +2934,12 @@ retry:
}
EXT4_I(inode)->i_disksize = inode->i_size;
err = ext4_add_nondir(handle, dentry, inode);
+ if (!err && IS_DIRSYNC(dir))
+ ext4_handle_sync(handle);
+
out_stop:
- ext4_journal_stop(handle);
+ if (handle)
+ ext4_journal_stop(handle);
if (err == -ENOSPC && ext4_should_retry_alloc(dir->i_sb, &retries))
goto retry;
return err;
--
1.7.12.rc0.22.gcdd159b


2013-02-09 21:53:55

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 06/12] ext4: grab page before starting transaction handle in write_begin()

The grab_cache_page_write_begin() function can potentially sleep for a
long time, since it may need to do memory allocation which can block
if the system is under significant memory pressure, and because it may
be blocked on page writeback. If it does take a long time to grab the
page, it's better that we not hold an active jbd2 handle.

So grab a handle on the page first, and _then_ start the transaction
handle.

This commit fixes the following long transaction handle hold time:

postmark-2917 [000] .... 196.435786: jbd2_handle_stats: dev 254,32
tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
dirtied_blocks 0

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/inode.c | 111 ++++++++++++++++++++++++++++++++++----------------------
1 file changed, 68 insertions(+), 43 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c3c47e2..38164a8 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -906,32 +906,40 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
ret = ext4_try_to_write_inline_data(mapping, inode, pos, len,
flags, pagep);
if (ret < 0)
- goto out;
- if (ret == 1) {
- ret = 0;
- goto out;
- }
+ return ret;
+ if (ret == 1)
+ return 0;
}

-retry:
+ /*
+ * grab_cache_page_write_begin() can take a long time if the
+ * system is thrashing due to memory pressure, or if the page
+ * is being written back. So grab it first before we start
+ * the transaction handle. This also allows us to allocate
+ * the page (if needed) without using GFP_NOFS.
+ */
+retry_grab:
+ page = grab_cache_page_write_begin(mapping, index, flags);
+ if (!page)
+ return -ENOMEM;
+ unlock_page(page);
+
+retry_journal:
handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE, needed_blocks);
if (IS_ERR(handle)) {
- ret = PTR_ERR(handle);
- goto out;
+ page_cache_release(page);
+ return PTR_ERR(handle);
}

- /* We cannot recurse into the filesystem as the transaction is already
- * started */
- flags |= AOP_FLAG_NOFS;
-
- page = grab_cache_page_write_begin(mapping, index, flags);
- if (!page) {
+ lock_page(page);
+ if (page->mapping != mapping) {
+ /* The page got truncated from under us */
+ unlock_page(page);
+ page_cache_release(page);
ext4_journal_stop(handle);
- ret = -ENOMEM;
- goto out;
+ goto retry_grab;
}
-
- *pagep = page;
+ wait_on_page_writeback(page);

if (ext4_should_dioread_nolock(inode))
ret = __block_write_begin(page, pos, len, ext4_get_block_write);
@@ -946,7 +954,6 @@ retry:

if (ret) {
unlock_page(page);
- page_cache_release(page);
/*
* __block_write_begin may have instantiated a few blocks
* outside i_size. Trim these off again. Don't need
@@ -970,11 +977,14 @@ retry:
if (inode->i_nlink)
ext4_orphan_del(NULL, inode);
}
- }

- if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
- goto retry;
-out:
+ if (ret == -ENOSPC &&
+ ext4_should_retry_alloc(inode->i_sb, &retries))
+ goto retry_journal;
+ page_cache_release(page);
+ return ret;
+ }
+ *pagep = page;
return ret;
}

@@ -2524,42 +2534,52 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
pos, len, flags,
pagep, fsdata);
if (ret < 0)
- goto out;
- if (ret == 1) {
- ret = 0;
- goto out;
- }
+ return ret;
+ if (ret == 1)
+ return 0;
}

-retry:
+ /*
+ * grab_cache_page_write_begin() can take a long time if the
+ * system is thrashing due to memory pressure, or if the page
+ * is being written back. So grab it first before we start
+ * the transaction handle. This also allows us to allocate
+ * the page (if needed) without using GFP_NOFS.
+ */
+retry_grab:
+ page = grab_cache_page_write_begin(mapping, index, flags);
+ if (!page)
+ return -ENOMEM;
+ unlock_page(page);
+
/*
* With delayed allocation, we don't log the i_disksize update
* if there is delayed block allocation. But we still need
* to journalling the i_disksize update if writes to the end
* of file which has an already mapped buffer.
*/
+retry_journal:
handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE, 1);
if (IS_ERR(handle)) {
- ret = PTR_ERR(handle);
- goto out;
+ page_cache_release(page);
+ return PTR_ERR(handle);
}
- /* We cannot recurse into the filesystem as the transaction is already
- * started */
- flags |= AOP_FLAG_NOFS;

- page = grab_cache_page_write_begin(mapping, index, flags);
- if (!page) {
+ lock_page(page);
+ if (page->mapping != mapping) {
+ /* The page got truncated from under us */
+ unlock_page(page);
+ page_cache_release(page);
ext4_journal_stop(handle);
- ret = -ENOMEM;
- goto out;
+ goto retry_grab;
}
- *pagep = page;
+ /* In case writeback began while the page was unlocked */
+ wait_on_page_writeback(page);

ret = __block_write_begin(page, pos, len, ext4_da_get_block_prep);
if (ret < 0) {
unlock_page(page);
ext4_journal_stop(handle);
- page_cache_release(page);
/*
* block_write_begin may have instantiated a few blocks
* outside i_size. Trim these off again. Don't need
@@ -2567,11 +2587,16 @@ retry:
*/
if (pos + len > inode->i_size)
ext4_truncate_failed_write(inode);
+
+ if (ret == -ENOSPC &&
+ ext4_should_retry_alloc(inode->i_sb, &retries))
+ goto retry_journal;
+
+ page_cache_release(page);
+ return ret;
}

- if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
- goto retry;
-out:
+ *pagep = page;
return ret;
}

--
1.7.12.rc0.22.gcdd159b


2013-02-09 21:53:55

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 10/12] ext4: fix the number of credits needed for ext4_unlink() and ext4_rmdir()

The ext4_unlink() and ext4_rmdir() don't actually release the blocks
associated with the file/directory. This gets done in a separate jbd2
handle called via ext4_evict_inode(). Thus, we don't need to reserve
lots of journal credits for the truncate.

Note that using too many journal credits is non-optimal because it can
leading to the journal transmit getting closed too early, before it is
strictly necessary.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/ext4_jbd2.h | 6 ------
fs/ext4/namei.c | 4 ++--
2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index 302814b..c1fc2dc 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -59,12 +59,6 @@
#define EXT4_META_TRANS_BLOCKS(sb) (EXT4_XATTR_TRANS_BLOCKS + \
EXT4_MAXQUOTAS_TRANS_BLOCKS(sb))

-/* Delete operations potentially hit one directory's namespace plus an
- * entire inode, plus arbitrary amounts of bitmap/indirection data. Be
- * generous. We can grow the delete transaction later if necessary. */
-
-#define EXT4_DELETE_TRANS_BLOCKS(sb) (2 * EXT4_DATA_TRANS_BLOCKS(sb) + 64)
-
/* Define an arbitrary limit for the amount of data we will anticipate
* writing to any given transaction. For unbounded transactions such as
* write(2) and truncate(2) we can write more than this, but we always
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 36a4afd..5f3d2b5 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2748,7 +2748,7 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry)
goto end_rmdir;

handle = ext4_journal_start(dir, EXT4_HT_DIR,
- EXT4_DELETE_TRANS_BLOCKS(dir->i_sb));
+ EXT4_DATA_TRANS_BLOCKS(dir->i_sb));
if (IS_ERR(handle)) {
retval = PTR_ERR(handle);
handle = NULL;
@@ -2811,7 +2811,7 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
goto end_unlink;

handle = ext4_journal_start(dir, EXT4_HT_DIR,
- EXT4_DELETE_TRANS_BLOCKS(dir->i_sb));
+ EXT4_DATA_TRANS_BLOCKS(dir->i_sb));
if (IS_ERR(handle)) {
retval = PTR_ERR(handle);
handle = NULL;
--
1.7.12.rc0.22.gcdd159b


2013-02-09 21:53:59

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 01/12] jbd2: track request delay statistics

Track the delay between when we first request that the commit begin
and when it actually begins, so we can see how much of a gap exists.
In theory, this should just be the remaining scheduling quantuum of
the thread which requested the commit (assuming it was not a
synchronous operation which triggered the commit request) plus
scheduling overhead; however, it's possible that real time processes
might get in the way of letting the kjournald thread from executing.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/jbd2/commit.c | 8 ++++++++
fs/jbd2/journal.c | 12 +++++++++---
fs/jbd2/transaction.c | 1 +
include/linux/jbd2.h | 7 +++++++
include/trace/events/jbd2.h | 8 ++++++--
5 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 3091d42..750c701 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -435,7 +435,12 @@ void jbd2_journal_commit_transaction(journal_t *journal)

trace_jbd2_commit_locking(journal, commit_transaction);
stats.run.rs_wait = commit_transaction->t_max_wait;
+ stats.run.rs_request_delay = 0;
stats.run.rs_locked = jiffies;
+ if (commit_transaction->t_requested)
+ stats.run.rs_request_delay =
+ jbd2_time_diff(commit_transaction->t_requested,
+ stats.run.rs_locked);
stats.run.rs_running = jbd2_time_diff(commit_transaction->t_start,
stats.run.rs_locked);

@@ -1116,7 +1121,10 @@ restart_loop:
*/
spin_lock(&journal->j_history_lock);
journal->j_stats.ts_tid++;
+ if (commit_transaction->t_requested)
+ journal->j_stats.ts_requested++;
journal->j_stats.run.rs_wait += stats.run.rs_wait;
+ journal->j_stats.run.rs_request_delay += stats.run.rs_request_delay;
journal->j_stats.run.rs_running += stats.run.rs_running;
journal->j_stats.run.rs_locked += stats.run.rs_locked;
journal->j_stats.run.rs_flushing += stats.run.rs_flushing;
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 1a80e31..4ba2e81 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -533,6 +533,7 @@ int __jbd2_log_start_commit(journal_t *journal, tid_t target)
jbd_debug(1, "JBD2: requesting commit %d/%d\n",
journal->j_commit_request,
journal->j_commit_sequence);
+ journal->j_running_transaction->t_requested = jiffies;
wake_up(&journal->j_wait_commit);
return 1;
} else if (!tid_geq(journal->j_commit_request, target))
@@ -898,13 +899,18 @@ static int jbd2_seq_info_show(struct seq_file *seq, void *v)

if (v != SEQ_START_TOKEN)
return 0;
- seq_printf(seq, "%lu transaction, each up to %u blocks\n",
- s->stats->ts_tid,
- s->journal->j_max_transaction_buffers);
+ seq_printf(seq, "%lu transactions (%lu requested), "
+ "each up to %u blocks\n",
+ s->stats->ts_tid, s->stats->ts_requested,
+ s->journal->j_max_transaction_buffers);
if (s->stats->ts_tid == 0)
return 0;
seq_printf(seq, "average: \n %ums waiting for transaction\n",
jiffies_to_msecs(s->stats->run.rs_wait / s->stats->ts_tid));
+ seq_printf(seq, " %ums request delay\n",
+ (s->stats->ts_requested == 0) ? 0 :
+ jiffies_to_msecs(s->stats->run.rs_request_delay /
+ s->stats->ts_requested));
seq_printf(seq, " %ums running transaction\n",
jiffies_to_msecs(s->stats->run.rs_running / s->stats->ts_tid));
seq_printf(seq, " %ums transaction was being locked\n",
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index df9f297..735609e 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -100,6 +100,7 @@ jbd2_get_transaction(journal_t *journal, transaction_t *transaction)
journal->j_running_transaction = transaction;
transaction->t_max_wait = 0;
transaction->t_start = jiffies;
+ transaction->t_requested = 0;

return transaction;
}
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index e30b663..e0aafc4 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -581,6 +581,11 @@ struct transaction_s
unsigned long t_start;

/*
+ * When commit was requested
+ */
+ unsigned long t_requested;
+
+ /*
* Checkpointing stats [j_checkpoint_sem]
*/
struct transaction_chp_stats_s t_chp_stats;
@@ -637,6 +642,7 @@ struct transaction_s

struct transaction_run_stats_s {
unsigned long rs_wait;
+ unsigned long rs_request_delay;
unsigned long rs_running;
unsigned long rs_locked;
unsigned long rs_flushing;
@@ -649,6 +655,7 @@ struct transaction_run_stats_s {

struct transaction_stats_s {
unsigned long ts_tid;
+ unsigned long ts_requested;
struct transaction_run_stats_s run;
};

diff --git a/include/trace/events/jbd2.h b/include/trace/events/jbd2.h
index 127993d..5419f57 100644
--- a/include/trace/events/jbd2.h
+++ b/include/trace/events/jbd2.h
@@ -142,6 +142,7 @@ TRACE_EVENT(jbd2_run_stats,
__field( dev_t, dev )
__field( unsigned long, tid )
__field( unsigned long, wait )
+ __field( unsigned long, request_delay )
__field( unsigned long, running )
__field( unsigned long, locked )
__field( unsigned long, flushing )
@@ -155,6 +156,7 @@ TRACE_EVENT(jbd2_run_stats,
__entry->dev = dev;
__entry->tid = tid;
__entry->wait = stats->rs_wait;
+ __entry->request_delay = stats->rs_request_delay;
__entry->running = stats->rs_running;
__entry->locked = stats->rs_locked;
__entry->flushing = stats->rs_flushing;
@@ -164,10 +166,12 @@ TRACE_EVENT(jbd2_run_stats,
__entry->blocks_logged = stats->rs_blocks_logged;
),

- TP_printk("dev %d,%d tid %lu wait %u running %u locked %u flushing %u "
- "logging %u handle_count %u blocks %u blocks_logged %u",
+ TP_printk("dev %d,%d tid %lu wait %u request_delay %u running %u "
+ "locked %u flushing %u logging %u handle_count %u "
+ "blocks %u blocks_logged %u",
MAJOR(__entry->dev), MINOR(__entry->dev), __entry->tid,
jiffies_to_msecs(__entry->wait),
+ jiffies_to_msecs(__entry->request_delay),
jiffies_to_msecs(__entry->running),
jiffies_to_msecs(__entry->locked),
jiffies_to_msecs(__entry->flushing),
--
1.7.12.rc0.22.gcdd159b


2013-02-09 21:53:55

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 05/12] ext4: pass context information to jbd2__journal_start()

So we can better understand what bits of ext4 are responsible for
long-running jbd2 handles, use jbd2__journal_start() so we can pass
context information for logging purposes.

The recommended way for finding the longer-running handles is:

T=/sys/kernel/debug/tracing
EVENT=$T/events/jbd2/jbd2_handle_stats
echo "interval > 5" > $EVENT/filter
echo 1 > $EVENT/enable

./run-my-fs-benchmark

cat $T/trace > /tmp/problem-handles

This will list handles that were active for longer than 20ms. Having
longer-running handles is bad, because a commit started at the wrong
time could stall for those 20+ milliseconds, which could delay an
fsync() or an O_SYNC operation. Here is an example line from the
trace file describing a handle which lived on for 311 jiffies, or over
1.2 seconds:

postmark-2917 [000] .... 196.435786: jbd2_handle_stats: dev 254,32
tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
dirtied_blocks 0

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/acl.c | 5 +++--
fs/ext4/ext4_jbd2.c | 5 +++--
fs/ext4/ext4_jbd2.h | 31 ++++++++++++++++++++++++++++---
fs/ext4/extents.c | 11 ++++++-----
fs/ext4/file.c | 2 +-
fs/ext4/ialloc.c | 2 +-
fs/ext4/indirect.c | 7 ++++---
fs/ext4/inline.c | 10 +++++-----
fs/ext4/inode.c | 33 ++++++++++++++++++++-------------
fs/ext4/ioctl.c | 4 ++--
fs/ext4/migrate.c | 4 ++--
fs/ext4/move_extent.c | 2 +-
fs/ext4/namei.c | 42 ++++++++++++++++++++++++------------------
fs/ext4/resize.c | 10 +++++-----
fs/ext4/super.c | 10 +++++-----
fs/ext4/xattr.c | 2 +-
16 files changed, 111 insertions(+), 69 deletions(-)

diff --git a/fs/ext4/acl.c b/fs/ext4/acl.c
index e6e0d98..406cf8b 100644
--- a/fs/ext4/acl.c
+++ b/fs/ext4/acl.c
@@ -324,7 +324,7 @@ ext4_acl_chmod(struct inode *inode)
if (error)
return error;
retry:
- handle = ext4_journal_start(inode,
+ handle = ext4_journal_start(inode, EXT4_HT_XATTR,
EXT4_DATA_TRANS_BLOCKS(inode->i_sb));
if (IS_ERR(handle)) {
error = PTR_ERR(handle);
@@ -422,7 +422,8 @@ ext4_xattr_set_acl(struct dentry *dentry, const char *name, const void *value,
acl = NULL;

retry:
- handle = ext4_journal_start(inode, EXT4_DATA_TRANS_BLOCKS(inode->i_sb));
+ handle = ext4_journal_start(inode, EXT4_HT_XATTR,
+ EXT4_DATA_TRANS_BLOCKS(inode->i_sb));
if (IS_ERR(handle)) {
error = PTR_ERR(handle);
goto release_and_out;
diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index 6f61145..7058975 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -38,7 +38,8 @@ static void ext4_put_nojournal(handle_t *handle)
/*
* Wrappers for jbd2_journal_start/end.
*/
-handle_t *ext4_journal_start_sb(struct super_block *sb, int nblocks)
+handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line,
+ int type, int nblocks)
{
journal_t *journal;

@@ -59,7 +60,7 @@ handle_t *ext4_journal_start_sb(struct super_block *sb, int nblocks)
ext4_abort(sb, "Detected aborted journal");
return ERR_PTR(-EROFS);
}
- return jbd2_journal_start(journal, nblocks);
+ return jbd2__journal_start(journal, nblocks, GFP_NOFS, type, line);
}

int __ext4_journal_stop(const char *where, unsigned int line, handle_t *handle)
diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index 7177f9b..302814b 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -110,6 +110,22 @@
#define EXT4_MAXQUOTAS_INIT_BLOCKS(sb) (MAXQUOTAS*EXT4_QUOTA_INIT_BLOCKS(sb))
#define EXT4_MAXQUOTAS_DEL_BLOCKS(sb) (MAXQUOTAS*EXT4_QUOTA_DEL_BLOCKS(sb))

+/*
+ * Ext4 handle operation types -- for logging purposes
+ */
+#define EXT4_HT_MISC 0
+#define EXT4_HT_INODE 1
+#define EXT4_HT_WRITE_PAGE 2
+#define EXT4_HT_MAP_BLOCKS 3
+#define EXT4_HT_DIR 4
+#define EXT4_HT_TRUNCATE 5
+#define EXT4_HT_QUOTA 6
+#define EXT4_HT_RESIZE 7
+#define EXT4_HT_MIGRATE 8
+#define EXT4_HT_MOVE_EXTENTS 9
+#define EXT4_HT_XATTR 10
+#define EXT4_HT_MAX 11
+
/**
* struct ext4_journal_cb_entry - Base structure for callback information.
*
@@ -234,7 +250,8 @@ int __ext4_handle_dirty_super(const char *where, unsigned int line,
#define ext4_handle_dirty_super(handle, sb) \
__ext4_handle_dirty_super(__func__, __LINE__, (handle), (sb))

-handle_t *ext4_journal_start_sb(struct super_block *sb, int nblocks);
+handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line,
+ int type, int nblocks);
int __ext4_journal_stop(const char *where, unsigned int line, handle_t *handle);

#define EXT4_NOJOURNAL_MAX_REF_COUNT ((unsigned long) 4096)
@@ -268,9 +285,17 @@ static inline int ext4_handle_has_enough_credits(handle_t *handle, int needed)
return 1;
}

-static inline handle_t *ext4_journal_start(struct inode *inode, int nblocks)
+#define ext4_journal_start_sb(sb, type, nblocks) \
+ __ext4_journal_start_sb((sb), __LINE__, (type), (nblocks))
+
+#define ext4_journal_start(inode, type, nblocks) \
+ __ext4_journal_start((inode), __LINE__, (type), (nblocks))
+
+static inline handle_t *__ext4_journal_start(struct inode *inode,
+ unsigned int line, int type,
+ int nblocks)
{
- return ext4_journal_start_sb(inode->i_sb, nblocks);
+ return __ext4_journal_start_sb(inode->i_sb, line, type, nblocks);
}

#define ext4_journal_stop(handle) \
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index e95acb0..d3a7e1c 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2545,7 +2545,7 @@ static int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start,
ext_debug("truncate since %u to %u\n", start, end);

/* probably first extent we're gonna free will be last in block */
- handle = ext4_journal_start(inode, depth + 1);
+ handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, depth + 1);
if (IS_ERR(handle))
return PTR_ERR(handle);

@@ -4129,7 +4129,7 @@ void ext4_ext_truncate(struct inode *inode)
* probably first extent we're gonna free will be last in block
*/
err = ext4_writepage_trans_blocks(inode);
- handle = ext4_journal_start(inode, err);
+ handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, err);
if (IS_ERR(handle))
return;

@@ -4295,7 +4295,8 @@ retry:
while (ret >= 0 && ret < max_blocks) {
map.m_lblk = map.m_lblk + ret;
map.m_len = max_blocks = max_blocks - ret;
- handle = ext4_journal_start(inode, credits);
+ handle = ext4_journal_start(inode, EXT4_HT_MAP_BLOCKS,
+ credits);
if (IS_ERR(handle)) {
ret = PTR_ERR(handle);
break;
@@ -4373,7 +4374,7 @@ int ext4_convert_unwritten_extents(struct inode *inode, loff_t offset,
while (ret >= 0 && ret < max_blocks) {
map.m_lblk += ret;
map.m_len = (max_blocks -= ret);
- handle = ext4_journal_start(inode, credits);
+ handle = ext4_journal_start(inode, EXT4_HT_MAP_BLOCKS, credits);
if (IS_ERR(handle)) {
ret = PTR_ERR(handle);
break;
@@ -4554,7 +4555,7 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
inode_dio_wait(inode);

credits = ext4_writepage_trans_blocks(inode);
- handle = ext4_journal_start(inode, credits);
+ handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, credits);
if (IS_ERR(handle)) {
err = PTR_ERR(handle);
goto out_dio;
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index afaf9f15..b41d92c 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -240,7 +240,7 @@ static int ext4_file_open(struct inode * inode, struct file * filp)
handle_t *handle;
int err;

- handle = ext4_journal_start_sb(sb, 1);
+ handle = ext4_journal_start_sb(sb, EXT4_HT_MISC, 1);
if (IS_ERR(handle))
return PTR_ERR(handle);
err = ext4_journal_get_write_access(handle, sbi->s_sbh);
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 3f32c80..10bd6fe 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -1137,7 +1137,7 @@ int ext4_init_inode_table(struct super_block *sb, ext4_group_t group,
if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_ZEROED))
goto out;

- handle = ext4_journal_start_sb(sb, 1);
+ handle = ext4_journal_start_sb(sb, EXT4_HT_MISC, 1);
if (IS_ERR(handle)) {
ret = PTR_ERR(handle);
goto out;
diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index 1932810..c541ab8 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -791,7 +791,7 @@ ssize_t ext4_ind_direct_IO(int rw, struct kiocb *iocb,

if (final_size > inode->i_size) {
/* Credits for sb + inode write */
- handle = ext4_journal_start(inode, 2);
+ handle = ext4_journal_start(inode, EXT4_HT_INODE, 2);
if (IS_ERR(handle)) {
ret = PTR_ERR(handle);
goto out;
@@ -851,7 +851,7 @@ locked:
int err;

/* Credits for sb + inode write */
- handle = ext4_journal_start(inode, 2);
+ handle = ext4_journal_start(inode, EXT4_HT_INODE, 2);
if (IS_ERR(handle)) {
/* This is really bad luck. We've written the data
* but cannot extend i_size. Bail out and pretend
@@ -950,7 +950,8 @@ static handle_t *start_transaction(struct inode *inode)
{
handle_t *result;

- result = ext4_journal_start(inode, ext4_blocks_for_truncate(inode));
+ result = ext4_journal_start(inode, EXT4_HT_TRUNCATE,
+ ext4_blocks_for_truncate(inode));
if (!IS_ERR(result))
return result;

diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 93a3408..bc5f871 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -545,7 +545,7 @@ static int ext4_convert_inline_data_to_extent(struct address_space *mapping,
return ret;

retry:
- handle = ext4_journal_start(inode, needed_blocks);
+ handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE, needed_blocks);
if (IS_ERR(handle)) {
ret = PTR_ERR(handle);
handle = NULL;
@@ -657,7 +657,7 @@ int ext4_try_to_write_inline_data(struct address_space *mapping,
* The possible write could happen in the inode,
* so try to reserve the space in inode first.
*/
- handle = ext4_journal_start(inode, 1);
+ handle = ext4_journal_start(inode, EXT4_HT_INODE, 1);
if (IS_ERR(handle)) {
ret = PTR_ERR(handle);
handle = NULL;
@@ -853,7 +853,7 @@ int ext4_da_write_inline_data_begin(struct address_space *mapping,
if (ret)
return ret;

- handle = ext4_journal_start(inode, 1);
+ handle = ext4_journal_start(inode, EXT4_HT_INODE, 1);
if (IS_ERR(handle)) {
ret = PTR_ERR(handle);
handle = NULL;
@@ -1770,7 +1770,7 @@ void ext4_inline_data_truncate(struct inode *inode, int *has_inline)


needed_blocks = ext4_writepage_trans_blocks(inode);
- handle = ext4_journal_start(inode, needed_blocks);
+ handle = ext4_journal_start(inode, EXT4_HT_INODE, needed_blocks);
if (IS_ERR(handle))
return;

@@ -1862,7 +1862,7 @@ int ext4_convert_inline_data(struct inode *inode)
if (error)
return error;

- handle = ext4_journal_start(inode, needed_blocks);
+ handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE, needed_blocks);
if (IS_ERR(handle)) {
error = PTR_ERR(handle);
goto out_free;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index fb1907d..c3c47e2 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -234,7 +234,8 @@ void ext4_evict_inode(struct inode *inode)
* protection against it
*/
sb_start_intwrite(inode->i_sb);
- handle = ext4_journal_start(inode, ext4_blocks_for_truncate(inode)+3);
+ handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE,
+ ext4_blocks_for_truncate(inode)+3);
if (IS_ERR(handle)) {
ext4_std_error(inode->i_sb, PTR_ERR(handle));
/*
@@ -687,7 +688,8 @@ static int _ext4_get_block(struct inode *inode, sector_t iblock,
if (map.m_len > DIO_MAX_BLOCKS)
map.m_len = DIO_MAX_BLOCKS;
dio_credits = ext4_chunk_trans_blocks(inode, map.m_len);
- handle = ext4_journal_start(inode, dio_credits);
+ handle = ext4_journal_start(inode, EXT4_HT_MAP_BLOCKS,
+ dio_credits);
if (IS_ERR(handle)) {
ret = PTR_ERR(handle);
return ret;
@@ -912,7 +914,7 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
}

retry:
- handle = ext4_journal_start(inode, needed_blocks);
+ handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE, needed_blocks);
if (IS_ERR(handle)) {
ret = PTR_ERR(handle);
goto out;
@@ -1947,7 +1949,8 @@ static int __ext4_journalled_writepage(struct page *page,
* references to buffers so we are safe */
unlock_page(page);

- handle = ext4_journal_start(inode, ext4_writepage_trans_blocks(inode));
+ handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE,
+ ext4_writepage_trans_blocks(inode));
if (IS_ERR(handle)) {
ret = PTR_ERR(handle);
goto out;
@@ -2378,7 +2381,8 @@ retry:
needed_blocks = ext4_da_writepages_trans_blocks(inode);

/* start a new transaction*/
- handle = ext4_journal_start(inode, needed_blocks);
+ handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE,
+ needed_blocks);
if (IS_ERR(handle)) {
ret = PTR_ERR(handle);
ext4_msg(inode->i_sb, KERN_CRIT, "%s: jbd2_start: "
@@ -2534,7 +2538,7 @@ retry:
* to journalling the i_disksize update if writes to the end
* of file which has an already mapped buffer.
*/
- handle = ext4_journal_start(inode, 1);
+ handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE, 1);
if (IS_ERR(handle)) {
ret = PTR_ERR(handle);
goto out;
@@ -4281,8 +4285,9 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)

/* (user+group)*(old+new) structure, inode write (sb,
* inode block, ? - but truncate inode update has it) */
- handle = ext4_journal_start(inode, (EXT4_MAXQUOTAS_INIT_BLOCKS(inode->i_sb)+
- EXT4_MAXQUOTAS_DEL_BLOCKS(inode->i_sb))+3);
+ handle = ext4_journal_start(inode, EXT4_HT_QUOTA,
+ (EXT4_MAXQUOTAS_INIT_BLOCKS(inode->i_sb) +
+ EXT4_MAXQUOTAS_DEL_BLOCKS(inode->i_sb)) + 3);
if (IS_ERR(handle)) {
error = PTR_ERR(handle);
goto err_out;
@@ -4317,7 +4322,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
(attr->ia_size < inode->i_size)) {
handle_t *handle;

- handle = ext4_journal_start(inode, 3);
+ handle = ext4_journal_start(inode, EXT4_HT_INODE, 3);
if (IS_ERR(handle)) {
error = PTR_ERR(handle);
goto err_out;
@@ -4337,7 +4342,8 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
attr->ia_size);
if (error) {
/* Do as much error cleanup as possible */
- handle = ext4_journal_start(inode, 3);
+ handle = ext4_journal_start(inode,
+ EXT4_HT_INODE, 3);
if (IS_ERR(handle)) {
ext4_orphan_del(NULL, inode);
goto err_out;
@@ -4678,7 +4684,7 @@ void ext4_dirty_inode(struct inode *inode, int flags)
{
handle_t *handle;

- handle = ext4_journal_start(inode, 2);
+ handle = ext4_journal_start(inode, EXT4_HT_INODE, 2);
if (IS_ERR(handle))
goto out;

@@ -4779,7 +4785,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)

/* Finally we can mark the inode as dirty. */

- handle = ext4_journal_start(inode, 1);
+ handle = ext4_journal_start(inode, EXT4_HT_INODE, 1);
if (IS_ERR(handle))
return PTR_ERR(handle);

@@ -4857,7 +4863,8 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
else
get_block = ext4_get_block;
retry_alloc:
- handle = ext4_journal_start(inode, ext4_writepage_trans_blocks(inode));
+ handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE,
+ ext4_writepage_trans_blocks(inode));
if (IS_ERR(handle)) {
ret = VM_FAULT_SIGBUS;
goto out;
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 4784ac2..31f4f56 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -104,7 +104,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
} else if (oldflags & EXT4_EOFBLOCKS_FL)
ext4_truncate(inode);

- handle = ext4_journal_start(inode, 1);
+ handle = ext4_journal_start(inode, EXT4_HT_INODE, 1);
if (IS_ERR(handle)) {
err = PTR_ERR(handle);
goto flags_out;
@@ -173,7 +173,7 @@ flags_out:
}

mutex_lock(&inode->i_mutex);
- handle = ext4_journal_start(inode, 1);
+ handle = ext4_journal_start(inode, EXT4_HT_INODE, 1);
if (IS_ERR(handle)) {
err = PTR_ERR(handle);
goto unlock_out;
diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
index db8226d..4e4fcfd 100644
--- a/fs/ext4/migrate.c
+++ b/fs/ext4/migrate.c
@@ -456,7 +456,7 @@ int ext4_ext_migrate(struct inode *inode)
*/
return retval;

- handle = ext4_journal_start(inode,
+ handle = ext4_journal_start(inode, EXT4_HT_MIGRATE,
EXT4_DATA_TRANS_BLOCKS(inode->i_sb) +
EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
EXT4_MAXQUOTAS_INIT_BLOCKS(inode->i_sb)
@@ -507,7 +507,7 @@ int ext4_ext_migrate(struct inode *inode)
ext4_set_inode_state(inode, EXT4_STATE_EXT_MIGRATE);
up_read((&EXT4_I(inode)->i_data_sem));

- handle = ext4_journal_start(inode, 1);
+ handle = ext4_journal_start(inode, EXT4_HT_MIGRATE, 1);
if (IS_ERR(handle)) {
/*
* It is impossible to update on-disk structures without
diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index cd90b7e..d78c33e 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -920,7 +920,7 @@ move_extent_per_page(struct file *o_filp, struct inode *donor_inode,
again:
*err = 0;
jblocks = ext4_writepage_trans_blocks(orig_inode) * 2;
- handle = ext4_journal_start(orig_inode, jblocks);
+ handle = ext4_journal_start(orig_inode, EXT4_HT_MOVE_EXTENTS, jblocks);
if (IS_ERR(handle)) {
*err = PTR_ERR(handle);
return 0;
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 34ed624..5184103 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2262,9 +2262,10 @@ static int ext4_create(struct inode *dir, struct dentry *dentry, umode_t mode,
dquot_initialize(dir);

retry:
- handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
- EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
- EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
+ handle = ext4_journal_start(dir, EXT4_HT_DIR,
+ (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
+ EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
+ EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb)));
if (IS_ERR(handle))
return PTR_ERR(handle);

@@ -2298,9 +2299,10 @@ static int ext4_mknod(struct inode *dir, struct dentry *dentry,
dquot_initialize(dir);

retry:
- handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
- EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
- EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
+ handle = ext4_journal_start(dir, EXT4_HT_DIR,
+ (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
+ EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
+ EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb)));
if (IS_ERR(handle))
return PTR_ERR(handle);

@@ -2414,9 +2416,10 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
dquot_initialize(dir);

retry:
- handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
- EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
- EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
+ handle = ext4_journal_start(dir, EXT4_HT_DIR,
+ (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
+ EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
+ EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb)));
if (IS_ERR(handle))
return PTR_ERR(handle);

@@ -2729,7 +2732,8 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry)
dquot_initialize(dir);
dquot_initialize(dentry->d_inode);

- handle = ext4_journal_start(dir, EXT4_DELETE_TRANS_BLOCKS(dir->i_sb));
+ handle = ext4_journal_start(dir, EXT4_HT_DIR,
+ EXT4_DELETE_TRANS_BLOCKS(dir->i_sb));
if (IS_ERR(handle))
return PTR_ERR(handle);

@@ -2791,7 +2795,8 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
dquot_initialize(dir);
dquot_initialize(dentry->d_inode);

- handle = ext4_journal_start(dir, EXT4_DELETE_TRANS_BLOCKS(dir->i_sb));
+ handle = ext4_journal_start(dir, EXT4_HT_DIR,
+ EXT4_DELETE_TRANS_BLOCKS(dir->i_sb));
if (IS_ERR(handle))
return PTR_ERR(handle);

@@ -2870,7 +2875,7 @@ static int ext4_symlink(struct inode *dir,
EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb);
}
retry:
- handle = ext4_journal_start(dir, credits);
+ handle = ext4_journal_start(dir, EXT4_HT_DIR, credits);
if (IS_ERR(handle))
return PTR_ERR(handle);

@@ -2908,7 +2913,7 @@ retry:
* Now inode is being linked into dir (EXT4_DATA_TRANS_BLOCKS
* + EXT4_INDEX_EXTRA_TRANS_BLOCKS), inode is also modified
*/
- handle = ext4_journal_start(dir,
+ handle = ext4_journal_start(dir, EXT4_HT_DIR,
EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
EXT4_INDEX_EXTRA_TRANS_BLOCKS + 1);
if (IS_ERR(handle)) {
@@ -2955,8 +2960,9 @@ static int ext4_link(struct dentry *old_dentry,
dquot_initialize(dir);

retry:
- handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
- EXT4_INDEX_EXTRA_TRANS_BLOCKS);
+ handle = ext4_journal_start(dir, EXT4_HT_DIR,
+ (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
+ EXT4_INDEX_EXTRA_TRANS_BLOCKS));
if (IS_ERR(handle))
return PTR_ERR(handle);

@@ -3039,9 +3045,9 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
* in separate transaction */
if (new_dentry->d_inode)
dquot_initialize(new_dentry->d_inode);
- handle = ext4_journal_start(old_dir, 2 *
- EXT4_DATA_TRANS_BLOCKS(old_dir->i_sb) +
- EXT4_INDEX_EXTRA_TRANS_BLOCKS + 2);
+ handle = ext4_journal_start(old_dir, EXT4_HT_DIR,
+ (2 * EXT4_DATA_TRANS_BLOCKS(old_dir->i_sb) +
+ EXT4_INDEX_EXTRA_TRANS_BLOCKS + 2));
if (IS_ERR(handle))
return PTR_ERR(handle);

diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 8eefb63..c7f4d75 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -466,7 +466,7 @@ static int setup_new_flex_group_blocks(struct super_block *sb,
meta_bg = EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_META_BG);

/* This transaction may be extended/restarted along the way */
- handle = ext4_journal_start_sb(sb, EXT4_MAX_TRANS_DATA);
+ handle = ext4_journal_start_sb(sb, EXT4_HT_RESIZE, EXT4_MAX_TRANS_DATA);
if (IS_ERR(handle))
return PTR_ERR(handle);

@@ -1031,7 +1031,7 @@ static void update_backups(struct super_block *sb, int blk_off, char *data,
handle_t *handle;
int err = 0, err2;

- handle = ext4_journal_start_sb(sb, EXT4_MAX_TRANS_DATA);
+ handle = ext4_journal_start_sb(sb, EXT4_HT_RESIZE, EXT4_MAX_TRANS_DATA);
if (IS_ERR(handle)) {
group = 1;
err = PTR_ERR(handle);
@@ -1412,7 +1412,7 @@ static int ext4_flex_group_add(struct super_block *sb,
* modify each of the reserved GDT dindirect blocks.
*/
credit = flex_gd->count * 4 + reserved_gdb;
- handle = ext4_journal_start_sb(sb, credit);
+ handle = ext4_journal_start_sb(sb, EXT4_HT_RESIZE, credit);
if (IS_ERR(handle)) {
err = PTR_ERR(handle);
goto exit;
@@ -1624,7 +1624,7 @@ static int ext4_group_extend_no_check(struct super_block *sb,
/* We will update the superblock, one block bitmap, and
* one group descriptor via ext4_group_add_blocks().
*/
- handle = ext4_journal_start_sb(sb, 3);
+ handle = ext4_journal_start_sb(sb, EXT4_HT_RESIZE, 3);
if (IS_ERR(handle)) {
err = PTR_ERR(handle);
ext4_warning(sb, "error %d on journal start", err);
@@ -1788,7 +1788,7 @@ static int ext4_convert_meta_bg(struct super_block *sb, struct inode *inode)
credits += 3; /* block bitmap, bg descriptor, resize inode */
}

- handle = ext4_journal_start_sb(sb, credits);
+ handle = ext4_journal_start_sb(sb, EXT4_HT_RESIZE, credits);
if (IS_ERR(handle))
return PTR_ERR(handle);

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 5812f29..373d46c 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4768,7 +4768,7 @@ static int ext4_write_dquot(struct dquot *dquot)
struct inode *inode;

inode = dquot_to_inode(dquot);
- handle = ext4_journal_start(inode,
+ handle = ext4_journal_start(inode, EXT4_HT_QUOTA,
EXT4_QUOTA_TRANS_BLOCKS(dquot->dq_sb));
if (IS_ERR(handle))
return PTR_ERR(handle);
@@ -4784,7 +4784,7 @@ static int ext4_acquire_dquot(struct dquot *dquot)
int ret, err;
handle_t *handle;

- handle = ext4_journal_start(dquot_to_inode(dquot),
+ handle = ext4_journal_start(dquot_to_inode(dquot), EXT4_HT_QUOTA,
EXT4_QUOTA_INIT_BLOCKS(dquot->dq_sb));
if (IS_ERR(handle))
return PTR_ERR(handle);
@@ -4800,7 +4800,7 @@ static int ext4_release_dquot(struct dquot *dquot)
int ret, err;
handle_t *handle;

- handle = ext4_journal_start(dquot_to_inode(dquot),
+ handle = ext4_journal_start(dquot_to_inode(dquot), EXT4_HT_QUOTA,
EXT4_QUOTA_DEL_BLOCKS(dquot->dq_sb));
if (IS_ERR(handle)) {
/* Release dquot anyway to avoid endless cycle in dqput() */
@@ -4832,7 +4832,7 @@ static int ext4_write_info(struct super_block *sb, int type)
handle_t *handle;

/* Data block + inode block */
- handle = ext4_journal_start(sb->s_root->d_inode, 2);
+ handle = ext4_journal_start(sb->s_root->d_inode, EXT4_HT_QUOTA, 2);
if (IS_ERR(handle))
return PTR_ERR(handle);
ret = dquot_commit_info(sb, type);
@@ -4978,7 +4978,7 @@ static int ext4_quota_off(struct super_block *sb, int type)

/* Update modification times of quota files when userspace can
* start looking at them */
- handle = ext4_journal_start(inode, 1);
+ handle = ext4_journal_start(inode, EXT4_HT_QUOTA, 1);
if (IS_ERR(handle))
goto out;
inode->i_mtime = inode->i_ctime = CURRENT_TIME;
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index c68990c..2efc560 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -1175,7 +1175,7 @@ retry:
if (ext4_has_inline_data(inode))
credits += ext4_writepage_trans_blocks(inode) + 1;

- handle = ext4_journal_start(inode, credits);
+ handle = ext4_journal_start(inode, EXT4_HT_XATTR, credits);
if (IS_ERR(handle)) {
error = PTR_ERR(handle);
} else {
--
1.7.12.rc0.22.gcdd159b


2013-02-09 21:53:55

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 07/12] ext4: start handle at the last possible moment in ext4_unlink()

Don't start the jbd2 transaction handle until after the directory
entry has been found, to minimize the amount of time that a handle is
held active.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/namei.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 5184103..4a27069 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2787,7 +2787,7 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
struct inode *inode;
struct buffer_head *bh;
struct ext4_dir_entry_2 *de;
- handle_t *handle;
+ handle_t *handle = NULL;

trace_ext4_unlink_enter(dir, dentry);
/* Initialize quotas before so that eventual writes go
@@ -2795,14 +2795,6 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
dquot_initialize(dir);
dquot_initialize(dentry->d_inode);

- handle = ext4_journal_start(dir, EXT4_HT_DIR,
- EXT4_DELETE_TRANS_BLOCKS(dir->i_sb));
- if (IS_ERR(handle))
- return PTR_ERR(handle);
-
- if (IS_DIRSYNC(dir))
- ext4_handle_sync(handle);
-
retval = -ENOENT;
bh = ext4_find_entry(dir, &dentry->d_name, &de, NULL);
if (!bh)
@@ -2814,6 +2806,17 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
if (le32_to_cpu(de->inode) != inode->i_ino)
goto end_unlink;

+ handle = ext4_journal_start(dir, EXT4_HT_DIR,
+ EXT4_DELETE_TRANS_BLOCKS(dir->i_sb));
+ if (IS_ERR(handle)) {
+ retval = PTR_ERR(handle);
+ handle = NULL;
+ goto end_unlink;
+ }
+
+ if (IS_DIRSYNC(dir))
+ ext4_handle_sync(handle);
+
if (!inode->i_nlink) {
ext4_warning(inode->i_sb,
"Deleting nonexistent file (%lu), %d",
@@ -2834,8 +2837,9 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
retval = 0;

end_unlink:
- ext4_journal_stop(handle);
brelse(bh);
+ if (handle)
+ ext4_journal_stop(handle);
trace_ext4_unlink_exit(dentry, retval);
return retval;
}
--
1.7.12.rc0.22.gcdd159b


2013-02-09 21:53:59

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 03/12] jbd2: add tracepoints which provide per-handle statistics

Handles which stay open a long time are problematic when it comes time
to close down a transaction so it can be committed. These tracepoints
will help us determine which ones are the problematic ones, and to
validate whether changes makes things better or worse.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/jbd2/transaction.c | 28 ++++++++++++-
include/linux/jbd2.h | 8 +++-
include/trace/events/jbd2.h | 98 +++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 131 insertions(+), 3 deletions(-)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 735609e..b7e2385 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -30,6 +30,8 @@
#include <linux/bug.h>
#include <linux/module.h>

+#include <trace/events/jbd2.h>
+
static void __jbd2_journal_temp_unlink_buffer(struct journal_head *jh);
static void __jbd2_journal_unfile_buffer(struct journal_head *jh);

@@ -307,6 +309,8 @@ repeat:
*/
update_t_max_wait(transaction, ts);
handle->h_transaction = transaction;
+ handle->h_requested_credits = nblocks;
+ handle->h_start_jiffies = jiffies;
atomic_inc(&transaction->t_updates);
atomic_inc(&transaction->t_handle_count);
jbd_debug(4, "Handle %p given %d credits (total %d, free %d)\n",
@@ -353,7 +357,8 @@ static handle_t *new_handle(int nblocks)
* Return a pointer to a newly allocated handle, or an ERR_PTR() value
* on failure.
*/
-handle_t *jbd2__journal_start(journal_t *journal, int nblocks, gfp_t gfp_mask)
+handle_t *jbd2__journal_start(journal_t *journal, int nblocks, gfp_t gfp_mask,
+ unsigned int type, unsigned int line_no)
{
handle_t *handle = journal_current_handle();
int err;
@@ -379,6 +384,11 @@ handle_t *jbd2__journal_start(journal_t *journal, int nblocks, gfp_t gfp_mask)
current->journal_info = NULL;
handle = ERR_PTR(err);
}
+ handle->h_type = type;
+ handle->h_line_no = line_no;
+ trace_jbd2_handle_start(journal->j_fs_dev->bd_dev,
+ handle->h_transaction->t_tid, type,
+ line_no, nblocks);
return handle;
}
EXPORT_SYMBOL(jbd2__journal_start);
@@ -386,7 +396,7 @@ EXPORT_SYMBOL(jbd2__journal_start);

handle_t *jbd2_journal_start(journal_t *journal, int nblocks)
{
- return jbd2__journal_start(journal, nblocks, GFP_NOFS);
+ return jbd2__journal_start(journal, nblocks, GFP_NOFS, 0, 0);
}
EXPORT_SYMBOL(jbd2_journal_start);

@@ -448,7 +458,14 @@ int jbd2_journal_extend(handle_t *handle, int nblocks)
goto unlock;
}

+ trace_jbd2_handle_extend(journal->j_fs_dev->bd_dev,
+ handle->h_transaction->t_tid,
+ handle->h_type, handle->h_line_no,
+ handle->h_buffer_credits,
+ nblocks);
+
handle->h_buffer_credits += nblocks;
+ handle->h_requested_credits += nblocks;
atomic_add(nblocks, &transaction->t_outstanding_credits);
result = 0;

@@ -1377,6 +1394,13 @@ int jbd2_journal_stop(handle_t *handle)
}

jbd_debug(4, "Handle %p going down\n", handle);
+ trace_jbd2_handle_stats(journal->j_fs_dev->bd_dev,
+ handle->h_transaction->t_tid,
+ handle->h_type, handle->h_line_no,
+ jiffies - handle->h_start_jiffies,
+ handle->h_sync, handle->h_requested_credits,
+ (handle->h_requested_credits -
+ handle->h_buffer_credits));

/*
* Implement synchronous transaction batching. If the handle
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 24db725..fa5fea1 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -400,6 +400,11 @@ struct jbd2_journal_handle
unsigned int h_sync: 1; /* sync-on-close */
unsigned int h_jdata: 1; /* force data journaling */
unsigned int h_aborted: 1; /* fatal error on handle */
+ unsigned int h_type: 8; /* for handle statistics */
+ unsigned int h_line_no: 16; /* for handle statistics */
+
+ unsigned long h_start_jiffies;
+ unsigned int h_requested_credits;

#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map h_lockdep_map;
@@ -1071,7 +1076,8 @@ static inline handle_t *journal_current_handle(void)
*/

extern handle_t *jbd2_journal_start(journal_t *, int nblocks);
-extern handle_t *jbd2__journal_start(journal_t *, int nblocks, gfp_t gfp_mask);
+extern handle_t *jbd2__journal_start(journal_t *, int nblocks, gfp_t gfp_mask,
+ unsigned int type, unsigned int line_no);
extern int jbd2_journal_restart(handle_t *, int nblocks);
extern int jbd2__journal_restart(handle_t *, int nblocks, gfp_t gfp_mask);
extern int jbd2_journal_extend (handle_t *, int nblocks);
diff --git a/include/trace/events/jbd2.h b/include/trace/events/jbd2.h
index 5419f57..070df49 100644
--- a/include/trace/events/jbd2.h
+++ b/include/trace/events/jbd2.h
@@ -132,6 +132,104 @@ TRACE_EVENT(jbd2_submit_inode_data,
(unsigned long) __entry->ino)
);

+TRACE_EVENT(jbd2_handle_start,
+ TP_PROTO(dev_t dev, unsigned long tid, unsigned int type,
+ unsigned int line_no, int requested_blocks),
+
+ TP_ARGS(dev, tid, type, line_no, requested_blocks),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __field( unsigned long, tid )
+ __field( unsigned int, type )
+ __field( unsigned int, line_no )
+ __field( int, requested_blocks)
+ ),
+
+ TP_fast_assign(
+ __entry->dev = dev;
+ __entry->tid = tid;
+ __entry->type = type;
+ __entry->line_no = line_no;
+ __entry->requested_blocks = requested_blocks;
+ ),
+
+ TP_printk("dev %d,%d tid %lu type %u line_no %u "
+ "requested_blocks %d",
+ MAJOR(__entry->dev), MINOR(__entry->dev), __entry->tid,
+ __entry->type, __entry->line_no, __entry->requested_blocks)
+);
+
+TRACE_EVENT(jbd2_handle_extend,
+ TP_PROTO(dev_t dev, unsigned long tid, unsigned int type,
+ unsigned int line_no, int buffer_credits,
+ int requested_blocks),
+
+ TP_ARGS(dev, tid, type, line_no, buffer_credits, requested_blocks),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __field( unsigned long, tid )
+ __field( unsigned int, type )
+ __field( unsigned int, line_no )
+ __field( int, buffer_credits )
+ __field( int, requested_blocks)
+ ),
+
+ TP_fast_assign(
+ __entry->dev = dev;
+ __entry->tid = tid;
+ __entry->type = type;
+ __entry->line_no = line_no;
+ __entry->buffer_credits = buffer_credits;
+ __entry->requested_blocks = requested_blocks;
+ ),
+
+ TP_printk("dev %d,%d tid %lu type %u line_no %u "
+ "buffer_credits %d requested_blocks %d",
+ MAJOR(__entry->dev), MINOR(__entry->dev), __entry->tid,
+ __entry->type, __entry->line_no, __entry->buffer_credits,
+ __entry->requested_blocks)
+);
+
+TRACE_EVENT(jbd2_handle_stats,
+ TP_PROTO(dev_t dev, unsigned long tid, unsigned int type,
+ unsigned int line_no, int interval, int sync,
+ int requested_blocks, int dirtied_blocks),
+
+ TP_ARGS(dev, tid, type, line_no, interval, sync,
+ requested_blocks, dirtied_blocks),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __field( unsigned long, tid )
+ __field( unsigned int, type )
+ __field( unsigned int, line_no )
+ __field( int, interval )
+ __field( int, sync )
+ __field( int, requested_blocks)
+ __field( int, dirtied_blocks )
+ ),
+
+ TP_fast_assign(
+ __entry->dev = dev;
+ __entry->tid = tid;
+ __entry->type = type;
+ __entry->line_no = line_no;
+ __entry->interval = interval;
+ __entry->sync = sync;
+ __entry->requested_blocks = requested_blocks;
+ __entry->dirtied_blocks = dirtied_blocks;
+ ),
+
+ TP_printk("dev %d,%d tid %lu type %u line_no %u interval %d "
+ "sync %d requested_blocks %d dirtied_blocks %d",
+ MAJOR(__entry->dev), MINOR(__entry->dev), __entry->tid,
+ __entry->type, __entry->line_no, __entry->interval,
+ __entry->sync, __entry->requested_blocks,
+ __entry->dirtied_blocks)
+);
+
TRACE_EVENT(jbd2_run_stats,
TP_PROTO(dev_t dev, unsigned long tid,
struct transaction_run_stats_s *stats),
--
1.7.12.rc0.22.gcdd159b


2013-02-10 13:49:43

by Tao Ma

[permalink] [raw]
Subject: Re: [PATCH 11/12] ext4: fix the number of credits needed for acl ops with inline data

On 02/10/2013 05:53 AM, Theodore Ts'o wrote:
> Operations which modify extended attributes may need extra journal
> credits if inline data is used, since there is a chance that some
> extended attributes may need to get pushed to an external attribute
> block.
>
> Changes to reflect this was made in xattr.c, but they were missed in
> fs/ext4/acl.c. To fix this, abstract the calculation of the number of
> credits needed for xattr operations to an inline function defined in
> ext4_jbd2.h, and use it in acl.c and xattr.c.
Oh, yes, I forgot about the acl.

Reviewed-by: Tao Ma <[email protected]>

Thanks,
Tao
>
> Also move the function declarations used in inline.c from xattr.h
> (where they are non-obviously hidden, and caused problems since
> ext4_jbd2.h needs to use the function ext4_has_inline_data), and move
> them to ext4.h.
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> Cc: Tao Ma <[email protected]>
> ---
> fs/ext4/acl.c | 4 +--
> fs/ext4/ext4.h | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> fs/ext4/ext4_jbd2.h | 14 +++++++++++
> fs/ext4/xattr.c | 9 +------
> fs/ext4/xattr.h | 68 ---------------------------------------------------
> 5 files changed, 87 insertions(+), 78 deletions(-)
>
> diff --git a/fs/ext4/acl.c b/fs/ext4/acl.c
> index 406cf8b..39a54a0 100644
> --- a/fs/ext4/acl.c
> +++ b/fs/ext4/acl.c
> @@ -325,7 +325,7 @@ ext4_acl_chmod(struct inode *inode)
> return error;
> retry:
> handle = ext4_journal_start(inode, EXT4_HT_XATTR,
> - EXT4_DATA_TRANS_BLOCKS(inode->i_sb));
> + ext4_jbd2_credits_xattr(inode));
> if (IS_ERR(handle)) {
> error = PTR_ERR(handle);
> ext4_std_error(inode->i_sb, error);
> @@ -423,7 +423,7 @@ ext4_xattr_set_acl(struct dentry *dentry, const char *name, const void *value,
>
> retry:
> handle = ext4_journal_start(inode, EXT4_HT_XATTR,
> - EXT4_DATA_TRANS_BLOCKS(inode->i_sb));
> + ext4_jbd2_credits_xattr(inode));
> if (IS_ERR(handle)) {
> error = PTR_ERR(handle);
> goto release_and_out;
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index c2c64da..9897cdf 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -2443,6 +2443,75 @@ extern const struct file_operations ext4_file_operations;
> extern loff_t ext4_llseek(struct file *file, loff_t offset, int origin);
> extern void ext4_unwritten_wait(struct inode *inode);
>
> +/* inline.c */
> +extern int ext4_has_inline_data(struct inode *inode);
> +extern int ext4_get_inline_size(struct inode *inode);
> +extern int ext4_get_max_inline_size(struct inode *inode);
> +extern int ext4_find_inline_data_nolock(struct inode *inode);
> +extern void ext4_write_inline_data(struct inode *inode,
> + struct ext4_iloc *iloc,
> + void *buffer, loff_t pos,
> + unsigned int len);
> +extern int ext4_prepare_inline_data(handle_t *handle, struct inode *inode,
> + unsigned int len);
> +extern int ext4_init_inline_data(handle_t *handle, struct inode *inode,
> + unsigned int len);
> +extern int ext4_destroy_inline_data(handle_t *handle, struct inode *inode);
> +
> +extern int ext4_readpage_inline(struct inode *inode, struct page *page);
> +extern int ext4_try_to_write_inline_data(struct address_space *mapping,
> + struct inode *inode,
> + loff_t pos, unsigned len,
> + unsigned flags,
> + struct page **pagep);
> +extern int ext4_write_inline_data_end(struct inode *inode,
> + loff_t pos, unsigned len,
> + unsigned copied,
> + struct page *page);
> +extern struct buffer_head *
> +ext4_journalled_write_inline_data(struct inode *inode,
> + unsigned len,
> + struct page *page);
> +extern int ext4_da_write_inline_data_begin(struct address_space *mapping,
> + struct inode *inode,
> + loff_t pos, unsigned len,
> + unsigned flags,
> + struct page **pagep,
> + void **fsdata);
> +extern int ext4_da_write_inline_data_end(struct inode *inode, loff_t pos,
> + unsigned len, unsigned copied,
> + struct page *page);
> +extern int ext4_try_add_inline_entry(handle_t *handle, struct dentry *dentry,
> + struct inode *inode);
> +extern int ext4_try_create_inline_dir(handle_t *handle,
> + struct inode *parent,
> + struct inode *inode);
> +extern int ext4_read_inline_dir(struct file *filp,
> + void *dirent, filldir_t filldir,
> + int *has_inline_data);
> +extern struct buffer_head *ext4_find_inline_entry(struct inode *dir,
> + const struct qstr *d_name,
> + struct ext4_dir_entry_2 **res_dir,
> + int *has_inline_data);
> +extern int ext4_delete_inline_entry(handle_t *handle,
> + struct inode *dir,
> + struct ext4_dir_entry_2 *de_del,
> + struct buffer_head *bh,
> + int *has_inline_data);
> +extern int empty_inline_dir(struct inode *dir, int *has_inline_data);
> +extern struct buffer_head *ext4_get_first_inline_block(struct inode *inode,
> + struct ext4_dir_entry_2 **parent_de,
> + int *retval);
> +extern int ext4_inline_data_fiemap(struct inode *inode,
> + struct fiemap_extent_info *fieinfo,
> + int *has_inline);
> +extern int ext4_try_to_evict_inline_data(handle_t *handle,
> + struct inode *inode,
> + int needed);
> +extern void ext4_inline_data_truncate(struct inode *inode, int *has_inline);
> +
> +extern int ext4_convert_inline_data(struct inode *inode);
> +
> /* namei.c */
> extern const struct inode_operations ext4_dir_inode_operations;
> extern const struct inode_operations ext4_special_inode_operations;
> @@ -2539,6 +2608,7 @@ extern void ext4_mmp_csum_set(struct super_block *sb, struct mmp_struct *mmp);
> extern int ext4_mmp_csum_verify(struct super_block *sb,
> struct mmp_struct *mmp);
>
> +
> /* BH_Uninit flag: blocks are allocated but uninitialized on disk */
> enum ext4_state_bits {
> BH_Uninit /* blocks are allocated but uninitialized on disk */
> diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
> index c1fc2dc..4c216b1 100644
> --- a/fs/ext4/ext4_jbd2.h
> +++ b/fs/ext4/ext4_jbd2.h
> @@ -104,6 +104,20 @@
> #define EXT4_MAXQUOTAS_INIT_BLOCKS(sb) (MAXQUOTAS*EXT4_QUOTA_INIT_BLOCKS(sb))
> #define EXT4_MAXQUOTAS_DEL_BLOCKS(sb) (MAXQUOTAS*EXT4_QUOTA_DEL_BLOCKS(sb))
>
> +static inline int ext4_jbd2_credits_xattr(struct inode *inode)
> +{
> + int credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb);
> +
> + /*
> + * In case of inline data, we may push out the data to a block,
> + * so we need to reserve credits for this eventuality
> + */
> + if (ext4_has_inline_data(inode))
> + credits += ext4_writepage_trans_blocks(inode) + 1;
> + return credits;
> +}
> +
> +
> /*
> * Ext4 handle operation types -- for logging purposes
> */
> diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
> index 2efc560..cc31da0 100644
> --- a/fs/ext4/xattr.c
> +++ b/fs/ext4/xattr.c
> @@ -1165,16 +1165,9 @@ ext4_xattr_set(struct inode *inode, int name_index, const char *name,
> {
> handle_t *handle;
> int error, retries = 0;
> - int credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb);
> + int credits = ext4_jbd2_credits_xattr(inode);
>
> retry:
> - /*
> - * In case of inline data, we may push out the data to a block,
> - * So reserve the journal space first.
> - */
> - if (ext4_has_inline_data(inode))
> - credits += ext4_writepage_trans_blocks(inode) + 1;
> -
> handle = ext4_journal_start(inode, EXT4_HT_XATTR, credits);
> if (IS_ERR(handle)) {
> error = PTR_ERR(handle);
> diff --git a/fs/ext4/xattr.h b/fs/ext4/xattr.h
> index 69eda78..aa25deb 100644
> --- a/fs/ext4/xattr.h
> +++ b/fs/ext4/xattr.h
> @@ -125,74 +125,6 @@ extern int ext4_xattr_ibody_inline_set(handle_t *handle, struct inode *inode,
> struct ext4_xattr_info *i,
> struct ext4_xattr_ibody_find *is);
>
> -extern int ext4_has_inline_data(struct inode *inode);
> -extern int ext4_get_inline_size(struct inode *inode);
> -extern int ext4_get_max_inline_size(struct inode *inode);
> -extern int ext4_find_inline_data_nolock(struct inode *inode);
> -extern void ext4_write_inline_data(struct inode *inode,
> - struct ext4_iloc *iloc,
> - void *buffer, loff_t pos,
> - unsigned int len);
> -extern int ext4_prepare_inline_data(handle_t *handle, struct inode *inode,
> - unsigned int len);
> -extern int ext4_init_inline_data(handle_t *handle, struct inode *inode,
> - unsigned int len);
> -extern int ext4_destroy_inline_data(handle_t *handle, struct inode *inode);
> -
> -extern int ext4_readpage_inline(struct inode *inode, struct page *page);
> -extern int ext4_try_to_write_inline_data(struct address_space *mapping,
> - struct inode *inode,
> - loff_t pos, unsigned len,
> - unsigned flags,
> - struct page **pagep);
> -extern int ext4_write_inline_data_end(struct inode *inode,
> - loff_t pos, unsigned len,
> - unsigned copied,
> - struct page *page);
> -extern struct buffer_head *
> -ext4_journalled_write_inline_data(struct inode *inode,
> - unsigned len,
> - struct page *page);
> -extern int ext4_da_write_inline_data_begin(struct address_space *mapping,
> - struct inode *inode,
> - loff_t pos, unsigned len,
> - unsigned flags,
> - struct page **pagep,
> - void **fsdata);
> -extern int ext4_da_write_inline_data_end(struct inode *inode, loff_t pos,
> - unsigned len, unsigned copied,
> - struct page *page);
> -extern int ext4_try_add_inline_entry(handle_t *handle, struct dentry *dentry,
> - struct inode *inode);
> -extern int ext4_try_create_inline_dir(handle_t *handle,
> - struct inode *parent,
> - struct inode *inode);
> -extern int ext4_read_inline_dir(struct file *filp,
> - void *dirent, filldir_t filldir,
> - int *has_inline_data);
> -extern struct buffer_head *ext4_find_inline_entry(struct inode *dir,
> - const struct qstr *d_name,
> - struct ext4_dir_entry_2 **res_dir,
> - int *has_inline_data);
> -extern int ext4_delete_inline_entry(handle_t *handle,
> - struct inode *dir,
> - struct ext4_dir_entry_2 *de_del,
> - struct buffer_head *bh,
> - int *has_inline_data);
> -extern int empty_inline_dir(struct inode *dir, int *has_inline_data);
> -extern struct buffer_head *ext4_get_first_inline_block(struct inode *inode,
> - struct ext4_dir_entry_2 **parent_de,
> - int *retval);
> -extern int ext4_inline_data_fiemap(struct inode *inode,
> - struct fiemap_extent_info *fieinfo,
> - int *has_inline);
> -extern int ext4_try_to_evict_inline_data(handle_t *handle,
> - struct inode *inode,
> - int needed);
> -extern void ext4_inline_data_truncate(struct inode *inode, int *has_inline);
> -
> -extern int ext4_convert_inline_data(struct inode *inode);
> -
> #ifdef CONFIG_EXT4_FS_SECURITY
> extern int ext4_init_security(handle_t *handle, struct inode *inode,
> struct inode *dir, const struct qstr *qstr);
>


2013-02-10 18:15:55

by Raymond Jennings

[permalink] [raw]
Subject: Re: [PATCH 11/12] ext4: fix the number of credits needed for acl ops with inline data

Quick question from a n00b.

What are "credits"?

I imagine I'm not the only onlooker curious about ti either.

On Sun, Feb 10, 2013 at 5:42 AM, Tao Ma <[email protected]> wrote:
> On 02/10/2013 05:53 AM, Theodore Ts'o wrote:
>> Operations which modify extended attributes may need extra journal
>> credits if inline data is used, since there is a chance that some
>> extended attributes may need to get pushed to an external attribute
>> block.
>>
>> Changes to reflect this was made in xattr.c, but they were missed in
>> fs/ext4/acl.c. To fix this, abstract the calculation of the number of
>> credits needed for xattr operations to an inline function defined in
>> ext4_jbd2.h, and use it in acl.c and xattr.c.
> Oh, yes, I forgot about the acl.
>
> Reviewed-by: Tao Ma <[email protected]>
>
> Thanks,
> Tao
>>
>> Also move the function declarations used in inline.c from xattr.h
>> (where they are non-obviously hidden, and caused problems since
>> ext4_jbd2.h needs to use the function ext4_has_inline_data), and move
>> them to ext4.h.
>>
>> Signed-off-by: "Theodore Ts'o" <[email protected]>
>> Cc: Tao Ma <[email protected]>
>> ---
>> fs/ext4/acl.c | 4 +--
>> fs/ext4/ext4.h | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>> fs/ext4/ext4_jbd2.h | 14 +++++++++++
>> fs/ext4/xattr.c | 9 +------
>> fs/ext4/xattr.h | 68 ---------------------------------------------------
>> 5 files changed, 87 insertions(+), 78 deletions(-)
>>
>> diff --git a/fs/ext4/acl.c b/fs/ext4/acl.c
>> index 406cf8b..39a54a0 100644
>> --- a/fs/ext4/acl.c
>> +++ b/fs/ext4/acl.c
>> @@ -325,7 +325,7 @@ ext4_acl_chmod(struct inode *inode)
>> return error;
>> retry:
>> handle = ext4_journal_start(inode, EXT4_HT_XATTR,
>> - EXT4_DATA_TRANS_BLOCKS(inode->i_sb));
>> + ext4_jbd2_credits_xattr(inode));
>> if (IS_ERR(handle)) {
>> error = PTR_ERR(handle);
>> ext4_std_error(inode->i_sb, error);
>> @@ -423,7 +423,7 @@ ext4_xattr_set_acl(struct dentry *dentry, const char *name, const void *value,
>>
>> retry:
>> handle = ext4_journal_start(inode, EXT4_HT_XATTR,
>> - EXT4_DATA_TRANS_BLOCKS(inode->i_sb));
>> + ext4_jbd2_credits_xattr(inode));
>> if (IS_ERR(handle)) {
>> error = PTR_ERR(handle);
>> goto release_and_out;
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index c2c64da..9897cdf 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -2443,6 +2443,75 @@ extern const struct file_operations ext4_file_operations;
>> extern loff_t ext4_llseek(struct file *file, loff_t offset, int origin);
>> extern void ext4_unwritten_wait(struct inode *inode);
>>
>> +/* inline.c */
>> +extern int ext4_has_inline_data(struct inode *inode);
>> +extern int ext4_get_inline_size(struct inode *inode);
>> +extern int ext4_get_max_inline_size(struct inode *inode);
>> +extern int ext4_find_inline_data_nolock(struct inode *inode);
>> +extern void ext4_write_inline_data(struct inode *inode,
>> + struct ext4_iloc *iloc,
>> + void *buffer, loff_t pos,
>> + unsigned int len);
>> +extern int ext4_prepare_inline_data(handle_t *handle, struct inode *inode,
>> + unsigned int len);
>> +extern int ext4_init_inline_data(handle_t *handle, struct inode *inode,
>> + unsigned int len);
>> +extern int ext4_destroy_inline_data(handle_t *handle, struct inode *inode);
>> +
>> +extern int ext4_readpage_inline(struct inode *inode, struct page *page);
>> +extern int ext4_try_to_write_inline_data(struct address_space *mapping,
>> + struct inode *inode,
>> + loff_t pos, unsigned len,
>> + unsigned flags,
>> + struct page **pagep);
>> +extern int ext4_write_inline_data_end(struct inode *inode,
>> + loff_t pos, unsigned len,
>> + unsigned copied,
>> + struct page *page);
>> +extern struct buffer_head *
>> +ext4_journalled_write_inline_data(struct inode *inode,
>> + unsigned len,
>> + struct page *page);
>> +extern int ext4_da_write_inline_data_begin(struct address_space *mapping,
>> + struct inode *inode,
>> + loff_t pos, unsigned len,
>> + unsigned flags,
>> + struct page **pagep,
>> + void **fsdata);
>> +extern int ext4_da_write_inline_data_end(struct inode *inode, loff_t pos,
>> + unsigned len, unsigned copied,
>> + struct page *page);
>> +extern int ext4_try_add_inline_entry(handle_t *handle, struct dentry *dentry,
>> + struct inode *inode);
>> +extern int ext4_try_create_inline_dir(handle_t *handle,
>> + struct inode *parent,
>> + struct inode *inode);
>> +extern int ext4_read_inline_dir(struct file *filp,
>> + void *dirent, filldir_t filldir,
>> + int *has_inline_data);
>> +extern struct buffer_head *ext4_find_inline_entry(struct inode *dir,
>> + const struct qstr *d_name,
>> + struct ext4_dir_entry_2 **res_dir,
>> + int *has_inline_data);
>> +extern int ext4_delete_inline_entry(handle_t *handle,
>> + struct inode *dir,
>> + struct ext4_dir_entry_2 *de_del,
>> + struct buffer_head *bh,
>> + int *has_inline_data);
>> +extern int empty_inline_dir(struct inode *dir, int *has_inline_data);
>> +extern struct buffer_head *ext4_get_first_inline_block(struct inode *inode,
>> + struct ext4_dir_entry_2 **parent_de,
>> + int *retval);
>> +extern int ext4_inline_data_fiemap(struct inode *inode,
>> + struct fiemap_extent_info *fieinfo,
>> + int *has_inline);
>> +extern int ext4_try_to_evict_inline_data(handle_t *handle,
>> + struct inode *inode,
>> + int needed);
>> +extern void ext4_inline_data_truncate(struct inode *inode, int *has_inline);
>> +
>> +extern int ext4_convert_inline_data(struct inode *inode);
>> +
>> /* namei.c */
>> extern const struct inode_operations ext4_dir_inode_operations;
>> extern const struct inode_operations ext4_special_inode_operations;
>> @@ -2539,6 +2608,7 @@ extern void ext4_mmp_csum_set(struct super_block *sb, struct mmp_struct *mmp);
>> extern int ext4_mmp_csum_verify(struct super_block *sb,
>> struct mmp_struct *mmp);
>>
>> +
>> /* BH_Uninit flag: blocks are allocated but uninitialized on disk */
>> enum ext4_state_bits {
>> BH_Uninit /* blocks are allocated but uninitialized on disk */
>> diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
>> index c1fc2dc..4c216b1 100644
>> --- a/fs/ext4/ext4_jbd2.h
>> +++ b/fs/ext4/ext4_jbd2.h
>> @@ -104,6 +104,20 @@
>> #define EXT4_MAXQUOTAS_INIT_BLOCKS(sb) (MAXQUOTAS*EXT4_QUOTA_INIT_BLOCKS(sb))
>> #define EXT4_MAXQUOTAS_DEL_BLOCKS(sb) (MAXQUOTAS*EXT4_QUOTA_DEL_BLOCKS(sb))
>>
>> +static inline int ext4_jbd2_credits_xattr(struct inode *inode)
>> +{
>> + int credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb);
>> +
>> + /*
>> + * In case of inline data, we may push out the data to a block,
>> + * so we need to reserve credits for this eventuality
>> + */
>> + if (ext4_has_inline_data(inode))
>> + credits += ext4_writepage_trans_blocks(inode) + 1;
>> + return credits;
>> +}
>> +
>> +
>> /*
>> * Ext4 handle operation types -- for logging purposes
>> */
>> diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
>> index 2efc560..cc31da0 100644
>> --- a/fs/ext4/xattr.c
>> +++ b/fs/ext4/xattr.c
>> @@ -1165,16 +1165,9 @@ ext4_xattr_set(struct inode *inode, int name_index, const char *name,
>> {
>> handle_t *handle;
>> int error, retries = 0;
>> - int credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb);
>> + int credits = ext4_jbd2_credits_xattr(inode);
>>
>> retry:
>> - /*
>> - * In case of inline data, we may push out the data to a block,
>> - * So reserve the journal space first.
>> - */
>> - if (ext4_has_inline_data(inode))
>> - credits += ext4_writepage_trans_blocks(inode) + 1;
>> -
>> handle = ext4_journal_start(inode, EXT4_HT_XATTR, credits);
>> if (IS_ERR(handle)) {
>> error = PTR_ERR(handle);
>> diff --git a/fs/ext4/xattr.h b/fs/ext4/xattr.h
>> index 69eda78..aa25deb 100644
>> --- a/fs/ext4/xattr.h
>> +++ b/fs/ext4/xattr.h
>> @@ -125,74 +125,6 @@ extern int ext4_xattr_ibody_inline_set(handle_t *handle, struct inode *inode,
>> struct ext4_xattr_info *i,
>> struct ext4_xattr_ibody_find *is);
>>
>> -extern int ext4_has_inline_data(struct inode *inode);
>> -extern int ext4_get_inline_size(struct inode *inode);
>> -extern int ext4_get_max_inline_size(struct inode *inode);
>> -extern int ext4_find_inline_data_nolock(struct inode *inode);
>> -extern void ext4_write_inline_data(struct inode *inode,
>> - struct ext4_iloc *iloc,
>> - void *buffer, loff_t pos,
>> - unsigned int len);
>> -extern int ext4_prepare_inline_data(handle_t *handle, struct inode *inode,
>> - unsigned int len);
>> -extern int ext4_init_inline_data(handle_t *handle, struct inode *inode,
>> - unsigned int len);
>> -extern int ext4_destroy_inline_data(handle_t *handle, struct inode *inode);
>> -
>> -extern int ext4_readpage_inline(struct inode *inode, struct page *page);
>> -extern int ext4_try_to_write_inline_data(struct address_space *mapping,
>> - struct inode *inode,
>> - loff_t pos, unsigned len,
>> - unsigned flags,
>> - struct page **pagep);
>> -extern int ext4_write_inline_data_end(struct inode *inode,
>> - loff_t pos, unsigned len,
>> - unsigned copied,
>> - struct page *page);
>> -extern struct buffer_head *
>> -ext4_journalled_write_inline_data(struct inode *inode,
>> - unsigned len,
>> - struct page *page);
>> -extern int ext4_da_write_inline_data_begin(struct address_space *mapping,
>> - struct inode *inode,
>> - loff_t pos, unsigned len,
>> - unsigned flags,
>> - struct page **pagep,
>> - void **fsdata);
>> -extern int ext4_da_write_inline_data_end(struct inode *inode, loff_t pos,
>> - unsigned len, unsigned copied,
>> - struct page *page);
>> -extern int ext4_try_add_inline_entry(handle_t *handle, struct dentry *dentry,
>> - struct inode *inode);
>> -extern int ext4_try_create_inline_dir(handle_t *handle,
>> - struct inode *parent,
>> - struct inode *inode);
>> -extern int ext4_read_inline_dir(struct file *filp,
>> - void *dirent, filldir_t filldir,
>> - int *has_inline_data);
>> -extern struct buffer_head *ext4_find_inline_entry(struct inode *dir,
>> - const struct qstr *d_name,
>> - struct ext4_dir_entry_2 **res_dir,
>> - int *has_inline_data);
>> -extern int ext4_delete_inline_entry(handle_t *handle,
>> - struct inode *dir,
>> - struct ext4_dir_entry_2 *de_del,
>> - struct buffer_head *bh,
>> - int *has_inline_data);
>> -extern int empty_inline_dir(struct inode *dir, int *has_inline_data);
>> -extern struct buffer_head *ext4_get_first_inline_block(struct inode *inode,
>> - struct ext4_dir_entry_2 **parent_de,
>> - int *retval);
>> -extern int ext4_inline_data_fiemap(struct inode *inode,
>> - struct fiemap_extent_info *fieinfo,
>> - int *has_inline);
>> -extern int ext4_try_to_evict_inline_data(handle_t *handle,
>> - struct inode *inode,
>> - int needed);
>> -extern void ext4_inline_data_truncate(struct inode *inode, int *has_inline);
>> -
>> -extern int ext4_convert_inline_data(struct inode *inode);
>> -
>> #ifdef CONFIG_EXT4_FS_SECURITY
>> extern int ext4_init_security(handle_t *handle, struct inode *inode,
>> struct inode *dir, const struct qstr *qstr);
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-02-10 19:43:18

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 11/12] ext4: fix the number of credits needed for acl ops with inline data

On Sun, Feb 10, 2013 at 10:15:14AM -0800, Shentino wrote:
> Quick question from a n00b.
>
> What are "credits"?
>
> I imagine I'm not the only onlooker curious about ti either.

A group of operations which need to be applied to the file system as
an atomic set of block updates are grouped by handles. This is very
to "BEGIN TRANSACTION" and "END TRANSACTION" pairs in SQL. For example:

BEGIN TRANSACTION
INSERT INTO INODE_TABLE (ino, i_blocks, i_size, i_uid) VALUES
(11, 2, 1024, 12);
UPDATE INODE_ALLOCATION_BITMAP SET in_use=TRUE where ino=11;
END TRANSACTION

The reason why we use the terminology "starting a handle" instead of
"starting a transaction" is because a large number of handles are
bundled together to form a jbd2 transaction. This is because commits
are expensive, and for performance reasons we batch a large number of
handles, or perhaps what you might call "micro-transactions" into a
one large jbd2 transaction, which in general gets commited once every
five seconds or so, unless a commit is explicitly requested, or we
start running out of space in the journal.

In order to make sure that we don't run out of space, before we start
a handle, we have to declare how the maximum number of blocks we are
planning on modifying --- this reservation of space in the journal is
also referred as journal credits, since each time we call
ext4_journal_get_write_access(), if that block is not already part of
the transaction, the number of credits associated with the handle will
be decremented by one.

When we open a handle, if there is not sufficient space (credits) left
in the current transaction, then start_this_handle() will request that
the current transaction be committed, and then block until the a new
transaction can be started.

When we are done, we close the handle. If it turns out that we didn't
need to modify as many blocks as we had declared when we opened the
handle, or if some of the blocks that we modified were already part of
the transaction, those excess credits are returned to the transaction.

So as an example, suppose we need to modify two metadata blocks
associated with an inode. That sequence might look like this:

/* start a handle with two credits */
/* note: this can trigger a journal commit if there
is not enough space left in the journal */
handle = ext4_journal_start(inode, 2);

ext4_journal_get_write_access(handle, bh);
<modify the metadata block found in bh>
ext4_handle_dirty_metadata(handle, inode, bh)

ext4_journal_get_write_access(handle, bh2);
<modify the metadata block found in bh2>
ext4_handle_dirty_metadata(handle, inode, bh2)

/* note: this can trigger a journal commit if the handle
is marked synchronous, or if enough time has elapsed --- i.e.,
the 5 second commit timer. */
ext4_journal_stop(handle);

In the case where we are running in no journal mode,
ext4_journal_start(), ext4_journal_get_write_access(), and
ext4_journal_stop() are effectively no-ops, and
ext4_handle_dirty_metadata() is mapped to mark_buffer_dirty_inode(bh,
inode), or mark_buffer_dirty(bh) if inode is NULL. (inode is NULL if
we are modifying metadata blocks belonging to a specific inode, such
as a block group descriptor block or an inode table block.)

Hope this helps / is interesting to those who are following along at
home. :-)

- Ted

2013-02-11 01:47:52

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 12/12] ext4: start handle at the last possible moment when creating inodes

On Sat, Feb 09, 2013 at 04:53:52PM -0500, Theodore Ts'o wrote:
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index 10bd6fe..943665b 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -725,6 +727,15 @@ repeat_in_this_group:
> "inode=%lu", ino + 1);
> continue;
> }
> + if (!handle) {
> + BUG_ON(nblocks == 0);

I've changed this to be:

BUG_ON(nblocks <= 0);

... since nblocks should (obviously) never be negative.

- Ted

2013-02-11 15:53:01

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 00/12] jbd2 optimization and bug fixes

On Sat 09-02-13 16:53:40, Ted Tso wrote:
> I've been recently looking at journalling code in ext4, and it's clear
> that no one has really been through this code in a while. It's easy to
> find some easy optimization.
>
> Some general rules of thumb that developers should keep in mind when
> making changes to ext4:
>
> 1) It's important to minimize the amount of time that a handle is held
> active, since a journal commit can't be closed until all handles have
> been stopped. In the ideal world, disk reads and memory allocations
> should be done *before* calling ext4_journal_start().
>
> 2) It's important that number of credits needed for a particular handle
> be large enough; otherwise it's possible that we run out of space in the
> journal without being able to finish the handle.
>
> 3) It's also important that we not overestimate then number of credits
> needed, since otherwise we might close a transaction too early --- and
> in particular, the process which starts the transaction commit process
> will tend to suffer the greatest amount of latency.
True but it's not too bad - we give back the credits we didn't use once
the current handle is dropped. But if there are many handles open in
parallel or if the overestimate is really huge, it will have visible
effects. So I agree it's good to keep the estimate as close as resonably
possible to reality.

> Currently enabling quotas cause the number of credits required to
> balloon significantly. Most of the time we need nowhere the number of
> credits which we reserve, so there's opportunity to optimize things.
> (For example, we could make sure create the quota record is created in a
> seprate transaction.)
So write(2) and similar calls are already optimized. You are right that
create(2) or unlink(2) still overestimate needed credits only due to
remotely possible worst case.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-02-11 15:57:48

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 01/12] jbd2: track request delay statistics

On Sat 09-02-13 16:53:41, Ted Tso wrote:
> Track the delay between when we first request that the commit begin
> and when it actually begins, so we can see how much of a gap exists.
> In theory, this should just be the remaining scheduling quantuum of
> the thread which requested the commit (assuming it was not a
> synchronous operation which triggered the commit request) plus
> scheduling overhead; however, it's possible that real time processes
> might get in the way of letting the kjournald thread from executing.
Looks good. I just wonder whether jiffies really have enough precision to
get you meaningful numbers. Did you try that?

Honza
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> fs/jbd2/commit.c | 8 ++++++++
> fs/jbd2/journal.c | 12 +++++++++---
> fs/jbd2/transaction.c | 1 +
> include/linux/jbd2.h | 7 +++++++
> include/trace/events/jbd2.h | 8 ++++++--
> 5 files changed, 31 insertions(+), 5 deletions(-)
>
> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> index 3091d42..750c701 100644
> --- a/fs/jbd2/commit.c
> +++ b/fs/jbd2/commit.c
> @@ -435,7 +435,12 @@ void jbd2_journal_commit_transaction(journal_t *journal)
>
> trace_jbd2_commit_locking(journal, commit_transaction);
> stats.run.rs_wait = commit_transaction->t_max_wait;
> + stats.run.rs_request_delay = 0;
> stats.run.rs_locked = jiffies;
> + if (commit_transaction->t_requested)
> + stats.run.rs_request_delay =
> + jbd2_time_diff(commit_transaction->t_requested,
> + stats.run.rs_locked);
> stats.run.rs_running = jbd2_time_diff(commit_transaction->t_start,
> stats.run.rs_locked);
>
> @@ -1116,7 +1121,10 @@ restart_loop:
> */
> spin_lock(&journal->j_history_lock);
> journal->j_stats.ts_tid++;
> + if (commit_transaction->t_requested)
> + journal->j_stats.ts_requested++;
> journal->j_stats.run.rs_wait += stats.run.rs_wait;
> + journal->j_stats.run.rs_request_delay += stats.run.rs_request_delay;
> journal->j_stats.run.rs_running += stats.run.rs_running;
> journal->j_stats.run.rs_locked += stats.run.rs_locked;
> journal->j_stats.run.rs_flushing += stats.run.rs_flushing;
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index 1a80e31..4ba2e81 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -533,6 +533,7 @@ int __jbd2_log_start_commit(journal_t *journal, tid_t target)
> jbd_debug(1, "JBD2: requesting commit %d/%d\n",
> journal->j_commit_request,
> journal->j_commit_sequence);
> + journal->j_running_transaction->t_requested = jiffies;
> wake_up(&journal->j_wait_commit);
> return 1;
> } else if (!tid_geq(journal->j_commit_request, target))
> @@ -898,13 +899,18 @@ static int jbd2_seq_info_show(struct seq_file *seq, void *v)
>
> if (v != SEQ_START_TOKEN)
> return 0;
> - seq_printf(seq, "%lu transaction, each up to %u blocks\n",
> - s->stats->ts_tid,
> - s->journal->j_max_transaction_buffers);
> + seq_printf(seq, "%lu transactions (%lu requested), "
> + "each up to %u blocks\n",
> + s->stats->ts_tid, s->stats->ts_requested,
> + s->journal->j_max_transaction_buffers);
> if (s->stats->ts_tid == 0)
> return 0;
> seq_printf(seq, "average: \n %ums waiting for transaction\n",
> jiffies_to_msecs(s->stats->run.rs_wait / s->stats->ts_tid));
> + seq_printf(seq, " %ums request delay\n",
> + (s->stats->ts_requested == 0) ? 0 :
> + jiffies_to_msecs(s->stats->run.rs_request_delay /
> + s->stats->ts_requested));
> seq_printf(seq, " %ums running transaction\n",
> jiffies_to_msecs(s->stats->run.rs_running / s->stats->ts_tid));
> seq_printf(seq, " %ums transaction was being locked\n",
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index df9f297..735609e 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -100,6 +100,7 @@ jbd2_get_transaction(journal_t *journal, transaction_t *transaction)
> journal->j_running_transaction = transaction;
> transaction->t_max_wait = 0;
> transaction->t_start = jiffies;
> + transaction->t_requested = 0;
>
> return transaction;
> }
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index e30b663..e0aafc4 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -581,6 +581,11 @@ struct transaction_s
> unsigned long t_start;
>
> /*
> + * When commit was requested
> + */
> + unsigned long t_requested;
> +
> + /*
> * Checkpointing stats [j_checkpoint_sem]
> */
> struct transaction_chp_stats_s t_chp_stats;
> @@ -637,6 +642,7 @@ struct transaction_s
>
> struct transaction_run_stats_s {
> unsigned long rs_wait;
> + unsigned long rs_request_delay;
> unsigned long rs_running;
> unsigned long rs_locked;
> unsigned long rs_flushing;
> @@ -649,6 +655,7 @@ struct transaction_run_stats_s {
>
> struct transaction_stats_s {
> unsigned long ts_tid;
> + unsigned long ts_requested;
> struct transaction_run_stats_s run;
> };
>
> diff --git a/include/trace/events/jbd2.h b/include/trace/events/jbd2.h
> index 127993d..5419f57 100644
> --- a/include/trace/events/jbd2.h
> +++ b/include/trace/events/jbd2.h
> @@ -142,6 +142,7 @@ TRACE_EVENT(jbd2_run_stats,
> __field( dev_t, dev )
> __field( unsigned long, tid )
> __field( unsigned long, wait )
> + __field( unsigned long, request_delay )
> __field( unsigned long, running )
> __field( unsigned long, locked )
> __field( unsigned long, flushing )
> @@ -155,6 +156,7 @@ TRACE_EVENT(jbd2_run_stats,
> __entry->dev = dev;
> __entry->tid = tid;
> __entry->wait = stats->rs_wait;
> + __entry->request_delay = stats->rs_request_delay;
> __entry->running = stats->rs_running;
> __entry->locked = stats->rs_locked;
> __entry->flushing = stats->rs_flushing;
> @@ -164,10 +166,12 @@ TRACE_EVENT(jbd2_run_stats,
> __entry->blocks_logged = stats->rs_blocks_logged;
> ),
>
> - TP_printk("dev %d,%d tid %lu wait %u running %u locked %u flushing %u "
> - "logging %u handle_count %u blocks %u blocks_logged %u",
> + TP_printk("dev %d,%d tid %lu wait %u request_delay %u running %u "
> + "locked %u flushing %u logging %u handle_count %u "
> + "blocks %u blocks_logged %u",
> MAJOR(__entry->dev), MINOR(__entry->dev), __entry->tid,
> jiffies_to_msecs(__entry->wait),
> + jiffies_to_msecs(__entry->request_delay),
> jiffies_to_msecs(__entry->running),
> jiffies_to_msecs(__entry->locked),
> jiffies_to_msecs(__entry->flushing),
> --
> 1.7.12.rc0.22.gcdd159b
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-02-11 15:58:39

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 02/12] jbd2: revert "jbd2: add COW fields to struct jbd2_journal_handle"

On Sat 09-02-13 16:53:42, Ted Tso wrote:
> This reverts commit 93737456d68ddcb86232f669b83da673dd12e351.
>
> The cow-snapshots effort is no longer active, so remove these extra
> fields to shrink down the handle structure.
Looks good. You can add:
Reviewed-by: Jan Kara <[email protected]>

Honza

>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> include/linux/jbd2.h | 28 +++-------------------------
> 1 file changed, 3 insertions(+), 25 deletions(-)
>
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index e0aafc4..24db725 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -397,35 +397,13 @@ struct jbd2_journal_handle
> int h_err;
>
> /* Flags [no locking] */
> - unsigned int h_sync:1; /* sync-on-close */
> - unsigned int h_jdata:1; /* force data journaling */
> - unsigned int h_aborted:1; /* fatal error on handle */
> - unsigned int h_cowing:1; /* COWing block to snapshot */
> -
> - /* Number of buffers requested by user:
> - * (before adding the COW credits factor) */
> - unsigned int h_base_credits:14;
> -
> - /* Number of buffers the user is allowed to dirty:
> - * (counts only buffers dirtied when !h_cowing) */
> - unsigned int h_user_credits:14;
> -
> + unsigned int h_sync: 1; /* sync-on-close */
> + unsigned int h_jdata: 1; /* force data journaling */
> + unsigned int h_aborted: 1; /* fatal error on handle */
>
> #ifdef CONFIG_DEBUG_LOCK_ALLOC
> struct lockdep_map h_lockdep_map;
> #endif
> -
> -#ifdef CONFIG_JBD2_DEBUG
> - /* COW debugging counters: */
> - unsigned int h_cow_moved; /* blocks moved to snapshot */
> - unsigned int h_cow_copied; /* blocks copied to snapshot */
> - unsigned int h_cow_ok_jh; /* blocks already COWed during current
> - transaction */
> - unsigned int h_cow_ok_bitmap; /* blocks not set in COW bitmap */
> - unsigned int h_cow_ok_mapped;/* blocks already mapped in snapshot */
> - unsigned int h_cow_bitmaps; /* COW bitmaps created */
> - unsigned int h_cow_excluded; /* blocks set in exclude bitmap */
> -#endif
> };
>
>
> --
> 1.7.12.rc0.22.gcdd159b
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-02-11 16:01:01

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH 12/12] ext4: start handle at the last possible moment when creating inodes

On 2013-02-09, at 14:53, Theodore Ts'o <[email protected]> wrote:

> In ext4_{create,mknod,mkdir,symlink}(), don't start the journal handle
> until the inode has been succesfully allocated. In order to do this,
> we need to start the handle in the ext4_new_inode(). So create a new
> variant of this function, ext4_new_inode_start_handle(), so the handle
> can be created at the last possible minute, before we need to modify
> the inode allocation bitmap block.
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> fs/ext4/ext4.h | 17 ++++++++--
> fs/ext4/ialloc.c | 15 +++++++--
> fs/ext4/namei.c | 94 ++++++++++++++++++++++++++------------------------------
> 3 files changed, 71 insertions(+), 55 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 9897cdf..05940ce 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1991,9 +1991,20 @@ extern int ext4fs_dirhash(const char *name, int len, struct
> dx_hash_info *hinfo);
>
> /* ialloc.c */
> -extern struct inode *ext4_new_inode(handle_t *, struct inode *, umode_t,
> - const struct qstr *qstr, __u32 goal,
> - uid_t *owner);
> +extern struct inode *__ext4_new_inode(handle_t *, struct inode *, umode_t,
> + const struct qstr *qstr, __u32 goal,
> + uid_t *owner, int handle_type,
> + unsigned int line_no, int nblocks);
> +
> +#define ext4_new_inode(handle, dir, mode, qstr, goal, owner) \
> + __ext4_new_inode((handle), (dir), (mode), (qstr), (goal), (owner), \
> + 0, 0, 0)
> +#define ext4_new_inode_start_handle(dir, mode, qstr, goal, owner, \
> + type, nblocks) \
> + __ext4_new_inode(0, (dir), (mode), (qstr), (goal), (owner), \

Please use "NULL" instead of "0" for pointers.

Cheers, Andreas

2013-02-11 16:16:34

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH 11/12] ext4: fix the number of credits needed for acl ops with inline data

On 2013-02-10, at 12:43, Theodore Ts'o <[email protected]> wrote:

> On Sun, Feb 10, 2013 at 10:15:14AM -0800, Shentino wrote:
>> Quick question from a n00b.
>>
>> What are "credits"?
>>
>> I imagine I'm not the only onlooker curious about it either.

It would be great to add this description to the ext4 wiki, since it explains transaction handles quite clearly.

Cheers, Andreas

> A group of operations which need to be applied to the file system as
> an atomic set of block updates are grouped by handles. This is very
> to "BEGIN TRANSACTION" and "END TRANSACTION" pairs in SQL. For example:
>
> BEGIN TRANSACTION
> INSERT INTO INODE_TABLE (ino, i_blocks, i_size, i_uid) VALUES
> (11, 2, 1024, 12);
> UPDATE INODE_ALLOCATION_BITMAP SET in_use=TRUE where ino=11;
> END TRANSACTION
>
> The reason why we use the terminology "starting a handle" instead of
> "starting a transaction" is because a large number of handles are
> bundled together to form a jbd2 transaction. This is because commits
> are expensive, and for performance reasons we batch a large number of
> handles, or perhaps what you might call "micro-transactions" into a
> one large jbd2 transaction, which in general gets commited once every
> five seconds or so, unless a commit is explicitly requested, or we
> start running out of space in the journal.
>
> In order to make sure that we don't run out of space, before we start
> a handle, we have to declare how the maximum number of blocks we are
> planning on modifying --- this reservation of space in the journal is
> also referred as journal credits, since each time we call
> ext4_journal_get_write_access(), if that block is not already part of
> the transaction, the number of credits associated with the handle will
> be decremented by one.
>
> When we open a handle, if there is not sufficient space (credits) left
> in the current transaction, then start_this_handle() will request that
> the current transaction be committed, and then block until the a new
> transaction can be started.
>
> When we are done, we close the handle. If it turns out that we didn't
> need to modify as many blocks as we had declared when we opened the
> handle, or if some of the blocks that we modified were already part of
> the transaction, those excess credits are returned to the transaction.
>
> So as an example, suppose we need to modify two metadata blocks
> associated with an inode. That sequence might look like this:
>
> /* start a handle with two credits */
> /* note: this can trigger a journal commit if there
> is not enough space left in the journal */
> handle = ext4_journal_start(inode, 2);
>
> ext4_journal_get_write_access(handle, bh);
> <modify the metadata block found in bh>
> ext4_handle_dirty_metadata(handle, inode, bh)
>
> ext4_journal_get_write_access(handle, bh2);
> <modify the metadata block found in bh2>
> ext4_handle_dirty_metadata(handle, inode, bh2)
>
> /* note: this can trigger a journal commit if the handle
> is marked synchronous, or if enough time has elapsed --- i.e.,
> the 5 second commit timer. */
> ext4_journal_stop(handle);
>
> In the case where we are running in no journal mode,
> ext4_journal_start(), ext4_journal_get_write_access(), and
> ext4_journal_stop() are effectively no-ops, and
> ext4_handle_dirty_metadata() is mapped to mark_buffer_dirty_inode(bh,
> inode), or mark_buffer_dirty(bh) if inode is NULL. (inode is NULL if
> we are modifying metadata blocks belonging to a specific inode, such
> as a block group descriptor block or an inode table block.)
>
> Hope this helps / is interesting to those who are following along at
> home. :-)
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-02-11 16:16:38

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 05/12] ext4: pass context information to jbd2__journal_start()

On Sat 09-02-13 16:53:45, Ted Tso wrote:
> So we can better understand what bits of ext4 are responsible for
> long-running jbd2 handles, use jbd2__journal_start() so we can pass
> context information for logging purposes.
>
> The recommended way for finding the longer-running handles is:
>
> T=/sys/kernel/debug/tracing
> EVENT=$T/events/jbd2/jbd2_handle_stats
> echo "interval > 5" > $EVENT/filter
> echo 1 > $EVENT/enable
>
> ./run-my-fs-benchmark
>
> cat $T/trace > /tmp/problem-handles
>
> This will list handles that were active for longer than 20ms. Having
> longer-running handles is bad, because a commit started at the wrong
> time could stall for those 20+ milliseconds, which could delay an
> fsync() or an O_SYNC operation. Here is an example line from the
> trace file describing a handle which lived on for 311 jiffies, or over
> 1.2 seconds:
>
> postmark-2917 [000] .... 196.435786: jbd2_handle_stats: dev 254,32
> tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
> dirtied_blocks 0
Nice! I'm just wondering - won't it be easier if we just stored a pointer
to __FILE__ in the handle? We wouldn't have to define those transaction
types and think which one to use when coding. Also checking the trace point
output would be simpler. I guess using __FILE__ will increase kernel size
due to additional string constants as gcc isn't clever enough to merge
identical strings. But we could workaround that defining in every ext4
file:

static char *JBD2_FILE = __FILE__;

and then use JBD2_FILE in the macros instead of __FILE__.

Honza

> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> fs/ext4/acl.c | 5 +++--
> fs/ext4/ext4_jbd2.c | 5 +++--
> fs/ext4/ext4_jbd2.h | 31 ++++++++++++++++++++++++++++---
> fs/ext4/extents.c | 11 ++++++-----
> fs/ext4/file.c | 2 +-
> fs/ext4/ialloc.c | 2 +-
> fs/ext4/indirect.c | 7 ++++---
> fs/ext4/inline.c | 10 +++++-----
> fs/ext4/inode.c | 33 ++++++++++++++++++++-------------
> fs/ext4/ioctl.c | 4 ++--
> fs/ext4/migrate.c | 4 ++--
> fs/ext4/move_extent.c | 2 +-
> fs/ext4/namei.c | 42 ++++++++++++++++++++++++------------------
> fs/ext4/resize.c | 10 +++++-----
> fs/ext4/super.c | 10 +++++-----
> fs/ext4/xattr.c | 2 +-
> 16 files changed, 111 insertions(+), 69 deletions(-)
>
> diff --git a/fs/ext4/acl.c b/fs/ext4/acl.c
> index e6e0d98..406cf8b 100644
> --- a/fs/ext4/acl.c
> +++ b/fs/ext4/acl.c
> @@ -324,7 +324,7 @@ ext4_acl_chmod(struct inode *inode)
> if (error)
> return error;
> retry:
> - handle = ext4_journal_start(inode,
> + handle = ext4_journal_start(inode, EXT4_HT_XATTR,
> EXT4_DATA_TRANS_BLOCKS(inode->i_sb));
> if (IS_ERR(handle)) {
> error = PTR_ERR(handle);
> @@ -422,7 +422,8 @@ ext4_xattr_set_acl(struct dentry *dentry, const char *name, const void *value,
> acl = NULL;
>
> retry:
> - handle = ext4_journal_start(inode, EXT4_DATA_TRANS_BLOCKS(inode->i_sb));
> + handle = ext4_journal_start(inode, EXT4_HT_XATTR,
> + EXT4_DATA_TRANS_BLOCKS(inode->i_sb));
> if (IS_ERR(handle)) {
> error = PTR_ERR(handle);
> goto release_and_out;
> diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
> index 6f61145..7058975 100644
> --- a/fs/ext4/ext4_jbd2.c
> +++ b/fs/ext4/ext4_jbd2.c
> @@ -38,7 +38,8 @@ static void ext4_put_nojournal(handle_t *handle)
> /*
> * Wrappers for jbd2_journal_start/end.
> */
> -handle_t *ext4_journal_start_sb(struct super_block *sb, int nblocks)
> +handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line,
> + int type, int nblocks)
> {
> journal_t *journal;
>
> @@ -59,7 +60,7 @@ handle_t *ext4_journal_start_sb(struct super_block *sb, int nblocks)
> ext4_abort(sb, "Detected aborted journal");
> return ERR_PTR(-EROFS);
> }
> - return jbd2_journal_start(journal, nblocks);
> + return jbd2__journal_start(journal, nblocks, GFP_NOFS, type, line);
> }
>
> int __ext4_journal_stop(const char *where, unsigned int line, handle_t *handle)
> diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
> index 7177f9b..302814b 100644
> --- a/fs/ext4/ext4_jbd2.h
> +++ b/fs/ext4/ext4_jbd2.h
> @@ -110,6 +110,22 @@
> #define EXT4_MAXQUOTAS_INIT_BLOCKS(sb) (MAXQUOTAS*EXT4_QUOTA_INIT_BLOCKS(sb))
> #define EXT4_MAXQUOTAS_DEL_BLOCKS(sb) (MAXQUOTAS*EXT4_QUOTA_DEL_BLOCKS(sb))
>
> +/*
> + * Ext4 handle operation types -- for logging purposes
> + */
> +#define EXT4_HT_MISC 0
> +#define EXT4_HT_INODE 1
> +#define EXT4_HT_WRITE_PAGE 2
> +#define EXT4_HT_MAP_BLOCKS 3
> +#define EXT4_HT_DIR 4
> +#define EXT4_HT_TRUNCATE 5
> +#define EXT4_HT_QUOTA 6
> +#define EXT4_HT_RESIZE 7
> +#define EXT4_HT_MIGRATE 8
> +#define EXT4_HT_MOVE_EXTENTS 9
> +#define EXT4_HT_XATTR 10
> +#define EXT4_HT_MAX 11
> +
> /**
> * struct ext4_journal_cb_entry - Base structure for callback information.
> *
> @@ -234,7 +250,8 @@ int __ext4_handle_dirty_super(const char *where, unsigned int line,
> #define ext4_handle_dirty_super(handle, sb) \
> __ext4_handle_dirty_super(__func__, __LINE__, (handle), (sb))
>
> -handle_t *ext4_journal_start_sb(struct super_block *sb, int nblocks);
> +handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line,
> + int type, int nblocks);
> int __ext4_journal_stop(const char *where, unsigned int line, handle_t *handle);
>
> #define EXT4_NOJOURNAL_MAX_REF_COUNT ((unsigned long) 4096)
> @@ -268,9 +285,17 @@ static inline int ext4_handle_has_enough_credits(handle_t *handle, int needed)
> return 1;
> }
>
> -static inline handle_t *ext4_journal_start(struct inode *inode, int nblocks)
> +#define ext4_journal_start_sb(sb, type, nblocks) \
> + __ext4_journal_start_sb((sb), __LINE__, (type), (nblocks))
> +
> +#define ext4_journal_start(inode, type, nblocks) \
> + __ext4_journal_start((inode), __LINE__, (type), (nblocks))
> +
> +static inline handle_t *__ext4_journal_start(struct inode *inode,
> + unsigned int line, int type,
> + int nblocks)
> {
> - return ext4_journal_start_sb(inode->i_sb, nblocks);
> + return __ext4_journal_start_sb(inode->i_sb, line, type, nblocks);
> }
>
> #define ext4_journal_stop(handle) \
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index e95acb0..d3a7e1c 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -2545,7 +2545,7 @@ static int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start,
> ext_debug("truncate since %u to %u\n", start, end);
>
> /* probably first extent we're gonna free will be last in block */
> - handle = ext4_journal_start(inode, depth + 1);
> + handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, depth + 1);
> if (IS_ERR(handle))
> return PTR_ERR(handle);
>
> @@ -4129,7 +4129,7 @@ void ext4_ext_truncate(struct inode *inode)
> * probably first extent we're gonna free will be last in block
> */
> err = ext4_writepage_trans_blocks(inode);
> - handle = ext4_journal_start(inode, err);
> + handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, err);
> if (IS_ERR(handle))
> return;
>
> @@ -4295,7 +4295,8 @@ retry:
> while (ret >= 0 && ret < max_blocks) {
> map.m_lblk = map.m_lblk + ret;
> map.m_len = max_blocks = max_blocks - ret;
> - handle = ext4_journal_start(inode, credits);
> + handle = ext4_journal_start(inode, EXT4_HT_MAP_BLOCKS,
> + credits);
> if (IS_ERR(handle)) {
> ret = PTR_ERR(handle);
> break;
> @@ -4373,7 +4374,7 @@ int ext4_convert_unwritten_extents(struct inode *inode, loff_t offset,
> while (ret >= 0 && ret < max_blocks) {
> map.m_lblk += ret;
> map.m_len = (max_blocks -= ret);
> - handle = ext4_journal_start(inode, credits);
> + handle = ext4_journal_start(inode, EXT4_HT_MAP_BLOCKS, credits);
> if (IS_ERR(handle)) {
> ret = PTR_ERR(handle);
> break;
> @@ -4554,7 +4555,7 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
> inode_dio_wait(inode);
>
> credits = ext4_writepage_trans_blocks(inode);
> - handle = ext4_journal_start(inode, credits);
> + handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, credits);
> if (IS_ERR(handle)) {
> err = PTR_ERR(handle);
> goto out_dio;
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index afaf9f15..b41d92c 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -240,7 +240,7 @@ static int ext4_file_open(struct inode * inode, struct file * filp)
> handle_t *handle;
> int err;
>
> - handle = ext4_journal_start_sb(sb, 1);
> + handle = ext4_journal_start_sb(sb, EXT4_HT_MISC, 1);
> if (IS_ERR(handle))
> return PTR_ERR(handle);
> err = ext4_journal_get_write_access(handle, sbi->s_sbh);
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index 3f32c80..10bd6fe 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -1137,7 +1137,7 @@ int ext4_init_inode_table(struct super_block *sb, ext4_group_t group,
> if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_ZEROED))
> goto out;
>
> - handle = ext4_journal_start_sb(sb, 1);
> + handle = ext4_journal_start_sb(sb, EXT4_HT_MISC, 1);
> if (IS_ERR(handle)) {
> ret = PTR_ERR(handle);
> goto out;
> diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
> index 1932810..c541ab8 100644
> --- a/fs/ext4/indirect.c
> +++ b/fs/ext4/indirect.c
> @@ -791,7 +791,7 @@ ssize_t ext4_ind_direct_IO(int rw, struct kiocb *iocb,
>
> if (final_size > inode->i_size) {
> /* Credits for sb + inode write */
> - handle = ext4_journal_start(inode, 2);
> + handle = ext4_journal_start(inode, EXT4_HT_INODE, 2);
> if (IS_ERR(handle)) {
> ret = PTR_ERR(handle);
> goto out;
> @@ -851,7 +851,7 @@ locked:
> int err;
>
> /* Credits for sb + inode write */
> - handle = ext4_journal_start(inode, 2);
> + handle = ext4_journal_start(inode, EXT4_HT_INODE, 2);
> if (IS_ERR(handle)) {
> /* This is really bad luck. We've written the data
> * but cannot extend i_size. Bail out and pretend
> @@ -950,7 +950,8 @@ static handle_t *start_transaction(struct inode *inode)
> {
> handle_t *result;
>
> - result = ext4_journal_start(inode, ext4_blocks_for_truncate(inode));
> + result = ext4_journal_start(inode, EXT4_HT_TRUNCATE,
> + ext4_blocks_for_truncate(inode));
> if (!IS_ERR(result))
> return result;
>
> diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
> index 93a3408..bc5f871 100644
> --- a/fs/ext4/inline.c
> +++ b/fs/ext4/inline.c
> @@ -545,7 +545,7 @@ static int ext4_convert_inline_data_to_extent(struct address_space *mapping,
> return ret;
>
> retry:
> - handle = ext4_journal_start(inode, needed_blocks);
> + handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE, needed_blocks);
> if (IS_ERR(handle)) {
> ret = PTR_ERR(handle);
> handle = NULL;
> @@ -657,7 +657,7 @@ int ext4_try_to_write_inline_data(struct address_space *mapping,
> * The possible write could happen in the inode,
> * so try to reserve the space in inode first.
> */
> - handle = ext4_journal_start(inode, 1);
> + handle = ext4_journal_start(inode, EXT4_HT_INODE, 1);
> if (IS_ERR(handle)) {
> ret = PTR_ERR(handle);
> handle = NULL;
> @@ -853,7 +853,7 @@ int ext4_da_write_inline_data_begin(struct address_space *mapping,
> if (ret)
> return ret;
>
> - handle = ext4_journal_start(inode, 1);
> + handle = ext4_journal_start(inode, EXT4_HT_INODE, 1);
> if (IS_ERR(handle)) {
> ret = PTR_ERR(handle);
> handle = NULL;
> @@ -1770,7 +1770,7 @@ void ext4_inline_data_truncate(struct inode *inode, int *has_inline)
>
>
> needed_blocks = ext4_writepage_trans_blocks(inode);
> - handle = ext4_journal_start(inode, needed_blocks);
> + handle = ext4_journal_start(inode, EXT4_HT_INODE, needed_blocks);
> if (IS_ERR(handle))
> return;
>
> @@ -1862,7 +1862,7 @@ int ext4_convert_inline_data(struct inode *inode)
> if (error)
> return error;
>
> - handle = ext4_journal_start(inode, needed_blocks);
> + handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE, needed_blocks);
> if (IS_ERR(handle)) {
> error = PTR_ERR(handle);
> goto out_free;
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index fb1907d..c3c47e2 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -234,7 +234,8 @@ void ext4_evict_inode(struct inode *inode)
> * protection against it
> */
> sb_start_intwrite(inode->i_sb);
> - handle = ext4_journal_start(inode, ext4_blocks_for_truncate(inode)+3);
> + handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE,
> + ext4_blocks_for_truncate(inode)+3);
> if (IS_ERR(handle)) {
> ext4_std_error(inode->i_sb, PTR_ERR(handle));
> /*
> @@ -687,7 +688,8 @@ static int _ext4_get_block(struct inode *inode, sector_t iblock,
> if (map.m_len > DIO_MAX_BLOCKS)
> map.m_len = DIO_MAX_BLOCKS;
> dio_credits = ext4_chunk_trans_blocks(inode, map.m_len);
> - handle = ext4_journal_start(inode, dio_credits);
> + handle = ext4_journal_start(inode, EXT4_HT_MAP_BLOCKS,
> + dio_credits);
> if (IS_ERR(handle)) {
> ret = PTR_ERR(handle);
> return ret;
> @@ -912,7 +914,7 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
> }
>
> retry:
> - handle = ext4_journal_start(inode, needed_blocks);
> + handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE, needed_blocks);
> if (IS_ERR(handle)) {
> ret = PTR_ERR(handle);
> goto out;
> @@ -1947,7 +1949,8 @@ static int __ext4_journalled_writepage(struct page *page,
> * references to buffers so we are safe */
> unlock_page(page);
>
> - handle = ext4_journal_start(inode, ext4_writepage_trans_blocks(inode));
> + handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE,
> + ext4_writepage_trans_blocks(inode));
> if (IS_ERR(handle)) {
> ret = PTR_ERR(handle);
> goto out;
> @@ -2378,7 +2381,8 @@ retry:
> needed_blocks = ext4_da_writepages_trans_blocks(inode);
>
> /* start a new transaction*/
> - handle = ext4_journal_start(inode, needed_blocks);
> + handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE,
> + needed_blocks);
> if (IS_ERR(handle)) {
> ret = PTR_ERR(handle);
> ext4_msg(inode->i_sb, KERN_CRIT, "%s: jbd2_start: "
> @@ -2534,7 +2538,7 @@ retry:
> * to journalling the i_disksize update if writes to the end
> * of file which has an already mapped buffer.
> */
> - handle = ext4_journal_start(inode, 1);
> + handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE, 1);
> if (IS_ERR(handle)) {
> ret = PTR_ERR(handle);
> goto out;
> @@ -4281,8 +4285,9 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
>
> /* (user+group)*(old+new) structure, inode write (sb,
> * inode block, ? - but truncate inode update has it) */
> - handle = ext4_journal_start(inode, (EXT4_MAXQUOTAS_INIT_BLOCKS(inode->i_sb)+
> - EXT4_MAXQUOTAS_DEL_BLOCKS(inode->i_sb))+3);
> + handle = ext4_journal_start(inode, EXT4_HT_QUOTA,
> + (EXT4_MAXQUOTAS_INIT_BLOCKS(inode->i_sb) +
> + EXT4_MAXQUOTAS_DEL_BLOCKS(inode->i_sb)) + 3);
> if (IS_ERR(handle)) {
> error = PTR_ERR(handle);
> goto err_out;
> @@ -4317,7 +4322,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
> (attr->ia_size < inode->i_size)) {
> handle_t *handle;
>
> - handle = ext4_journal_start(inode, 3);
> + handle = ext4_journal_start(inode, EXT4_HT_INODE, 3);
> if (IS_ERR(handle)) {
> error = PTR_ERR(handle);
> goto err_out;
> @@ -4337,7 +4342,8 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
> attr->ia_size);
> if (error) {
> /* Do as much error cleanup as possible */
> - handle = ext4_journal_start(inode, 3);
> + handle = ext4_journal_start(inode,
> + EXT4_HT_INODE, 3);
> if (IS_ERR(handle)) {
> ext4_orphan_del(NULL, inode);
> goto err_out;
> @@ -4678,7 +4684,7 @@ void ext4_dirty_inode(struct inode *inode, int flags)
> {
> handle_t *handle;
>
> - handle = ext4_journal_start(inode, 2);
> + handle = ext4_journal_start(inode, EXT4_HT_INODE, 2);
> if (IS_ERR(handle))
> goto out;
>
> @@ -4779,7 +4785,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
>
> /* Finally we can mark the inode as dirty. */
>
> - handle = ext4_journal_start(inode, 1);
> + handle = ext4_journal_start(inode, EXT4_HT_INODE, 1);
> if (IS_ERR(handle))
> return PTR_ERR(handle);
>
> @@ -4857,7 +4863,8 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
> else
> get_block = ext4_get_block;
> retry_alloc:
> - handle = ext4_journal_start(inode, ext4_writepage_trans_blocks(inode));
> + handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE,
> + ext4_writepage_trans_blocks(inode));
> if (IS_ERR(handle)) {
> ret = VM_FAULT_SIGBUS;
> goto out;
> diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
> index 4784ac2..31f4f56 100644
> --- a/fs/ext4/ioctl.c
> +++ b/fs/ext4/ioctl.c
> @@ -104,7 +104,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> } else if (oldflags & EXT4_EOFBLOCKS_FL)
> ext4_truncate(inode);
>
> - handle = ext4_journal_start(inode, 1);
> + handle = ext4_journal_start(inode, EXT4_HT_INODE, 1);
> if (IS_ERR(handle)) {
> err = PTR_ERR(handle);
> goto flags_out;
> @@ -173,7 +173,7 @@ flags_out:
> }
>
> mutex_lock(&inode->i_mutex);
> - handle = ext4_journal_start(inode, 1);
> + handle = ext4_journal_start(inode, EXT4_HT_INODE, 1);
> if (IS_ERR(handle)) {
> err = PTR_ERR(handle);
> goto unlock_out;
> diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
> index db8226d..4e4fcfd 100644
> --- a/fs/ext4/migrate.c
> +++ b/fs/ext4/migrate.c
> @@ -456,7 +456,7 @@ int ext4_ext_migrate(struct inode *inode)
> */
> return retval;
>
> - handle = ext4_journal_start(inode,
> + handle = ext4_journal_start(inode, EXT4_HT_MIGRATE,
> EXT4_DATA_TRANS_BLOCKS(inode->i_sb) +
> EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
> EXT4_MAXQUOTAS_INIT_BLOCKS(inode->i_sb)
> @@ -507,7 +507,7 @@ int ext4_ext_migrate(struct inode *inode)
> ext4_set_inode_state(inode, EXT4_STATE_EXT_MIGRATE);
> up_read((&EXT4_I(inode)->i_data_sem));
>
> - handle = ext4_journal_start(inode, 1);
> + handle = ext4_journal_start(inode, EXT4_HT_MIGRATE, 1);
> if (IS_ERR(handle)) {
> /*
> * It is impossible to update on-disk structures without
> diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
> index cd90b7e..d78c33e 100644
> --- a/fs/ext4/move_extent.c
> +++ b/fs/ext4/move_extent.c
> @@ -920,7 +920,7 @@ move_extent_per_page(struct file *o_filp, struct inode *donor_inode,
> again:
> *err = 0;
> jblocks = ext4_writepage_trans_blocks(orig_inode) * 2;
> - handle = ext4_journal_start(orig_inode, jblocks);
> + handle = ext4_journal_start(orig_inode, EXT4_HT_MOVE_EXTENTS, jblocks);
> if (IS_ERR(handle)) {
> *err = PTR_ERR(handle);
> return 0;
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 34ed624..5184103 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -2262,9 +2262,10 @@ static int ext4_create(struct inode *dir, struct dentry *dentry, umode_t mode,
> dquot_initialize(dir);
>
> retry:
> - handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
> - EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
> - EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
> + handle = ext4_journal_start(dir, EXT4_HT_DIR,
> + (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
> + EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
> + EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb)));
> if (IS_ERR(handle))
> return PTR_ERR(handle);
>
> @@ -2298,9 +2299,10 @@ static int ext4_mknod(struct inode *dir, struct dentry *dentry,
> dquot_initialize(dir);
>
> retry:
> - handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
> - EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
> - EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
> + handle = ext4_journal_start(dir, EXT4_HT_DIR,
> + (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
> + EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
> + EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb)));
> if (IS_ERR(handle))
> return PTR_ERR(handle);
>
> @@ -2414,9 +2416,10 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
> dquot_initialize(dir);
>
> retry:
> - handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
> - EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
> - EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
> + handle = ext4_journal_start(dir, EXT4_HT_DIR,
> + (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
> + EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
> + EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb)));
> if (IS_ERR(handle))
> return PTR_ERR(handle);
>
> @@ -2729,7 +2732,8 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry)
> dquot_initialize(dir);
> dquot_initialize(dentry->d_inode);
>
> - handle = ext4_journal_start(dir, EXT4_DELETE_TRANS_BLOCKS(dir->i_sb));
> + handle = ext4_journal_start(dir, EXT4_HT_DIR,
> + EXT4_DELETE_TRANS_BLOCKS(dir->i_sb));
> if (IS_ERR(handle))
> return PTR_ERR(handle);
>
> @@ -2791,7 +2795,8 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
> dquot_initialize(dir);
> dquot_initialize(dentry->d_inode);
>
> - handle = ext4_journal_start(dir, EXT4_DELETE_TRANS_BLOCKS(dir->i_sb));
> + handle = ext4_journal_start(dir, EXT4_HT_DIR,
> + EXT4_DELETE_TRANS_BLOCKS(dir->i_sb));
> if (IS_ERR(handle))
> return PTR_ERR(handle);
>
> @@ -2870,7 +2875,7 @@ static int ext4_symlink(struct inode *dir,
> EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb);
> }
> retry:
> - handle = ext4_journal_start(dir, credits);
> + handle = ext4_journal_start(dir, EXT4_HT_DIR, credits);
> if (IS_ERR(handle))
> return PTR_ERR(handle);
>
> @@ -2908,7 +2913,7 @@ retry:
> * Now inode is being linked into dir (EXT4_DATA_TRANS_BLOCKS
> * + EXT4_INDEX_EXTRA_TRANS_BLOCKS), inode is also modified
> */
> - handle = ext4_journal_start(dir,
> + handle = ext4_journal_start(dir, EXT4_HT_DIR,
> EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
> EXT4_INDEX_EXTRA_TRANS_BLOCKS + 1);
> if (IS_ERR(handle)) {
> @@ -2955,8 +2960,9 @@ static int ext4_link(struct dentry *old_dentry,
> dquot_initialize(dir);
>
> retry:
> - handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
> - EXT4_INDEX_EXTRA_TRANS_BLOCKS);
> + handle = ext4_journal_start(dir, EXT4_HT_DIR,
> + (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
> + EXT4_INDEX_EXTRA_TRANS_BLOCKS));
> if (IS_ERR(handle))
> return PTR_ERR(handle);
>
> @@ -3039,9 +3045,9 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
> * in separate transaction */
> if (new_dentry->d_inode)
> dquot_initialize(new_dentry->d_inode);
> - handle = ext4_journal_start(old_dir, 2 *
> - EXT4_DATA_TRANS_BLOCKS(old_dir->i_sb) +
> - EXT4_INDEX_EXTRA_TRANS_BLOCKS + 2);
> + handle = ext4_journal_start(old_dir, EXT4_HT_DIR,
> + (2 * EXT4_DATA_TRANS_BLOCKS(old_dir->i_sb) +
> + EXT4_INDEX_EXTRA_TRANS_BLOCKS + 2));
> if (IS_ERR(handle))
> return PTR_ERR(handle);
>
> diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
> index 8eefb63..c7f4d75 100644
> --- a/fs/ext4/resize.c
> +++ b/fs/ext4/resize.c
> @@ -466,7 +466,7 @@ static int setup_new_flex_group_blocks(struct super_block *sb,
> meta_bg = EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_META_BG);
>
> /* This transaction may be extended/restarted along the way */
> - handle = ext4_journal_start_sb(sb, EXT4_MAX_TRANS_DATA);
> + handle = ext4_journal_start_sb(sb, EXT4_HT_RESIZE, EXT4_MAX_TRANS_DATA);
> if (IS_ERR(handle))
> return PTR_ERR(handle);
>
> @@ -1031,7 +1031,7 @@ static void update_backups(struct super_block *sb, int blk_off, char *data,
> handle_t *handle;
> int err = 0, err2;
>
> - handle = ext4_journal_start_sb(sb, EXT4_MAX_TRANS_DATA);
> + handle = ext4_journal_start_sb(sb, EXT4_HT_RESIZE, EXT4_MAX_TRANS_DATA);
> if (IS_ERR(handle)) {
> group = 1;
> err = PTR_ERR(handle);
> @@ -1412,7 +1412,7 @@ static int ext4_flex_group_add(struct super_block *sb,
> * modify each of the reserved GDT dindirect blocks.
> */
> credit = flex_gd->count * 4 + reserved_gdb;
> - handle = ext4_journal_start_sb(sb, credit);
> + handle = ext4_journal_start_sb(sb, EXT4_HT_RESIZE, credit);
> if (IS_ERR(handle)) {
> err = PTR_ERR(handle);
> goto exit;
> @@ -1624,7 +1624,7 @@ static int ext4_group_extend_no_check(struct super_block *sb,
> /* We will update the superblock, one block bitmap, and
> * one group descriptor via ext4_group_add_blocks().
> */
> - handle = ext4_journal_start_sb(sb, 3);
> + handle = ext4_journal_start_sb(sb, EXT4_HT_RESIZE, 3);
> if (IS_ERR(handle)) {
> err = PTR_ERR(handle);
> ext4_warning(sb, "error %d on journal start", err);
> @@ -1788,7 +1788,7 @@ static int ext4_convert_meta_bg(struct super_block *sb, struct inode *inode)
> credits += 3; /* block bitmap, bg descriptor, resize inode */
> }
>
> - handle = ext4_journal_start_sb(sb, credits);
> + handle = ext4_journal_start_sb(sb, EXT4_HT_RESIZE, credits);
> if (IS_ERR(handle))
> return PTR_ERR(handle);
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 5812f29..373d46c 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -4768,7 +4768,7 @@ static int ext4_write_dquot(struct dquot *dquot)
> struct inode *inode;
>
> inode = dquot_to_inode(dquot);
> - handle = ext4_journal_start(inode,
> + handle = ext4_journal_start(inode, EXT4_HT_QUOTA,
> EXT4_QUOTA_TRANS_BLOCKS(dquot->dq_sb));
> if (IS_ERR(handle))
> return PTR_ERR(handle);
> @@ -4784,7 +4784,7 @@ static int ext4_acquire_dquot(struct dquot *dquot)
> int ret, err;
> handle_t *handle;
>
> - handle = ext4_journal_start(dquot_to_inode(dquot),
> + handle = ext4_journal_start(dquot_to_inode(dquot), EXT4_HT_QUOTA,
> EXT4_QUOTA_INIT_BLOCKS(dquot->dq_sb));
> if (IS_ERR(handle))
> return PTR_ERR(handle);
> @@ -4800,7 +4800,7 @@ static int ext4_release_dquot(struct dquot *dquot)
> int ret, err;
> handle_t *handle;
>
> - handle = ext4_journal_start(dquot_to_inode(dquot),
> + handle = ext4_journal_start(dquot_to_inode(dquot), EXT4_HT_QUOTA,
> EXT4_QUOTA_DEL_BLOCKS(dquot->dq_sb));
> if (IS_ERR(handle)) {
> /* Release dquot anyway to avoid endless cycle in dqput() */
> @@ -4832,7 +4832,7 @@ static int ext4_write_info(struct super_block *sb, int type)
> handle_t *handle;
>
> /* Data block + inode block */
> - handle = ext4_journal_start(sb->s_root->d_inode, 2);
> + handle = ext4_journal_start(sb->s_root->d_inode, EXT4_HT_QUOTA, 2);
> if (IS_ERR(handle))
> return PTR_ERR(handle);
> ret = dquot_commit_info(sb, type);
> @@ -4978,7 +4978,7 @@ static int ext4_quota_off(struct super_block *sb, int type)
>
> /* Update modification times of quota files when userspace can
> * start looking at them */
> - handle = ext4_journal_start(inode, 1);
> + handle = ext4_journal_start(inode, EXT4_HT_QUOTA, 1);
> if (IS_ERR(handle))
> goto out;
> inode->i_mtime = inode->i_ctime = CURRENT_TIME;
> diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
> index c68990c..2efc560 100644
> --- a/fs/ext4/xattr.c
> +++ b/fs/ext4/xattr.c
> @@ -1175,7 +1175,7 @@ retry:
> if (ext4_has_inline_data(inode))
> credits += ext4_writepage_trans_blocks(inode) + 1;
>
> - handle = ext4_journal_start(inode, credits);
> + handle = ext4_journal_start(inode, EXT4_HT_XATTR, credits);
> if (IS_ERR(handle)) {
> error = PTR_ERR(handle);
> } else {
> --
> 1.7.12.rc0.22.gcdd159b
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-02-11 16:21:36

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 07/12] ext4: start handle at the last possible moment in ext4_unlink()

On Sat 09-02-13 16:53:47, Ted Tso wrote:
> Don't start the jbd2 transaction handle until after the directory
> entry has been found, to minimize the amount of time that a handle is
> held active.
Looks good. You can add:
Reviewed-by: Jan Kara <[email protected]>

Honza
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> fs/ext4/namei.c | 24 ++++++++++++++----------
> 1 file changed, 14 insertions(+), 10 deletions(-)
>
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 5184103..4a27069 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -2787,7 +2787,7 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
> struct inode *inode;
> struct buffer_head *bh;
> struct ext4_dir_entry_2 *de;
> - handle_t *handle;
> + handle_t *handle = NULL;
>
> trace_ext4_unlink_enter(dir, dentry);
> /* Initialize quotas before so that eventual writes go
> @@ -2795,14 +2795,6 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
> dquot_initialize(dir);
> dquot_initialize(dentry->d_inode);
>
> - handle = ext4_journal_start(dir, EXT4_HT_DIR,
> - EXT4_DELETE_TRANS_BLOCKS(dir->i_sb));
> - if (IS_ERR(handle))
> - return PTR_ERR(handle);
> -
> - if (IS_DIRSYNC(dir))
> - ext4_handle_sync(handle);
> -
> retval = -ENOENT;
> bh = ext4_find_entry(dir, &dentry->d_name, &de, NULL);
> if (!bh)
> @@ -2814,6 +2806,17 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
> if (le32_to_cpu(de->inode) != inode->i_ino)
> goto end_unlink;
>
> + handle = ext4_journal_start(dir, EXT4_HT_DIR,
> + EXT4_DELETE_TRANS_BLOCKS(dir->i_sb));
> + if (IS_ERR(handle)) {
> + retval = PTR_ERR(handle);
> + handle = NULL;
> + goto end_unlink;
> + }
> +
> + if (IS_DIRSYNC(dir))
> + ext4_handle_sync(handle);
> +
> if (!inode->i_nlink) {
> ext4_warning(inode->i_sb,
> "Deleting nonexistent file (%lu), %d",
> @@ -2834,8 +2837,9 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
> retval = 0;
>
> end_unlink:
> - ext4_journal_stop(handle);
> brelse(bh);
> + if (handle)
> + ext4_journal_stop(handle);
> trace_ext4_unlink_exit(dentry, retval);
> return retval;
> }
> --
> 1.7.12.rc0.22.gcdd159b
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-02-11 16:22:57

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 08/12] ext4: start handle at the last possible moment in ext4_rmdir()

On Sat 09-02-13 16:53:48, Ted Tso wrote:
> Don't start the jbd2 transaction handle until after the directory
> entry has been found, to minimize the amount of time that a handle is
> held active.
Looks good. You can add:
Reviewed-by: Jan Kara <[email protected]>

Honza

>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> fs/ext4/namei.c | 24 ++++++++++++++----------
> 1 file changed, 14 insertions(+), 10 deletions(-)
>
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 4a27069..36a4afd 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -2725,26 +2725,18 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry)
> struct inode *inode;
> struct buffer_head *bh;
> struct ext4_dir_entry_2 *de;
> - handle_t *handle;
> + handle_t *handle = NULL;
>
> /* Initialize quotas before so that eventual writes go in
> * separate transaction */
> dquot_initialize(dir);
> dquot_initialize(dentry->d_inode);
>
> - handle = ext4_journal_start(dir, EXT4_HT_DIR,
> - EXT4_DELETE_TRANS_BLOCKS(dir->i_sb));
> - if (IS_ERR(handle))
> - return PTR_ERR(handle);
> -
> retval = -ENOENT;
> bh = ext4_find_entry(dir, &dentry->d_name, &de, NULL);
> if (!bh)
> goto end_rmdir;
>
> - if (IS_DIRSYNC(dir))
> - ext4_handle_sync(handle);
> -
> inode = dentry->d_inode;
>
> retval = -EIO;
> @@ -2755,6 +2747,17 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry)
> if (!empty_dir(inode))
> goto end_rmdir;
>
> + handle = ext4_journal_start(dir, EXT4_HT_DIR,
> + EXT4_DELETE_TRANS_BLOCKS(dir->i_sb));
> + if (IS_ERR(handle)) {
> + retval = PTR_ERR(handle);
> + handle = NULL;
> + goto end_rmdir;
> + }
> +
> + if (IS_DIRSYNC(dir))
> + ext4_handle_sync(handle);
> +
> retval = ext4_delete_entry(handle, dir, de, bh);
> if (retval)
> goto end_rmdir;
> @@ -2776,8 +2779,9 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry)
> ext4_mark_inode_dirty(handle, dir);
>
> end_rmdir:
> - ext4_journal_stop(handle);
> brelse(bh);
> + if (handle)
> + ext4_journal_stop(handle);
> return retval;
> }
>
> --
> 1.7.12.rc0.22.gcdd159b
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-02-11 16:26:40

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 09/12] ext4: fix the number of credits needed for ext4_ext_migrate()

On Sat 09-02-13 16:53:49, Ted Tso wrote:
> The migration ioctl creates a temporary inode. Since this inode is
> never linked to a directory, we don't need to reserve journal credits
> required for modifying the directory.
Looks good. You can add:
Reviewed-by: Jan Kara <[email protected]>

Honza

>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> fs/ext4/migrate.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
> index 4e4fcfd..480acf4 100644
> --- a/fs/ext4/migrate.c
> +++ b/fs/ext4/migrate.c
> @@ -456,11 +456,14 @@ int ext4_ext_migrate(struct inode *inode)
> */
> return retval;
>
> + /*
> + * Worst case we can touch the allocation bitmaps, a bgd
> + * block, and a block to link in the orphan list. We do need
> + * need to worry about credits for modifying the quota inode.
> + */
> handle = ext4_journal_start(inode, EXT4_HT_MIGRATE,
> - EXT4_DATA_TRANS_BLOCKS(inode->i_sb) +
> - EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
> - EXT4_MAXQUOTAS_INIT_BLOCKS(inode->i_sb)
> - + 1);
> + 4 + EXT4_MAXQUOTAS_TRANS_BLOCKS(inode->i_sb));
> +
> if (IS_ERR(handle)) {
> retval = PTR_ERR(handle);
> return retval;
> --
> 1.7.12.rc0.22.gcdd159b
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-02-11 16:28:32

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 10/12] ext4: fix the number of credits needed for ext4_unlink() and ext4_rmdir()

On Sat 09-02-13 16:53:50, Ted Tso wrote:
> The ext4_unlink() and ext4_rmdir() don't actually release the blocks
> associated with the file/directory. This gets done in a separate jbd2
> handle called via ext4_evict_inode(). Thus, we don't need to reserve
> lots of journal credits for the truncate.
>
> Note that using too many journal credits is non-optimal because it can
> leading to the journal transmit getting closed too early, before it is
> strictly necessary.
Nice spotting. The patch looks good. You can add:
Reviewed-by: Jan Kara <[email protected]>

Honza

>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> fs/ext4/ext4_jbd2.h | 6 ------
> fs/ext4/namei.c | 4 ++--
> 2 files changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
> index 302814b..c1fc2dc 100644
> --- a/fs/ext4/ext4_jbd2.h
> +++ b/fs/ext4/ext4_jbd2.h
> @@ -59,12 +59,6 @@
> #define EXT4_META_TRANS_BLOCKS(sb) (EXT4_XATTR_TRANS_BLOCKS + \
> EXT4_MAXQUOTAS_TRANS_BLOCKS(sb))
>
> -/* Delete operations potentially hit one directory's namespace plus an
> - * entire inode, plus arbitrary amounts of bitmap/indirection data. Be
> - * generous. We can grow the delete transaction later if necessary. */
> -
> -#define EXT4_DELETE_TRANS_BLOCKS(sb) (2 * EXT4_DATA_TRANS_BLOCKS(sb) + 64)
> -
> /* Define an arbitrary limit for the amount of data we will anticipate
> * writing to any given transaction. For unbounded transactions such as
> * write(2) and truncate(2) we can write more than this, but we always
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 36a4afd..5f3d2b5 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -2748,7 +2748,7 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry)
> goto end_rmdir;
>
> handle = ext4_journal_start(dir, EXT4_HT_DIR,
> - EXT4_DELETE_TRANS_BLOCKS(dir->i_sb));
> + EXT4_DATA_TRANS_BLOCKS(dir->i_sb));
> if (IS_ERR(handle)) {
> retval = PTR_ERR(handle);
> handle = NULL;
> @@ -2811,7 +2811,7 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
> goto end_unlink;
>
> handle = ext4_journal_start(dir, EXT4_HT_DIR,
> - EXT4_DELETE_TRANS_BLOCKS(dir->i_sb));
> + EXT4_DATA_TRANS_BLOCKS(dir->i_sb));
> if (IS_ERR(handle)) {
> retval = PTR_ERR(handle);
> handle = NULL;
> --
> 1.7.12.rc0.22.gcdd159b
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-02-11 16:30:49

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 11/12] ext4: fix the number of credits needed for acl ops with inline data

On Sat 09-02-13 16:53:51, Ted Tso wrote:
> Operations which modify extended attributes may need extra journal
> credits if inline data is used, since there is a chance that some
> extended attributes may need to get pushed to an external attribute
> block.
>
> Changes to reflect this was made in xattr.c, but they were missed in
> fs/ext4/acl.c. To fix this, abstract the calculation of the number of
> credits needed for xattr operations to an inline function defined in
> ext4_jbd2.h, and use it in acl.c and xattr.c.
>
> Also move the function declarations used in inline.c from xattr.h
> (where they are non-obviously hidden, and caused problems since
> ext4_jbd2.h needs to use the function ext4_has_inline_data), and move
> them to ext4.h.
Looks good. You can add:
Reviewed-by: Jan Kara <[email protected]>

Honza

>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> Cc: Tao Ma <[email protected]>
> ---
> fs/ext4/acl.c | 4 +--
> fs/ext4/ext4.h | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> fs/ext4/ext4_jbd2.h | 14 +++++++++++
> fs/ext4/xattr.c | 9 +------
> fs/ext4/xattr.h | 68 ---------------------------------------------------
> 5 files changed, 87 insertions(+), 78 deletions(-)
>
> diff --git a/fs/ext4/acl.c b/fs/ext4/acl.c
> index 406cf8b..39a54a0 100644
> --- a/fs/ext4/acl.c
> +++ b/fs/ext4/acl.c
> @@ -325,7 +325,7 @@ ext4_acl_chmod(struct inode *inode)
> return error;
> retry:
> handle = ext4_journal_start(inode, EXT4_HT_XATTR,
> - EXT4_DATA_TRANS_BLOCKS(inode->i_sb));
> + ext4_jbd2_credits_xattr(inode));
> if (IS_ERR(handle)) {
> error = PTR_ERR(handle);
> ext4_std_error(inode->i_sb, error);
> @@ -423,7 +423,7 @@ ext4_xattr_set_acl(struct dentry *dentry, const char *name, const void *value,
>
> retry:
> handle = ext4_journal_start(inode, EXT4_HT_XATTR,
> - EXT4_DATA_TRANS_BLOCKS(inode->i_sb));
> + ext4_jbd2_credits_xattr(inode));
> if (IS_ERR(handle)) {
> error = PTR_ERR(handle);
> goto release_and_out;
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index c2c64da..9897cdf 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -2443,6 +2443,75 @@ extern const struct file_operations ext4_file_operations;
> extern loff_t ext4_llseek(struct file *file, loff_t offset, int origin);
> extern void ext4_unwritten_wait(struct inode *inode);
>
> +/* inline.c */
> +extern int ext4_has_inline_data(struct inode *inode);
> +extern int ext4_get_inline_size(struct inode *inode);
> +extern int ext4_get_max_inline_size(struct inode *inode);
> +extern int ext4_find_inline_data_nolock(struct inode *inode);
> +extern void ext4_write_inline_data(struct inode *inode,
> + struct ext4_iloc *iloc,
> + void *buffer, loff_t pos,
> + unsigned int len);
> +extern int ext4_prepare_inline_data(handle_t *handle, struct inode *inode,
> + unsigned int len);
> +extern int ext4_init_inline_data(handle_t *handle, struct inode *inode,
> + unsigned int len);
> +extern int ext4_destroy_inline_data(handle_t *handle, struct inode *inode);
> +
> +extern int ext4_readpage_inline(struct inode *inode, struct page *page);
> +extern int ext4_try_to_write_inline_data(struct address_space *mapping,
> + struct inode *inode,
> + loff_t pos, unsigned len,
> + unsigned flags,
> + struct page **pagep);
> +extern int ext4_write_inline_data_end(struct inode *inode,
> + loff_t pos, unsigned len,
> + unsigned copied,
> + struct page *page);
> +extern struct buffer_head *
> +ext4_journalled_write_inline_data(struct inode *inode,
> + unsigned len,
> + struct page *page);
> +extern int ext4_da_write_inline_data_begin(struct address_space *mapping,
> + struct inode *inode,
> + loff_t pos, unsigned len,
> + unsigned flags,
> + struct page **pagep,
> + void **fsdata);
> +extern int ext4_da_write_inline_data_end(struct inode *inode, loff_t pos,
> + unsigned len, unsigned copied,
> + struct page *page);
> +extern int ext4_try_add_inline_entry(handle_t *handle, struct dentry *dentry,
> + struct inode *inode);
> +extern int ext4_try_create_inline_dir(handle_t *handle,
> + struct inode *parent,
> + struct inode *inode);
> +extern int ext4_read_inline_dir(struct file *filp,
> + void *dirent, filldir_t filldir,
> + int *has_inline_data);
> +extern struct buffer_head *ext4_find_inline_entry(struct inode *dir,
> + const struct qstr *d_name,
> + struct ext4_dir_entry_2 **res_dir,
> + int *has_inline_data);
> +extern int ext4_delete_inline_entry(handle_t *handle,
> + struct inode *dir,
> + struct ext4_dir_entry_2 *de_del,
> + struct buffer_head *bh,
> + int *has_inline_data);
> +extern int empty_inline_dir(struct inode *dir, int *has_inline_data);
> +extern struct buffer_head *ext4_get_first_inline_block(struct inode *inode,
> + struct ext4_dir_entry_2 **parent_de,
> + int *retval);
> +extern int ext4_inline_data_fiemap(struct inode *inode,
> + struct fiemap_extent_info *fieinfo,
> + int *has_inline);
> +extern int ext4_try_to_evict_inline_data(handle_t *handle,
> + struct inode *inode,
> + int needed);
> +extern void ext4_inline_data_truncate(struct inode *inode, int *has_inline);
> +
> +extern int ext4_convert_inline_data(struct inode *inode);
> +
> /* namei.c */
> extern const struct inode_operations ext4_dir_inode_operations;
> extern const struct inode_operations ext4_special_inode_operations;
> @@ -2539,6 +2608,7 @@ extern void ext4_mmp_csum_set(struct super_block *sb, struct mmp_struct *mmp);
> extern int ext4_mmp_csum_verify(struct super_block *sb,
> struct mmp_struct *mmp);
>
> +
> /* BH_Uninit flag: blocks are allocated but uninitialized on disk */
> enum ext4_state_bits {
> BH_Uninit /* blocks are allocated but uninitialized on disk */
> diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
> index c1fc2dc..4c216b1 100644
> --- a/fs/ext4/ext4_jbd2.h
> +++ b/fs/ext4/ext4_jbd2.h
> @@ -104,6 +104,20 @@
> #define EXT4_MAXQUOTAS_INIT_BLOCKS(sb) (MAXQUOTAS*EXT4_QUOTA_INIT_BLOCKS(sb))
> #define EXT4_MAXQUOTAS_DEL_BLOCKS(sb) (MAXQUOTAS*EXT4_QUOTA_DEL_BLOCKS(sb))
>
> +static inline int ext4_jbd2_credits_xattr(struct inode *inode)
> +{
> + int credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb);
> +
> + /*
> + * In case of inline data, we may push out the data to a block,
> + * so we need to reserve credits for this eventuality
> + */
> + if (ext4_has_inline_data(inode))
> + credits += ext4_writepage_trans_blocks(inode) + 1;
> + return credits;
> +}
> +
> +
> /*
> * Ext4 handle operation types -- for logging purposes
> */
> diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
> index 2efc560..cc31da0 100644
> --- a/fs/ext4/xattr.c
> +++ b/fs/ext4/xattr.c
> @@ -1165,16 +1165,9 @@ ext4_xattr_set(struct inode *inode, int name_index, const char *name,
> {
> handle_t *handle;
> int error, retries = 0;
> - int credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb);
> + int credits = ext4_jbd2_credits_xattr(inode);
>
> retry:
> - /*
> - * In case of inline data, we may push out the data to a block,
> - * So reserve the journal space first.
> - */
> - if (ext4_has_inline_data(inode))
> - credits += ext4_writepage_trans_blocks(inode) + 1;
> -
> handle = ext4_journal_start(inode, EXT4_HT_XATTR, credits);
> if (IS_ERR(handle)) {
> error = PTR_ERR(handle);
> diff --git a/fs/ext4/xattr.h b/fs/ext4/xattr.h
> index 69eda78..aa25deb 100644
> --- a/fs/ext4/xattr.h
> +++ b/fs/ext4/xattr.h
> @@ -125,74 +125,6 @@ extern int ext4_xattr_ibody_inline_set(handle_t *handle, struct inode *inode,
> struct ext4_xattr_info *i,
> struct ext4_xattr_ibody_find *is);
>
> -extern int ext4_has_inline_data(struct inode *inode);
> -extern int ext4_get_inline_size(struct inode *inode);
> -extern int ext4_get_max_inline_size(struct inode *inode);
> -extern int ext4_find_inline_data_nolock(struct inode *inode);
> -extern void ext4_write_inline_data(struct inode *inode,
> - struct ext4_iloc *iloc,
> - void *buffer, loff_t pos,
> - unsigned int len);
> -extern int ext4_prepare_inline_data(handle_t *handle, struct inode *inode,
> - unsigned int len);
> -extern int ext4_init_inline_data(handle_t *handle, struct inode *inode,
> - unsigned int len);
> -extern int ext4_destroy_inline_data(handle_t *handle, struct inode *inode);
> -
> -extern int ext4_readpage_inline(struct inode *inode, struct page *page);
> -extern int ext4_try_to_write_inline_data(struct address_space *mapping,
> - struct inode *inode,
> - loff_t pos, unsigned len,
> - unsigned flags,
> - struct page **pagep);
> -extern int ext4_write_inline_data_end(struct inode *inode,
> - loff_t pos, unsigned len,
> - unsigned copied,
> - struct page *page);
> -extern struct buffer_head *
> -ext4_journalled_write_inline_data(struct inode *inode,
> - unsigned len,
> - struct page *page);
> -extern int ext4_da_write_inline_data_begin(struct address_space *mapping,
> - struct inode *inode,
> - loff_t pos, unsigned len,
> - unsigned flags,
> - struct page **pagep,
> - void **fsdata);
> -extern int ext4_da_write_inline_data_end(struct inode *inode, loff_t pos,
> - unsigned len, unsigned copied,
> - struct page *page);
> -extern int ext4_try_add_inline_entry(handle_t *handle, struct dentry *dentry,
> - struct inode *inode);
> -extern int ext4_try_create_inline_dir(handle_t *handle,
> - struct inode *parent,
> - struct inode *inode);
> -extern int ext4_read_inline_dir(struct file *filp,
> - void *dirent, filldir_t filldir,
> - int *has_inline_data);
> -extern struct buffer_head *ext4_find_inline_entry(struct inode *dir,
> - const struct qstr *d_name,
> - struct ext4_dir_entry_2 **res_dir,
> - int *has_inline_data);
> -extern int ext4_delete_inline_entry(handle_t *handle,
> - struct inode *dir,
> - struct ext4_dir_entry_2 *de_del,
> - struct buffer_head *bh,
> - int *has_inline_data);
> -extern int empty_inline_dir(struct inode *dir, int *has_inline_data);
> -extern struct buffer_head *ext4_get_first_inline_block(struct inode *inode,
> - struct ext4_dir_entry_2 **parent_de,
> - int *retval);
> -extern int ext4_inline_data_fiemap(struct inode *inode,
> - struct fiemap_extent_info *fieinfo,
> - int *has_inline);
> -extern int ext4_try_to_evict_inline_data(handle_t *handle,
> - struct inode *inode,
> - int needed);
> -extern void ext4_inline_data_truncate(struct inode *inode, int *has_inline);
> -
> -extern int ext4_convert_inline_data(struct inode *inode);
> -
> #ifdef CONFIG_EXT4_FS_SECURITY
> extern int ext4_init_security(handle_t *handle, struct inode *inode,
> struct inode *dir, const struct qstr *qstr);
> --
> 1.7.12.rc0.22.gcdd159b
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-02-11 16:35:51

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 06/12] ext4: grab page before starting transaction handle in write_begin()

On Sat 09-02-13 16:53:46, Ted Tso wrote:
> The grab_cache_page_write_begin() function can potentially sleep for a
> long time, since it may need to do memory allocation which can block
> if the system is under significant memory pressure, and because it may
> be blocked on page writeback. If it does take a long time to grab the
> page, it's better that we not hold an active jbd2 handle.
>
> So grab a handle on the page first, and _then_ start the transaction
> handle.
>
> This commit fixes the following long transaction handle hold time:
>
> postmark-2917 [000] .... 196.435786: jbd2_handle_stats: dev 254,32
> tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
> dirtied_blocks 0
OK, the patch looks correct. ext4_write_begin() becomes a bit complex but
it's still well contained so I guess it's worth the improvement. You can
add:
Reviewed-by: Jan Kara <[email protected]>

Honza

> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> fs/ext4/inode.c | 111 ++++++++++++++++++++++++++++++++++----------------------
> 1 file changed, 68 insertions(+), 43 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index c3c47e2..38164a8 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -906,32 +906,40 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
> ret = ext4_try_to_write_inline_data(mapping, inode, pos, len,
> flags, pagep);
> if (ret < 0)
> - goto out;
> - if (ret == 1) {
> - ret = 0;
> - goto out;
> - }
> + return ret;
> + if (ret == 1)
> + return 0;
> }
>
> -retry:
> + /*
> + * grab_cache_page_write_begin() can take a long time if the
> + * system is thrashing due to memory pressure, or if the page
> + * is being written back. So grab it first before we start
> + * the transaction handle. This also allows us to allocate
> + * the page (if needed) without using GFP_NOFS.
> + */
> +retry_grab:
> + page = grab_cache_page_write_begin(mapping, index, flags);
> + if (!page)
> + return -ENOMEM;
> + unlock_page(page);
> +
> +retry_journal:
> handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE, needed_blocks);
> if (IS_ERR(handle)) {
> - ret = PTR_ERR(handle);
> - goto out;
> + page_cache_release(page);
> + return PTR_ERR(handle);
> }
>
> - /* We cannot recurse into the filesystem as the transaction is already
> - * started */
> - flags |= AOP_FLAG_NOFS;
> -
> - page = grab_cache_page_write_begin(mapping, index, flags);
> - if (!page) {
> + lock_page(page);
> + if (page->mapping != mapping) {
> + /* The page got truncated from under us */
> + unlock_page(page);
> + page_cache_release(page);
> ext4_journal_stop(handle);
> - ret = -ENOMEM;
> - goto out;
> + goto retry_grab;
> }
> -
> - *pagep = page;
> + wait_on_page_writeback(page);
>
> if (ext4_should_dioread_nolock(inode))
> ret = __block_write_begin(page, pos, len, ext4_get_block_write);
> @@ -946,7 +954,6 @@ retry:
>
> if (ret) {
> unlock_page(page);
> - page_cache_release(page);
> /*
> * __block_write_begin may have instantiated a few blocks
> * outside i_size. Trim these off again. Don't need
> @@ -970,11 +977,14 @@ retry:
> if (inode->i_nlink)
> ext4_orphan_del(NULL, inode);
> }
> - }
>
> - if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
> - goto retry;
> -out:
> + if (ret == -ENOSPC &&
> + ext4_should_retry_alloc(inode->i_sb, &retries))
> + goto retry_journal;
> + page_cache_release(page);
> + return ret;
> + }
> + *pagep = page;
> return ret;
> }
>
> @@ -2524,42 +2534,52 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
> pos, len, flags,
> pagep, fsdata);
> if (ret < 0)
> - goto out;
> - if (ret == 1) {
> - ret = 0;
> - goto out;
> - }
> + return ret;
> + if (ret == 1)
> + return 0;
> }
>
> -retry:
> + /*
> + * grab_cache_page_write_begin() can take a long time if the
> + * system is thrashing due to memory pressure, or if the page
> + * is being written back. So grab it first before we start
> + * the transaction handle. This also allows us to allocate
> + * the page (if needed) without using GFP_NOFS.
> + */
> +retry_grab:
> + page = grab_cache_page_write_begin(mapping, index, flags);
> + if (!page)
> + return -ENOMEM;
> + unlock_page(page);
> +
> /*
> * With delayed allocation, we don't log the i_disksize update
> * if there is delayed block allocation. But we still need
> * to journalling the i_disksize update if writes to the end
> * of file which has an already mapped buffer.
> */
> +retry_journal:
> handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE, 1);
> if (IS_ERR(handle)) {
> - ret = PTR_ERR(handle);
> - goto out;
> + page_cache_release(page);
> + return PTR_ERR(handle);
> }
> - /* We cannot recurse into the filesystem as the transaction is already
> - * started */
> - flags |= AOP_FLAG_NOFS;
>
> - page = grab_cache_page_write_begin(mapping, index, flags);
> - if (!page) {
> + lock_page(page);
> + if (page->mapping != mapping) {
> + /* The page got truncated from under us */
> + unlock_page(page);
> + page_cache_release(page);
> ext4_journal_stop(handle);
> - ret = -ENOMEM;
> - goto out;
> + goto retry_grab;
> }
> - *pagep = page;
> + /* In case writeback began while the page was unlocked */
> + wait_on_page_writeback(page);
>
> ret = __block_write_begin(page, pos, len, ext4_da_get_block_prep);
> if (ret < 0) {
> unlock_page(page);
> ext4_journal_stop(handle);
> - page_cache_release(page);
> /*
> * block_write_begin may have instantiated a few blocks
> * outside i_size. Trim these off again. Don't need
> @@ -2567,11 +2587,16 @@ retry:
> */
> if (pos + len > inode->i_size)
> ext4_truncate_failed_write(inode);
> +
> + if (ret == -ENOSPC &&
> + ext4_should_retry_alloc(inode->i_sb, &retries))
> + goto retry_journal;
> +
> + page_cache_release(page);
> + return ret;
> }
>
> - if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
> - goto retry;
> -out:
> + *pagep = page;
> return ret;
> }
>
> --
> 1.7.12.rc0.22.gcdd159b
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-02-11 18:13:39

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 05/12] ext4: pass context information to jbd2__journal_start()

On Mon, Feb 11, 2013 at 05:16:34PM +0100, Jan Kara wrote:
> > postmark-2917 [000] .... 196.435786: jbd2_handle_stats: dev 254,32
> > tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
> > dirtied_blocks 0
> Nice! I'm just wondering - won't it be easier if we just stored a pointer
> to __FILE__ in the handle? We wouldn't have to define those transaction
> types and think which one to use when coding. Also checking the trace point
> output would be simpler. I guess using __FILE__ will increase kernel size
> due to additional string constants as gcc isn't clever enough to merge
> identical strings. But we could workaround that defining in every ext4
> file:

One of the reasons why I originally started with using an integer type
was that I was originally thinking about calculating handle-level
statistics (max hold time, average hold time, etc., and keeping those
statistics on a per-class basis). So having an integer type number
was useful for that purpose.

Once I figured out that you could do things specify constraints such
as "interval > 5", the desire to do kernel-level statistics became
less urgent for me, since it's really finding the "long pole" handle
hold times which are most interesting. I still might want to some
kernel-level statistics which are broken down by type, since once we
eliminate the long pole, knowing what the average handle hold times
are could still be useful/interesting.

Another downside of using a pointer to __FILE__, or some static
character pointer, is that it doesn't work for the applications such
as perf (and I think ftrace, although I'm not certain about the ftrace
tool, since I never use it) can't handle printing the char *, since
they use the ring buffer syscall directly instead of using the
TP_printk statement. So all they get is a kernel address, and
something which is useful to decode.

But for users who are using the /sys/kernel/debug/tracing interface
via echo and cat, I do agree that using a static char *JBD_FILE so
that /sys/kernel/debug/trace has real file names would be a bit more
convenient for them.

- Ted

2013-02-11 18:30:39

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 10/12] ext4: fix the number of credits needed for ext4_unlink() and ext4_rmdir()

On Mon, Feb 11, 2013 at 05:28:30PM +0100, Jan Kara wrote:
> On Sat 09-02-13 16:53:50, Ted Tso wrote:
> > The ext4_unlink() and ext4_rmdir() don't actually release the blocks
> > associated with the file/directory. This gets done in a separate jbd2
> > handle called via ext4_evict_inode(). Thus, we don't need to reserve
> > lots of journal credits for the truncate.
> >
> > Note that using too many journal credits is non-optimal because it can
> > leading to the journal transmit getting closed too early, before it is
> > strictly necessary.
> Nice spotting. The patch looks good. You can add:
> Reviewed-by: Jan Kara <[email protected]>

The change in how ext4_unlink() works goes back years and years, so
this patch is applicable to ext3 as well. I imagine a some of these
patches, such as the one where we now can start the handle in
ext4_new_inode(), or changing the order in which we call
grab_cache_page() and journal_start(), will probably be too
complicated for you to feel comfortable backporting them to ext3.
This one is really simple though, so maybe it's worth it.

The other ones you might want to consider is the patches do the
directory lookup before before start the journal, since they are
pretty easy to validate by inspection.

Cheers,

- Ted

2013-02-11 19:30:22

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 10/12] ext4: fix the number of credits needed for ext4_unlink() and ext4_rmdir()

On Mon 11-02-13 13:30:36, Ted Tso wrote:
> On Mon, Feb 11, 2013 at 05:28:30PM +0100, Jan Kara wrote:
> > On Sat 09-02-13 16:53:50, Ted Tso wrote:
> > > The ext4_unlink() and ext4_rmdir() don't actually release the blocks
> > > associated with the file/directory. This gets done in a separate jbd2
> > > handle called via ext4_evict_inode(). Thus, we don't need to reserve
> > > lots of journal credits for the truncate.
> > >
> > > Note that using too many journal credits is non-optimal because it can
> > > leading to the journal transmit getting closed too early, before it is
> > > strictly necessary.
> > Nice spotting. The patch looks good. You can add:
> > Reviewed-by: Jan Kara <[email protected]>
>
> The change in how ext4_unlink() works goes back years and years, so
> this patch is applicable to ext3 as well. I imagine a some of these
> patches, such as the one where we now can start the handle in
> ext4_new_inode(), or changing the order in which we call
> grab_cache_page() and journal_start(), will probably be too
> complicated for you to feel comfortable backporting them to ext3.
> This one is really simple though, so maybe it's worth it.
>
> The other ones you might want to consider is the patches do the
> directory lookup before before start the journal, since they are
> pretty easy to validate by inspection.
Yeah, I will likely take the easy ones into ext3 but I definitely don't
plan to port the change in ext4_new_inode() into ext3.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-02-11 19:58:57

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 05/12] ext4: pass context information to jbd2__journal_start()

On Mon 11-02-13 13:13:35, Ted Tso wrote:
> On Mon, Feb 11, 2013 at 05:16:34PM +0100, Jan Kara wrote:
> > > postmark-2917 [000] .... 196.435786: jbd2_handle_stats: dev 254,32
> > > tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
> > > dirtied_blocks 0
> > Nice! I'm just wondering - won't it be easier if we just stored a pointer
> > to __FILE__ in the handle? We wouldn't have to define those transaction
> > types and think which one to use when coding. Also checking the trace point
> > output would be simpler. I guess using __FILE__ will increase kernel size
> > due to additional string constants as gcc isn't clever enough to merge
> > identical strings. But we could workaround that defining in every ext4
> > file:
>
> One of the reasons why I originally started with using an integer type
> was that I was originally thinking about calculating handle-level
> statistics (max hold time, average hold time, etc., and keeping those
> statistics on a per-class basis). So having an integer type number
> was useful for that purpose.
>
> Once I figured out that you could do things specify constraints such
> as "interval > 5", the desire to do kernel-level statistics became
> less urgent for me, since it's really finding the "long pole" handle
> hold times which are most interesting. I still might want to some
> kernel-level statistics which are broken down by type, since once we
> eliminate the long pole, knowing what the average handle hold times
> are could still be useful/interesting.
Agreed. Although for debugging purposes one can compute this info from
stats output by tracepoints. But if you want to ask users to give you
stats, computing them in kernel is superior (less overhead than full
tracing). We could still compute per call site stats with __FILE__, lineno
pair but it would require some hashing and is probably an overkill.

> Another downside of using a pointer to __FILE__, or some static
> character pointer, is that it doesn't work for the applications such
> as perf (and I think ftrace, although I'm not certain about the ftrace
> tool, since I never use it) can't handle printing the char *, since
> they use the ring buffer syscall directly instead of using the
> TP_printk statement. So all they get is a kernel address, and
> something which is useful to decode.
One can overcome this relatively easily by defining the trace
event to have __array(char, file, 32) and then strcpy() the name to is in
TP_fast_assign(). We do this in lot of other trace points as well (with
device names etc.).

> But for users who are using the /sys/kernel/debug/tracing interface
> via echo and cat, I do agree that using a static char *JBD_FILE so
> that /sys/kernel/debug/trace has real file names would be a bit more
> convenient for them.
Convenience of use is one thing, having programmer properly assign new
handle types (or use old ones) is another. And you also have questions like
- there are three call sites with this handle type but I cannot figure out
which of them is causing problems.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-02-11 20:14:44

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 05/12] ext4: pass context information to jbd2__journal_start()

On Mon, Feb 11, 2013 at 08:58:55PM +0100, Jan Kara wrote:
> > Another downside of using a pointer to __FILE__, or some static
> > character pointer, is that it doesn't work for the applications such
> > as perf (and I think ftrace, although I'm not certain about the ftrace
> > tool, since I never use it) can't handle printing the char *, since
> > they use the ring buffer syscall directly instead of using the
> > TP_printk statement. So all they get is a kernel address, and
> > something which is useful to decode.
>
> One can overcome this relatively easily by defining the trace
> event to have __array(char, file, 32) and then strcpy() the name to is in
> TP_fast_assign(). We do this in lot of other trace points as well (with
> device names etc.).

True, but then the tradeoff is that we've bloated the size of the
tracepoint in the ring buffer. Given how many handles can be created
and closed per second, keeping the tracepoint size small is also
desireable. Of course one thing we could do is use a small integer,
and then have a sysfs interface which maps the integer to the file
name. That may be overkill, though.

>
> > But for users who are using the /sys/kernel/debug/tracing interface
> > via echo and cat, I do agree that using a static char *JBD_FILE so
> > that /sys/kernel/debug/trace has real file names would be a bit more
> > convenient for them.
> Convenience of use is one thing, having programmer properly assign new
> handle types (or use old ones) is another. And you also have questions like
> - there are three call sites with this handle type but I cannot figure out
> which of them is causing problems.

Well, that's what the line number is for (so you can distinguish the
call site). The other reason why using a type separate from the file
name is sometimes the best way to classify different types of handle
operations isn't necessarily by file name. As far as the effort of
programmers to assign new handle types, I don't really anticipate that
this will be a common thing; it's pretty rare that we introduce code
which requires new calls to ext4_journal_start/ext4_journal_stop. And
when programmers do need to do this, the question of which handle type
to use or whether to create a new handle type is a pretty minor issue.
The much greater difficulty is figuring out how many journal credits
to reserve. :-)

- Ted