2012-03-20 14:41:10

by Artem Bityutskiy

[permalink] [raw]
Subject: [PATCH v1 0/9] do not use s_dirt in ext4

This patch-set makes ext4 independent of the VFS superblock management
services. Namely, ext4 does not require to register the 'write_super()' VFS
call-back.

The reason of this exercises is to get rid of the 'sync_supers()' kernel thread
which wakes up every 5 seconds (by default) even if all superblocks are clean.
This is wasteful from power management POW (unnecessary wake-ups).

Note, I tried to optimize 'sync_supers()' instead in 2010, but Al wanted me
to get rid of it instead. See https://lkml.org/lkml/2010/6/6/87
And I think this is right because many file-systems do not need this, for
example btrfs does not use VFS superblock management services at all, so on a
btrfs-based system we currently end-up useless periodic wake-ups source.

Changes for other file-systems are coming later.

The patch-set structure.
1. patches 1,2,3 are independent ext4 cleanups and I ask Ted to merge them as
soon/long as they are OK. I sent them also independently in order to get
early comments, but did not get so far, so re-sending.
2. patch 4 exports 'dirty_writeback_interval' and it would be very useful to
have it merged ASAP to simplify further work
3. patch 5 is also and independent VFS clean-up
4. patches 6-9 actually make ext4 independent on the 'sync_supers()' thread.

Thanks,
Artem.


2012-03-20 14:41:34

by Artem Bityutskiy

[permalink] [raw]
Subject: [PATCH v1 1/9] ext4: do not mark superblock as dirty unnecessarily

From: Artem Bityutskiy <[email protected]>

Commit a0375156ca1041574b5d47cc7e32f10b891151b0 cleaned up superblock dirtying
handling, but missed one place. This patch does what was intended: if we have
the journal, then we update the superblock through the journal rather than
doing this directly.

Signed-off-by: Artem Bityutskiy <[email protected]>
---
fs/ext4/inode.c | 4 +---
1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index feaa82f..e040403 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3924,10 +3924,8 @@ static int ext4_do_update_inode(handle_t *handle,
ext4_update_dynamic_rev(sb);
EXT4_SET_RO_COMPAT_FEATURE(sb,
EXT4_FEATURE_RO_COMPAT_LARGE_FILE);
- sb->s_dirt = 1;
ext4_handle_sync(handle);
- err = ext4_handle_dirty_metadata(handle, NULL,
- EXT4_SB(sb)->s_sbh);
+ err = ext4_handle_dirty_super(handle, sb);
}
}
raw_inode->i_generation = cpu_to_le32(inode->i_generation);
--
1.7.7.6


2012-03-20 14:41:22

by Artem Bityutskiy

[permalink] [raw]
Subject: [PATCH v1 2/9] ext4: write superblock only once on unmount

From: Artem Bityutskiy <[email protected]>

In some rather rare cases it is possible that ext4 may the superblock to the
media twice. This patch makes sure this does not happen. This should speed up
unmounting in those cases.

Signed-off-by: Artem Bityutskiy <[email protected]>
---
fs/ext4/super.c | 7 +++----
1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 502c61f..c1f5111 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -808,9 +808,6 @@ static void ext4_put_super(struct super_block *sb)
destroy_workqueue(sbi->dio_unwritten_wq);

lock_super(sb);
- if (sb->s_dirt)
- ext4_commit_super(sb, 1);
-
if (sbi->s_journal) {
err = jbd2_journal_destroy(sbi->s_journal);
sbi->s_journal = NULL;
@@ -827,8 +824,10 @@ static void ext4_put_super(struct super_block *sb)
if (!(sb->s_flags & MS_RDONLY)) {
EXT4_CLEAR_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
es->s_state = cpu_to_le16(sbi->s_mount_state);
- ext4_commit_super(sb, 1);
}
+ if (sb->s_dirt || !(sb->s_flags & MS_RDONLY))
+ ext4_commit_super(sb, 1);
+
if (sbi->s_proc) {
remove_proc_entry(sb->s_id, ext4_proc_root);
}
--
1.7.7.6

2012-03-20 14:41:28

by Artem Bityutskiy

[permalink] [raw]
Subject: [PATCH v1 8/9] ext4: small cleanup in ext4_commit_super

From: Artem Bityutskiy <[email protected]>

We have many 'EXT4_SB(sb)' references in 'ext4_commit_super()' and I am going
to add another one, so I think the function will be a bit tidier if I introduce
a separate 'sbi' pointer for the ext4 superblock.

Signed-off-by: Artem Bityutskiy <[email protected]>
---
fs/ext4/super.c | 18 +++++++++---------
1 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 7d453f7..64f3a6a 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4132,8 +4132,9 @@ static int ext4_load_journal(struct super_block *sb,

static int ext4_commit_super(struct super_block *sb, int sync)
{
- struct ext4_super_block *es = EXT4_SB(sb)->s_es;
- struct buffer_head *sbh = EXT4_SB(sb)->s_sbh;
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+ struct ext4_super_block *es = sbi->s_es;
+ struct buffer_head *sbh = sbi->s_sbh;
int error = 0;

if (!sbh || block_device_ejected(sb))
@@ -4174,18 +4175,17 @@ static int ext4_commit_super(struct super_block *sb, int sync)
es->s_wtime = cpu_to_le32(get_seconds());
if (sb->s_bdev->bd_part)
es->s_kbytes_written =
- cpu_to_le64(EXT4_SB(sb)->s_kbytes_written +
+ cpu_to_le64(sbi->s_kbytes_written +
((part_stat_read(sb->s_bdev->bd_part, sectors[1]) -
- EXT4_SB(sb)->s_sectors_written_start) >> 1));
+ sbi->s_sectors_written_start) >> 1));
else
- es->s_kbytes_written =
- cpu_to_le64(EXT4_SB(sb)->s_kbytes_written);
+ es->s_kbytes_written = cpu_to_le64(sbi->s_kbytes_written);
ext4_free_blocks_count_set(es,
- EXT4_C2B(EXT4_SB(sb), percpu_counter_sum_positive(
- &EXT4_SB(sb)->s_freeclusters_counter)));
+ EXT4_C2B(sbi, percpu_counter_sum_positive(
+ &sbi->s_freeclusters_counter)));
es->s_free_inodes_count =
cpu_to_le32(percpu_counter_sum_positive(
- &EXT4_SB(sb)->s_freeinodes_counter));
+ &sbi->s_freeinodes_counter));
BUFFER_TRACE(sbh, "marking dirty");
mark_buffer_dirty(sbh);
if (sync) {
--
1.7.7.6

2012-03-20 14:41:29

by Artem Bityutskiy

[permalink] [raw]
Subject: [PATCH v1 9/9] ext4: introduce own superblock dirty flag

From: Artem Bityutskiy <[email protected]>

We finally do not need VFS's 's_dirt' flag in ext4 - introduce our own
's_dirty' flag instead.

Note: the final goal is to get rid of the 'sync_supers()' kernel thread which
wakes up every 5 seconds and even if there is nothing to do. Thus, we are
pushing superblock management from VFS down to file-systems.

Signed-off-by: Artem Bityutskiy <[email protected]>
---
fs/ext4/ext4.h | 1 +
fs/ext4/super.c | 12 ++++++------
2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 87e8376..be15b2d 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1158,6 +1158,7 @@ struct ext4_sb_info {
u32 s_hash_seed[4];
int s_def_hash_version;
int s_hash_unsigned; /* 3 if hash should be signed, 0 if not */
+ int s_dirty;
struct percpu_counter s_freeclusters_counter;
struct percpu_counter s_freeinodes_counter;
struct percpu_counter s_dirs_counter;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 64f3a6a..e24dec2 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -824,7 +824,7 @@ static void ext4_put_super(struct super_block *sb)
EXT4_CLEAR_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
es->s_state = cpu_to_le16(sbi->s_mount_state);
}
- if (sb->s_dirt || !(sb->s_flags & MS_RDONLY))
+ if (sbi->s_dirty || !(sb->s_flags & MS_RDONLY))
ext4_commit_super(sb, 1);

if (sbi->s_proc) {
@@ -4140,7 +4140,7 @@ static int ext4_commit_super(struct super_block *sb, int sync)
if (!sbh || block_device_ejected(sb))
return error;

- sb->s_dirt = 0;
+ sbi->s_dirty = 0;
/*
* Make sure we first mark the superblock as clean and then start
* writing it out.
@@ -4225,7 +4225,7 @@ static void write_super(struct work_struct *work)
kfree(sbwork);

smp_rmb();
- if (!sb->s_dirt)
+ if (!EXT4_SB(sb)->s_dirty)
return;

lock_super(sb);
@@ -4241,9 +4241,9 @@ void __ext4_mark_super_dirty(struct super_block *sb)

/* Make sure we see 's_dirt' changes ASAP */
smp_rmb();
- if (sb->s_dirt == 1)
+ if (sbi->s_dirty == 1)
return;
- sb->s_dirt = 1;
+ sbi->s_dirty = 1;
/* Make other CPUs see the 's_dirt' change as soon as possible */
smp_wmb();

@@ -4254,7 +4254,7 @@ void __ext4_mark_super_dirty(struct super_block *sb)
* trouble anyway, and the SB will be written out on unmount or
* we may be luckier next time it is marked as dirty.
*/
- sb->s_dirt = 2;
+ sbi->s_dirty = 2;
return;
}

--
1.7.7.6

2012-03-20 14:41:25

by Artem Bityutskiy

[permalink] [raw]
Subject: [PATCH v1 5/9] VFS: remove unused superblock helpers

From: Artem Bityutskiy <[email protected]>

Remove the 'sb_mark_dirty()', 'sb_mark_clean()' and 'sb_is_dirty()' helpers
which are not used. I introduced them 2 years and the intention was to make
all file-systems use them in order to be able to optimize 'sync_supers()'.
However, Al Viro vetoed my patches at the end and asked me to push superblock
management down to file-systems and get rid of the 's_dirt' flag completely,
as well as kill 'sync_supers()' altogether. Thus, remove the helpers.

Signed-off-by: Artem Bityutskiy <[email protected]>
Cc: Al Viro <[email protected]>
---
include/linux/fs.h | 13 -------------
1 files changed, 0 insertions(+), 13 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 69cd5bb..68387e9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1870,19 +1870,6 @@ extern struct dentry *mount_pseudo(struct file_system_type *, char *,
const struct dentry_operations *dops,
unsigned long);

-static inline void sb_mark_dirty(struct super_block *sb)
-{
- sb->s_dirt = 1;
-}
-static inline void sb_mark_clean(struct super_block *sb)
-{
- sb->s_dirt = 0;
-}
-static inline int sb_is_dirty(struct super_block *sb)
-{
- return sb->s_dirt;
-}
-
/* Alas, no aliases. Too much hassle with bringing module.h everywhere */
#define fops_get(fops) \
(((fops) && try_module_get((fops)->owner) ? (fops) : NULL))
--
1.7.7.6

2012-03-20 14:41:43

by Artem Bityutskiy

[permalink] [raw]
Subject: [PATCH v1 7/9] ext4: stop using VFS for dirty superblock management

From: Artem Bityutskiy <[email protected]>

Do not use VFS for managing dirty superblock but do it inside ext4. Remove the
'ext4_write_super()' VFS callback and instead, schedule a delayed work when the
superblock is marked as dirty. Use the 'dio_unwritten_wq' which we already have
for these purposes.

We add memory barriers to make sure the 's_dirt' flag changes are visible to
other CPUs as soon as possible to avoid queuing unnecessary works.

The big changes comparing to the previous behavior:
1. We use 'dio_unwritten' queue, so the superblock will be written by
per-filesystem 'ext4-dio-unwrit' thread, instead of 'sync_supers' thread.
2. We allocate memory in 'ext4_mark_super_dirty()'.

Note: the final goal is to get rid of the 'sync_supers()' kernel thread which
wakes up every 5 seconds and even if there is nothing to do. Thus, we are
pushing superblock management from VFS down to file-systems.

Signed-off-by: Artem Bityutskiy <[email protected]>
---
fs/ext4/super.c | 73 +++++++++++++++++++++++++++++++++++++++++++++++-------
1 files changed, 63 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index d9543f3..7d453f7 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -73,7 +73,6 @@ static const char *ext4_decode_error(struct super_block *sb, int errno,
static int ext4_remount(struct super_block *sb, int *flags, char *data);
static int ext4_statfs(struct dentry *dentry, struct kstatfs *buf);
static int ext4_unfreeze(struct super_block *sb);
-static void ext4_write_super(struct super_block *sb);
static int ext4_freeze(struct super_block *sb);
static struct dentry *ext4_mount(struct file_system_type *fs_type, int flags,
const char *dev_name, void *data);
@@ -1294,7 +1293,6 @@ static const struct super_operations ext4_nojournal_sops = {
.dirty_inode = ext4_dirty_inode,
.drop_inode = ext4_drop_inode,
.evict_inode = ext4_evict_inode,
- .write_super = ext4_write_super,
.put_super = ext4_put_super,
.statfs = ext4_statfs,
.remount_fs = ext4_remount,
@@ -4140,6 +4138,14 @@ static int ext4_commit_super(struct super_block *sb, int sync)

if (!sbh || block_device_ejected(sb))
return error;
+
+ sb->s_dirt = 0;
+ /*
+ * Make sure we first mark the superblock as clean and then start
+ * writing it out.
+ */
+ smp_wmb();
+
if (buffer_write_io_error(sbh)) {
/*
* Oh, dear. A previous attempt to write the
@@ -4180,7 +4186,6 @@ static int ext4_commit_super(struct super_block *sb, int sync)
es->s_free_inodes_count =
cpu_to_le32(percpu_counter_sum_positive(
&EXT4_SB(sb)->s_freeinodes_counter));
- sb->s_dirt = 0;
BUFFER_TRACE(sbh, "marking dirty");
mark_buffer_dirty(sbh);
if (sync) {
@@ -4199,9 +4204,64 @@ static int ext4_commit_super(struct super_block *sb, int sync)
return error;
}

+struct sb_delayed_work {
+ struct delayed_work dwork;
+ struct super_block *sb;
+};
+
+static struct sb_delayed_work *work_to_sbwork(struct work_struct *work)
+{
+ struct delayed_work *dwork;
+
+ dwork = container_of(work, struct delayed_work, work);
+ return container_of(dwork, struct sb_delayed_work, dwork);
+}
+
+static void write_super(struct work_struct *work)
+{
+ struct sb_delayed_work *sbwork = work_to_sbwork(work);
+ struct super_block *sb = sbwork->sb;
+
+ kfree(sbwork);
+
+ smp_rmb();
+ if (!sb->s_dirt)
+ return;
+
+ lock_super(sb);
+ ext4_commit_super(sb, 1);
+ unlock_super(sb);
+}
+
void __ext4_mark_super_dirty(struct super_block *sb)
{
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+ struct sb_delayed_work *sbwork;
+ unsigned long delay;
+
+ /* Make sure we see 's_dirt' changes ASAP */
+ smp_rmb();
+ if (sb->s_dirt == 1)
+ return;
sb->s_dirt = 1;
+ /* Make other CPUs see the 's_dirt' change as soon as possible */
+ smp_wmb();
+
+ sbwork = kmalloc(sizeof(struct sb_delayed_work), GFP_NOFS);
+ if (!sbwork) {
+ /*
+ * Well, should not be a big deal - the system must be in
+ * trouble anyway, and the SB will be written out on unmount or
+ * we may be luckier next time it is marked as dirty.
+ */
+ sb->s_dirt = 2;
+ return;
+ }
+
+ INIT_DELAYED_WORK(&sbwork->dwork, write_super);
+ sbwork->sb = sb;
+ delay = msecs_to_jiffies(dirty_writeback_interval * 10);
+ queue_delayed_work(sbi->dio_unwritten_wq, &sbwork->dwork, delay);
}

/*
@@ -4291,13 +4351,6 @@ int ext4_force_commit(struct super_block *sb)
return ret;
}

-static void ext4_write_super(struct super_block *sb)
-{
- lock_super(sb);
- ext4_commit_super(sb, 1);
- unlock_super(sb);
-}

2012-03-20 14:41:26

by Artem Bityutskiy

[permalink] [raw]
Subject: [PATCH v1 6/9] ext4: introduce __ext4_mark_super_dirty

From: Artem Bityutskiy <[email protected]>

Introduce another superblock dirtying helper: '__ext4_mark_super_dirty()' which
marks the superblock as dirty unconditionally, without checking whether we have
the superblock or not. Use this function from '__ext4_handle_dirty_super()'.

This patch is a preparation for further work. We need to have a single place
where we mark the superblock as dirty, and this place is now the
'__ext4_mark_super_dirty()' function. We intentionally make it to be non-inline
because one of the next patches will add more code there.

Note: the final goal is to get rid of the 'sync_supers()' kernel thread which
wakes up every 5 seconds and even if there is nothing to do. Thus, we are
pushing superblock management from VFS down to file-systems.

Signed-off-by: Artem Bityutskiy <[email protected]>
---
fs/ext4/ext4.h | 3 ++-
fs/ext4/ext4_jbd2.c | 2 +-
fs/ext4/super.c | 5 +++++
3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 513004f..87e8376 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1931,6 +1931,7 @@ extern int ext4_group_extend(struct super_block *sb,
extern int ext4_resize_fs(struct super_block *sb, ext4_fsblk_t n_blocks_count);

/* super.c */
+extern void __ext4_mark_super_dirty(struct super_block *sb);
extern void *ext4_kvmalloc(size_t size, gfp_t flags);
extern void *ext4_kvzalloc(size_t size, gfp_t flags);
extern void ext4_kvfree(void *ptr);
@@ -2207,7 +2208,7 @@ static inline void ext4_unlock_group(struct super_block *sb,
static inline void ext4_mark_super_dirty(struct super_block *sb)
{
if (EXT4_SB(sb)->s_journal == NULL)
- sb->s_dirt =1;
+ __ext4_mark_super_dirty(sb);
}

/*
diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index aca1790..57e5d5c 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -149,6 +149,6 @@ int __ext4_handle_dirty_super(const char *where, unsigned int line,
ext4_journal_abort_handle(where, line, __func__,
bh, handle, err);
} else
- sb->s_dirt = 1;
+ __ext4_mark_super_dirty(sb);
return err;
}
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index d280bb6..d9543f3 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4199,6 +4199,11 @@ static int ext4_commit_super(struct super_block *sb, int sync)
return error;
}

+void __ext4_mark_super_dirty(struct super_block *sb)
+{
+ sb->s_dirt = 1;
+}
+
/*
* Have we just finished recovery? If so, and if we are mounting (or
* remounting) the filesystem readonly, then we will end up with a
--
1.7.7.6

2012-03-20 14:41:24

by Artem Bityutskiy

[permalink] [raw]
Subject: [PATCH v1 4/9] mm: export dirty_writeback_interval

From: Artem Bityutskiy <[email protected]>

Export 'dirty_writeback_interval' to make it visible to file-systems. We are
going to push superblock management down to file-systems and get rid of the
'sync_supers' kernel thread completly.

Signed-off-by: Artem Bityutskiy <[email protected]>
Cc: Al Viro <[email protected]>
---
mm/page-writeback.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 363ba70..5e39858 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -95,6 +95,8 @@ unsigned long vm_dirty_bytes;
*/
unsigned int dirty_writeback_interval = 5 * 100; /* centiseconds */

+EXPORT_SYMBOL_GPL(dirty_writeback_interval);
+
/*
* The longest time for which data is allowed to remain dirty
*/
--
1.7.7.6

2012-03-20 14:41:23

by Artem Bityutskiy

[permalink] [raw]
Subject: [PATCH v1 3/9] ext4: remove useless s_dirt assignment

From: Artem Bityutskiy <[email protected]>

Clean-up ext4 a tiny bit by removing useless s_dirt assignment in
'ext4_fill_super()' because a bit later we anyway call 'ext4_setup_super()'
which writes the superblock to the media unconditionally.

Signed-off-by: Artem Bityutskiy <[email protected]>
---
fs/ext4/super.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index c1f5111..d280bb6 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3415,7 +3415,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
#else
es->s_flags |= cpu_to_le32(EXT2_FLAGS_SIGNED_HASH);
#endif
- sb->s_dirt = 1;
}

/* Handle clustersize */
--
1.7.7.6

2012-03-21 08:24:12

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCH v1 7/9] ext4: stop using VFS for dirty superblock management

On Tue, 2012-03-20 at 16:41 +0200, Artem Bityutskiy wrote:
> + INIT_DELAYED_WORK(&sbwork->dwork, write_super);
> + sbwork->sb = sb;
> + delay = msecs_to_jiffies(dirty_writeback_interval * 10);
> + queue_delayed_work(sbi->dio_unwritten_wq, &sbwork->dwork, delay);

I've just realized that the side-effect of using DIO workqueue is that
'syncfs()' will also synchronize the superblock because it flushes the
workqueue:

static int ext4_sync_fs(struct super_block *sb, int wait)
{
...
flush_workqueue(sbi->dio_unwritten_wq);
...
}

But before my change, it seems the superblock was not flushed on
'syncfs()', at least I do no see how this would be done. However, I
think it is OK because I think it is correct to write the dirty
superblock out on 'syncfs()', right?

--
Best Regards,
Artem Bityutskiy


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2012-03-22 09:53:42

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v1 0/9] do not use s_dirt in ext4

On Tue 20-03-12 16:41:20, Artem Bityutskiy wrote:
> This patch-set makes ext4 independent of the VFS superblock management
> services. Namely, ext4 does not require to register the 'write_super()' VFS
> call-back.
>
> The reason of this exercises is to get rid of the 'sync_supers()' kernel thread
> which wakes up every 5 seconds (by default) even if all superblocks are clean.
> This is wasteful from power management POW (unnecessary wake-ups).
>
> Note, I tried to optimize 'sync_supers()' instead in 2010, but Al wanted me
> to get rid of it instead. See https://lkml.org/lkml/2010/6/6/87
> And I think this is right because many file-systems do not need this, for
> example btrfs does not use VFS superblock management services at all, so on a
> btrfs-based system we currently end-up useless periodic wake-ups source.
>
> Changes for other file-systems are coming later.
>
> The patch-set structure.
> 1. patches 1,2,3 are independent ext4 cleanups and I ask Ted to merge them as
> soon/long as they are OK. I sent them also independently in order to get
> early comments, but did not get so far, so re-sending.
> 2. patch 4 exports 'dirty_writeback_interval' and it would be very useful to
> have it merged ASAP to simplify further work
> 3. patch 5 is also and independent VFS clean-up
> 4. patches 6-9 actually make ext4 independent on the 'sync_supers()' thread.
Artem, if you look at places where ext4 sets s_dirt you will notice they
are rather rare events and all of them actually take care of writing
superblock themselves (at least if my memory serves well). So ext4
shouldn't need sync_supers() at all...

Honza

2012-03-22 09:58:21

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v1 1/9] ext4: do not mark superblock as dirty unnecessarily

On Tue 20-03-12 16:41:21, Artem Bityutskiy wrote:
> From: Artem Bityutskiy <[email protected]>
>
> Commit a0375156ca1041574b5d47cc7e32f10b891151b0 cleaned up superblock dirtying
> handling, but missed one place. This patch does what was intended: if we have
> the journal, then we update the superblock through the journal rather than
> doing this directly.
>
> Signed-off-by: Artem Bityutskiy <[email protected]>
Looks OK.
Reviewed-by: Jan Kara <[email protected]>

Honza
> ---
> fs/ext4/inode.c | 4 +---
> 1 files changed, 1 insertions(+), 3 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index feaa82f..e040403 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3924,10 +3924,8 @@ static int ext4_do_update_inode(handle_t *handle,
> ext4_update_dynamic_rev(sb);
> EXT4_SET_RO_COMPAT_FEATURE(sb,
> EXT4_FEATURE_RO_COMPAT_LARGE_FILE);
> - sb->s_dirt = 1;
> ext4_handle_sync(handle);
> - err = ext4_handle_dirty_metadata(handle, NULL,
> - EXT4_SB(sb)->s_sbh);
> + err = ext4_handle_dirty_super(handle, sb);
> }
> }
> raw_inode->i_generation = cpu_to_le32(inode->i_generation);
> --
> 1.7.7.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2012-03-22 09:59:57

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v1 2/9] ext4: write superblock only once on unmount

On Tue 20-03-12 16:41:22, Artem Bityutskiy wrote:
> From: Artem Bityutskiy <[email protected]>
>
> In some rather rare cases it is possible that ext4 may the superblock to the
> media twice. This patch makes sure this does not happen. This should speed up
> unmounting in those cases.
This cleanup looks OK.
Reviewed-by: Jan Kara <[email protected]>

Honza

>
> Signed-off-by: Artem Bityutskiy <[email protected]>
> ---
> fs/ext4/super.c | 7 +++----
> 1 files changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 502c61f..c1f5111 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -808,9 +808,6 @@ static void ext4_put_super(struct super_block *sb)
> destroy_workqueue(sbi->dio_unwritten_wq);
>
> lock_super(sb);
> - if (sb->s_dirt)
> - ext4_commit_super(sb, 1);
> -
> if (sbi->s_journal) {
> err = jbd2_journal_destroy(sbi->s_journal);
> sbi->s_journal = NULL;
> @@ -827,8 +824,10 @@ static void ext4_put_super(struct super_block *sb)
> if (!(sb->s_flags & MS_RDONLY)) {
> EXT4_CLEAR_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
> es->s_state = cpu_to_le16(sbi->s_mount_state);
> - ext4_commit_super(sb, 1);
> }
> + if (sb->s_dirt || !(sb->s_flags & MS_RDONLY))
> + ext4_commit_super(sb, 1);
> +
> if (sbi->s_proc) {
> remove_proc_entry(sb->s_id, ext4_proc_root);
> }
> --
> 1.7.7.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2012-03-22 10:02:08

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v1 3/9] ext4: remove useless s_dirt assignment

On Tue 20-03-12 16:41:23, Artem Bityutskiy wrote:
> From: Artem Bityutskiy <[email protected]>
>
> Clean-up ext4 a tiny bit by removing useless s_dirt assignment in
> 'ext4_fill_super()' because a bit later we anyway call 'ext4_setup_super()'
> which writes the superblock to the media unconditionally.
>
> Signed-off-by: Artem Bityutskiy <[email protected]>
Looks good.
Reviewed-by: Jan Kara <[email protected]>

Honza
> ---
> fs/ext4/super.c | 1 -
> 1 files changed, 0 insertions(+), 1 deletions(-)
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index c1f5111..d280bb6 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -3415,7 +3415,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
> #else
> es->s_flags |= cpu_to_le32(EXT2_FLAGS_SIGNED_HASH);
> #endif
> - sb->s_dirt = 1;
> }
>
> /* Handle clustersize */
> --
> 1.7.7.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2012-03-22 10:02:59

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCH v1 0/9] do not use s_dirt in ext4

On Thu, 2012-03-22 at 10:53 +0100, Jan Kara wrote:
> On Tue 20-03-12 16:41:20, Artem Bityutskiy wrote:
> > This patch-set makes ext4 independent of the VFS superblock management
> > services. Namely, ext4 does not require to register the 'write_super()' VFS
> > call-back.
> >
> > The reason of this exercises is to get rid of the 'sync_supers()' kernel thread
> > which wakes up every 5 seconds (by default) even if all superblocks are clean.
> > This is wasteful from power management POW (unnecessary wake-ups).
> >
> > Note, I tried to optimize 'sync_supers()' instead in 2010, but Al wanted me
> > to get rid of it instead. See https://lkml.org/lkml/2010/6/6/87
> > And I think this is right because many file-systems do not need this, for
> > example btrfs does not use VFS superblock management services at all, so on a
> > btrfs-based system we currently end-up useless periodic wake-ups source.
> >
> > Changes for other file-systems are coming later.
> >
> > The patch-set structure.
> > 1. patches 1,2,3 are independent ext4 cleanups and I ask Ted to merge them as
> > soon/long as they are OK. I sent them also independently in order to get
> > early comments, but did not get so far, so re-sending.
> > 2. patch 4 exports 'dirty_writeback_interval' and it would be very useful to
> > have it merged ASAP to simplify further work
> > 3. patch 5 is also and independent VFS clean-up
> > 4. patches 6-9 actually make ext4 independent on the 'sync_supers()' thread.
> Artem, if you look at places where ext4 sets s_dirt you will notice they
> are rather rare events and all of them actually take care of writing
> superblock themselves (at least if my memory serves well). So ext4
> shouldn't need sync_supers() at all...

Hmm, if there is journal, then ext4 does not initialize the
'->write_super()' VFS call-back and indeed takes care of the SB itself.
Indeed. So 'sync_supers()' wakes up every 5 seconds for nothing.

However, if there is _no_ journal, the 'write_super' is initialized, and
in many places the 's_dirt' flag is set, and thus VFS services seem to
be actively used.

I do not ext4 well enough, but the SB dirtying looks a bit messy, and I
am happy to do some clean-ups, just need a bit more of direction :-)

Thanks!

--
Best Regards,
Artem Bityutskiy


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2012-03-22 10:11:33

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v1 8/9] ext4: small cleanup in ext4_commit_super

On Tue 20-03-12 16:41:28, Artem Bityutskiy wrote:
> From: Artem Bityutskiy <[email protected]>
>
> We have many 'EXT4_SB(sb)' references in 'ext4_commit_super()' and I am going
> to add another one, so I think the function will be a bit tidier if I introduce
> a separate 'sbi' pointer for the ext4 superblock.
>
> Signed-off-by: Artem Bityutskiy <[email protected]>
Good cleanup.
Reviewed-by: Jan Kara <[email protected]>

Honza
> ---
> fs/ext4/super.c | 18 +++++++++---------
> 1 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 7d453f7..64f3a6a 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -4132,8 +4132,9 @@ static int ext4_load_journal(struct super_block *sb,
>
> static int ext4_commit_super(struct super_block *sb, int sync)
> {
> - struct ext4_super_block *es = EXT4_SB(sb)->s_es;
> - struct buffer_head *sbh = EXT4_SB(sb)->s_sbh;
> + struct ext4_sb_info *sbi = EXT4_SB(sb);
> + struct ext4_super_block *es = sbi->s_es;
> + struct buffer_head *sbh = sbi->s_sbh;
> int error = 0;
>
> if (!sbh || block_device_ejected(sb))
> @@ -4174,18 +4175,17 @@ static int ext4_commit_super(struct super_block *sb, int sync)
> es->s_wtime = cpu_to_le32(get_seconds());
> if (sb->s_bdev->bd_part)
> es->s_kbytes_written =
> - cpu_to_le64(EXT4_SB(sb)->s_kbytes_written +
> + cpu_to_le64(sbi->s_kbytes_written +
> ((part_stat_read(sb->s_bdev->bd_part, sectors[1]) -
> - EXT4_SB(sb)->s_sectors_written_start) >> 1));
> + sbi->s_sectors_written_start) >> 1));
> else
> - es->s_kbytes_written =
> - cpu_to_le64(EXT4_SB(sb)->s_kbytes_written);
> + es->s_kbytes_written = cpu_to_le64(sbi->s_kbytes_written);
> ext4_free_blocks_count_set(es,
> - EXT4_C2B(EXT4_SB(sb), percpu_counter_sum_positive(
> - &EXT4_SB(sb)->s_freeclusters_counter)));
> + EXT4_C2B(sbi, percpu_counter_sum_positive(
> + &sbi->s_freeclusters_counter)));
> es->s_free_inodes_count =
> cpu_to_le32(percpu_counter_sum_positive(
> - &EXT4_SB(sb)->s_freeinodes_counter));
> + &sbi->s_freeinodes_counter));
> BUFFER_TRACE(sbh, "marking dirty");
> mark_buffer_dirty(sbh);
> if (sync) {
> --
> 1.7.7.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2012-03-22 10:33:11

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v1 0/9] do not use s_dirt in ext4

On Thu 22-03-12 12:05:47, Artem Bityutskiy wrote:
> On Thu, 2012-03-22 at 10:53 +0100, Jan Kara wrote:
> > On Tue 20-03-12 16:41:20, Artem Bityutskiy wrote:
> > > This patch-set makes ext4 independent of the VFS superblock management
> > > services. Namely, ext4 does not require to register the 'write_super()' VFS
> > > call-back.
> > >
> > > The reason of this exercises is to get rid of the 'sync_supers()' kernel thread
> > > which wakes up every 5 seconds (by default) even if all superblocks are clean.
> > > This is wasteful from power management POW (unnecessary wake-ups).
> > >
> > > Note, I tried to optimize 'sync_supers()' instead in 2010, but Al wanted me
> > > to get rid of it instead. See https://lkml.org/lkml/2010/6/6/87
> > > And I think this is right because many file-systems do not need this, for
> > > example btrfs does not use VFS superblock management services at all, so on a
> > > btrfs-based system we currently end-up useless periodic wake-ups source.
> > >
> > > Changes for other file-systems are coming later.
> > >
> > > The patch-set structure.
> > > 1. patches 1,2,3 are independent ext4 cleanups and I ask Ted to merge them as
> > > soon/long as they are OK. I sent them also independently in order to get
> > > early comments, but did not get so far, so re-sending.
> > > 2. patch 4 exports 'dirty_writeback_interval' and it would be very useful to
> > > have it merged ASAP to simplify further work
> > > 3. patch 5 is also and independent VFS clean-up
> > > 4. patches 6-9 actually make ext4 independent on the 'sync_supers()' thread.
> > Artem, if you look at places where ext4 sets s_dirt you will notice they
> > are rather rare events and all of them actually take care of writing
> > superblock themselves (at least if my memory serves well). So ext4
> > shouldn't need sync_supers() at all...
>
> Hmm, if there is journal, then ext4 does not initialize the
> '->write_super()' VFS call-back and indeed takes care of the SB itself.
> Indeed. So 'sync_supers()' wakes up every 5 seconds for nothing.
Yes.

> However, if there is _no_ journal, the 'write_super' is initialized, and
> in many places the 's_dirt' flag is set, and thus VFS services seem to
> be actively used.
Which many places are you speaking about? Grep shows 4 places with
sb->s_dirt = 1;
You remove two of those in your cleanups so only
__ext4_handle_dirty_super() remains. That is called from 3 (4 after your
cleanups) places and they happen so rarely (during filesystem resize or
when we start using some feature on the filesystem) that if you use
sync_buffer() from all of them, it should be fine.

Then we have ext4_mark_super_dirty() call from 4 places - I forgot about
these originally... I kind of miss their purpose. Originally they were used
so that we write total number of free blocks and inodes in the superblock
but when we do not maintain them in the journal mode I don't see a reason
to periodically sync them in no-journal mode. Ted, what is the purpose of
these calls?

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2012-03-22 11:23:09

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCH v1 0/9] do not use s_dirt in ext4

On Thu, 2012-03-22 at 11:33 +0100, Jan Kara wrote:
> > However, if there is _no_ journal, the 'write_super' is initialized, and
> > in many places the 's_dirt' flag is set, and thus VFS services seem to
> > be actively used.
> Which many places are you speaking about? Grep shows 4 places with
> sb->s_dirt = 1;

Well, with 'ext4_mark_super_dirty()' there are still 6 or something
places.

> You remove two of those in your cleanups so only
> __ext4_handle_dirty_super() remains. That is called from 3 (4 after your
> cleanups) places and they happen so rarely (during filesystem resize or
> when we start using some feature on the filesystem) that if you use
> sync_buffer() from all of them, it should be fine.

But AFAIKC, the whole '__ext4_handle_dirty_super()' also falls-back to
marking the superblock as dirty if the file-system has no journal for
some reasons, right?. But I do not really understand what
'ext4_handle_valid()' does. If I grep for 'ext4_handle_dirty_super()' -
there are many places places where it is used, and a few are obviously
for the superblocks.


--
Best Regards,
Artem Bityutskiy


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2012-03-22 13:35:24

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v1 0/9] do not use s_dirt in ext4

Hi Artem,

Just as a quick FYI, I tried applying your patch series on top of my
development tree, and ran into problems when I ran the regression test
(using xfstests). When I backed out your changes and reran, the tests
completed without any problems.

I'm rerunning the tests since the first failure looks like it might
not be related to your patch series (and yet it went away once I
backed out your patch). The second failure however looks definitely
related to your changes. It looks like you don't wait to make sure
the workqueue is flushed out before the file system gets unmounted,
and that can lead to a panic.

Since we're already in the 3.3 has already been released, I suspect
this patch series will probably need to wait until the next merge
window. We might be able to pull in some of the obviously safe
patches, however.

Regards,

- Ted


BEGIN TEST: Ext4 4k block Wed Mar 21 22:47:17 EDT 2012
Device: /dev/vdb
mke2fs options: -q
mount options: -o block_validity
000 - unknown test, ignored
FSTYP -- ext4
PLATFORM -- Linux/i686 candygram 3.3.0-rc2-00592-gc56a0b2
MKFS_OPTIONS -- -q /dev/vdc
MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity /dev/vdc /vdc
...

075 [ 808.872903]
[ 808.873567] ======================================================
[ 808.875933] [ INFO: possible circular locking dependency detected ]
[ 808.875933] 3.3.0-rc2-00592-gc56a0b2 #32 Not tainted
[ 808.875933] -------------------------------------------------------
[ 808.875933] fsx/13769 is trying to acquire lock:
[ 808.875933] (&sb->s_type->i_mutex_key#3){+.+.+.}, at: [<c028d900>] ext4_ext_punch_hole+0x2b8/0x382
[ 808.875933]
[ 808.875933] but task is already holding lock:
[ 808.875933] (jbd2_handle){+.+...}, at: [<c02a5995>] start_this_handle+0x4e4/0x51a
[ 808.875933]
[ 808.875933] which lock already depends on the new lock.
[ 808.875933]
[ 808.875933]
[ 808.875933] the existing dependency chain (in reverse order) is:
[ 808.875933]
[ 808.875933] -> #1 (jbd2_handle){+.+...}:
[ 808.875933] [<c019789d>] lock_acquire+0x99/0xbd
[ 808.875933] [<c02a59b7>] start_this_handle+0x506/0x51a
[ 808.875933] [<c02a5ba6>] jbd2__journal_start+0xae/0xda
[ 808.875933] [<c02a5be4>] jbd2_journal_start+0x12/0x14
[ 808.875933] [<c0284fb8>] ext4_journal_start_sb+0x11e/0x126
[ 808.875933] [<c0277661>] ext4_unlink+0x82/0x1e5
[ 808.875933] [<c02127e1>] vfs_unlink+0x61/0xaf
[ 808.875933] [<c02147b5>] do_unlinkat+0xa0/0x112
[ 808.875933] [<c0214946>] sys_unlinkat+0x30/0x37
[ 808.875933] [<c06d8c5d>] syscall_call+0x7/0xb
[ 808.875933]
[ 808.875933] -> #0 (&sb->s_type->i_mutex_key#3){+.+.+.}:
[ 808.875933] [<c0197598>] __lock_acquire+0x989/0xbf5
[ 808.875933] [<c019789d>] lock_acquire+0x99/0xbd
[ 808.875933] [<c06d65f4>] __mutex_lock_common+0x30/0x316
[ 808.875933] [<c06d6988>] mutex_lock_nested+0x26/0x2f
[ 808.875933] [<c028d900>] ext4_ext_punch_hole+0x2b8/0x382
[ 808.875933] [<c026e316>] ext4_punch_hole+0x5f/0x70
[ 808.875933] [<c028fbce>] ext4_fallocate+0x63/0x469
[ 808.875933] [<c0208974>] do_fallocate+0xe7/0x105
[ 808.875933] [<c02089c3>] sys_fallocate+0x31/0x46
[ 808.875933] [<c06d8c5d>] syscall_call+0x7/0xb
[ 808.875933]
[ 808.875933] other info that might help us debug this:
[ 808.875933]
[ 808.875933] Possible unsafe locking scenario:
[ 808.875933]
[ 808.875933] CPU0 CPU1
[ 808.875933] ---- ----
[ 808.875933] lock(jbd2_handle);
[ 808.875933] lock(&sb->s_type->i_mutex_key#3);
[ 808.875933] lock(jbd2_handle);
[ 808.875933] lock(&sb->s_type->i_mutex_key#3);
[ 808.875933]
[ 808.875933] *** DEADLOCK ***
[ 808.875933]
[ 808.875933] 1 lock held by fsx/13769:
[ 808.875933] #0: (jbd2_handle){+.+...}, at: [<c02a5995>] start_this_handle+0x4e4/0x51a
[ 808.875933]
[ 808.875933] stack backtrace:
[ 808.875933] Pid: 13769, comm: fsx Not tainted 3.3.0-rc2-00592-gc56a0b2 #32
[ 808.875933] Call Trace:
[ 808.875933] [<c01954fb>] print_circular_bug+0x194/0x1a1
[ 808.875933] [<c0197598>] __lock_acquire+0x989/0xbf5
[ 808.875933] [<c019789d>] lock_acquire+0x99/0xbd
[ 808.875933] [<c028d900>] ? ext4_ext_punch_hole+0x2b8/0x382
[ 808.875933] [<c06d65f4>] __mutex_lock_common+0x30/0x316
[ 808.875933] [<c028d900>] ? ext4_ext_punch_hole+0x2b8/0x382
[ 808.875933] [<c017d53a>] ? local_clock+0x3d/0x55
[ 808.875933] [<c01942de>] ? lock_release_holdtime+0x2b/0xcd
[ 808.875933] [<c028d8d9>] ? ext4_ext_punch_hole+0x291/0x382
[ 808.875933] [<c06d6988>] mutex_lock_nested+0x26/0x2f
[ 808.875933] [<c028d900>] ? ext4_ext_punch_hole+0x2b8/0x382
[ 808.875933] [<c028d900>] ext4_ext_punch_hole+0x2b8/0x382
[ 808.875933] [<c026e316>] ext4_punch_hole+0x5f/0x70
[ 808.875933] [<c028fbce>] ext4_fallocate+0x63/0x469
[ 808.875933] [<c017d4ed>] ? sched_clock_cpu+0x134/0x144
[ 808.875933] [<c023473e>] ? fsnotify+0x1e8/0x202
[ 808.875933] [<c01940d5>] ? trace_hardirqs_off+0xb/0xd
[ 808.875933] [<c017d53a>] ? local_clock+0x3d/0x55
[ 808.875933] [<c020a873>] ? fget+0x57/0x71
[ 808.875933] [<c0208974>] do_fallocate+0xe7/0x105
[ 808.875933] [<c02089c3>] sys_fallocate+0x31/0x46
[ 808.875933] [<c06d8c5d>] syscall_call+0x7/0xb
[ 808.875933] [<c06d0000>] ? init_intel+0x1aa/0x370

I also saw the following in nojournal mode:

BEGIN TEST: Ext4 4k block w/ no journal Wed Mar 21 23:26:48 EDT 2012
Device: /dev/vdb
mke2fs options: -q -O ^has_journal
mount options: -o block_validity,noload
000 - unknown test, ignored
[ 2409.975802] EXT4-fs (vdb): mounted filesystem with ordered data mode. Opts: (null)
FSTYP -- ext4
PLATFORM -- Linux/i686 candygram 3.3.0-rc2-00592-gc56a0b2
MKFS_OPTIONS -- -q -O ^has_journal /dev/vdc
MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity,noload /dev/vdc /vdc

[ 2410.270742] EXT4-fs (vdc): mounted filesystem without journal. Opts: acl,user_xattr,block_validity,noload
[ 2443.856170] EXT4-fs (vdb): mounted filesystem without journal. Opts: acl,user_xattr,block_validity,noload
001 6s ... 6s
[ 2454.241371] BUG: unable to handle kernel paging request at 0063e408
[ 2454.244611] IP: [<c016bdd4>] __queue_work+0x1f8/0x2a8
[ 2454.244611] *pdpt = 0000000001247001 *pde = 0000000000000000
[ 2454.244611] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 2454.244611] Modules linked in:
[ 2454.244611]
[ 2454.244611] Pid: 0, comm: swapper/0 Not tainted 3.3.0-rc2-00592-gc56a0b2 #32 Bochs Bochs
[ 2454.244611] EIP: 0060:[<c016bdd4>] EFLAGS: 00010046 CPU: 0
[ 2454.244611] EIP is at __queue_work+0x1f8/0x2a8
[ 2454.244611] EAX: ef022404 EBX: 0063e400 ECX: ef022400 EDX: c016bd5a
[ 2454.244611] ESI: f6c02a40 EDI: ca524d00 EBP: f680df68 ESP: f680df44
[ 2454.244611] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 2454.244611] Process swapper/0 (pid: 0, ti=f680c000 task=c09cd020 task.ti=c09c6000)
[ 2454.244611] Stack:
[ 2454.244611] 00000000 c01630bb 00000000 00000000 ef022400 00000246 ccc37e00 ef022474
[ 2454.244611] f680dfbc f680df74 c016beaa ef022428 f680dfd0 c016312f f680dfbc f680dfa4
[ 2454.244611] ef022400 c0c84774 c0c84574 c0c84374 c0c84174 c016be84 00000100 c0c83940
[ 2454.244611] Call Trace:
[ 2454.244611] [<c01630bb>] ? run_timer_softirq+0x11b/0x24a
[ 2454.244611] [<c016beaa>] delayed_work_timer_fn+0x26/0x29
[ 2454.244611] [<c016312f>] run_timer_softirq+0x18f/0x24a
[ 2454.244611] [<c016be84>] ? __queue_work+0x2a8/0x2a8
[ 2454.244611] [<c015cbb3>] __do_softirq+0xb1/0x17a
[ 2454.244611] [<c015cb02>] ? irq_enter+0x5e/0x5e
[ 2454.244611] <IRQ>
[ 2454.244611] [<c015c969>] ? irq_exit+0x43/0xa3
[ 2454.244611] [<c0146f4f>] ? smp_apic_timer_interrupt+0x71/0x7f
[ 2454.244611] [<c06d9107>] ? apic_timer_interrupt+0x2f/0x34
[ 2454.244611] [<c013783e>] ? default_idle+0x5c/0xa5
[ 2454.244611] [<c01b007b>] ? audit_log_exit+0x5d4/0xbff
[ 2454.244611] [<c014d1f3>] ? native_safe_halt+0x5/0x7
[ 2454.244611] [<c0137843>] ? default_idle+0x61/0xa5
[ 2454.244611] [<c0130eba>] ? cpu_idle+0x61/0x83
[ 2454.244611] [<c06bcdaa>] ? rest_init+0x92/0x97
[ 2454.244611] [<c0a467c5>] ? start_kernel+0x303/0x308
[ 2454.244611] [<c0a460a0>] ? i386_start_kernel+0xa0/0xa7
[ 2454.244611] Code: ff 74 16 8b 47 04 89 d9 ff 75 ec 8b 55 e8 ff 17 83 c7 08 83 3f 00 58 eb e8 8b 45 ec 8b 4d ec 83 c0 04 39
41 04 74 04 0f 0b eb fe <8b> 43 08 ff 44 83 10 8b 4b 08 8b 43 4c c1 e1 04 3b 43 50 7d 75
[ 2454.244611] EIP: [<c016bdd4>] __queue_work+0x1f8/0x2a8 SS:ESP 0068:f680df44
[ 2454.244611] CR2: 000000000063e408
[ 2454.244611] ---[ end trace 59443e013535cdda ]---
[ 2454.244611] Kernel panic - not syncing: Fatal exception in interrupt
[ 2455.000233] BUG: spinlock lockup on CPU#0, swapper/0/0
[ 2455.000233] lock: f6c02a40, .magic: dead4ead, .owner: swapper/0/0, .owner_cpu: 0
[ 2455.000233] Pid: 0, comm: swapper/0 Tainted: G D 3.3.0-rc2-00592-gc56a0b2 #32
[ 2455.000233] Call Trace:
[ 2455.000233] [<c038eb17>] spin_dump+0x70/0x7a
[ 2455.000233] [<c038ec89>] do_raw_spin_lock+0xd5/0xf3
[ 2455.000233] [<c016bd5a>] ? __queue_work+0x17e/0x2a8
[ 2455.000233] [<c06d8520>] _raw_spin_lock_irqsave+0x56/0x6a
[ 2455.000233] [<c016bd5a>] __queue_work+0x17e/0x2a8
[ 2455.000233] [<c01940d5>] ? trace_hardirqs_off+0xb/0xd
[ 2455.000233] [<c016bec9>] queue_work_on+0x1c/0x24
[ 2455.000233] [<c016bfe2>] queue_work+0x1a/0x1d
[ 2455.000233] [<c036d2e5>] kblockd_schedule_work+0x12/0x14
[ 2455.000233] [<c037e175>] cfq_schedule_dispatch+0x35/0x3a
[ 2455.000233] [<c037f300>] cfq_completed_request+0x44f/0x46c
[ 2455.000233] [<c036d556>] ? blk_add_request_payload+0x52/0x52
[ 2455.000233] [<c0369ab2>] elv_completed_request+0x46/0x49
[ 2455.000233] [<c036dc26>] __blk_put_request+0x25/0x91
[ 2455.000233] [<c036de51>] blk_finish_request+0x1bf/0x1c7
[ 2455.000233] [<c036e9b8>] __blk_end_bidi_request+0x28/0x31
[ 2455.000233] [<c036e9e2>] __blk_end_request_all+0x21/0x2f
[ 2455.000233] [<c0476c58>] blk_done+0x80/0xd4
[ 2455.000233] [<c03f0aed>] vring_interrupt+0x6a/0x76
[ 2455.000233] [<c01b360b>] handle_irq_event_percpu+0x4e/0x167
[ 2455.000233] [<c01b3755>] handle_irq_event+0x31/0x48
[ 2455.000233] [<c01b5590>] ? handle_percpu_irq+0x40/0x40
[ 2455.000233] [<c01b561f>] handle_edge_irq+0x8f/0xb1
[ 2455.000233] <IRQ> [<c0132530>] ? do_IRQ+0x3c/0x95
[ 2455.000233] [<c06de82e>] ? common_interrupt+0x2e/0x34
[ 2455.000233] [<c06d8b2a>] ? _raw_spin_unlock_irq+0x27/0x30
[ 2455.000233] [<c06d8b2c>] ? _raw_spin_unlock_irq+0x29/0x30
[ 2455.000233] [<c06d76bb>] ? __schedule+0x53f/0x56f
[ 2455.000233] [<c0157eaa>] ? console_unlock+0x188/0x1de
[ 2455.000233] [<c01a1dec>] ? crash_kexec+0x1a/0xac
[ 2455.000233] [<c0195541>] ? print_lock_contention_bug+0x11/0xc1
[ 2455.000233] [<c06d777d>] ? _cond_resched+0x30/0x49
[ 2455.000233] [<c02053cb>] ? slab_pre_alloc_hook+0x1d/0x22
[ 2455.000233] [<c0206a70>] ? kmem_cache_alloc_trace+0x25/0xb6
[ 2455.000233] [<c0133a7f>] ? register_nmi_handler+0x3f/0xf4
[ 2455.000233] [<c01497e7>] ? default_get_apic_id+0x17/0x37
[ 2455.000233] [<c0145c2b>] ? native_nmi_stop_other_cpus+0xdc/0xdc
[ 2455.000233] [<c0133a7f>] ? register_nmi_handler+0x3f/0xf4
[ 2455.000233] [<c0145ba7>] ? native_nmi_stop_other_cpus+0x58/0xdc
[ 2455.000233] [<c06d60ce>] ? panic+0x8d/0x175
[ 2455.000233] [<c06d99db>] ? oops_end+0x97/0xa6
[ 2455.000233] [<c014e2e7>] ? no_context+0x18b/0x195
[ 2455.000233] [<c014e3ef>] ? __bad_area_nosemaphore+0xfe/0x108
[ 2455.000233] [<c06db4fe>] ? spurious_fault+0x104/0x104
[ 2455.000233] [<c014e40b>] ? bad_area_nosemaphore+0x12/0x15
[ 2455.000233] [<c06db6b8>] ? do_page_fault+0x1ba/0x36a
[ 2455.000233] [<c06db4fe>] ? spurious_fault+0x104/0x104
[ 2455.000233] [<c06d9317>] ? error_code+0x5f/0x64
[ 2455.000233] [<c016bd5a>] ? __queue_work+0x17e/0x2a8
[ 2455.000233] [<c06d00d8>] ? init_intel+0x282/0x370
[ 2455.000233] [<c06db4fe>] ? spurious_fault+0x104/0x104
[ 2455.000233] [<c016bdd4>] ? __queue_work+0x1f8/0x2a8
[ 2455.000233] [<c01630bb>] ? run_timer_softirq+0x11b/0x24a
[ 2455.000233] [<c016beaa>] ? delayed_work_timer_fn+0x26/0x29
[ 2455.000233] [<c016312f>] ? run_timer_softirq+0x18f/0x24a
[ 2455.000233] [<c016be84>] ? __queue_work+0x2a8/0x2a8
[ 2455.000233] [<c015cbb3>] ? __do_softirq+0xb1/0x17a
[ 2455.000233] [<c015cb02>] ? irq_enter+0x5e/0x5e
[ 2455.000233] <IRQ> [<c015c969>] ? irq_exit+0x43/0xa3
[ 2455.000233] [<c0146f4f>] ? smp_apic_timer_interrupt+0x71/0x7f
[ 2455.000233] [<c06d9107>] ? apic_timer_interrupt+0x2f/0x34
[ 2455.000233] [<c013783e>] ? default_idle+0x5c/0xa5
[ 2455.000233] [<c01b007b>] ? audit_log_exit+0x5d4/0xbff
[ 2455.000233] [<c014d1f3>] ? native_safe_halt+0x5/0x7
[ 2455.000233] [<c0137843>] ? default_idle+0x61/0xa5
[ 2455.000233] [<c0130eba>] ? cpu_idle+0x61/0x83
[ 2455.000233] [<c06bcdaa>] ? rest_init+0x92/0x97
[ 2455.000233] [<c0a467c5>] ? start_kernel+0x303/0x308
[ 2455.000233] [<c0a460a0>] ? i386_start_kernel+0xa0/0xa7

On Tue, Mar 20, 2012 at 04:41:20PM +0200, Artem Bityutskiy wrote:
> This patch-set makes ext4 independent of the VFS superblock management
> services. Namely, ext4 does not require to register the 'write_super()' VFS
> call-back.
>
> The reason of this exercises is to get rid of the 'sync_supers()' kernel thread
> which wakes up every 5 seconds (by default) even if all superblocks are clean.
> This is wasteful from power management POW (unnecessary wake-ups).
>
> Note, I tried to optimize 'sync_supers()' instead in 2010, but Al wanted me
> to get rid of it instead. See https://lkml.org/lkml/2010/6/6/87
> And I think this is right because many file-systems do not need this, for
> example btrfs does not use VFS superblock management services at all, so on a
> btrfs-based system we currently end-up useless periodic wake-ups source.
>
> Changes for other file-systems are coming later.
>
> The patch-set structure.
> 1. patches 1,2,3 are independent ext4 cleanups and I ask Ted to merge them as
> soon/long as they are OK. I sent them also independently in order to get
> early comments, but did not get so far, so re-sending.
> 2. patch 4 exports 'dirty_writeback_interval' and it would be very useful to
> have it merged ASAP to simplify further work
> 3. patch 5 is also and independent VFS clean-up
> 4. patches 6-9 actually make ext4 independent on the 'sync_supers()' thread.
>
> Thanks,
> Artem.

2012-03-22 13:42:56

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v1 0/9] do not use s_dirt in ext4

On Thu 22-03-12 13:25:37, Artem Bityutskiy wrote:
> On Thu, 2012-03-22 at 11:33 +0100, Jan Kara wrote:
> > > However, if there is _no_ journal, the 'write_super' is initialized, and
> > > in many places the 's_dirt' flag is set, and thus VFS services seem to
> > > be actively used.
> > Which many places are you speaking about? Grep shows 4 places with
> > sb->s_dirt = 1;
>
> Well, with 'ext4_mark_super_dirty()' there are still 6 or something
> places.
>
> > You remove two of those in your cleanups so only
> > __ext4_handle_dirty_super() remains. That is called from 3 (4 after your
> > cleanups) places and they happen so rarely (during filesystem resize or
> > when we start using some feature on the filesystem) that if you use
> > sync_buffer() from all of them, it should be fine.
>
> But AFAIKC, the whole '__ext4_handle_dirty_super()' also falls-back to
> marking the superblock as dirty if the file-system has no journal for
> some reasons, right?
Yes. And I wrote that if you do sync_buffer(EXT4_SB(sb)->s_sbh) instead
of marking superblock dirty, it would be fine.

> But I do not really understand what
> 'ext4_handle_valid()' does. If I grep for 'ext4_handle_dirty_super()' -
> there are many places places where it is used, and a few are obviously
> for the superblocks.
ext4_handle_valid() is false if and only if ext4 is in no-journal mode.
If I grep for ext4_handle_dirty_super() I see:
jack@quack:~/source/linux-fs/fs/ext4> grep "ext4_handle_dirty_super" *.c
ext4_jbd2.c:int __ext4_handle_dirty_super(const char *where, unsigned int line,
resize.c: err = ext4_handle_dirty_super(handle, sb);
resize.c: ext4_handle_dirty_super(handle, sb);
xattr.c: ext4_handle_dirty_super(handle, sb);

So not really many cases...
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2012-03-22 13:53:34

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCH v1 0/9] do not use s_dirt in ext4

On Thu, 2012-03-22 at 09:35 -0400, Ted Ts'o wrote:
> Just as a quick FYI, I tried applying your patch series on top of my
> development tree, and ran into problems when I ran the regression test
> (using xfstests). When I backed out your changes and reran, the tests
> completed without any problems.

Thanks Ted, I'll take a look at xfstests and run them next time before
sending out v2.

> I'm rerunning the tests since the first failure looks like it might
> not be related to your patch series (and yet it went away once I
> backed out your patch). The second failure however looks definitely
> related to your changes. It looks like you don't wait to make sure
> the workqueue is flushed out before the file system gets unmounted,
> and that can lead to a panic.

Hmm, I thought the whole DIO workqueue would be flushed so I do not have
to do anything. I'll take a look.

> Since we're already in the 3.3 has already been released, I suspect
> this patch series will probably need to wait until the next merge
> window. We might be able to pull in some of the obviously safe
> patches, however.

Sure, that's fine.

But I wonder, since this is cross-FS story, where I need to first do
small VFS change (export the variable), then change all file-systems,
and then remove whole 's_dirt'/'write_supers()' stuff from VFS, how this
would be handled?

IMO, the best way would be to make everything go in via one single tree,
granted I could get all the acks, do you feel like ext4 tree could be
the one?

Also, I am working on top of vanilla 3.3, do you prefer me to work with
your tree instead? I guess this tree:

git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git

but which branch?

--
Best Regards,
Artem Bityutskiy


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2012-03-22 13:56:41

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCH v1 0/9] do not use s_dirt in ext4

On Thu, 2012-03-22 at 14:42 +0100, Jan Kara wrote:
> > But AFAIKC, the whole '__ext4_handle_dirty_super()' also falls-back to
> > marking the superblock as dirty if the file-system has no journal for
> > some reasons, right?
> Yes. And I wrote that if you do sync_buffer(EXT4_SB(sb)->s_sbh) instead
> of marking superblock dirty, it would be fine.

OK, sorry for not reading carefully, I will take a look at this and all
the places where we use 's_dirt' and try to think how to just eliminate
them. But since I have 0 experience with ext4, I'll probably need some
help, but let's see.

Thanks for reviewing and the feed-back!

--
Best Regards,
Artem Bityutskiy


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2012-03-22 15:06:02

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v1 0/9] do not use s_dirt in ext4

On Thu, Mar 22, 2012 at 03:56:25PM +0200, Artem Bityutskiy wrote:
>
> But I wonder, since this is cross-FS story, where I need to first do
> small VFS change (export the variable), then change all file-systems,
> and then remove whole 's_dirt'/'write_supers()' stuff from VFS, how this
> would be handled?

What I'm thinking about doing is pulling the first five patches into
my tree, which includes the VFS change to export the
dirty_writeback_interval variable. Those patches should be very low
risk, and acceptable at this point (although I'm still going to test
them really hard).

Assuming there are no objections with this, then you'd be able to send
the rest of the patches for each file system to get rid of the super
writeback thread separately; those really should go via each file
system maintainer's separate trees, though. Ext2 and ext3 changes I'm
willing to carry in the ext4 tree (although Jan does have his own ext3
tree). However, it would really be inappropriate for me to carry
patches against jfs, xfs, or other non-ext 2/3/4 file systems in my
tree. It will go much faster if you negotiate with each of the file
system maintainers/development teams directly, instead of trying to
use my tree and me as a middleman.

The one thing that will require some coordination is the patch to
completely remove the super writeback thread altogether, which
obviously can't be applied until we know all of the file systems have
removed their dependency on s_dirt. That's a pretty trivial patch
that could go in late in the next merge window, or even in -rc2 (with
prior warning given to Linus), once you're sure all of the file
systems have gotten their s_dirt removal patches merged in.

Does that seem reasonable to you?

Best regards,

- Ted

2012-03-23 08:52:23

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCH v1 0/9] do not use s_dirt in ext4

On Thu, 2012-03-22 at 11:06 -0400, Ted Ts'o wrote:
> Does that seem reasonable to you?

Yes, thank you.

One note, just FYI, xfs and jfs do not use s_dirt. Here are the FSes
which make use of it:

fs/affs
fs/exofs
fs/ext4
fs/fat
fs/hfs
fs/hfsplus
fs/jffs2
fs/reiserfs
fs/sysv
fs/udf
fs/udf
fs/ufs

(mostly old citizens).

--
Best Regards,
Artem Bityutskiy


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2012-03-23 14:23:22

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v1 0/9] do not use s_dirt in ext4

On Fri, Mar 23, 2012 at 10:55:15AM +0200, Artem Bityutskiy wrote:
> On Thu, 2012-03-22 at 11:06 -0400, Ted Ts'o wrote:
> > Does that seem reasonable to you?
>
> Yes, thank you.

Ok, i've included the first five patches in your patch series in the
ext4 tree.

We can deal with fixing up the rest of the patch series after the
merge window.

- Ted

2012-03-27 13:26:58

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCH v1 0/9] do not use s_dirt in ext4

On Thu, 2012-03-22 at 11:33 +0100, Jan Kara wrote:
> Then we have ext4_mark_super_dirty() call from 4 places - I forgot about
> these originally... I kind of miss their purpose. Originally they were used
> so that we write total number of free blocks and inodes in the superblock
> but when we do not maintain them in the journal mode I don't see a reason
> to periodically sync them in no-journal mode. Ted, what is the purpose of
> these calls?

I do not understand what's the fundamental difference between journal
and non-journal mode. Why when we do have the journal we do not mark the
super-block as dirty in many places (e.g., in 'ext4_file_open()' - if we
do have the journal, when do we make sure we save the mount point path
change?).

May be it has something to do with behaving like the ext2 driver when
mounting ext2-formatted media with the the ext4 driver?

Jan, since Ted did not answer, may be you can figure out the reasons
from this commit message, which actually introduced the
'ext4_mark_super_dirty()' function?


commit a0375156ca1041574b5d47cc7e32f10b891151b0
Author: Theodore Ts'o <[email protected]>
Date: Fri Jun 11 23:14:04 2010 -0400

ext4: Clean up s_dirt handling

We don't need to set s_dirt in most of the ext4 code when journaling
is enabled. In ext3/4 some of the summary statistics for # of free
inodes, blocks, and directories are calculated from the per-block
group statistics when the file system is mounted or unmounted. As a
result the superblock doesn't have to be updated, either via the
journal or by setting s_dirt. There are a few exceptions, most
notably when resizing the file system, where the superblock needs to
be modified --- and in that case it should be done as a journalled
operation if possible, and s_dirt set only in no-journal mode.

This patch will optimize out some unneeded disk writes when using ext4
with a journal.

Signed-off-by: "Theodore Ts'o" <[email protected]>

--
Best Regards,
Artem Bityutskiy


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2012-03-27 20:14:01

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v1 0/9] do not use s_dirt in ext4

On Tue 27-03-12 16:29:58, Artem Bityutskiy wrote:
> On Thu, 2012-03-22 at 11:33 +0100, Jan Kara wrote:
> > Then we have ext4_mark_super_dirty() call from 4 places - I forgot about
> > these originally... I kind of miss their purpose. Originally they were used
> > so that we write total number of free blocks and inodes in the superblock
> > but when we do not maintain them in the journal mode I don't see a reason
> > to periodically sync them in no-journal mode. Ted, what is the purpose of
> > these calls?
>
> I do not understand what's the fundamental difference between journal
> and non-journal mode. Why when we do have the journal we do not mark the
> super-block as dirty in many places (e.g., in 'ext4_file_open()' - if we
> do have the journal, when do we make sure we save the mount point path
> change?).
We write it at least during ext4_put_super().

> May be it has something to do with behaving like the ext2 driver when
> mounting ext2-formatted media with the the ext4 driver?
I'm not really sure about this...

> Jan, since Ted did not answer, may be you can figure out the reasons
> from this commit message, which actually introduced the
> 'ext4_mark_super_dirty()' function?
Anyway, attached are two patches which you can include in your series
and which should make your cleanups simpler.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR


Attachments:
(No filename) (1.35 kB)
0001-ext4-Remove-useless-marking-of-superblock-dirty.patch (1.74 kB)
0002-ext4-Convert-last-user-of-ext4_mark_super_dirty-to-e.patch (1.80 kB)
Download all attachments

2012-03-28 08:42:18

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCH v1 0/9] do not use s_dirt in ext4

Thanks for the patches.

On Tue, 2012-03-27 at 22:14 +0200, Jan Kara wrote:
> @@ -181,9 +181,22 @@ static int ext4_file_open(struct inode * inode, struct file * filp)
> path.dentry = mnt->mnt_root;
> cp = d_path(&path, buf, sizeof(buf));
> if (!IS_ERR(cp)) {
> + handle_t *handle;
> + int err;
> +
> + handle = ext4_journal_start_sb(sb, 1);
> + if (IS_ERR(handle))
> + return PTR_ERR(handle);
> + err = ext4_journal_get_write_access(handle,
> + EXT4_SB(sb)->s_sbh);

Why do we need to bother with journal in this case - AFAIU, we update a
single field, not critical, and we do not really need the journal for
this - we can just call 'mark_buffer_dirty(sbi->s_sbh)' and let the SB
be written out directly and asynchronously.

--
Best Regards,
Artem Bityutskiy


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2012-03-28 10:16:05

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v1 0/9] do not use s_dirt in ext4

On Wed 28-03-12 11:44:42, Artem Bityutskiy wrote:
> On Tue, 2012-03-27 at 22:14 +0200, Jan Kara wrote:
> > @@ -181,9 +181,22 @@ static int ext4_file_open(struct inode * inode, struct file * filp)
> > path.dentry = mnt->mnt_root;
> > cp = d_path(&path, buf, sizeof(buf));
> > if (!IS_ERR(cp)) {
> > + handle_t *handle;
> > + int err;
> > +
> > + handle = ext4_journal_start_sb(sb, 1);
> > + if (IS_ERR(handle))
> > + return PTR_ERR(handle);
> > + err = ext4_journal_get_write_access(handle,
> > + EXT4_SB(sb)->s_sbh);
>
> Why do we need to bother with journal in this case - AFAIU, we update a
> single field, not critical, and we do not really need the journal for
> this - we can just call 'mark_buffer_dirty(sbi->s_sbh)' and let the SB
> be written out directly and asynchronously.
Well, except that if someone is trying to modify the superblock via the
journal in parallel, the journalling layer will loudly complain about
buffer becoming dirty under it's hands... So there are only two ways you
can reasonably do this in ext3/ext4 - just modify the data (which is a bit
ugly but works), or do full journalling.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2012-03-30 15:20:49

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCH v1 0/9] do not use s_dirt in ext4

On Tue, 2012-03-27 at 22:14 +0200, Jan Kara wrote:
> On Tue 27-03-12 16:29:58, Artem Bityutskiy wrote:
> > On Thu, 2012-03-22 at 11:33 +0100, Jan Kara wrote:
> > > Then we have ext4_mark_super_dirty() call from 4 places - I forgot about
> > > these originally... I kind of miss their purpose. Originally they were used
> > > so that we write total number of free blocks and inodes in the superblock
> > > but when we do not maintain them in the journal mode I don't see a reason
> > > to periodically sync them in no-journal mode. Ted, what is the purpose of
> > > these calls?
> >
> > I do not understand what's the fundamental difference between journal
> > and non-journal mode. Why when we do have the journal we do not mark the
> > super-block as dirty in many places (e.g., in 'ext4_file_open()' - if we
> > do have the journal, when do we make sure we save the mount point path
> > change?).
> We write it at least during ext4_put_super().
>
> > May be it has something to do with behaving like the ext2 driver when
> > mounting ext2-formatted media with the the ext4 driver?
> I'm not really sure about this...
>
> > Jan, since Ted did not answer, may be you can figure out the reasons
> > from this commit message, which actually introduced the
> > 'ext4_mark_super_dirty()' function?
> Anyway, attached are two patches which you can include in your series
> and which should make your cleanups simpler.

I amended the second patch:
- ext4_journal_stop(sb);
+ ext4_journal_stop(sbi->s_sbh);

Extensive testing with xfstests found a problem with these patches:

[23500.238579] ------------[ cut here ]------------
[23500.238720] kernel BUG at fs/buffer.c:2871!
[23500.238842] invalid opcode: 0000 [#1] SMP
[23500.239279] CPU 11
[23500.239338] Modules linked in: [last unloaded: scsi_wait_scan]
[23500.239442]
[23500.239442] Pid: 15799, comm: fsstress Not tainted 3.3.0+ #43 Bochs Bochs
[23500.239442] RIP: 0010:[<ffffffff811a9add>] [<ffffffff811a9add>] submit_bh+0x10d/0x120
[23500.239442] RSP: 0018:ffff880273a41b58 EFLAGS: 00010202
[23500.239442] RAX: 000000000004d025 RBX: ffff8802469e5a90 RCX: 0000000000000000
[23500.239442] RDX: ffff880273a41fd8 RSI: ffff8802469e5a90 RDI: 0000000000000211
[23500.239442] RBP: ffff880273a41b78 R08: ffff880409645d80 R09: 0000000000000001
[23500.239442] R10: 0000000000000001 R11: ffff880178439000 R12: 0000000000000211
[23500.239442] R13: ffff880273a41c34 R14: ffff8804059b7000 R15: ffff880273a41fd8
[23500.239442] FS: 00007fc8731e7700(0000) GS:ffff88041fd60000(0000) knlGS:0000000000000000
[23500.239442] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[23500.239442] CR2: 00007fc873082008 CR3: 000000012456a000 CR4: 00000000000006e0
[23500.239442] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[23500.239442] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[23500.239442] Process fsstress (pid: 15799, threadinfo ffff880273a40000, task ffff880150dd8000)
[23500.239442] Stack:
[23500.239442] ffff8804059b7000 ffff8802469e5a90 0000000000000211 ffff880273a41c34
[23500.239442] ffff880273a41b98 ffffffff811ab59d 0000000000000002 ffff8804059b7128
[23500.239442] ffff880273a41bf8 ffffffff8123be1d 0000000091827364 ffff88010e9a7e30
[23500.239442] Call Trace:
[23500.239442] [<ffffffff811ab59d>] write_dirty_buffer+0x4d/0x80
[23500.239442] [<ffffffff8123be1d>] __flush_batch+0x4d/0xa0
[23500.239442] [<ffffffff8123c605>] jbd2_log_do_checkpoint+0xf5/0x4f0
[23500.239442] [<ffffffff8123ca89>] __jbd2_log_wait_for_space+0x89/0x190
[23500.239442] [<ffffffff81237a98>] start_this_handle+0x3a8/0x4e0
[23500.239442] [<ffffffff810799e0>] ? remove_wait_queue+0x50/0x50
[23500.239442] [<ffffffff81237c93>] jbd2__journal_start+0xc3/0x100
[23500.239442] [<ffffffff81237ce3>] jbd2_journal_start+0x13/0x20
[23500.239442] [<ffffffff8121743f>] ext4_journal_start_sb+0x7f/0x1d0
[23500.239442] [<ffffffff81224a24>] ? ext4_fallocate+0x1a4/0x530
[23500.239442] [<ffffffff811f64c5>] ? ext4_meta_trans_blocks+0xa5/0xb0
[23500.239442] [<ffffffff81224a24>] ext4_fallocate+0x1a4/0x530
[23500.239442] [<ffffffff8117a092>] do_fallocate+0xf2/0x160
[23500.239442] [<ffffffff8117a14b>] sys_fallocate+0x4b/0x70
[23500.239442] [<ffffffff815e6d69>] system_call_fastpath+0x16/0x1b
[23500.239442] Code: ee 44 89 e7 e8 35 1f 0f 00 49 8b 5d 18 4c 89 ef e8 19 4e 00 00 48 83 c4 08 c1 e3 18 c1 fb 1f 83 e3 a1 89 d8 5b 41 5c 41 5d 5d c3 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 66 0f 1f 84 00 00 00 00 00 55 48
[23500.239442] RIP [<ffffffff811a9add>] submit_bh+0x10d/0x120
[23500.239442] RSP <ffff880273a41b58>
[23500.261657] ---[ end trace 3db7a7a7ae953551 ]---
[23500.262131] ------------[ cut here ]------------
[23500.262577] WARNING: at kernel/exit.c:897 do_exit+0x55/0x870()
[23500.263070] Hardware name: Bochs
[23500.263467] Modules linked in: [last unloaded: scsi_wait_scan]
[23500.264129] Pid: 15799, comm: fsstress Tainted: G D 3.3.0+ #43
[23500.264623] Call Trace:
[23500.265066] [<ffffffff810572bf>] warn_slowpath_common+0x7f/0xc0
[23500.265542] [<ffffffff8105731a>] warn_slowpath_null+0x1a/0x20
[23500.266034] [<ffffffff8105b1c5>] do_exit+0x55/0x870
[23500.266482] [<ffffffff81058f6c>] ? kmsg_dump+0x5c/0xf0
[23500.266932] [<ffffffff815dfeac>] oops_end+0xac/0xf0
[23500.267418] [<ffffffff81017928>] die+0x58/0x90
[23500.267852] [<ffffffff815df7a4>] do_trap+0xc4/0x170
[23500.268343] [<ffffffff81014fa5>] do_invalid_op+0x95/0xb0
[23500.268803] [<ffffffff811a9add>] ? submit_bh+0x10d/0x120
[23500.269344] [<ffffffff81299db4>] ? drive_stat_acct+0x114/0x190
[23500.269825] [<ffffffff8129ec36>] ? blk_queue_bio+0x106/0x400
[23500.270324] [<ffffffff815e7f9b>] invalid_op+0x1b/0x20
[23500.270778] [<ffffffff811a9add>] ? submit_bh+0x10d/0x120
[23500.271263] [<ffffffff811a9ac7>] ? submit_bh+0xf7/0x120
[23500.271723] [<ffffffff811ab59d>] write_dirty_buffer+0x4d/0x80
[23500.272217] [<ffffffff8123be1d>] __flush_batch+0x4d/0xa0
[23500.272675] [<ffffffff8123c605>] jbd2_log_do_checkpoint+0xf5/0x4f0
[23500.273226] [<ffffffff8123ca89>] __jbd2_log_wait_for_space+0x89/0x190
[23500.273714] [<ffffffff81237a98>] start_this_handle+0x3a8/0x4e0
[23500.274213] [<ffffffff810799e0>] ? remove_wait_queue+0x50/0x50
[23500.274686] [<ffffffff81237c93>] jbd2__journal_start+0xc3/0x100
[23500.275193] [<ffffffff81237ce3>] jbd2_journal_start+0x13/0x20
[23500.275664] [<ffffffff8121743f>] ext4_journal_start_sb+0x7f/0x1d0
[23500.276166] [<ffffffff81224a24>] ? ext4_fallocate+0x1a4/0x530
[23500.276638] [<ffffffff811f64c5>] ? ext4_meta_trans_blocks+0xa5/0xb0
[23500.277186] [<ffffffff81224a24>] ext4_fallocate+0x1a4/0x530
[23500.277654] [<ffffffff8117a092>] do_fallocate+0xf2/0x160
[23500.278164] [<ffffffff8117a14b>] sys_fallocate+0x4b/0x70
[23500.278624] [<ffffffff815e6d69>] system_call_fastpath+0x16/0x1b
[23500.279124] ---[ end trace 3db7a7a7ae953552 ]---

--
Best Regards,
Artem Bityutskiy


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2012-03-30 15:35:30

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v1 0/9] do not use s_dirt in ext4

On Fri 30-03-12 18:23:55, Artem Bityutskiy wrote:
> On Tue, 2012-03-27 at 22:14 +0200, Jan Kara wrote:
> > On Tue 27-03-12 16:29:58, Artem Bityutskiy wrote:
> > > On Thu, 2012-03-22 at 11:33 +0100, Jan Kara wrote:
> > > > Then we have ext4_mark_super_dirty() call from 4 places - I forgot about
> > > > these originally... I kind of miss their purpose. Originally they were used
> > > > so that we write total number of free blocks and inodes in the superblock
> > > > but when we do not maintain them in the journal mode I don't see a reason
> > > > to periodically sync them in no-journal mode. Ted, what is the purpose of
> > > > these calls?
> > >
> > > I do not understand what's the fundamental difference between journal
> > > and non-journal mode. Why when we do have the journal we do not mark the
> > > super-block as dirty in many places (e.g., in 'ext4_file_open()' - if we
> > > do have the journal, when do we make sure we save the mount point path
> > > change?).
> > We write it at least during ext4_put_super().
> >
> > > May be it has something to do with behaving like the ext2 driver when
> > > mounting ext2-formatted media with the the ext4 driver?
> > I'm not really sure about this...
> >
> > > Jan, since Ted did not answer, may be you can figure out the reasons
> > > from this commit message, which actually introduced the
> > > 'ext4_mark_super_dirty()' function?
> > Anyway, attached are two patches which you can include in your series
> > and which should make your cleanups simpler.
>
> I amended the second patch:
> - ext4_journal_stop(sb);
> + ext4_journal_stop(sbi->s_sbh);
It should have been:
ext4_journal_stop(handle);

> Extensive testing with xfstests found a problem with these patches:
>
> [23500.238579] ------------[ cut here ]------------
> [23500.238720] kernel BUG at fs/buffer.c:2871!
> [23500.238842] invalid opcode: 0000 [#1] SMP
> [23500.239279] CPU 11
> [23500.239338] Modules linked in: [last unloaded: scsi_wait_scan]
> [23500.239442]
> [23500.239442] Pid: 15799, comm: fsstress Not tainted 3.3.0+ #43 Bochs Bochs
> [23500.239442] RIP: 0010:[<ffffffff811a9add>] [<ffffffff811a9add>] submit_bh+0x10d/0x120
> [23500.239442] RSP: 0018:ffff880273a41b58 EFLAGS: 00010202
> [23500.239442] RAX: 000000000004d025 RBX: ffff8802469e5a90 RCX: 0000000000000000
> [23500.239442] RDX: ffff880273a41fd8 RSI: ffff8802469e5a90 RDI: 0000000000000211
> [23500.239442] RBP: ffff880273a41b78 R08: ffff880409645d80 R09: 0000000000000001
> [23500.239442] R10: 0000000000000001 R11: ffff880178439000 R12: 0000000000000211
> [23500.239442] R13: ffff880273a41c34 R14: ffff8804059b7000 R15: ffff880273a41fd8
> [23500.239442] FS: 00007fc8731e7700(0000) GS:ffff88041fd60000(0000) knlGS:0000000000000000
> [23500.239442] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [23500.239442] CR2: 00007fc873082008 CR3: 000000012456a000 CR4: 00000000000006e0
> [23500.239442] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [23500.239442] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [23500.239442] Process fsstress (pid: 15799, threadinfo ffff880273a40000, task ffff880150dd8000)
> [23500.239442] Stack:
> [23500.239442] ffff8804059b7000 ffff8802469e5a90 0000000000000211 ffff880273a41c34
> [23500.239442] ffff880273a41b98 ffffffff811ab59d 0000000000000002 ffff8804059b7128
> [23500.239442] ffff880273a41bf8 ffffffff8123be1d 0000000091827364 ffff88010e9a7e30
> [23500.239442] Call Trace:
> [23500.239442] [<ffffffff811ab59d>] write_dirty_buffer+0x4d/0x80
> [23500.239442] [<ffffffff8123be1d>] __flush_batch+0x4d/0xa0
> [23500.239442] [<ffffffff8123c605>] jbd2_log_do_checkpoint+0xf5/0x4f0
> [23500.239442] [<ffffffff8123ca89>] __jbd2_log_wait_for_space+0x89/0x190
> [23500.239442] [<ffffffff81237a98>] start_this_handle+0x3a8/0x4e0
> [23500.239442] [<ffffffff810799e0>] ? remove_wait_queue+0x50/0x50
> [23500.239442] [<ffffffff81237c93>] jbd2__journal_start+0xc3/0x100
> [23500.239442] [<ffffffff81237ce3>] jbd2_journal_start+0x13/0x20
> [23500.239442] [<ffffffff8121743f>] ext4_journal_start_sb+0x7f/0x1d0
> [23500.239442] [<ffffffff81224a24>] ? ext4_fallocate+0x1a4/0x530
> [23500.239442] [<ffffffff811f64c5>] ? ext4_meta_trans_blocks+0xa5/0xb0
> [23500.239442] [<ffffffff81224a24>] ext4_fallocate+0x1a4/0x530
> [23500.239442] [<ffffffff8117a092>] do_fallocate+0xf2/0x160
> [23500.239442] [<ffffffff8117a14b>] sys_fallocate+0x4b/0x70
> [23500.239442] [<ffffffff815e6d69>] system_call_fastpath+0x16/0x1b
> [23500.239442] Code: ee 44 89 e7 e8 35 1f 0f 00 49 8b 5d 18 4c 89 ef e8 19 4e 00 00 48 83 c4 08 c1 e3 18 c1 fb 1f 83 e3 a1 89 d8 5b 41 5c 41 5d 5d c3 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 66 0f 1f 84 00 00 00 00 00 55 48
> [23500.239442] RIP [<ffffffff811a9add>] submit_bh+0x10d/0x120
> [23500.239442] RSP <ffff880273a41b58>
> [23500.261657] ---[ end trace 3db7a7a7ae953551 ]---
Hmm, looks like we tried to checkpoint BH_Unwritten buffer. That looks
like a bug in fallocate() support. Not really related but definitely worth
reporting.

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR

2012-03-30 15:40:34

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCH v1 0/9] do not use s_dirt in ext4

On Fri, 2012-03-30 at 17:35 +0200, Jan Kara wrote:
> > [23500.239442] [<ffffffff811ab59d>] write_dirty_buffer+0x4d/0x80
> > [23500.239442] [<ffffffff8123be1d>] __flush_batch+0x4d/0xa0
> > [23500.239442] [<ffffffff8123c605>] jbd2_log_do_checkpoint+0xf5/0x4f0
> > [23500.239442] [<ffffffff8123ca89>] __jbd2_log_wait_for_space+0x89/0x190
> > [23500.239442] [<ffffffff81237a98>] start_this_handle+0x3a8/0x4e0
> > [23500.239442] [<ffffffff810799e0>] ? remove_wait_queue+0x50/0x50
> > [23500.239442] [<ffffffff81237c93>] jbd2__journal_start+0xc3/0x100
> > [23500.239442] [<ffffffff81237ce3>] jbd2_journal_start+0x13/0x20
> > [23500.239442] [<ffffffff8121743f>] ext4_journal_start_sb+0x7f/0x1d0
> > [23500.239442] [<ffffffff81224a24>] ? ext4_fallocate+0x1a4/0x530
> > [23500.239442] [<ffffffff811f64c5>] ? ext4_meta_trans_blocks+0xa5/0xb0
> > [23500.239442] [<ffffffff81224a24>] ext4_fallocate+0x1a4/0x530
> > [23500.239442] [<ffffffff8117a092>] do_fallocate+0xf2/0x160
> > [23500.239442] [<ffffffff8117a14b>] sys_fallocate+0x4b/0x70
> > [23500.239442] [<ffffffff815e6d69>] system_call_fastpath+0x16/0x1b
> > [23500.239442] Code: ee 44 89 e7 e8 35 1f 0f 00 49 8b 5d 18 4c 89 ef e8 19 4e 00 00 48 83 c4 08 c1 e3 18 c1 fb 1f 83 e3 a1 89 d8 5b 41 5c 41 5d 5d c3 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 66 0f 1f 84 00 00 00 00 00 55 48
> > [23500.239442] RIP [<ffffffff811a9add>] submit_bh+0x10d/0x120
> > [23500.239442] RSP <ffff880273a41b58>
> > [23500.261657] ---[ end trace 3db7a7a7ae953551 ]---
> Hmm, looks like we tried to checkpoint BH_Unwritten buffer. That looks
> like a bug in fallocate() support. Not really related but definitely worth
> reporting.

Well, I ran vanilla the tests in vanilla 3.3 overnight, they were fine.
But may be I was lucky. I'll try to run the tests with vanilla kernel
some more. I mean, it would make more sense to report something against
vanilla 3.3, not a patched 3.3.

Any hints how to properly report an ext4 bug?

--
Best Regards,
Artem Bityutskiy


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2012-03-31 12:25:41

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCH v1 0/9] do not use s_dirt in ext4

On Fri, 2012-03-30 at 17:35 +0200, Jan Kara wrote:
> Hmm, looks like we tried to checkpoint BH_Unwritten buffer. That looks
> like a bug in fallocate() support. Not really related but definitely worth
> reporting.

Yeah, vanilla 3.3 dies as well, with a bit different oops though.

[10697.469725] ------------[ cut here ]------------
[10697.470440] kernel BUG at fs/buffer.c:2871!
[10697.470440] invalid opcode: 0000 [#1] SMP
[10697.470440] CPU 9
[10697.470440] Modules linked in: [last unloaded: scsi_wait_scan]
[10697.470440]
[10697.470440] Pid: 27877, comm: fsstress Not tainted 3.3.0+ #43 Bochs Bochs
[10697.470440] RIP: 0010:[<ffffffff811a9add>] [<ffffffff811a9add>] submit_bh+0x10d/0x120
[10697.470440] RSP: 0018:ffff8803dfbef758 EFLAGS: 00010202
[10697.470440] RAX: 000000000004d025 RBX: ffff8803ffaa66e8 RCX: 0000000000000000
[10697.470440] RDX: ffff8803dfbeffd8 RSI: ffff8803ffaa66e8 RDI: 0000000000000211
[10697.470440] RBP: ffff8803dfbef778 R08: ffff880409013600 R09: 0000000000000001
[10697.470440] R10: 0000000000000000 R11: ffff88040656f240 R12: 0000000000000211
[10697.470440] R13: ffff8803dfbef834 R14: ffff8804065d6800 R15: ffff8803dfbeffd8
[10697.470440] FS: 00007fd0012a0700(0000) GS:ffff88041fd20000(0000) knlGS:0000000000000000
[10697.470440] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10697.470440] CR2: 00007fcffc22dbc8 CR3: 0000000406d3d000 CR4: 00000000000006e0
[10697.470440] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[10697.470440] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[10697.470440] Process fsstress (pid: 27877, threadinfo ffff8803dfbee000, task ffff880404a60000)
[10697.470440] Stack:
[10697.470440] ffff8804065d6b18 ffff8803ffaa66e8 0000000000000211 ffff8803dfbef834
[10697.470440] ffff8803dfbef798 ffffffff811ab59d 000000000000003c ffff8804065d6af8
[10697.470440] ffff8803dfbef7f8 ffffffff8123be2d 0000000091827364 ffff8803dfbef7b0
[10697.470440] Call Trace:
[10697.470440] [<ffffffff811ab59d>] write_dirty_buffer+0x4d/0x80
[10697.470440] [<ffffffff8123be2d>] __flush_batch+0x4d/0xa0
[10697.470440] [<ffffffff8123c8aa>] jbd2_log_do_checkpoint+0x38a/0x4f0
[10697.470440] [<ffffffff812c800d>] ? __write_lock_failed+0xd/0x20
[10697.470440] [<ffffffff8123ca99>] __jbd2_log_wait_for_space+0x89/0x190
[10697.470440] [<ffffffff812c8029>] ? __read_lock_failed+0x9/0x20
[10697.470440] [<ffffffff81237aa8>] start_this_handle+0x3a8/0x4e0
[10697.470440] [<ffffffff810799e0>] ? remove_wait_queue+0x50/0x50
[10697.470440] [<ffffffff81237ca3>] jbd2__journal_start+0xc3/0x100
[10697.470440] [<ffffffff81237cf3>] jbd2_journal_start+0x13/0x20
[10697.470440] [<ffffffff8121740f>] ext4_journal_start_sb+0x7f/0x1d0
[10697.470440] [<ffffffff811f9322>] ? ext4_write_begin+0x112/0x310
[10697.470440] [<ffffffff811f6465>] ? ext4_meta_trans_blocks+0xa5/0xb0
[10697.470440] [<ffffffff811f9322>] ext4_write_begin+0x112/0x310
[10697.470440] [<ffffffff811f96d4>] ext4_da_write_begin+0x1b4/0x210
[10697.470440] [<ffffffff812383e7>] ? jbd2_journal_stop+0x1b7/0x2b0
[10697.470440] [<ffffffff8111af42>] generic_file_buffered_write+0x112/0x290
[10697.470440] [<ffffffff8111c7b9>] __generic_file_aio_write+0x229/0x440
[10697.470440] [<ffffffff8111ca42>] generic_file_aio_write+0x72/0xe0
[10697.470440] [<ffffffff811f319f>] ext4_file_write+0xbf/0x260
[10697.470440] [<ffffffff81165d5f>] ? kmem_cache_free+0x2f/0x130
[10697.470440] [<ffffffff81186af3>] ? putname+0x33/0x50
[10697.470440] [<ffffffff8117ada2>] do_sync_write+0xd2/0x110
[10697.470440] [<ffffffff812633bc>] ? security_file_permission+0x2c/0xb0
[10697.470440] [<ffffffff8117b321>] ? rw_verify_area+0x61/0xf0
[10697.470440] [<ffffffff8117b683>] vfs_write+0xb3/0x180
[10697.470440] [<ffffffff8117b9aa>] sys_write+0x4a/0x90
[10697.470440] [<ffffffff815e6d69>] system_call_fastpath+0x16/0x1b
[10697.470440] Code: ee 44 89 e7 e8 45 1f 0f 00 49 8b 5d 18 4c 89 ef e8 19 4e 00 00 48 83 c4 08 c1 e3 18 c1 fb 1f 83 e3 a1 89 d8 5b 41 5c 41 5d 5d c3 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 66 0f 1f 84 00 00 00 00 00 55 48
[10697.470440] RIP [<ffffffff811a9add>] submit_bh+0x10d/0x120
[10697.470440] RSP <ffff8803dfbef758>
[10697.497128] ---[ end trace 21134344c4537c68 ]---
[10697.497589] ------------[ cut here ]------------
[10697.498081] WARNING: at kernel/exit.c:897 do_exit+0x55/0x870()
[10697.498575] Hardware name: Bochs
[10697.499044] Modules linked in: [last unloaded: scsi_wait_scan]
[10697.499725] Pid: 27877, comm: fsstress Tainted: G D 3.3.0+ #43
[10697.500280] Call Trace:
[10697.500682] [<ffffffff810572bf>] warn_slowpath_common+0x7f/0xc0
[10697.501223] [<ffffffff8105731a>] warn_slowpath_null+0x1a/0x20
[10697.501721] [<ffffffff8105b1c5>] do_exit+0x55/0x870
[10697.502231] [<ffffffff81058f6c>] ? kmsg_dump+0x5c/0xf0
[10697.502714] [<ffffffff815dfeac>] oops_end+0xac/0xf0
[10697.503243] [<ffffffff81017928>] die+0x58/0x90
[10697.503701] [<ffffffff815df7a4>] do_trap+0xc4/0x170
[10697.504208] [<ffffffff81014fa5>] do_invalid_op+0x95/0xb0
[10697.504696] [<ffffffff811a9add>] ? submit_bh+0x10d/0x120
[10697.505598] [<ffffffff81299dc4>] ? drive_stat_acct+0x114/0x190
[10697.506136] [<ffffffff8129ec46>] ? blk_queue_bio+0x106/0x400
[10697.506633] [<ffffffff815e7f9b>] invalid_op+0x1b/0x20
[10697.507163] [<ffffffff811a9add>] ? submit_bh+0x10d/0x120
[10697.507648] [<ffffffff811a9ac7>] ? submit_bh+0xf7/0x120
[10697.508164] [<ffffffff811ab59d>] write_dirty_buffer+0x4d/0x80
[10697.508663] [<ffffffff8123be2d>] __flush_batch+0x4d/0xa0
[10697.509187] [<ffffffff8123c8aa>] jbd2_log_do_checkpoint+0x38a/0x4f0
[10697.509700] [<ffffffff812c800d>] ? __write_lock_failed+0xd/0x20
[10697.510238] [<ffffffff8123ca99>] __jbd2_log_wait_for_space+0x89/0x190
[10697.510754] [<ffffffff812c8029>] ? __read_lock_failed+0x9/0x20
[10697.511180] [<ffffffff81237aa8>] start_this_handle+0x3a8/0x4e0
[10697.511522] [<ffffffff810799e0>] ? remove_wait_queue+0x50/0x50
[10697.511863] [<ffffffff81237ca3>] jbd2__journal_start+0xc3/0x100
[10697.512230] [<ffffffff81237cf3>] jbd2_journal_start+0x13/0x20
[10697.512569] [<ffffffff8121740f>] ext4_journal_start_sb+0x7f/0x1d0
[10697.512915] [<ffffffff811f9322>] ? ext4_write_begin+0x112/0x310
[10697.513283] [<ffffffff811f6465>] ? ext4_meta_trans_blocks+0xa5/0xb0
[10697.513631] [<ffffffff811f9322>] ext4_write_begin+0x112/0x310
[10697.513970] [<ffffffff811f96d4>] ext4_da_write_begin+0x1b4/0x210
[10697.514340] [<ffffffff812383e7>] ? jbd2_journal_stop+0x1b7/0x2b0
[10697.514683] [<ffffffff8111af42>] generic_file_buffered_write+0x112/0x290
[10697.515077] [<ffffffff8111c7b9>] __generic_file_aio_write+0x229/0x440
[10697.515428] [<ffffffff8111ca42>] generic_file_aio_write+0x72/0xe0
[10697.515771] [<ffffffff811f319f>] ext4_file_write+0xbf/0x260
[10697.516127] [<ffffffff81165d5f>] ? kmem_cache_free+0x2f/0x130
[10697.516467] [<ffffffff81186af3>] ? putname+0x33/0x50
[10697.516790] [<ffffffff8117ada2>] do_sync_write+0xd2/0x110
[10697.517146] [<ffffffff812633bc>] ? security_file_permission+0x2c/0xb0
[10697.517498] [<ffffffff8117b321>] ? rw_verify_area+0x61/0xf0
[10697.517834] [<ffffffff8117b683>] vfs_write+0xb3/0x180
[10697.518184] [<ffffffff8117b9aa>] sys_write+0x4a/0x90
[10697.518508] [<ffffffff815e6d69>] system_call_fastpath+0x16/0x1b
[10697.518934] ---[ end trace 21134344c4537c69 ]---

--
Best Regards,
Artem Bityutskiy


2012-03-31 11:49:26

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v1 0/9] do not use s_dirt in ext4

On Fri 30-03-12 18:43:15, Artem Bityutskiy wrote:
> On Fri, 2012-03-30 at 17:35 +0200, Jan Kara wrote:
> > > [23500.239442] [<ffffffff811ab59d>] write_dirty_buffer+0x4d/0x80
> > > [23500.239442] [<ffffffff8123be1d>] __flush_batch+0x4d/0xa0
> > > [23500.239442] [<ffffffff8123c605>] jbd2_log_do_checkpoint+0xf5/0x4f0
> > > [23500.239442] [<ffffffff8123ca89>] __jbd2_log_wait_for_space+0x89/0x190
> > > [23500.239442] [<ffffffff81237a98>] start_this_handle+0x3a8/0x4e0
> > > [23500.239442] [<ffffffff810799e0>] ? remove_wait_queue+0x50/0x50
> > > [23500.239442] [<ffffffff81237c93>] jbd2__journal_start+0xc3/0x100
> > > [23500.239442] [<ffffffff81237ce3>] jbd2_journal_start+0x13/0x20
> > > [23500.239442] [<ffffffff8121743f>] ext4_journal_start_sb+0x7f/0x1d0
> > > [23500.239442] [<ffffffff81224a24>] ? ext4_fallocate+0x1a4/0x530
> > > [23500.239442] [<ffffffff811f64c5>] ? ext4_meta_trans_blocks+0xa5/0xb0
> > > [23500.239442] [<ffffffff81224a24>] ext4_fallocate+0x1a4/0x530
> > > [23500.239442] [<ffffffff8117a092>] do_fallocate+0xf2/0x160
> > > [23500.239442] [<ffffffff8117a14b>] sys_fallocate+0x4b/0x70
> > > [23500.239442] [<ffffffff815e6d69>] system_call_fastpath+0x16/0x1b
> > > [23500.239442] Code: ee 44 89 e7 e8 35 1f 0f 00 49 8b 5d 18 4c 89 ef e8 19 4e 00 00 48 83 c4 08 c1 e3 18 c1 fb 1f 83 e3 a1 89 d8 5b 41 5c 41 5d 5d c3 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 66 0f 1f 84 00 00 00 00 00 55 48
> > > [23500.239442] RIP [<ffffffff811a9add>] submit_bh+0x10d/0x120
> > > [23500.239442] RSP <ffff880273a41b58>
> > > [23500.261657] ---[ end trace 3db7a7a7ae953551 ]---
> > Hmm, looks like we tried to checkpoint BH_Unwritten buffer. That looks
> > like a bug in fallocate() support. Not really related but definitely worth
> > reporting.
>
> Well, I ran vanilla the tests in vanilla 3.3 overnight, they were fine.
> But may be I was lucky. I'll try to run the tests with vanilla kernel
> some more. I mean, it would make more sense to report something against
> vanilla 3.3, not a patched 3.3.
That's true. Frankly I think you were lucky with hitting the bug with
patched kernel rather than not hitting it with vanilla :). What test did
you run exactly?

> Any hints how to properly report an ext4 bug?
Hmm, like any other. Report what load did you run, what kernel, and the
oops... And send the report to [email protected].
Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR

2012-04-02 13:43:46

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCH v1 0/9] do not use s_dirt in ext4

On Sat, 2012-03-31 at 13:49 +0200, Jan Kara wrote:
> > Well, I ran vanilla the tests in vanilla 3.3 overnight, they were fine.
> > But may be I was lucky. I'll try to run the tests with vanilla kernel
> > some more. I mean, it would make more sense to report something against
> > vanilla 3.3, not a patched 3.3.
> That's true. Frankly I think you were lucky with hitting the bug with
> patched kernel rather than not hitting it with vanilla :). What test did
> you run exactly?

I've sent a separate report (Subject: 3.3 oops). The oops there is a bit
different comparing to this one, though. For this oops I used the same
kvm setup.

--
Best Regards,
Artem Bityutskiy


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part