2013-06-10 19:32:18

by Paul Gortmaker

[permalink] [raw]
Subject: [RFC PATCH 0/4] ext4/jbd2: several possible mainline fixes

I've been trying to diagnose a problem where ext4 can't be reliably
used under preempt-rt kernels. While I've not solved that yet, I
have found two things that look to me to be possible issues that
could also impact mainline (and two cosmetic fixes), so I'm sending
them here to see if they make sense to others who are more familiar
with filesystem code than I am. At the moment, I can't see why the
issues I've identified here couldn't happen on mainline/non-RT.
The series attached here was build/boot tested on v3.10-rc5.

[ -- stop reading here if you don't care about how ext4 fails on RT -- ]

With respect to the RT problem, it looks like I'm seeing a stall
in jbd2_journal_commit_transaction() in the jbd2 thread, and then
everything else piles up behind that. The stall looks like this:

[ 1593.701664] jbd2/sda1-8 D ffff880223c758c0 4832 1214 2 0x00000000
[ 1593.701668] ffff88021ff97b10 0000000000000046 ffff88021ff97ac0 ffffffff8106e8f1
[ 1593.701669] ffff880223c75500 ffff88021ff97fd8 ffff88021ff97fd8 ffff88021ff97fd8
[ 1593.701670] ffffffff81c16440 ffff880223c75500 ffff88021ff97ae0 ffff880223c75500
[ 1593.701671] Call Trace:
[ 1593.701673] [<ffffffff8106e8f1>] ? get_parent_ip+0x11/0x50
[ 1593.701674] [<ffffffff810e9560>] ? __lock_page+0x70/0x70
[ 1593.701676] [<ffffffff8172dc34>] schedule+0x24/0x70
[ 1593.701677] [<ffffffff8172dd0a>] io_schedule+0x8a/0xd0
[ 1593.701678] [<ffffffff810e9569>] sleep_on_page+0x9/0x10
[ 1593.701679] [<ffffffff8172cba8>] __wait_on_bit+0x58/0x90
[ 1593.701680] [<ffffffff810e9ca9>] ? find_get_pages_tag+0x109/0x190
[ 1593.701681] [<ffffffff810e9688>] wait_on_page_bit+0x78/0x80
[ 1593.701682] [<ffffffff810603d0>] ? autoremove_wake_function+0x40/0x40
[ 1593.701684] [<ffffffff810e9a39>] filemap_fdatawait_range+0xf9/0x190
[ 1593.701685] [<ffffffff812c4451>] ? submit_bio+0x71/0xe0
[ 1593.701686] [<ffffffff8106e8f1>] ? get_parent_ip+0x11/0x50
[ 1593.701687] [<ffffffff8103f125>] ? unpin_current_cpu+0x15/0x80
[ 1593.701689] [<ffffffff810e9aef>] filemap_fdatawait+0x1f/0x30
[ 1593.701691] [<ffffffff81220c34>] jbd2_journal_commit_transaction+0xb14/0x1c20
[ 1593.701692] [<ffffffff81226b91>] kjournald2+0xc1/0x270

Using jbd_debug() it seems that I end up with jbd2_log_do_checkpoint
and jbd2_journal_commit_transaction running into each other. In one of
my attached patches, I show they overlap to the point of interrupting
each others jbd_debug messages. Maybe that doesn't matter? Here is a
typical debug trace with time stamps:

[ 695.229129] (fs/jbd2/journal.c, 495): __jbd2_log_start_commit: JBD2: requesting commit 42331/42330
[ 695.229141] (fs/jbd2/journal.c, 203): kjournald2: kjournald2 wakes
[ 695.229145] (fs/jbd2/journal.c, 211): kjournald2: woke because of timeout
[ 695.229147] (fs/jbd2/journal.c, 155): kjournald2: commit_sequence=42330, commit_request=42331
[ 695.229150] (fs/jbd2/journal.c, 158): kjournald2: OK, requests differ
[ 695.229158] (fs/jbd2/commit.c, 372): jbd2_journal_commit_transaction: JBD2: starting commit of transaction 42331
[ 695.229164] (fs/jbd2/journal.c, 495): __jbd2_log_start_commit: JBD2: requesting commit 42331/42330
[ 695.243653] (fs/jbd2/journal.c, 495): __jbd2_log_start_commit: JBD2: requesting commit 42331/42330
[ 695.247738] (fs/jbd2/checkpoint.c, 345): jbd2_log_do_checkpoint: Start checkpoint
[ 695.247749] (fs/jbd2/checkpoint.c, 354): jbd2_log_do_checkpoint: cleanup_journal_tail returned 1
[ 695.248917] (fs/jbd2/journal.c, 648): jbd2_log_wait_commit: JBD2: want 42331, j_commit_sequence=42330
[ 695.265162] (fs/jbd2/revoke.c, 566): jbd2_journal_write_revoke_records: Wrote 10 revoke records
<filesystem now dead for writes>

Notice that we never got the expected message:

"jbd2_journal_commit_transaction: JBD2: commit 42331 complete"

Stuck waiting/spinning somewhere in jbd2_journal_commit_transaction.
As near as I can tell, it never got to phase 3 of commit_transaction.

Since jbd2_journal_commit_transaction is such a large function,
I'm tempted to break it up some, just to ease my debugging (compare
0x1c20 to the smaller numbers around it). Perhaps there would be
interest in such kinds of patches for mainline?

Thanks,
Paul.
---

Paul Gortmaker (4):
jbd2/journal_commit_transaction: relocate state lock to incorporate
all users
jbd2/log_wait_for_space: drop checkpoint mutex when waiting
jbd2: fix duplicate debug label for phase 2
jbd/jbd2: relocate bit_spinlock header to jbd_common

fs/jbd2/checkpoint.c | 7 +++++++
fs/jbd2/commit.c | 7 ++++---
include/linux/jbd.h | 1 -
include/linux/jbd2.h | 1 -
include/linux/jbd_common.h | 2 ++
5 files changed, 13 insertions(+), 5 deletions(-)

--
1.8.1.2



2013-06-10 19:32:00

by Paul Gortmaker

[permalink] [raw]
Subject: [PATCH 1/4] jbd2/journal_commit_transaction: relocate state lock to incorporate all users

The state lock is taken after we are doing an assert on the state
value, not before. It is also taken after the jbd_debug() call.
Noting that jbd_debug() is implemented with two separate printk()
calls, it can lead to corrupted and misleading debug output like
the following (see lines marked with "*"):

[ 290.339362] (fs/jbd2/journal.c, 203): kjournald2: kjournald2 wakes
[ 290.339365] (fs/jbd2/journal.c, 155): kjournald2: commit_sequence=42103, commit_request=42104
[ 290.339369] (fs/jbd2/journal.c, 158): kjournald2: OK, requests differ
[* 290.339376] (fs/jbd2/journal.c, 648): jbd2_log_wait_commit:
[* 290.339379] (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: want 42104, j_commit_sequence=42103
[* 290.339382] JBD2: starting commit of transaction 42104
[ 290.339410] (fs/jbd2/revoke.c, 566): jbd2_journal_write_revoke_records: Wrote 0 revoke records
[ 290.376555] (fs/jbd2/commit.c, 1088): jbd2_journal_commit_transaction: JBD2: commit 42104 complete, head 42079

i.e. the debug output from log_wait_commit and journal_commit_transaction
have become interleaved. The output should have been:

(fs/jbd2/journal.c, 648): jbd2_log_wait_commit: JBD2: want 42104, j_commit_sequence=42103
(fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: starting commit of transaction 42104

It is expected that this is not easy to replicate -- I was only able
to cause it on preempt-rt kernels, and even then only under heavy
I/O load.

Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/jbd2/commit.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 0f53946..7e8a5fd 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -423,6 +423,8 @@ void jbd2_journal_commit_transaction(journal_t *journal)
J_ASSERT(journal->j_running_transaction != NULL);
J_ASSERT(journal->j_committing_transaction == NULL);

+ write_lock(&journal->j_state_lock);
+
commit_transaction = journal->j_running_transaction;
J_ASSERT(commit_transaction->t_state == T_RUNNING);

@@ -430,7 +432,6 @@ void jbd2_journal_commit_transaction(journal_t *journal)
jbd_debug(1, "JBD2: starting commit of transaction %d\n",
commit_transaction->t_tid);

- write_lock(&journal->j_state_lock);
commit_transaction->t_state = T_LOCKED;

trace_jbd2_commit_locking(journal, commit_transaction);
--
1.8.1.2


2013-06-10 19:32:02

by Paul Gortmaker

[permalink] [raw]
Subject: [PATCH 3/4] jbd2: fix duplicate debug label for phase 2

Currently we see this output:

$git grep phase fs/jbd2
fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 1\n");
fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 2\n");
fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 2\n");
fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 3\n");
fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 4\n");
[...]

There is clearly a duplicate label for phase 2, and they are
both active (i.e. not in #if ... #else block). Rename them to
be "2a" and "2b" so the debug output is unambiguous.

Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/jbd2/commit.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 7e8a5fd..a10fc55 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -534,7 +534,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
wake_up(&journal->j_wait_transaction_locked);
write_unlock(&journal->j_state_lock);

- jbd_debug(3, "JBD2: commit phase 2\n");
+ jbd_debug(3, "JBD2: commit phase 2a\n");

/*
* Now start flushing things to disk, in the order they appear
@@ -549,7 +549,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
WRITE_SYNC);
blk_finish_plug(&plug);

- jbd_debug(3, "JBD2: commit phase 2\n");
+ jbd_debug(3, "JBD2: commit phase 2b\n");

/*
* Way to go: we have now written out all of the data for a
--
1.8.1.2


2013-06-10 19:32:03

by Paul Gortmaker

[permalink] [raw]
Subject: [PATCH 4/4] jbd/jbd2: relocate bit_spinlock header to jbd_common

The bit_spinlock functions are only used for the jbd_lock_bh_state
functions (and friends) in jbd_common.h and are not directly used
by either of jbd.h or jbd2.h content.

The jbd_common file is new as of commit 446066724c36 ("jdb/jbd2: factor
out common functions from the jbd[2] header files") but common
(and isolated) headers were not considered for factoring at that time.

Signed-off-by: Paul Gortmaker <[email protected]>
---
include/linux/jbd.h | 1 -
include/linux/jbd2.h | 1 -
include/linux/jbd_common.h | 2 ++
3 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/jbd.h b/include/linux/jbd.h
index 7e0b622..6e5524f 100644
--- a/include/linux/jbd.h
+++ b/include/linux/jbd.h
@@ -27,7 +27,6 @@
#include <linux/buffer_head.h>
#include <linux/journal-head.h>
#include <linux/stddef.h>
-#include <linux/bit_spinlock.h>
#include <linux/mutex.h>
#include <linux/timer.h>
#include <linux/lockdep.h>
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 6e051f4..9986ab8 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -26,7 +26,6 @@
#include <linux/buffer_head.h>
#include <linux/journal-head.h>
#include <linux/stddef.h>
-#include <linux/bit_spinlock.h>
#include <linux/mutex.h>
#include <linux/timer.h>
#include <linux/slab.h>
diff --git a/include/linux/jbd_common.h b/include/linux/jbd_common.h
index 6133679..949b5dd 100644
--- a/include/linux/jbd_common.h
+++ b/include/linux/jbd_common.h
@@ -1,6 +1,8 @@
#ifndef _LINUX_JBD_STATE_H
#define _LINUX_JBD_STATE_H

+#include <linux/bit_spinlock.h>
+
enum jbd_state_bits {
BH_JBD /* Has an attached ext3 journal_head */
= BH_PrivateStart,
--
1.8.1.2


2013-06-10 19:32:01

by Paul Gortmaker

[permalink] [raw]
Subject: [PATCH 2/4] jbd2/log_wait_for_space: drop checkpoint mutex when waiting

While trying to debug an an issue under extreme I/O loading
on preempt-rt kernels, the following backtrace was observed
via SysRQ output:

rm D ffff8802203afbc0 4600 4878 4748 0x00000000
ffff8802217bfb78 0000000000000082 ffff88021fc2bb80 ffff88021fc2bb80
ffff88021fc2bb80 ffff8802217bffd8 ffff8802217bffd8 ffff8802217bffd8
ffff88021f1d4c80 ffff88021fc2bb80 ffff8802217bfb88 ffff88022437b000
Call Trace:
[<ffffffff8172dc34>] schedule+0x24/0x70
[<ffffffff81225b5d>] jbd2_log_wait_commit+0xbd/0x140
[<ffffffff81060390>] ? __init_waitqueue_head+0x50/0x50
[<ffffffff81223635>] jbd2_log_do_checkpoint+0xf5/0x520
[<ffffffff81223b09>] __jbd2_log_wait_for_space+0xa9/0x1f0
[<ffffffff8121dc40>] start_this_handle.isra.10+0x2e0/0x530
[<ffffffff81060390>] ? __init_waitqueue_head+0x50/0x50
[<ffffffff8121e0a3>] jbd2__journal_start+0xc3/0x110
[<ffffffff811de7ce>] ? ext4_rmdir+0x6e/0x230
[<ffffffff8121e0fe>] jbd2_journal_start+0xe/0x10
[<ffffffff811f308b>] ext4_journal_start_sb+0x5b/0x160
[<ffffffff811de7ce>] ext4_rmdir+0x6e/0x230
[<ffffffff811435c5>] vfs_rmdir+0xd5/0x140
[<ffffffff8114370f>] do_rmdir+0xdf/0x120
[<ffffffff8105c6b4>] ? task_work_run+0x44/0x80
[<ffffffff81002889>] ? do_notify_resume+0x89/0x100
[<ffffffff817361ae>] ? int_signal+0x12/0x17
[<ffffffff81145d85>] sys_unlinkat+0x25/0x40
[<ffffffff81735f22>] system_call_fastpath+0x16/0x1b

What is interesting here, is that we call log_wait_commit, from
within wait_for_space, but we are still holding the checkpoint_mutex
as it surrounds mostly the whole of wait_for_space. And then, as we
are waiting, journal_commit_transaction can run, and if the JBD2_FLUSHED
bit is set, then we will also try to take the same checkpoint_mutex.

It seems that we need to drop checkpoint_mutex temporarily while
waiting, if we want to guarantee that progress can be made by
jbd2_journal_commit_transaction(). There doesn't seem to be anything
preempt-rt specific about this, other then perhaps increasing the
odds of it happening.

Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/jbd2/checkpoint.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index c78841e..d042283 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -156,7 +156,14 @@ void __jbd2_log_wait_for_space(journal_t *journal)
/* We were able to recover space; yay! */
;
} else if (tid) {
+ /*
+ * jbd2_journal_commit_transaction() may want
+ * to take the checkpoint_mutex if JBD2_FLUSHED
+ * is set. So we need to temporarily drop it.
+ */
+ mutex_unlock(&journal->j_checkpoint_mutex);
jbd2_log_wait_commit(journal, tid);
+ mutex_lock(&journal->j_checkpoint_mutex);
} else {
printk(KERN_ERR "%s: needed %d blocks and "
"only had %d space available\n",
--
1.8.1.2


2013-06-10 23:38:51

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] ext4/jbd2: several possible mainline fixes

On Mon, Jun 10, 2013 at 03:31:59PM -0400, Paul Gortmaker wrote:
> Using jbd_debug() it seems that I end up with jbd2_log_do_checkpoint
> and jbd2_journal_commit_transaction running into each other. In one of
> my attached patches, I show they overlap to the point of interrupting
> each others jbd_debug messages. Maybe that doesn't matter?

That should be OK. We do allow a new transactions while we are
committing an older transaction, and if this requires more space, a
checkpoint could start. I'm not sure why you're apparently seeing a
deadlock under RT-linux, though.

> Stuck waiting/spinning somewhere in jbd2_journal_commit_transaction.
> As near as I can tell, it never got to phase 3 of commit_transaction.
>
> Since jbd2_journal_commit_transaction is such a large function,
> I'm tempted to break it up some, just to ease my debugging (compare
> 0x1c20 to the smaller numbers around it). Perhaps there would be
> interest in such kinds of patches for mainline?

Instead of breaking it up, can you just use addr2line, i.e.:

% addr2line -a ffffffff8046a067 -i -e vmlinux
0xffffffff8046a067
./include/linux/buffer_head.h:287
./fs/ext4/inode.c:5585
./fs/ext4/inode.c:5963

I find this to be incredibly useful, since with the -i option it will
handle inline functions correctly. In the above example there are two
levels of inlining, one explicitly marked inline in
include/linux/buffer.h, and one implicit inlining taking place because
we had a static function in fs/ext4/inode.c that was only called by
one caller.

Because of gcc's implicit inlining, just breaking up the function by
itself wouldn't be enough, unless you explicitly marked the new static
functions with noinline; but that introduces inefficiencies. If the
only reason you want to do this is to make it easier to figure out a
stack trace, addr2line really is your friend....

Cheers,

- Ted

2013-06-11 02:12:30

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 1/4] jbd2/journal_commit_transaction: relocate state lock to incorporate all users

On Mon, Jun 10, 2013 at 03:32:00PM -0400, Paul Gortmaker wrote:
> The state lock is taken after we are doing an assert on the state
> value, not before. It is also taken after the jbd_debug() call.
> Noting that jbd_debug() is implemented with two separate printk()
> calls, it can lead to corrupted and misleading debug output like
> the following (see lines marked with "*"):

This is only a cosmetic fix, but instead of trying to fix it by moving
a lock --- which would only fix this issue, but there are probably
others cases where this might be an issue, I'd suggest fixing it with
something like this:

/* in fs/jbd2/journal.c */
void __jbd2_debug(int level, const char *func, unsigned int line, const char *fmt, ...)
{
struct va_format vaf;
va_list args;

if (level < jbd2_journal_enable_debug)
return;
va_start(args, fmt);
vaf.fmt = fmt;
vaf.va = &args
printk(KERN_DEBUG, "jbd2: (%s, %u): %pV\n", func, line, &vaf);
va_end(args);
}

/* in jbd2.h */

#define jbd_debug(n, fmt, a...) __jbd2_debug((n), __func__, __LINE__, (fmt), ##a)

Could you give this approach a try?

Thanks,

- Ted

2013-06-11 02:33:31

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 2/4] jbd2/log_wait_for_space: drop checkpoint mutex when waiting

On Mon, Jun 10, 2013 at 03:32:01PM -0400, Paul Gortmaker wrote:
>
> What is interesting here, is that we call log_wait_commit, from
> within wait_for_space, but we are still holding the checkpoint_mutex
> as it surrounds mostly the whole of wait_for_space. And then, as we
> are waiting, journal_commit_transaction can run, and if the JBD2_FLUSHED
> bit is set, then we will also try to take the same checkpoint_mutex.

> } else if (tid) {
> + /*
> + * jbd2_journal_commit_transaction() may want
> + * to take the checkpoint_mutex if JBD2_FLUSHED
> + * is set. So we need to temporarily drop it.
> + */
> + mutex_unlock(&journal->j_checkpoint_mutex);
> jbd2_log_wait_commit(journal, tid);
> + mutex_lock(&journal->j_checkpoint_mutex);
> } else {
> printk(KERN_ERR "%s: needed %d blocks and "
> "only had %d space available\n",

After we execute the code in the else cause, we drop through to just
before the bottom of the while loop, where we see:

mutex_unlock(&journal->j_checkpoint_mutex);
}

So it would be better to change things like this instead, so we don't
end up grabbing and releasing the j_checkpoint_mutex unnecessarily:

} else if (tid) {
+ /*
+ * jbd2_journal_commit_transaction() may want
+ * to take the checkpoint_mutex if JBD2_FLUSHED
+ * is set. So we need to temporarily drop it.
+ */
+ mutex_unlock(&journal->j_checkpoint_mutex);
jbd2_log_wait_commit(journal, tid);
+ write_lock(&journal->j_state_lock);
+ continue;
} else {
printk(KERN_ERR "%s: needed %d blocks and "
"only had %d space available\n",

Could you try respinning the patch like this and testing the result?

Thanks,

- Ted

2013-06-11 02:45:50

by Paul Gortmaker

[permalink] [raw]
Subject: Re: [PATCH 1/4] jbd2/journal_commit_transaction: relocate state lock to incorporate all users

On Mon, Jun 10, 2013 at 10:12 PM, Theodore Ts'o <[email protected]> wrote:
> On Mon, Jun 10, 2013 at 03:32:00PM -0400, Paul Gortmaker wrote:
>> The state lock is taken after we are doing an assert on the state
>> value, not before. It is also taken after the jbd_debug() call.
>> Noting that jbd_debug() is implemented with two separate printk()
>> calls, it can lead to corrupted and misleading debug output like
>> the following (see lines marked with "*"):
>
> This is only a cosmetic fix, but instead of trying to fix it by moving
> a lock --- which would only fix this issue, but there are probably
> others cases where this might be an issue, I'd suggest fixing it with
> something like this:
>
> /* in fs/jbd2/journal.c */
> void __jbd2_debug(int level, const char *func, unsigned int line, const char *fmt, ...)
> {
> struct va_format vaf;
> va_list args;
>
> if (level < jbd2_journal_enable_debug)
> return;
> va_start(args, fmt);
> vaf.fmt = fmt;
> vaf.va = &args
> printk(KERN_DEBUG, "jbd2: (%s, %u): %pV\n", func, line, &vaf);
> va_end(args);
> }
>
> /* in jbd2.h */
>
> #define jbd_debug(n, fmt, a...) __jbd2_debug((n), __func__, __LINE__, (fmt), ##a)
>
> Could you give this approach a try?

Sure, I will do so tomorrow - but since it can't be reproduced
on-demand, all I'll be able to do is to watch for independent
calls with very close time stamps, and confirm they were not
interleaved.

What about the state assert being done outside of the state
lock? Should I keep that as a separate patch so that the
assert isn't checking what could possibly be a transient value?

Thanks,
Paul.
--

>
> Thanks,
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-06-11 02:52:50

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 1/4] jbd2/journal_commit_transaction: relocate state lock to incorporate all users

On Mon, Jun 10, 2013 at 10:45:50PM -0400, Paul Gortmaker wrote:
>
> Sure, I will do so tomorrow - but since it can't be reproduced
> on-demand, all I'll be able to do is to watch for independent
> calls with very close time stamps, and confirm they were not
> interleaved.

Well, if the resulting patch causes jbd_debug() messages to be issued
correctly, I have every confidence that they won't be interleaved; the
%pV structure was used to solve this very problem, and it's used all
over the kernel. In fact we've used it __ext4_error()/ext4_error() to
solve this exact same issue of interleaved messages.

> What about the state assert being done outside of the state
> lock? Should I keep that as a separate patch so that the
> assert isn't checking what could possibly be a transient value?

Ah, I missed that since I had been focusing on the jbd_debug().
That's a good catch, we'll still need this patch to make sure we're
checking the state assert under the j_state lock. But with the fixed
jbd_debug() we can keep the jbd_debug() statement outside of the
j_state_lock critical region.

Cheers,

- Ted

2013-06-11 03:10:17

by Paul Gortmaker

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] ext4/jbd2: several possible mainline fixes

On Mon, Jun 10, 2013 at 7:38 PM, Theodore Ts'o <[email protected]> wrote:
> On Mon, Jun 10, 2013 at 03:31:59PM -0400, Paul Gortmaker wrote:
>> Using jbd_debug() it seems that I end up with jbd2_log_do_checkpoint
>> and jbd2_journal_commit_transaction running into each other. In one of
>> my attached patches, I show they overlap to the point of interrupting
>> each others jbd_debug messages. Maybe that doesn't matter?
>
> That should be OK. We do allow a new transactions while we are
> committing an older transaction, and if this requires more space, a
> checkpoint could start. I'm not sure why you're apparently seeing a
> deadlock under RT-linux, though.
>
>> Stuck waiting/spinning somewhere in jbd2_journal_commit_transaction.
>> As near as I can tell, it never got to phase 3 of commit_transaction.
>>
>> Since jbd2_journal_commit_transaction is such a large function,
>> I'm tempted to break it up some, just to ease my debugging (compare
>> 0x1c20 to the smaller numbers around it). Perhaps there would be
>> interest in such kinds of patches for mainline?
>
> Instead of breaking it up, can you just use addr2line, i.e.:
>
> % addr2line -a ffffffff8046a067 -i -e vmlinux
> 0xffffffff8046a067
> ./include/linux/buffer_head.h:287
> ./fs/ext4/inode.c:5585
> ./fs/ext4/inode.c:5963
>
> I find this to be incredibly useful, since with the -i option it will
> handle inline functions correctly. In the above example there are two

Thanks, I wasn't aware of the "-i" -- and had simply been using
gdb directly with "l *jbd2_journal_commit_transaction+0x<offset>"
which shows what inline we were in, but it still wasn't clear to me
yet what was going on, that we were stuck there.

> levels of inlining, one explicitly marked inline in
> include/linux/buffer.h, and one implicit inlining taking place because
> we had a static function in fs/ext4/inode.c that was only called by
> one caller.
>
> Because of gcc's implicit inlining, just breaking up the function by
> itself wouldn't be enough, unless you explicitly marked the new static
> functions with noinline; but that introduces inefficiencies. If the
> only reason you want to do this is to make it easier to figure out a
> stack trace, addr2line really is your friend....

That was one reason -- the other is that I was thinking if sensible
functional boundaries in the source could be made between the
chunks marked as phase 1 --> phase N, then people like me who
are new to reading that bit of code might come away feeling more
confident that they understood it correctly. Anyway, it was just a
thought...

I will keep people posted as to what (if?) I finally figure out
about RT+jbd2. I have reproduced it on a completely different
machine (dual socket numa xeon with dual disks as raid0, vs.
the original single disk, single socket, COTS dell optiplex).

It still takes a massively parallel yocto build, combined with a
large "rm -rf" elsewhere to trigger it though (and even multiple
tries of the above). So it is hard to say with confidence that
it is "found and fixed" based on build results alone -- when 5
full yocto builds can pass w/o any issue at all. :(

But thanks for the input though!
Paul.
--

>
> Cheers,
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-06-11 03:21:22

by Paul Gortmaker

[permalink] [raw]
Subject: Re: [PATCH 2/4] jbd2/log_wait_for_space: drop checkpoint mutex when waiting

On Mon, Jun 10, 2013 at 10:33 PM, Theodore Ts'o <[email protected]> wrote:
> On Mon, Jun 10, 2013 at 03:32:01PM -0400, Paul Gortmaker wrote:
>>
>> What is interesting here, is that we call log_wait_commit, from
>> within wait_for_space, but we are still holding the checkpoint_mutex
>> as it surrounds mostly the whole of wait_for_space. And then, as we
>> are waiting, journal_commit_transaction can run, and if the JBD2_FLUSHED
>> bit is set, then we will also try to take the same checkpoint_mutex.
>
>> } else if (tid) {
>> + /*
>> + * jbd2_journal_commit_transaction() may want
>> + * to take the checkpoint_mutex if JBD2_FLUSHED
>> + * is set. So we need to temporarily drop it.
>> + */
>> + mutex_unlock(&journal->j_checkpoint_mutex);
>> jbd2_log_wait_commit(journal, tid);
>> + mutex_lock(&journal->j_checkpoint_mutex);
>> } else {
>> printk(KERN_ERR "%s: needed %d blocks and "
>> "only had %d space available\n",
>
> After we execute the code in the else cause, we drop through to just
> before the bottom of the while loop, where we see:
>
> mutex_unlock(&journal->j_checkpoint_mutex);
> }
>
> So it would be better to change things like this instead, so we don't
> end up grabbing and releasing the j_checkpoint_mutex unnecessarily:
>
> } else if (tid) {
> + /*
> + * jbd2_journal_commit_transaction() may want
> + * to take the checkpoint_mutex if JBD2_FLUSHED
> + * is set. So we need to temporarily drop it.
> + */
> + mutex_unlock(&journal->j_checkpoint_mutex);
> jbd2_log_wait_commit(journal, tid);
> + write_lock(&journal->j_state_lock);
> + continue;
> } else {
> printk(KERN_ERR "%s: needed %d blocks and "
> "only had %d space available\n",
>
> Could you try respinning the patch like this and testing the result?

Absolutely; will do that tomorrow and re-test on 3.10-rc5.

And I'll keep it in my preempt-rt testing queue too, as I keep
looking into that RT problem.

Thanks for reviewing and confirming it could be a real problem.

Paul.
--

>
> Thanks,
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-06-11 13:03:33

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 2/4] jbd2/log_wait_for_space: drop checkpoint mutex when waiting

On Mon, Jun 10, 2013 at 11:20:50PM -0400, Paul Gortmaker wrote:
>
> Absolutely; will do that tomorrow and re-test on 3.10-rc5.

Thanks!! And by the way, I did look at patches #3 and #4 in your
series, and they looked fine. If you're going to be resending a new
patch series shortly, I won't bother grabbing them now and wait for
your new series, but if you won't have time to complete your testing,
the patches are independent and easy to validate by inspection, so I
could also just pull them into the ext4 tree earlier.

Cheers, and good luck figuring out the RT problem.

- Ted

P.S. About the bit spinlock patches in the RT Tree... Something that
might be interesting to do if you have the time is to measure the
performance differential on non-realtime kernels to replace the bit
spinlocks with normal spinlocks. The two main issues with it I can
forsee is the potential increased memory overhead (since a system can
have a huge number of bh's), but if this were offset with performance
gains (and we can confirm no performance losses moving away from bit
spinlocks), I'm not wedded to keeping them. Other folks in the
fsdevel community may push back on adding spinlocks to the bh that
many other file systems would have no use for, and that may very well
be a concern, but if we understand what the tradeoffs are, both pro
and con, it's something we can have a reasonable discussion about.


2013-06-11 13:20:03

by Paul Gortmaker

[permalink] [raw]
Subject: Re: [PATCH 2/4] jbd2/log_wait_for_space: drop checkpoint mutex when waiting

On 13-06-11 09:03 AM, Theodore Ts'o wrote:
> On Mon, Jun 10, 2013 at 11:20:50PM -0400, Paul Gortmaker wrote:
>>
>> Absolutely; will do that tomorrow and re-test on 3.10-rc5.
>
> Thanks!! And by the way, I did look at patches #3 and #4 in your
> series, and they looked fine. If you're going to be resending a new
> patch series shortly, I won't bother grabbing them now and wait for
> your new series, but if you won't have time to complete your testing,
> the patches are independent and easy to validate by inspection, so I
> could also just pull them into the ext4 tree earlier.

I'll definitely respin them and retest today, so no need for you to
bother manually dealing with the two trivial ones independently.

>
> Cheers, and good luck figuring out the RT problem.

Thanks!

>
> - Ted
>
> P.S. About the bit spinlock patches in the RT Tree... Something that
> might be interesting to do if you have the time is to measure the
> performance differential on non-realtime kernels to replace the bit
> spinlocks with normal spinlocks. The two main issues with it I can
> forsee is the potential increased memory overhead (since a system can
> have a huge number of bh's), but if this were offset with performance
> gains (and we can confirm no performance losses moving away from bit
> spinlocks), I'm not wedded to keeping them. Other folks in the

Good to know. I'll keep it in mind to try some performance tests on
a jbd[2] converted non-RT kernel when I have time. I expect with the
advent of hlist_bl_head we'll see more and more users of bit_spin,
but at least those ones can be "fixed" all in one place for RT.

Paul.
--

> fsdevel community may push back on adding spinlocks to the bh that
> many other file systems would have no use for, and that may very well
> be a concern, but if we understand what the tradeoffs are, both pro
> and con, it's something we can have a reasonable discussion about.
>

2013-06-11 17:37:52

by Paul Gortmaker

[permalink] [raw]
Subject: Re: [PATCH 1/4] jbd2/journal_commit_transaction: relocate state lock to incorporate all users

[Re: [PATCH 1/4] jbd2/journal_commit_transaction: relocate state lock to incorporate all users] On 10/06/2013 (Mon 22:12) Theodore Ts'o wrote:

> On Mon, Jun 10, 2013 at 03:32:00PM -0400, Paul Gortmaker wrote:
> > The state lock is taken after we are doing an assert on the state
> > value, not before. It is also taken after the jbd_debug() call.
> > Noting that jbd_debug() is implemented with two separate printk()
> > calls, it can lead to corrupted and misleading debug output like
> > the following (see lines marked with "*"):
>
> This is only a cosmetic fix, but instead of trying to fix it by moving
> a lock --- which would only fix this issue, but there are probably
> others cases where this might be an issue, I'd suggest fixing it with
> something like this:
>
> /* in fs/jbd2/journal.c */
> void __jbd2_debug(int level, const char *func, unsigned int line, const char *fmt, ...)
> {
> struct va_format vaf;
> va_list args;
>
> if (level < jbd2_journal_enable_debug)
> return;

I've changed "level <" for "level >" here....

> va_start(args, fmt);
> vaf.fmt = fmt;
> vaf.va = &args

...and added a ";" here.

> printk(KERN_DEBUG, "jbd2: (%s, %u): %pV\n", func, line, &vaf);

I'm so glad to see that I am not the only one who accidentally keeps
putting a comma after the KERN_<N> values. :)

I'd thought about using pr_debug() but decided not to, since it is a
no-op without us actually defining DEBUG somewhere, and if we don't,
then people would wonder where their debug output went, I think.

> va_end(args);
> }
>
> /* in jbd2.h */
>
> #define jbd_debug(n, fmt, a...) __jbd2_debug((n), __func__, __LINE__, (fmt), ##a)
>
> Could you give this approach a try?

Since both ext3 and ext4 will suffer the same problem, I've made
a combined solution. It has passed a basic sanity test here, so
maybe if you have a spare moment, you could have a look to determine
if you are OK with the implementation, while I continue fixing up
the others and doing more testing?

Thanks,
Paul.
--

>From fbedf3102700b034395404dfdc7ac2dcae145af8 Mon Sep 17 00:00:00 2001
From: Paul Gortmaker <[email protected]>
Date: Tue, 11 Jun 2013 11:46:10 -0400
Subject: [PATCH] jbd/jbd2: make common jbd_debug that won't split printk
statements

Since the jbd_debug() is implemented with two separate printk()
calls, it can lead to corrupted and misleading debug output like
the following (see lines marked with "*"):

[ 290.339362] (fs/jbd2/journal.c, 203): kjournald2: kjournald2 wakes
[ 290.339365] (fs/jbd2/journal.c, 155): kjournald2: commit_sequence=42103, commit_request=42104
[ 290.339369] (fs/jbd2/journal.c, 158): kjournald2: OK, requests differ
[* 290.339376] (fs/jbd2/journal.c, 648): jbd2_log_wait_commit:
[* 290.339379] (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: want 42104, j_commit_sequence=42103
[* 290.339382] JBD2: starting commit of transaction 42104
[ 290.339410] (fs/jbd2/revoke.c, 566): jbd2_journal_write_revoke_records: Wrote 0 revoke records
[ 290.376555] (fs/jbd2/commit.c, 1088): jbd2_journal_commit_transaction: JBD2: commit 42104 complete, head 42079

i.e. the debug output from log_wait_commit and journal_commit_transaction
have become interleaved. The output should have been:

(fs/jbd2/journal.c, 648): jbd2_log_wait_commit: JBD2: want 42104, j_commit_sequence=42103
(fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: starting commit of transaction 42104

It is expected that this is not easy to replicate -- I was only able
to cause it on preempt-rt kernels, and even then only under heavy
I/O load. Ted suggested we make use of va_args to reduce it to a single
printk() call to fix it.

This was originally seen on ext4, but since ext3 should suffer the same
problem, I've taken the next step to make the jbd_debug a shared
implementation, so to fix ext3 and reduce duplication at the same time.

Signed-off-by: Paul Gortmaker <[email protected]>

diff --git a/include/linux/jbd.h b/include/linux/jbd.h
index 6e5524f..ea88e8e 100644
--- a/include/linux/jbd.h
+++ b/include/linux/jbd.h
@@ -56,17 +56,9 @@
*/
#define JBD_EXPENSIVE_CHECKING
extern u8 journal_enable_debug;
-
-#define jbd_debug(n, f, a...) \
- do { \
- if ((n) <= journal_enable_debug) { \
- printk (KERN_DEBUG "(%s, %d): %s: ", \
- __FILE__, __LINE__, __func__); \
- printk (f, ## a); \
- } \
- } while (0)
+#define JOURNAL_ENABLE_DEBUG journal_enable_debug
#else
-#define jbd_debug(f, a...) /**/
+#define JOURNAL_ENABLE_DEBUG INT_MAX
#endif

static inline void *jbd_alloc(size_t size, gfp_t flags)
@@ -245,6 +237,13 @@ typedef struct journal_superblock_s
#include <linux/sched.h>
#include <linux/jbd_common.h>

+#ifdef CONFIG_JBD_DEBUG
+#define jbd_debug(n, fmt, a...) \
+ __jbd_debug((n), __FILE__, __func__, __LINE__, (fmt), ##a)
+#else
+#define jbd_debug(n, fmt, a...) /**/
+#endif
+
#define J_ASSERT(assert) BUG_ON(!(assert))

#define J_ASSERT_BH(bh, expr) J_ASSERT(expr)
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 9986ab8..c39a9ca 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -56,17 +56,9 @@
*/
#define JBD2_EXPENSIVE_CHECKING
extern ushort jbd2_journal_enable_debug;
-
-#define jbd_debug(n, f, a...) \
- do { \
- if ((n) <= jbd2_journal_enable_debug) { \
- printk (KERN_DEBUG "(%s, %d): %s: ", \
- __FILE__, __LINE__, __func__); \
- printk (f, ## a); \
- } \
- } while (0)
+#define JOURNAL_ENABLE_DEBUG jbd2_journal_enable_debug
#else
-#define jbd_debug(f, a...) /**/
+#define JOURNAL_ENABLE_DEBUG INT_MAX
#endif

extern void *jbd2_alloc(size_t size, gfp_t flags);
@@ -303,6 +295,13 @@ typedef struct journal_superblock_s
#include <linux/sched.h>
#include <linux/jbd_common.h>

+#ifdef CONFIG_JBD2_DEBUG
+#define jbd_debug(n, fmt, a...) \
+ __jbd_debug((n), __FILE__, __func__, __LINE__, (fmt), ##a)
+#else
+#define jbd_debug(n, fmt, a...) /**/
+#endif
+
#define J_ASSERT(assert) BUG_ON(!(assert))

#define J_ASSERT_BH(bh, expr) J_ASSERT(expr)
diff --git a/include/linux/jbd_common.h b/include/linux/jbd_common.h
index 949b5dd..c8e2fba 100644
--- a/include/linux/jbd_common.h
+++ b/include/linux/jbd_common.h
@@ -1,6 +1,7 @@
#ifndef _LINUX_JBD_STATE_H
#define _LINUX_JBD_STATE_H

+#include <linux/printk.h>
#include <linux/bit_spinlock.h>

enum jbd_state_bits {
@@ -29,6 +30,23 @@ TAS_BUFFER_FNS(RevokeValid, revokevalid)
BUFFER_FNS(Freed, freed)
BUFFER_FNS(Verified, verified)

+#if defined(CONFIG_JBD_DEBUG) || defined(CONFIG_JBD2_DEBUG)
+static inline void __jbd_debug(int level, const char *file, const char *func,
+ unsigned int line, const char *fmt, ...)
+{
+ struct va_format vaf;
+ va_list args;
+
+ if (level > JOURNAL_ENABLE_DEBUG)
+ return;
+ va_start(args, fmt);
+ vaf.fmt = fmt;
+ vaf.va = &args;
+ printk(KERN_DEBUG "%s: (%s, %u): %pV\n", file, func, line, &vaf);
+ va_end(args);
+}
+#endif
+
static inline struct buffer_head *jh2bh(struct journal_head *jh)
{
return jh->b_bh;
--
1.8.1.2

2013-06-11 17:53:48

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 1/4] jbd2/journal_commit_transaction: relocate state lock to incorporate all users

On Tue, Jun 11, 2013 at 01:38:24PM -0400, Paul Gortmaker wrote:
>
> +#if defined(CONFIG_JBD_DEBUG) || defined(CONFIG_JBD2_DEBUG)
> +static inline void __jbd_debug(int level, const char *file, const char *func,
> + unsigned int line, const char *fmt, ...)
> +{
> + struct va_format vaf;
> + va_list args;
> +
> + if (level > JOURNAL_ENABLE_DEBUG)
> + return;
> + va_start(args, fmt);
> + vaf.fmt = fmt;
> + vaf.va = &args;
> + printk(KERN_DEBUG "%s: (%s, %u): %pV\n", file, func, line, &vaf);
> + va_end(args);
> +}
> +#endif

What's the difference in the overall size of jbd.ko and jbd2.ko using
an inline __jbd_debug compared with one where the __jbd_debug is
defined as a function in fs/jbd2/journal.c (or fs/jbd/journal.c; that
will be the problem trying to solve this problem for jbd and jbd2 at
the same time)?

If gcc was competent at optimizing inline functions with varargs, this
might be fine. But I'm a bit concerned whether this might cause a lot
of bloat if CONFIG_JBD2_DEBUG is enabled. It would be nice if the
impact was small enough that people were willing to enable so it's
available when we want to do debuggin.

Regards,

- Ted

2013-06-11 18:48:19

by Paul Gortmaker

[permalink] [raw]
Subject: Re: [PATCH 1/4] jbd2/journal_commit_transaction: relocate state lock to incorporate all users

[Re: [PATCH 1/4] jbd2/journal_commit_transaction: relocate state lock to incorporate all users] On 11/06/2013 (Tue 13:53) Theodore Ts'o wrote:

> On Tue, Jun 11, 2013 at 01:38:24PM -0400, Paul Gortmaker wrote:
> >
> > +#if defined(CONFIG_JBD_DEBUG) || defined(CONFIG_JBD2_DEBUG)
> > +static inline void __jbd_debug(int level, const char *file, const char *func,
> > + unsigned int line, const char *fmt, ...)
> > +{
> > + struct va_format vaf;
> > + va_list args;
> > +
> > + if (level > JOURNAL_ENABLE_DEBUG)
> > + return;
> > + va_start(args, fmt);
> > + vaf.fmt = fmt;
> > + vaf.va = &args;
> > + printk(KERN_DEBUG "%s: (%s, %u): %pV\n", file, func, line, &vaf);
> > + va_end(args);
> > +}
> > +#endif
>
> What's the difference in the overall size of jbd.ko and jbd2.ko using
> an inline __jbd_debug compared with one where the __jbd_debug is
> defined as a function in fs/jbd2/journal.c (or fs/jbd/journal.c; that
> will be the problem trying to solve this problem for jbd and jbd2 at
> the same time)?
>
> If gcc was competent at optimizing inline functions with varargs, this
> might be fine. But I'm a bit concerned whether this might cause a lot
> of bloat if CONFIG_JBD2_DEBUG is enabled. It would be nice if the
> impact was small enough that people were willing to enable so it's
> available when we want to do debuggin.

There is a small delta for inline vs non-inline, but it isn't anywhere
near as big as the off-vs-on delta (with gcc-4.7.3):

JBD[2]_DEBUG off:
text data bss dec hex filename
48072 113 128 48313 bcb9 fs/jbd2/built-in.o
36862 50 32 36944 9050 fs/jbd/built-in.o

JBD[2]_DEBUG on, __jbd[2]_debug as two independent real void functions:
text data bss dec hex filename
55261 115 132 55508 d8d4 fs/jbd2/built-in.o
43752 51 52 43855 ab4f fs/jbd/built-in.o

JBD[2]_DEBUG on, __jbd_debug as a shared inline function in jbd_common.h:
text data bss dec hex filename
55862 115 132 56109 db2d fs/jbd2/built-in.o
44353 51 52 44456 ada8 fs/jbd/built-in.o

So it looks like you can choose whichever solution you like
better. I don't have a real preference either way. The
non-inline version looks like what I've included below.

Paul.
--

>From d22a2bbe9108934ba3d1374f61f6e3a01aaa36e6 Mon Sep 17 00:00:00 2001
From: Paul Gortmaker <[email protected]>
Date: Tue, 11 Jun 2013 11:46:10 -0400
Subject: [PATCH] jbd/jbd2: make jbd_debug that won't split printk statements

Since the jbd_debug() is implemented with two separate printk()
calls, it can lead to corrupted and misleading debug output like
the following (see lines marked with "*"):

[ 290.339362] (fs/jbd2/journal.c, 203): kjournald2: kjournald2 wakes
[ 290.339365] (fs/jbd2/journal.c, 155): kjournald2: commit_sequence=42103, commit_request=42104
[ 290.339369] (fs/jbd2/journal.c, 158): kjournald2: OK, requests differ
[* 290.339376] (fs/jbd2/journal.c, 648): jbd2_log_wait_commit:
[* 290.339379] (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: want 42104, j_commit_sequence=42103
[* 290.339382] JBD2: starting commit of transaction 42104
[ 290.339410] (fs/jbd2/revoke.c, 566): jbd2_journal_write_revoke_records: Wrote 0 revoke records
[ 290.376555] (fs/jbd2/commit.c, 1088): jbd2_journal_commit_transaction: JBD2: commit 42104 complete, head 42079

i.e. the debug output from log_wait_commit and journal_commit_transaction
have become interleaved. The output should have been:

(fs/jbd2/journal.c, 648): jbd2_log_wait_commit: JBD2: want 42104, j_commit_sequence=42103
(fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: starting commit of transaction 42104

It is expected that this is not easy to replicate -- I was only able
to cause it on preempt-rt kernels, and even then only under heavy
I/O load. Ted suggested we make use of va_args to reduce it to a single
printk() call to fix it.

This was originally seen on ext4, but since ext3 should suffer the
same problem, I've taken the next step to implement the same kind of
fix there as well.

Signed-off-by: Paul Gortmaker <[email protected]>

diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c
index 6510d63..5e046a6 100644
--- a/fs/jbd/journal.c
+++ b/fs/jbd/journal.c
@@ -90,6 +90,23 @@ static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
static void __journal_abort_soft (journal_t *journal, int errno);
static const char *journal_dev_name(journal_t *journal, char *buffer);

+#ifdef CONFIG_JBD_DEBUG
+void __jbd_debug(int level, const char *file, const char *func,
+ unsigned int line, const char *fmt, ...)
+{
+ struct va_format vaf;
+ va_list args;
+
+ if (level > journal_enable_debug)
+ return;
+ va_start(args, fmt);
+ vaf.fmt = fmt;
+ vaf.va = &args;
+ printk(KERN_DEBUG "%s: (%s, %u): %pV\n", file, func, line, &vaf);
+ va_end(args);
+}
+#endif
+
/*
* Helper function used to manage commit timeouts
*/
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 9545757..6aa196e 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -103,6 +103,23 @@ EXPORT_SYMBOL(jbd2_inode_cache);
static void __journal_abort_soft (journal_t *journal, int errno);
static int jbd2_journal_create_slab(size_t slab_size);

+#ifdef CONFIG_JBD2_DEBUG
+void __jbd2_debug(int level, const char *file, const char *func,
+ unsigned int line, const char *fmt, ...)
+{
+ struct va_format vaf;
+ va_list args;
+
+ if (level > jbd2_journal_enable_debug)
+ return;
+ va_start(args, fmt);
+ vaf.fmt = fmt;
+ vaf.va = &args;
+ printk(KERN_DEBUG "%s: (%s, %u): %pV\n", file, func, line, &vaf);
+ va_end(args);
+}
+#endif
+
/* Checksumming functions */
int jbd2_verify_csum_type(journal_t *j, journal_superblock_t *sb)
{
diff --git a/include/linux/jbd.h b/include/linux/jbd.h
index 6e5524f..bc12abc 100644
--- a/include/linux/jbd.h
+++ b/include/linux/jbd.h
@@ -56,17 +56,13 @@
*/
#define JBD_EXPENSIVE_CHECKING
extern u8 journal_enable_debug;
+void __jbd_debug(int level, const char *file, const char *func,
+ unsigned int line, const char *fmt, ...);

-#define jbd_debug(n, f, a...) \
- do { \
- if ((n) <= journal_enable_debug) { \
- printk (KERN_DEBUG "(%s, %d): %s: ", \
- __FILE__, __LINE__, __func__); \
- printk (f, ## a); \
- } \
- } while (0)
+#define jbd_debug(n, fmt, a...) \
+ __jbd_debug((n), __FILE__, __func__, __LINE__, (fmt), ##a)
#else
-#define jbd_debug(f, a...) /**/
+#define jbd_debug(n, fmt, a...) /**/
#endif

static inline void *jbd_alloc(size_t size, gfp_t flags)
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 9986ab8..8d95314 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -56,17 +56,13 @@
*/
#define JBD2_EXPENSIVE_CHECKING
extern ushort jbd2_journal_enable_debug;
+void __jbd2_debug(int level, const char *file, const char *func,
+ unsigned int line, const char *fmt, ...);

-#define jbd_debug(n, f, a...) \
- do { \
- if ((n) <= jbd2_journal_enable_debug) { \
- printk (KERN_DEBUG "(%s, %d): %s: ", \
- __FILE__, __LINE__, __func__); \
- printk (f, ## a); \
- } \
- } while (0)
+#define jbd_debug(n, fmt, a...) \
+ __jbd2_debug((n), __FILE__, __func__, __LINE__, (fmt), ##a)
#else
-#define jbd_debug(f, a...) /**/
+#define jbd_debug(n, fmt, a...) /**/
#endif

extern void *jbd2_alloc(size_t size, gfp_t flags);
--
1.8.1.2

2013-06-11 21:53:42

by Paul Gortmaker

[permalink] [raw]
Subject: Re: [PATCH 1/4] jbd2/journal_commit_transaction: relocate state lock to incorporate all users

On 13-06-11 02:48 PM, Paul Gortmaker wrote:
> [Re: [PATCH 1/4] jbd2/journal_commit_transaction: relocate state lock to incorporate all users] On 11/06/2013 (Tue 13:53) Theodore Ts'o wrote:
>
>> On Tue, Jun 11, 2013 at 01:38:24PM -0400, Paul Gortmaker wrote:
>>>
>>> +#if defined(CONFIG_JBD_DEBUG) || defined(CONFIG_JBD2_DEBUG)
>>> +static inline void __jbd_debug(int level, const char *file, const char *func,
>>> + unsigned int line, const char *fmt, ...)
>>> +{
>>> + struct va_format vaf;
>>> + va_list args;
>>> +
>>> + if (level > JOURNAL_ENABLE_DEBUG)
>>> + return;
>>> + va_start(args, fmt);
>>> + vaf.fmt = fmt;
>>> + vaf.va = &args;
>>> + printk(KERN_DEBUG "%s: (%s, %u): %pV\n", file, func, line, &vaf);
>>> + va_end(args);
>>> +}
>>> +#endif
>>
>> What's the difference in the overall size of jbd.ko and jbd2.ko using
>> an inline __jbd_debug compared with one where the __jbd_debug is
>> defined as a function in fs/jbd2/journal.c (or fs/jbd/journal.c; that
>> will be the problem trying to solve this problem for jbd and jbd2 at
>> the same time)?
>>
>> If gcc was competent at optimizing inline functions with varargs, this
>> might be fine. But I'm a bit concerned whether this might cause a lot
>> of bloat if CONFIG_JBD2_DEBUG is enabled. It would be nice if the
>> impact was small enough that people were willing to enable so it's
>> available when we want to do debuggin.
>
> There is a small delta for inline vs non-inline, but it isn't anywhere
> near as big as the off-vs-on delta (with gcc-4.7.3):
>
> JBD[2]_DEBUG off:
> text data bss dec hex filename
> 48072 113 128 48313 bcb9 fs/jbd2/built-in.o
> 36862 50 32 36944 9050 fs/jbd/built-in.o
>
> JBD[2]_DEBUG on, __jbd[2]_debug as two independent real void functions:
> text data bss dec hex filename
> 55261 115 132 55508 d8d4 fs/jbd2/built-in.o
> 43752 51 52 43855 ab4f fs/jbd/built-in.o
>
> JBD[2]_DEBUG on, __jbd_debug as a shared inline function in jbd_common.h:
> text data bss dec hex filename
> 55862 115 132 56109 db2d fs/jbd2/built-in.o
> 44353 51 52 44456 ada8 fs/jbd/built-in.o
>
> So it looks like you can choose whichever solution you like
> better. I don't have a real preference either way. The

The more I think about this, the more I'm inclined to go with
the two independent real void functions. It gives smaller code
(albeit only slightly) and it probably is a tiny amount easier
to read vs. trying to be clever to share ~20 lines of code.

Plus I can then separate the ext3 content from the ext4 content
and create appropriate CC (i.e. add Jan Kara) for those parts.

Paul.
--

> non-inline version looks like what I've included below.
>
> Paul.
> --
>
> From d22a2bbe9108934ba3d1374f61f6e3a01aaa36e6 Mon Sep 17 00:00:00 2001
> From: Paul Gortmaker <[email protected]>
> Date: Tue, 11 Jun 2013 11:46:10 -0400
> Subject: [PATCH] jbd/jbd2: make jbd_debug that won't split printk statements
>
> Since the jbd_debug() is implemented with two separate printk()
> calls, it can lead to corrupted and misleading debug output like
> the following (see lines marked with "*"):
>
> [ 290.339362] (fs/jbd2/journal.c, 203): kjournald2: kjournald2 wakes
> [ 290.339365] (fs/jbd2/journal.c, 155): kjournald2: commit_sequence=42103, commit_request=42104
> [ 290.339369] (fs/jbd2/journal.c, 158): kjournald2: OK, requests differ
> [* 290.339376] (fs/jbd2/journal.c, 648): jbd2_log_wait_commit:
> [* 290.339379] (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: want 42104, j_commit_sequence=42103
> [* 290.339382] JBD2: starting commit of transaction 42104
> [ 290.339410] (fs/jbd2/revoke.c, 566): jbd2_journal_write_revoke_records: Wrote 0 revoke records
> [ 290.376555] (fs/jbd2/commit.c, 1088): jbd2_journal_commit_transaction: JBD2: commit 42104 complete, head 42079
>
> i.e. the debug output from log_wait_commit and journal_commit_transaction
> have become interleaved. The output should have been:
>
> (fs/jbd2/journal.c, 648): jbd2_log_wait_commit: JBD2: want 42104, j_commit_sequence=42103
> (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: starting commit of transaction 42104
>
> It is expected that this is not easy to replicate -- I was only able
> to cause it on preempt-rt kernels, and even then only under heavy
> I/O load. Ted suggested we make use of va_args to reduce it to a single
> printk() call to fix it.
>
> This was originally seen on ext4, but since ext3 should suffer the
> same problem, I've taken the next step to implement the same kind of
> fix there as well.
>
> Signed-off-by: Paul Gortmaker <[email protected]>
>
> diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c
> index 6510d63..5e046a6 100644
> --- a/fs/jbd/journal.c
> +++ b/fs/jbd/journal.c
> @@ -90,6 +90,23 @@ static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
> static void __journal_abort_soft (journal_t *journal, int errno);
> static const char *journal_dev_name(journal_t *journal, char *buffer);
>
> +#ifdef CONFIG_JBD_DEBUG
> +void __jbd_debug(int level, const char *file, const char *func,
> + unsigned int line, const char *fmt, ...)
> +{
> + struct va_format vaf;
> + va_list args;
> +
> + if (level > journal_enable_debug)
> + return;
> + va_start(args, fmt);
> + vaf.fmt = fmt;
> + vaf.va = &args;
> + printk(KERN_DEBUG "%s: (%s, %u): %pV\n", file, func, line, &vaf);
> + va_end(args);
> +}
> +#endif
> +
> /*
> * Helper function used to manage commit timeouts
> */
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index 9545757..6aa196e 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -103,6 +103,23 @@ EXPORT_SYMBOL(jbd2_inode_cache);
> static void __journal_abort_soft (journal_t *journal, int errno);
> static int jbd2_journal_create_slab(size_t slab_size);
>
> +#ifdef CONFIG_JBD2_DEBUG
> +void __jbd2_debug(int level, const char *file, const char *func,
> + unsigned int line, const char *fmt, ...)
> +{
> + struct va_format vaf;
> + va_list args;
> +
> + if (level > jbd2_journal_enable_debug)
> + return;
> + va_start(args, fmt);
> + vaf.fmt = fmt;
> + vaf.va = &args;
> + printk(KERN_DEBUG "%s: (%s, %u): %pV\n", file, func, line, &vaf);
> + va_end(args);
> +}
> +#endif
> +
> /* Checksumming functions */
> int jbd2_verify_csum_type(journal_t *j, journal_superblock_t *sb)
> {
> diff --git a/include/linux/jbd.h b/include/linux/jbd.h
> index 6e5524f..bc12abc 100644
> --- a/include/linux/jbd.h
> +++ b/include/linux/jbd.h
> @@ -56,17 +56,13 @@
> */
> #define JBD_EXPENSIVE_CHECKING
> extern u8 journal_enable_debug;
> +void __jbd_debug(int level, const char *file, const char *func,
> + unsigned int line, const char *fmt, ...);
>
> -#define jbd_debug(n, f, a...) \
> - do { \
> - if ((n) <= journal_enable_debug) { \
> - printk (KERN_DEBUG "(%s, %d): %s: ", \
> - __FILE__, __LINE__, __func__); \
> - printk (f, ## a); \
> - } \
> - } while (0)
> +#define jbd_debug(n, fmt, a...) \
> + __jbd_debug((n), __FILE__, __func__, __LINE__, (fmt), ##a)
> #else
> -#define jbd_debug(f, a...) /**/
> +#define jbd_debug(n, fmt, a...) /**/
> #endif
>
> static inline void *jbd_alloc(size_t size, gfp_t flags)
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index 9986ab8..8d95314 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -56,17 +56,13 @@
> */
> #define JBD2_EXPENSIVE_CHECKING
> extern ushort jbd2_journal_enable_debug;
> +void __jbd2_debug(int level, const char *file, const char *func,
> + unsigned int line, const char *fmt, ...);
>
> -#define jbd_debug(n, f, a...) \
> - do { \
> - if ((n) <= jbd2_journal_enable_debug) { \
> - printk (KERN_DEBUG "(%s, %d): %s: ", \
> - __FILE__, __LINE__, __func__); \
> - printk (f, ## a); \
> - } \
> - } while (0)
> +#define jbd_debug(n, fmt, a...) \
> + __jbd2_debug((n), __FILE__, __func__, __LINE__, (fmt), ##a)
> #else
> -#define jbd_debug(f, a...) /**/
> +#define jbd_debug(n, fmt, a...) /**/
> #endif
>
> extern void *jbd2_alloc(size_t size, gfp_t flags);
>

2013-06-11 22:45:11

by Paul Gortmaker

[permalink] [raw]
Subject: [PATCH v2 0/6] misc jbd2 fixes and cleanups

This is a series of commits that came about while researching a problem
on a preempt-rt kernel. The RT problem seems tied to the function
jbd2_journal_commit_transaction, hence the common theme/location in
this series.

That said, it turns out that there is nothing in these commits that is
specific to RT, so they were reviewed with the aim to mainline them
(and also continue to research the RT problem...)

Once these have been placed in a non-rebased branch with fixed commit
IDs, I'll submit ext3/jbd versions of the same where appropriate.

Testing has involved ensuring jbd_debug still works (with the newly
documented non-debugfs module param) and also with a largely parallel
yocto build running concurrently with a large "rm -rf" (i.e. the test
that has been problematic on 3.4.x/3.6.x preempt-rt kernels).

Thanks,
Paul.

---

Changes since v1:
--assert after state lock is now a standalone/separate commit
--add jbd2-only va_args patch for jbd_debug in manner suggested by Ted
--fix log_wait_for_space patch to avoid extra mutex lock/unlock
--add new patch to fix stale data/dependency in jbd2/Kconfig


The following changes since commit 317ddd256b9c24b0d78fa8018f80f1e495481a10:

Linux 3.10-rc5 (2013-06-08 17:41:04 -0700)

are available in the git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux.git jbd2-fixes

for you to fetch changes up to 26a48f70423cc58f5fbaf1d8fcd703e8bd455aef:

jbd2: remove debug dependency on debug_fs; update help text (2013-06-11 17:58:42 -0400)

----------------------------------------------------------------
Paul Gortmaker (6):
jbd2/journal_commit_transaction: relocate assert after state lock
jbd2/log_wait_for_space: drop checkpoint mutex when waiting
jbd2: fix duplicate debug label for phase 2
jbd/jbd2: relocate bit_spinlock header to jbd_common
jbd2: make jbd_debug that won't split printk statements
jbd2: remove debug dependency on debug_fs; update help text

fs/jbd2/Kconfig | 6 +++---
fs/jbd2/checkpoint.c | 8 ++++++++
fs/jbd2/commit.c | 6 +++---
fs/jbd2/journal.c | 17 +++++++++++++++++
include/linux/jbd.h | 1 -
include/linux/jbd2.h | 15 +++++----------
include/linux/jbd_common.h | 2 ++
7 files changed, 38 insertions(+), 17 deletions(-)

2013-06-11 22:45:14

by Paul Gortmaker

[permalink] [raw]
Subject: [PATCH 4/6] jbd/jbd2: relocate bit_spinlock header to jbd_common

The bit_spinlock functions are only used for the jbd_lock_bh_state
functions (and friends) in jbd_common.h and are not directly used
by either of jbd.h or jbd2.h content.

The jbd_common file is new as of commit 446066724c36 ("jdb/jbd2: factor
out common functions from the jbd[2] header files") but common
(and isolated) headers were not considered for factoring at that time.

Signed-off-by: Paul Gortmaker <[email protected]>
---
include/linux/jbd.h | 1 -
include/linux/jbd2.h | 1 -
include/linux/jbd_common.h | 2 ++
3 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/jbd.h b/include/linux/jbd.h
index 7e0b622..6e5524f 100644
--- a/include/linux/jbd.h
+++ b/include/linux/jbd.h
@@ -27,7 +27,6 @@
#include <linux/buffer_head.h>
#include <linux/journal-head.h>
#include <linux/stddef.h>
-#include <linux/bit_spinlock.h>
#include <linux/mutex.h>
#include <linux/timer.h>
#include <linux/lockdep.h>
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 6e051f4..9986ab8 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -26,7 +26,6 @@
#include <linux/buffer_head.h>
#include <linux/journal-head.h>
#include <linux/stddef.h>
-#include <linux/bit_spinlock.h>
#include <linux/mutex.h>
#include <linux/timer.h>
#include <linux/slab.h>
diff --git a/include/linux/jbd_common.h b/include/linux/jbd_common.h
index 6133679..949b5dd 100644
--- a/include/linux/jbd_common.h
+++ b/include/linux/jbd_common.h
@@ -1,6 +1,8 @@
#ifndef _LINUX_JBD_STATE_H
#define _LINUX_JBD_STATE_H

+#include <linux/bit_spinlock.h>
+
enum jbd_state_bits {
BH_JBD /* Has an attached ext3 journal_head */
= BH_PrivateStart,
--
1.8.1.2


2013-06-11 22:44:25

by Paul Gortmaker

[permalink] [raw]
Subject: [PATCH 1/6] jbd2/journal_commit_transaction: relocate assert after state lock

The state lock is taken after we are doing an assert on the state
value, not before. So we might in fact be doing an assert on a
transient value. Ensure the state check is within the scope of
the state lock being taken.

Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/jbd2/commit.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 0f53946..3263e4a 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -424,13 +424,13 @@ void jbd2_journal_commit_transaction(journal_t *journal)
J_ASSERT(journal->j_committing_transaction == NULL);

commit_transaction = journal->j_running_transaction;
- J_ASSERT(commit_transaction->t_state == T_RUNNING);

trace_jbd2_start_commit(journal, commit_transaction);
jbd_debug(1, "JBD2: starting commit of transaction %d\n",
commit_transaction->t_tid);

write_lock(&journal->j_state_lock);
+ J_ASSERT(commit_transaction->t_state == T_RUNNING);
commit_transaction->t_state = T_LOCKED;

trace_jbd2_commit_locking(journal, commit_transaction);
--
1.8.1.2


2013-06-11 22:45:15

by Paul Gortmaker

[permalink] [raw]
Subject: [PATCH 5/6] jbd2: make jbd_debug that won't split printk statements

Since the jbd_debug() is implemented with two separate printk()
calls, it can lead to corrupted and misleading debug output like
the following (see lines marked with "*"):

[ 290.339362] (fs/jbd2/journal.c, 203): kjournald2: kjournald2 wakes
[ 290.339365] (fs/jbd2/journal.c, 155): kjournald2: commit_sequence=42103, commit_request=42104
[ 290.339369] (fs/jbd2/journal.c, 158): kjournald2: OK, requests differ
[* 290.339376] (fs/jbd2/journal.c, 648): jbd2_log_wait_commit:
[* 290.339379] (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: want 42104, j_commit_sequence=42103
[* 290.339382] JBD2: starting commit of transaction 42104
[ 290.339410] (fs/jbd2/revoke.c, 566): jbd2_journal_write_revoke_records: Wrote 0 revoke records
[ 290.376555] (fs/jbd2/commit.c, 1088): jbd2_journal_commit_transaction: JBD2: commit 42104 complete, head 42079

i.e. the debug output from log_wait_commit and journal_commit_transaction
have become interleaved. The output should have been:

(fs/jbd2/journal.c, 648): jbd2_log_wait_commit: JBD2: want 42104, j_commit_sequence=42103
(fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: starting commit of transaction 42104

It is expected that this is not easy to replicate -- I was only able
to cause it on preempt-rt kernels, and even then only under heavy
I/O load.

Reported-by: Paul Gortmaker <[email protected]>
Suggested-by: "Theodore Ts'o" <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/jbd2/journal.c | 17 +++++++++++++++++
include/linux/jbd2.h | 14 +++++---------
2 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 9545757..6aa196e 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -103,6 +103,23 @@ EXPORT_SYMBOL(jbd2_inode_cache);
static void __journal_abort_soft (journal_t *journal, int errno);
static int jbd2_journal_create_slab(size_t slab_size);

+#ifdef CONFIG_JBD2_DEBUG
+void __jbd2_debug(int level, const char *file, const char *func,
+ unsigned int line, const char *fmt, ...)
+{
+ struct va_format vaf;
+ va_list args;
+
+ if (level > jbd2_journal_enable_debug)
+ return;
+ va_start(args, fmt);
+ vaf.fmt = fmt;
+ vaf.va = &args;
+ printk(KERN_DEBUG "%s: (%s, %u): %pV\n", file, func, line, &vaf);
+ va_end(args);
+}
+#endif
+
/* Checksumming functions */
int jbd2_verify_csum_type(journal_t *j, journal_superblock_t *sb)
{
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 9986ab8..8d95314 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -56,17 +56,13 @@
*/
#define JBD2_EXPENSIVE_CHECKING
extern ushort jbd2_journal_enable_debug;
+void __jbd2_debug(int level, const char *file, const char *func,
+ unsigned int line, const char *fmt, ...);

-#define jbd_debug(n, f, a...) \
- do { \
- if ((n) <= jbd2_journal_enable_debug) { \
- printk (KERN_DEBUG "(%s, %d): %s: ", \
- __FILE__, __LINE__, __func__); \
- printk (f, ## a); \
- } \
- } while (0)
+#define jbd_debug(n, fmt, a...) \
+ __jbd2_debug((n), __FILE__, __func__, __LINE__, (fmt), ##a)
#else
-#define jbd_debug(f, a...) /**/
+#define jbd_debug(n, fmt, a...) /**/
#endif

extern void *jbd2_alloc(size_t size, gfp_t flags);
--
1.8.1.2


2013-06-11 22:45:16

by Paul Gortmaker

[permalink] [raw]
Subject: [PATCH 6/6] jbd2: remove debug dependency on debug_fs; update help text

commit b6e96d0067d8 ("jbd2: use module parameters instead of debugfs
for jbd_debug") removed any need for a dependency on DEBUG_FS. It
also moved the /sys variables out from underneath the typical debugfs
mount point. Delete the dependency and update the /sys path to where
the debug settings are currently.

Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/jbd2/Kconfig | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/jbd2/Kconfig b/fs/jbd2/Kconfig
index 69a48c2..5a9f553 100644
--- a/fs/jbd2/Kconfig
+++ b/fs/jbd2/Kconfig
@@ -20,7 +20,7 @@ config JBD2

config JBD2_DEBUG
bool "JBD2 (ext4) debugging support"
- depends on JBD2 && DEBUG_FS
+ depends on JBD2
help
If you are using the ext4 journaled file system (or
potentially any other filesystem/device using JBD2), this option
@@ -29,7 +29,7 @@ config JBD2_DEBUG
By default, the debugging output will be turned off.

If you select Y here, then you will be able to turn on debugging
- with "echo N > /sys/kernel/debug/jbd2/jbd2-debug", where N is a
+ with "echo N > /sys/module/jbd2/parameters/jbd2_debug", where N is a
number between 1 and 5. The higher the number, the more debugging
output is generated. To turn debugging off again, do
- "echo 0 > /sys/kernel/debug/jbd2/jbd2-debug".
+ "echo 0 > /sys/module/jbd2/parameters/jbd2_debug".
--
1.8.1.2


2013-06-11 22:44:27

by Paul Gortmaker

[permalink] [raw]
Subject: [PATCH 3/6] jbd2: fix duplicate debug label for phase 2

Currently we see this output:

$git grep phase fs/jbd2
fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 1\n");
fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 2\n");
fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 2\n");
fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 3\n");
fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 4\n");
[...]

There is clearly a duplicate label for phase 2, and they are
both active (i.e. not in #if ... #else block). Rename them to
be "2a" and "2b" so the debug output is unambiguous.

Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/jbd2/commit.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 3263e4a..ab3c9d9 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -533,7 +533,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
wake_up(&journal->j_wait_transaction_locked);
write_unlock(&journal->j_state_lock);

- jbd_debug(3, "JBD2: commit phase 2\n");
+ jbd_debug(3, "JBD2: commit phase 2a\n");

/*
* Now start flushing things to disk, in the order they appear
@@ -548,7 +548,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
WRITE_SYNC);
blk_finish_plug(&plug);

- jbd_debug(3, "JBD2: commit phase 2\n");
+ jbd_debug(3, "JBD2: commit phase 2b\n");

/*
* Way to go: we have now written out all of the data for a
--
1.8.1.2


2013-06-11 22:44:26

by Paul Gortmaker

[permalink] [raw]
Subject: [PATCH 2/6] jbd2/log_wait_for_space: drop checkpoint mutex when waiting

While trying to debug an an issue under extreme I/O loading
on preempt-rt kernels, the following backtrace was observed
via SysRQ output:

rm D ffff8802203afbc0 4600 4878 4748 0x00000000
ffff8802217bfb78 0000000000000082 ffff88021fc2bb80 ffff88021fc2bb80
ffff88021fc2bb80 ffff8802217bffd8 ffff8802217bffd8 ffff8802217bffd8
ffff88021f1d4c80 ffff88021fc2bb80 ffff8802217bfb88 ffff88022437b000
Call Trace:
[<ffffffff8172dc34>] schedule+0x24/0x70
[<ffffffff81225b5d>] jbd2_log_wait_commit+0xbd/0x140
[<ffffffff81060390>] ? __init_waitqueue_head+0x50/0x50
[<ffffffff81223635>] jbd2_log_do_checkpoint+0xf5/0x520
[<ffffffff81223b09>] __jbd2_log_wait_for_space+0xa9/0x1f0
[<ffffffff8121dc40>] start_this_handle.isra.10+0x2e0/0x530
[<ffffffff81060390>] ? __init_waitqueue_head+0x50/0x50
[<ffffffff8121e0a3>] jbd2__journal_start+0xc3/0x110
[<ffffffff811de7ce>] ? ext4_rmdir+0x6e/0x230
[<ffffffff8121e0fe>] jbd2_journal_start+0xe/0x10
[<ffffffff811f308b>] ext4_journal_start_sb+0x5b/0x160
[<ffffffff811de7ce>] ext4_rmdir+0x6e/0x230
[<ffffffff811435c5>] vfs_rmdir+0xd5/0x140
[<ffffffff8114370f>] do_rmdir+0xdf/0x120
[<ffffffff8105c6b4>] ? task_work_run+0x44/0x80
[<ffffffff81002889>] ? do_notify_resume+0x89/0x100
[<ffffffff817361ae>] ? int_signal+0x12/0x17
[<ffffffff81145d85>] sys_unlinkat+0x25/0x40
[<ffffffff81735f22>] system_call_fastpath+0x16/0x1b

What is interesting here, is that we call log_wait_commit, from
within wait_for_space, but we are still holding the checkpoint_mutex
as it surrounds mostly the whole of wait_for_space. And then, as we
are waiting, journal_commit_transaction can run, and if the JBD2_FLUSHED
bit is set, then we will also try to take the same checkpoint_mutex.

It seems that we need to drop the checkpoint_mutex while sitting in
jbd2_log_wait_commit, if we want to guarantee that progress can be made
by jbd2_journal_commit_transaction(). There does not seem to be
anything preempt-rt specific about this, other then perhaps increasing
the odds of it happening.

Signed-off-by: Paul Gortmaker <[email protected]>
---
fs/jbd2/checkpoint.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index c78841e..1d4f129 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -156,7 +156,15 @@ void __jbd2_log_wait_for_space(journal_t *journal)
/* We were able to recover space; yay! */
;
} else if (tid) {
+ /*
+ * jbd2_journal_commit_transaction() may want
+ * to take the checkpoint_mutex if JBD2_FLUSHED
+ * is set. So we need to temporarily drop it.
+ */
+ mutex_unlock(&journal->j_checkpoint_mutex);
jbd2_log_wait_commit(journal, tid);
+ write_lock(&journal->j_state_lock);
+ continue;
} else {
printk(KERN_ERR "%s: needed %d blocks and "
"only had %d space available\n",
--
1.8.1.2


2013-06-13 02:42:52

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 1/6] jbd2/journal_commit_transaction: relocate assert after state lock

On Tue, Jun 11, 2013 at 06:44:25PM -0400, Paul Gortmaker wrote:
> The state lock is taken after we are doing an assert on the state
> value, not before. So we might in fact be doing an assert on a
> transient value. Ensure the state check is within the scope of
> the state lock being taken.
>
> Signed-off-by: Paul Gortmaker <[email protected]>

Thanks, applied.

- Ted

2013-06-13 02:55:30

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 2/6] jbd2/log_wait_for_space: drop checkpoint mutex when waiting

On Tue, Jun 11, 2013 at 06:44:26PM -0400, Paul Gortmaker wrote:
> While trying to debug an an issue under extreme I/O loading
> on preempt-rt kernels, the following backtrace was observed
> via SysRQ output:
>
> rm D ffff8802203afbc0 4600 4878 4748 0x00000000
> ffff8802217bfb78 0000000000000082 ffff88021fc2bb80 ffff88021fc2bb80
> ffff88021fc2bb80 ffff8802217bffd8 ffff8802217bffd8 ffff8802217bffd8
> ffff88021f1d4c80 ffff88021fc2bb80 ffff8802217bfb88 ffff88022437b000
> Call Trace:
> [<ffffffff8172dc34>] schedule+0x24/0x70
> [<ffffffff81225b5d>] jbd2_log_wait_commit+0xbd/0x140
> [<ffffffff81060390>] ? __init_waitqueue_head+0x50/0x50
> [<ffffffff81223635>] jbd2_log_do_checkpoint+0xf5/0x520
> [<ffffffff81223b09>] __jbd2_log_wait_for_space+0xa9/0x1f0
> [<ffffffff8121dc40>] start_this_handle.isra.10+0x2e0/0x530
> [<ffffffff81060390>] ? __init_waitqueue_head+0x50/0x50
> [<ffffffff8121e0a3>] jbd2__journal_start+0xc3/0x110
> [<ffffffff811de7ce>] ? ext4_rmdir+0x6e/0x230
> [<ffffffff8121e0fe>] jbd2_journal_start+0xe/0x10
> [<ffffffff811f308b>] ext4_journal_start_sb+0x5b/0x160
> [<ffffffff811de7ce>] ext4_rmdir+0x6e/0x230
> [<ffffffff811435c5>] vfs_rmdir+0xd5/0x140
> [<ffffffff8114370f>] do_rmdir+0xdf/0x120
> [<ffffffff8105c6b4>] ? task_work_run+0x44/0x80
> [<ffffffff81002889>] ? do_notify_resume+0x89/0x100
> [<ffffffff817361ae>] ? int_signal+0x12/0x17
> [<ffffffff81145d85>] sys_unlinkat+0x25/0x40
> [<ffffffff81735f22>] system_call_fastpath+0x16/0x1b
>
> What is interesting here, is that we call log_wait_commit, from
> within wait_for_space, but we are still holding the checkpoint_mutex
> as it surrounds mostly the whole of wait_for_space. And then, as we
> are waiting, journal_commit_transaction can run, and if the JBD2_FLUSHED
> bit is set, then we will also try to take the same checkpoint_mutex.
>
> It seems that we need to drop the checkpoint_mutex while sitting in
> jbd2_log_wait_commit, if we want to guarantee that progress can be made
> by jbd2_journal_commit_transaction(). There does not seem to be
> anything preempt-rt specific about this, other then perhaps increasing
> the odds of it happening.
>
> Signed-off-by: Paul Gortmaker <[email protected]>

Applied, thanks.

- Ted

2013-06-13 02:57:20

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 3/6] jbd2: fix duplicate debug label for phase 2

On Tue, Jun 11, 2013 at 06:44:27PM -0400, Paul Gortmaker wrote:
> Currently we see this output:
>
> $git grep phase fs/jbd2
> fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 1\n");
> fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 2\n");
> fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 2\n");
> fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 3\n");
> fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 4\n");
> [...]
>
> There is clearly a duplicate label for phase 2, and they are
> both active (i.e. not in #if ... #else block). Rename them to
> be "2a" and "2b" so the debug output is unambiguous.
>
> Signed-off-by: Paul Gortmaker <[email protected]>

Applied, thanks

- Ted

2013-06-13 03:02:23

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 4/6] jbd/jbd2: relocate bit_spinlock header to jbd_common

On Tue, Jun 11, 2013 at 06:44:28PM -0400, Paul Gortmaker wrote:
> The bit_spinlock functions are only used for the jbd_lock_bh_state
> functions (and friends) in jbd_common.h and are not directly used
> by either of jbd.h or jbd2.h content.
>
> The jbd_common file is new as of commit 446066724c36 ("jdb/jbd2: factor
> out common functions from the jbd[2] header files") but common
> (and isolated) headers were not considered for factoring at that time.
>
> Signed-off-by: Paul Gortmaker <[email protected]>

Thanks, applied.

- Ted

2013-06-13 03:08:31

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 6/6] jbd2: remove debug dependency on debug_fs; update help text

On Tue, Jun 11, 2013 at 06:44:30PM -0400, Paul Gortmaker wrote:
> commit b6e96d0067d8 ("jbd2: use module parameters instead of debugfs
> for jbd_debug") removed any need for a dependency on DEBUG_FS. It
> also moved the /sys variables out from underneath the typical debugfs
> mount point. Delete the dependency and update the /sys path to where
> the debug settings are currently.
>
> Signed-off-by: Paul Gortmaker <[email protected]>

Applied, thanks.

- Ted

2013-06-13 13:50:59

by Paul Gortmaker

[permalink] [raw]
Subject: Re: [PATCH 6/6] jbd2: remove debug dependency on debug_fs; update help text

On 13-06-12 11:08 PM, Theodore Ts'o wrote:
> On Tue, Jun 11, 2013 at 06:44:30PM -0400, Paul Gortmaker wrote:
>> commit b6e96d0067d8 ("jbd2: use module parameters instead of debugfs
>> for jbd_debug") removed any need for a dependency on DEBUG_FS. It
>> also moved the /sys variables out from underneath the typical debugfs
>> mount point. Delete the dependency and update the /sys path to where
>> the debug settings are currently.
>>
>> Signed-off-by: Paul Gortmaker <[email protected]>
>
> Applied, thanks.

Just a heads up -- it seems some script threw away the 1st line of the
commit log here.

https://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git/commit/?h=dev&id=7c39c8340334b79019812d7366057b1d4376516e

Paul.
--

>
> - Ted
>

2013-06-13 14:14:31

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 6/6] jbd2: remove debug dependency on debug_fs; update help text

On Thu, Jun 13, 2013 at 09:51:31AM -0400, Paul Gortmaker wrote:
> On 13-06-12 11:08 PM, Theodore Ts'o wrote:
> > On Tue, Jun 11, 2013 at 06:44:30PM -0400, Paul Gortmaker wrote:
> >> commit b6e96d0067d8 ("jbd2: use module parameters instead of debugfs
> >> for jbd_debug") removed any need for a dependency on DEBUG_FS. It
> >> also moved the /sys variables out from underneath the typical debugfs
> >> mount point. Delete the dependency and update the /sys path to where
> >> the debug settings are currently.
> >>
> >> Signed-off-by: Paul Gortmaker <[email protected]>
> >
> > Applied, thanks.
>
> Just a heads up -- it seems some script threw away the 1st line of the
> commit log here.

Thanks for noticing this. Guilt is apparently stripping lines that
begin with "commit". Using "Commit" avoids this.

- Ted

2013-06-13 14:46:32

by Paul Gortmaker

[permalink] [raw]
Subject: Re: [PATCH 6/6] jbd2: remove debug dependency on debug_fs; update help text

On 13-06-13 10:14 AM, Theodore Ts'o wrote:
> On Thu, Jun 13, 2013 at 09:51:31AM -0400, Paul Gortmaker wrote:
>> On 13-06-12 11:08 PM, Theodore Ts'o wrote:
>>> On Tue, Jun 11, 2013 at 06:44:30PM -0400, Paul Gortmaker wrote:
>>>> commit b6e96d0067d8 ("jbd2: use module parameters instead of debugfs
>>>> for jbd_debug") removed any need for a dependency on DEBUG_FS. It
>>>> also moved the /sys variables out from underneath the typical debugfs
>>>> mount point. Delete the dependency and update the /sys path to where
>>>> the debug settings are currently.
>>>>
>>>> Signed-off-by: Paul Gortmaker <[email protected]>
>>>
>>> Applied, thanks.
>>
>> Just a heads up -- it seems some script threw away the 1st line of the
>> commit log here.
>
> Thanks for noticing this. Guilt is apparently stripping lines that
> begin with "commit". Using "Commit" avoids this.

It appears that Guilt is guilty of clobbering the author dates with
the commit date too, which might not be desirable...

https://lkml.org/lkml/2013/6/9/200

Paul.
--

>
> - Ted
>