2008-02-21 19:08:39

by Josef Bacik

[permalink] [raw]
Subject: [RFC][PATCH] fix journal overflow problem

Hello,

This is related to that jbd patch I sent a few weeks ago. I originally found
that the problem where t_nr_buffers would be > than t_outstanding_credits
wouldn't happen upstream, but apparently I'm an idiot and I was just missing my
messsages, and the problem does exist. Now for the entirely too long of a
description of whats going wrong.

Say we have a transaction dirty a bitmap buffer and go to flush it to the disk.
Then ext3 goes to get write access to that buffer via
journal_get_undo_access(), finds out it doesn't need it, and then subsequently
does a journal_release_buffer() and then proceeds to never touch that buffer
again. Then the original committing transaction will go through and add its
buffers to the checkpointing list, and refile the buffer. Because we did a
journal_get_undo_access(), the jh->b_next_transaction is set to our currently
running transaction, and that buffer because it was set BH_JBDDirty by the
committing transaction is filed onto the running transactions BJ_Metadata list,
which increments our t_nr_buffers counter. Because we never actually dirtied
this buffer ourselves, we never accounted for the credit, and we end up with
t_outstanding_credits being less than t_nr_buffers.

This is a problem because while we are writing out the metadata blocks to the
journal, we do a t_outstanding_credits-- for each buffer. If
t_outstanding_credits is less than the number of buffers we have then
t_outstanding_credits will eventually become negative, which means that
jbd_space_needed will eventually start saying it needs way less credits than it
actually does, and will allow transactions to grow huge and eventually we'll
overflow the journal (albeit this is a bitch to try and reproduce).

So my approach is to have a counter that is incremented each time the
transaction calls do_get_write_access (or journal_get_create_access) so we can
keep track of how many people are currently trying to modify that buffer. So
in the case where we do a journal_get_undo_access()+journal_release_buffer()
and nobody else ever touches the buffer, we can then set jh->b_next_transaction
to NULL in journal_release_buffer() and avoid having the buffer filed onto our
transaction. If somebody else is modifying the journal head then we know to
leave it alone because chances are it will be dirtied and the credit will be
accounted for.

There is also a slight change to how we reset b_modified. I originally reset
b_nr_access (my access counter) in the same way b_modified was reset, but I
didn't really like this because we were only taking the j_list_lock instead of
the jbd_buffer lock, so we could race and still end up in the same situation
(which is in fact what happened). So I've changed how we reset b_modified.
Instead of looping through all of the buffers for the transaction, which is a
little innefficient anyway, we reset it in do_get_write_access in the cases
where we know that this is the first time that this transaction has accessed
the buffer (ie when b_next_transaction != transaction && b_transaction !=
transaction). I reset b_nr_access in the same way. I ran tests with this
patch and verified that we no longer got into the situation where
t_outstanding_credits was less than t_nr_buffers.

This is just my patch that I was using, I plan on cleaning it up if this is an
acceptable way to fix the problem. I'd also like to put an ASSERT before we
process the t_buffers list for the case where t_outstanding_credits is less
than t_nr_buffers. If my particular solution isn't acceptable I'm open to
suggestions, however I still think that resetting b_modified should be changed
the way I have it as its a potential race condition and innefficient. Thanks
much,

Josef

diff --git a/fs/jbd/commit.c b/fs/jbd/commit.c
index a38c718..6cc0a1e 100644
--- a/fs/jbd/commit.c
+++ b/fs/jbd/commit.c
@@ -407,22 +407,6 @@ void journal_commit_transaction(journal_t *journal)
jbd_debug (3, "JBD: commit phase 2\n");

/*
- * First, drop modified flag: all accesses to the buffers
- * will be tracked for a new trasaction only -bzzz
- */
- spin_lock(&journal->j_list_lock);
- if (commit_transaction->t_buffers) {
- new_jh = jh = commit_transaction->t_buffers->b_tnext;
- do {
- J_ASSERT_JH(new_jh, new_jh->b_modified == 1 ||
- new_jh->b_modified == 0);
- new_jh->b_modified = 0;
- new_jh = new_jh->b_tnext;
- } while (new_jh != jh);
- }
- spin_unlock(&journal->j_list_lock);
-
- /*
* Now start flushing things to disk, in the order they appear
* on the transaction lists. Data blocks go first.
*/
@@ -490,6 +474,11 @@ void journal_commit_transaction(journal_t *journal)

descriptor = NULL;
bufs = 0;
+
+ if (commit_transaction->t_nr_buffers >
+ commit_transaction->t_outstanding_credits)
+ printk(KERN_EMERG "nr buffers greater than outstanding credits\n");
+
while (commit_transaction->t_buffers) {

/* Find the next buffer to be journaled... */
diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c
index 3943a89..5059b79 100644
--- a/fs/jbd/journal.c
+++ b/fs/jbd/journal.c
@@ -1800,6 +1800,7 @@ static void __journal_remove_journal_head(struct
buffer_head *bh)
printk(KERN_WARNING "%s: freeing "
"b_committed_data\n",
__FUNCTION__);
+ dump_stack();
jbd_free(jh->b_committed_data, bh->b_size);
}
bh->b_private = NULL;
diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index 038ed74..db00d39 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -609,6 +609,13 @@ repeat:
goto done;

/*
+ * This is the first time this transaction has touched this buffer
+ * reset b_modified and b_nr_access
+ */
+ jh->b_modified = 0;
+ jh->b_nr_access = 0;
+
+ /*
* If there is already a copy-out version of this buffer, then we don't
* need to make another one
*/
@@ -713,6 +720,11 @@ repeat:
}

done:
+ /*
+ * Ok we got write access, update the access counter
+ */
+ jh->b_nr_access++;
+
if (need_copy) {
struct page *page;
int offset;
@@ -819,13 +831,33 @@ int journal_get_create_access(handle_t *handle, struct
buffer_head *bh)
J_ASSERT_JH(jh, buffer_locked(jh2bh(jh)));

if (jh->b_transaction == NULL) {
+ /*
+ * first time this transaction has touched this buffer,
+ * reset b_modified
+ */
+ jh->b_modified = 0;
+
jh->b_transaction = transaction;
JBUFFER_TRACE(jh, "file as BJ_Reserved");
__journal_file_buffer(jh, transaction, BJ_Reserved);
} else if (jh->b_transaction == journal->j_committing_transaction) {
JBUFFER_TRACE(jh, "set next transaction");
+
+ if (jh->b_next_transaction != transaction) {
+ /* reset b_modified */
+ jh->b_modified = 0;
+ }
+
jh->b_next_transaction = transaction;
}
+
+ /*
+ * Nobody who calls journal_get_create_access calls
+ * journal_release_buffer, but if at some point somebody decides to
+ * we'll need this
+ */
+ jh->b_nr_access++;
+
spin_unlock(&journal->j_list_lock);
jbd_unlock_bh_state(bh);

@@ -1190,11 +1222,41 @@ out:
* journal_release_buffer: undo a get_write_access without any buffer
* updates, if the update decided in the end that it didn't need access.
*
+ * We decrement the b_nr_access as this person no longer requires write
+ * access to the buffer, and if nobody else requires write access then we want
+ * to set b_next_transaction to NULL in case this buffer is on the committing
+ * transaction, as it will get refiled onto this transaction without it being
+ * counted. We don't worry about the case where this is the first time this
+ * jh has been used (ie jh->b_transaction == handle->h_transaction) since the
+ * beginning of the committing of this transaction will drop the jh.
*/
void
journal_release_buffer(handle_t *handle, struct buffer_head *bh)
{
+ struct journal_head *jh = bh2jh(bh);
BUFFER_TRACE(bh, "entry");
+
+ jbd_lock_bh_state(bh);
+ /*
+ * if nobody else has requested write access to this jh then set
+ * b_next_transaction to NULL to keep it from getting refiled onto
+ * this transaction
+ */
+ if (!(--jh->b_nr_access)) {
+
+ /*
+ * A journal_get_undo_access()+journal_release_buffer() will
+ * leave undo-committed data that we no longer need
+ */
+ if (jh->b_next_transaction && jh->b_committed_data) {
+ jbd_free(jh->b_committed_data, bh->b_size);
+ jh->b_committed_data = NULL;
+ }
+
+ jh->b_next_transaction = NULL;
+ }
+
+ jbd_unlock_bh_state(bh);
}

/**
@@ -1245,6 +1307,11 @@ int journal_forget (handle_t *handle, struct buffer_head
*bh)
*/
jh->b_modified = 0;

+ /*
+ * drop one of our write access credits
+ */
+ jh->b_nr_access--;
+
if (jh->b_transaction == handle->h_transaction) {
J_ASSERT_JH(jh, !jh->b_frozen_data);

diff --git a/include/linux/journal-head.h b/include/linux/journal-head.h
index 8a62d1e..160f1d1 100644
--- a/include/linux/journal-head.h
+++ b/include/linux/journal-head.h
@@ -39,6 +39,13 @@ struct journal_head {
unsigned b_modified;

/*
+ * This is a counter of the number of things who have requested write
+ * access to this buffer by the current transaction
+ * [jbd_lock_bh_state()]
+ */
+ unsigned b_nr_access;
+
+ /*
* Copy of the buffer data frozen for writing to the log.
* [jbd_lock_bh_state()]
*/


2008-02-22 10:08:49

by Jan Kara

[permalink] [raw]
Subject: Re: [RFC][PATCH] fix journal overflow problem

Hello,

On Thu 21-02-08 13:58:55, Josef Bacik wrote:
> This is related to that jbd patch I sent a few weeks ago. I originally
> found that the problem where t_nr_buffers would be > than
> t_outstanding_credits wouldn't happen upstream, but apparently I'm an
> idiot and I was just missing my messsages, and the problem does exist.
> Now for the entirely too long of a description of whats going wrong.
>
> Say we have a transaction dirty a bitmap buffer and go to flush it to the
> disk. Then ext3 goes to get write access to that buffer via
> journal_get_undo_access(), finds out it doesn't need it, and then
> subsequently does a journal_release_buffer() and then proceeds to never
> touch that buffer again. Then the original committing transaction will
> go through and add its buffers to the checkpointing list, and refile the
> buffer. Because we did a journal_get_undo_access(), the
> jh->b_next_transaction is set to our currently running transaction, and
> that buffer because it was set BH_JBDDirty by the committing transaction
> is filed onto the running transactions BJ_Metadata list, which increments
> our t_nr_buffers counter. Because we never actually dirtied this buffer
> ourselves, we never accounted for the credit, and we end up with
> t_outstanding_credits being less than t_nr_buffers.
Thanks for the debugging. You're right that such situation can happen
and we then miscount the transactions credits. Actually, we miscount the
credits whenever we do journal_get_write_access() on a jbd_dirty buffer
that isn't yet in our transaction and don't call journal_dirty_metadata()
later.

> This is a problem because while we are writing out the metadata blocks to
> the journal, we do a t_outstanding_credits-- for each buffer. If
> t_outstanding_credits is less than the number of buffers we have then
> t_outstanding_credits will eventually become negative, which means that
> jbd_space_needed will eventually start saying it needs way less credits
> than it actually does, and will allow transactions to grow huge and
> eventually we'll overflow the journal (albeit this is a bitch to try and
> reproduce).
Yes, actually, how much negative t_outstanding_credits grow? I'd expect
that this is not too common situation...

> So my approach is to have a counter that is incremented each time the
> transaction calls do_get_write_access (or journal_get_create_access) so
> we can keep track of how many people are currently trying to modify that
> buffer. So in the case where we do a
> journal_get_undo_access()+journal_release_buffer() and nobody else ever
> touches the buffer, we can then set jh->b_next_transaction to NULL in
> journal_release_buffer() and avoid having the buffer filed onto our
> transaction. If somebody else is modifying the journal head then we know
> to leave it alone because chances are it will be dirtied and the credit
> will be accounted for.
But the race is still possibly there in case we refile the buffer from
t_forget list just between do_get_write_access() and
journal_release_buffer(), isn't it?
And it would be quite hard to get rid of such races. So maybe how about
the following: In do_get_write_access() (or journal_get_create_access())
when we see the buffer is jbddirty and we set j_next_transaction to our
transaction, we also set j_modified to 1. That should fix the accounting of
transaction credits. I agree that sometimes we needlessly refile some
buffers from the previous transaction but as I said above, it shouldn't be
that much (and we did it up to now anyway).

> There is also a slight change to how we reset b_modified. I originally
> reset b_nr_access (my access counter) in the same way b_modified was
> reset, but I didn't really like this because we were only taking the
> j_list_lock instead of the jbd_buffer lock, so we could race and still
> end up in the same situation (which is in fact what happened). So I've
Yes, that is a good catch.

> changed how we reset b_modified. Instead of looping through all of the
> buffers for the transaction, which is a little innefficient anyway, we
> reset it in do_get_write_access in the cases where we know that this is
> the first time that this transaction has accessed the buffer (ie when
> b_next_transaction != transaction && b_transaction != transaction). I
> reset b_nr_access in the same way. I ran tests with this patch and
> verified that we no longer got into the situation where
> t_outstanding_credits was less than t_nr_buffers.
>
> This is just my patch that I was using, I plan on cleaning it up if this
> is an acceptable way to fix the problem. I'd also like to put an ASSERT
> before we process the t_buffers list for the case where
> t_outstanding_credits is less than t_nr_buffers. If my particular
> solution isn't acceptable I'm open to suggestions, however I still think
> that resetting b_modified should be changed the way I have it as its a
> potential race condition and innefficient. Thanks much,
I agree with the b_modified change, but please send it as a separate
patch. For the credit accounting I'd rather go by the route I've suggested
above.

Honza

> diff --git a/fs/jbd/commit.c b/fs/jbd/commit.c
> index a38c718..6cc0a1e 100644
> --- a/fs/jbd/commit.c
> +++ b/fs/jbd/commit.c
> @@ -407,22 +407,6 @@ void journal_commit_transaction(journal_t *journal)
> jbd_debug (3, "JBD: commit phase 2\n");
>
> /*
> - * First, drop modified flag: all accesses to the buffers
> - * will be tracked for a new trasaction only -bzzz
> - */
> - spin_lock(&journal->j_list_lock);
> - if (commit_transaction->t_buffers) {
> - new_jh = jh = commit_transaction->t_buffers->b_tnext;
> - do {
> - J_ASSERT_JH(new_jh, new_jh->b_modified == 1 ||
> - new_jh->b_modified == 0);
> - new_jh->b_modified = 0;
> - new_jh = new_jh->b_tnext;
> - } while (new_jh != jh);
> - }
> - spin_unlock(&journal->j_list_lock);
> -
> - /*
> * Now start flushing things to disk, in the order they appear
> * on the transaction lists. Data blocks go first.
> */
> @@ -490,6 +474,11 @@ void journal_commit_transaction(journal_t *journal)
>
> descriptor = NULL;
> bufs = 0;
> +
> + if (commit_transaction->t_nr_buffers >
> + commit_transaction->t_outstanding_credits)
> + printk(KERN_EMERG "nr buffers greater than outstanding credits\n");
> +
> while (commit_transaction->t_buffers) {
>
> /* Find the next buffer to be journaled... */
> diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c
> index 3943a89..5059b79 100644
> --- a/fs/jbd/journal.c
> +++ b/fs/jbd/journal.c
> @@ -1800,6 +1800,7 @@ static void __journal_remove_journal_head(struct
> buffer_head *bh)
> printk(KERN_WARNING "%s: freeing "
> "b_committed_data\n",
> __FUNCTION__);
> + dump_stack();
> jbd_free(jh->b_committed_data, bh->b_size);
> }
> bh->b_private = NULL;
> diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
> index 038ed74..db00d39 100644
> --- a/fs/jbd/transaction.c
> +++ b/fs/jbd/transaction.c
> @@ -609,6 +609,13 @@ repeat:
> goto done;
>
> /*
> + * This is the first time this transaction has touched this buffer
> + * reset b_modified and b_nr_access
> + */
> + jh->b_modified = 0;
> + jh->b_nr_access = 0;
> +
> + /*
> * If there is already a copy-out version of this buffer, then we don't
> * need to make another one
> */
> @@ -713,6 +720,11 @@ repeat:
> }
>
> done:
> + /*
> + * Ok we got write access, update the access counter
> + */
> + jh->b_nr_access++;
> +
> if (need_copy) {
> struct page *page;
> int offset;
> @@ -819,13 +831,33 @@ int journal_get_create_access(handle_t *handle, struct
> buffer_head *bh)
> J_ASSERT_JH(jh, buffer_locked(jh2bh(jh)));
>
> if (jh->b_transaction == NULL) {
> + /*
> + * first time this transaction has touched this buffer,
> + * reset b_modified
> + */
> + jh->b_modified = 0;
> +
> jh->b_transaction = transaction;
> JBUFFER_TRACE(jh, "file as BJ_Reserved");
> __journal_file_buffer(jh, transaction, BJ_Reserved);
> } else if (jh->b_transaction == journal->j_committing_transaction) {
> JBUFFER_TRACE(jh, "set next transaction");
> +
> + if (jh->b_next_transaction != transaction) {
> + /* reset b_modified */
> + jh->b_modified = 0;
> + }
> +
> jh->b_next_transaction = transaction;
> }
> +
> + /*
> + * Nobody who calls journal_get_create_access calls
> + * journal_release_buffer, but if at some point somebody decides to
> + * we'll need this
> + */
> + jh->b_nr_access++;
> +
> spin_unlock(&journal->j_list_lock);
> jbd_unlock_bh_state(bh);
>
> @@ -1190,11 +1222,41 @@ out:
> * journal_release_buffer: undo a get_write_access without any buffer
> * updates, if the update decided in the end that it didn't need access.
> *
> + * We decrement the b_nr_access as this person no longer requires write
> + * access to the buffer, and if nobody else requires write access then we want
> + * to set b_next_transaction to NULL in case this buffer is on the committing
> + * transaction, as it will get refiled onto this transaction without it being
> + * counted. We don't worry about the case where this is the first time this
> + * jh has been used (ie jh->b_transaction == handle->h_transaction) since the
> + * beginning of the committing of this transaction will drop the jh.
> */
> void
> journal_release_buffer(handle_t *handle, struct buffer_head *bh)
> {
> + struct journal_head *jh = bh2jh(bh);
> BUFFER_TRACE(bh, "entry");
> +
> + jbd_lock_bh_state(bh);
> + /*
> + * if nobody else has requested write access to this jh then set
> + * b_next_transaction to NULL to keep it from getting refiled onto
> + * this transaction
> + */
> + if (!(--jh->b_nr_access)) {
> +
> + /*
> + * A journal_get_undo_access()+journal_release_buffer() will
> + * leave undo-committed data that we no longer need
> + */
> + if (jh->b_next_transaction && jh->b_committed_data) {
> + jbd_free(jh->b_committed_data, bh->b_size);
> + jh->b_committed_data = NULL;
> + }
> +
> + jh->b_next_transaction = NULL;
> + }
> +
> + jbd_unlock_bh_state(bh);
> }
>
> /**
> @@ -1245,6 +1307,11 @@ int journal_forget (handle_t *handle, struct buffer_head
> *bh)
> */
> jh->b_modified = 0;
>
> + /*
> + * drop one of our write access credits
> + */
> + jh->b_nr_access--;
> +
> if (jh->b_transaction == handle->h_transaction) {
> J_ASSERT_JH(jh, !jh->b_frozen_data);
>
> diff --git a/include/linux/journal-head.h b/include/linux/journal-head.h
> index 8a62d1e..160f1d1 100644
> --- a/include/linux/journal-head.h
> +++ b/include/linux/journal-head.h
> @@ -39,6 +39,13 @@ struct journal_head {
> unsigned b_modified;
>
> /*
> + * This is a counter of the number of things who have requested write
> + * access to this buffer by the current transaction
> + * [jbd_lock_bh_state()]
> + */
> + unsigned b_nr_access;
> +
> + /*
> * Copy of the buffer data frozen for writing to the log.
> * [jbd_lock_bh_state()]
> */
--
Jan Kara <[email protected]>
SUSE Labs, CR

2008-02-22 17:01:46

by Josef Bacik

[permalink] [raw]
Subject: Re: [RFC][PATCH] fix journal overflow problem

On Friday 22 February 2008 5:08:47 am Jan Kara wrote:
> Hello,
>
> On Thu 21-02-08 13:58:55, Josef Bacik wrote:
> > This is related to that jbd patch I sent a few weeks ago. I originally
> > found that the problem where t_nr_buffers would be > than
> > t_outstanding_credits wouldn't happen upstream, but apparently I'm an
> > idiot and I was just missing my messsages, and the problem does exist.
> > Now for the entirely too long of a description of whats going wrong.
> >
> > Say we have a transaction dirty a bitmap buffer and go to flush it to the
> > disk. Then ext3 goes to get write access to that buffer via
> > journal_get_undo_access(), finds out it doesn't need it, and then
> > subsequently does a journal_release_buffer() and then proceeds to never
> > touch that buffer again. Then the original committing transaction will
> > go through and add its buffers to the checkpointing list, and refile the
> > buffer. Because we did a journal_get_undo_access(), the
> > jh->b_next_transaction is set to our currently running transaction, and
> > that buffer because it was set BH_JBDDirty by the committing transaction
> > is filed onto the running transactions BJ_Metadata list, which increments
> > our t_nr_buffers counter. Because we never actually dirtied this buffer
> > ourselves, we never accounted for the credit, and we end up with
> > t_outstanding_credits being less than t_nr_buffers.
>
> Thanks for the debugging. You're right that such situation can happen
> and we then miscount the transactions credits. Actually, we miscount the
> credits whenever we do journal_get_write_access() on a jbd_dirty buffer
> that isn't yet in our transaction and don't call journal_dirty_metadata()
> later.
>

Right.

> > This is a problem because while we are writing out the metadata blocks to
> > the journal, we do a t_outstanding_credits-- for each buffer. If
> > t_outstanding_credits is less than the number of buffers we have then
> > t_outstanding_credits will eventually become negative, which means that
> > jbd_space_needed will eventually start saying it needs way less credits
> > than it actually does, and will allow transactions to grow huge and
> > eventually we'll overflow the journal (albeit this is a bitch to try and
> > reproduce).
>
> Yes, actually, how much negative t_outstanding_credits grow? I'd expect
> that this is not too common situation...
>

I've seen it get to where we have like 300 extra buffer heads, which by itself
isn't bad, but you get a couple of transactions who are allowed to grow to 300
more buffers than what they normally would and things go boom. But you are
right, its not too common of a situation, for the most part it just works, and
only screws the handfull of people who can hit it every time.

> > So my approach is to have a counter that is incremented each time the
> > transaction calls do_get_write_access (or journal_get_create_access) so
> > we can keep track of how many people are currently trying to modify that
> > buffer. So in the case where we do a
> > journal_get_undo_access()+journal_release_buffer() and nobody else ever
> > touches the buffer, we can then set jh->b_next_transaction to NULL in
> > journal_release_buffer() and avoid having the buffer filed onto our
> > transaction. If somebody else is modifying the journal head then we know
> > to leave it alone because chances are it will be dirtied and the credit
> > will be accounted for.
>
> But the race is still possibly there in case we refile the buffer from
> t_forget list just between do_get_write_access() and
> journal_release_buffer(), isn't it?
> And it would be quite hard to get rid of such races. So maybe how about
> the following: In do_get_write_access() (or journal_get_create_access())
> when we see the buffer is jbddirty and we set j_next_transaction to our
> transaction, we also set j_modified to 1. That should fix the accounting of
> transaction credits. I agree that sometimes we needlessly refile some
> buffers from the previous transaction but as I said above, it shouldn't be
> that much (and we did it up to now anyway).
>

The only problem with this approach is we end up using credits we can't really
afford. So for example, we have gotten write access for several different
bitmap blocks trying to find room to allocate (and therefore decremented
h_buffer_credits in order to account for those buffers which will be refiled
onto the transaction later), and then we end up overflowing the handle because
ext3 only accounted for having to use 1 credit for modifying 1 bitmap, and we
assert when h_buffer_credits goes negative.

Instead what if in __journal_refile_buffer instead of checking buffer_jbddirty
to see if it was dirty, instead just check j_modified, and if j_modified is 1
then go ahead and file it onto b_next_transaction's BJ_Metadata list, and if
not put it on b_next_transaction's BJ_Reserved list. So if we do end up
dirtying it the credit is accounted for, and we move it appropriately, and if
we don't end up modifying it, the credit doesn't get accounted for and it stays
on the reserved list and gets unlinked at the begginning of b_next_transactions
commit phase. How does that sound?

> > There is also a slight change to how we reset b_modified. I originally
> > reset b_nr_access (my access counter) in the same way b_modified was
> > reset, but I didn't really like this because we were only taking the
> > j_list_lock instead of the jbd_buffer lock, so we could race and still
> > end up in the same situation (which is in fact what happened). So I've
>
> Yes, that is a good catch.
>
> > changed how we reset b_modified. Instead of looping through all of the
> > buffers for the transaction, which is a little innefficient anyway, we
> > reset it in do_get_write_access in the cases where we know that this is
> > the first time that this transaction has accessed the buffer (ie when
> > b_next_transaction != transaction && b_transaction != transaction). I
> > reset b_nr_access in the same way. I ran tests with this patch and
> > verified that we no longer got into the situation where
> > t_outstanding_credits was less than t_nr_buffers.
> >
> > This is just my patch that I was using, I plan on cleaning it up if this
> > is an acceptable way to fix the problem. I'd also like to put an ASSERT
> > before we process the t_buffers list for the case where
> > t_outstanding_credits is less than t_nr_buffers. If my particular
> > solution isn't acceptable I'm open to suggestions, however I still think
> > that resetting b_modified should be changed the way I have it as its a
> > potential race condition and innefficient. Thanks much,
>
> I agree with the b_modified change, but please send it as a separate
> patch. For the credit accounting I'd rather go by the route I've suggested
> above.
>

Sounds good I will do that. Thanks much,

Josef