2005-01-19 15:33:31

by Alex Tomas

[permalink] [raw]
Subject: [PATCH] JBD: journal_release_buffer()


Good day,

journal_release_buffer() can cause journal overflow in some
(very rare) conditions. Please, take a look at possible fix.

Th journal_head structure holds number of users of the buffer
for current transaction. The routines do_get_write_access()
and journal_get_create_access() tracks this number:
1) resets it to zero if the block's becoming part of the current
transaction
2) increments it

journal_release_buffer() decrements it and if it's 0, then the
blocks isn't member of the transaction.

The patch has been tested on UP with dbench and tool that
uses xattr very much.


Signed-off-by: Alex Tomas <[email protected]>

Index: linux-2.6.7/include/linux/journal-head.h
===================================================================
--- linux-2.6.7.orig/include/linux/journal-head.h 2003-06-24 18:05:26.000000000 +0400
+++ linux-2.6.7/include/linux/journal-head.h 2005-01-19 14:09:59.000000000 +0300
@@ -80,6 +80,11 @@
* [j_list_lock]
*/
struct journal_head *b_cpnext, *b_cpprev;
+
+ /*
+ * counter to track users of the buffer in current transaction
+ */
+ int b_tcount;
};

#endif /* JOURNAL_HEAD_H_INCLUDED */
Index: linux-2.6.7/fs/jbd/transaction.c
===================================================================
--- linux-2.6.7.orig/fs/jbd/transaction.c 2004-08-26 17:12:40.000000000 +0400
+++ linux-2.6.7/fs/jbd/transaction.c 2005-01-19 17:23:30.058160408 +0300
@@ -611,6 +611,10 @@
handle->h_buffer_credits--;
if (credits)
(*credits)++;
+
+ /* the block's becoming member of the trasaction -bzzz */
+ jh->b_tcount = 0;
+
goto done;
}

@@ -694,6 +698,9 @@
if (credits)
(*credits)++;

+ /* the block's becoming member of the trasaction -bzzz */
+ jh->b_tcount = 0;
+
/*
* Finally, if the buffer is not journaled right now, we need to make
* sure it doesn't get written to disk before the caller actually
@@ -723,6 +730,11 @@
memcpy(jh->b_frozen_data, source+offset, jh2bh(jh)->b_size);
kunmap_atomic(source, KM_USER0);
}
+
+ /* track all references to the block to be able to recognize the
+ * situation when the buffer is not part of transaction -bzzz */
+ jh->b_tcount++;
+
jbd_unlock_bh_state(bh);

/*
@@ -822,11 +834,20 @@
jh->b_transaction = transaction;
JBUFFER_TRACE(jh, "file as BJ_Reserved");
__journal_file_buffer(jh, transaction, BJ_Reserved);
+ jh->b_tcount = 0;
} else if (jh->b_transaction == journal->j_committing_transaction) {
JBUFFER_TRACE(jh, "set next transaction");
jh->b_next_transaction = transaction;
+ jh->b_tcount = 0;
}
spin_unlock(&journal->j_list_lock);
+
+ /*
+ * track all reference to the block to be able to recognize
+ * the situation when the buffer is not part of transaction -bzzz
+ */
+ jh->b_tcount++;
+
jbd_unlock_bh_state(bh);

/*
@@ -1178,8 +1199,40 @@
void
journal_release_buffer(handle_t *handle, struct buffer_head *bh, int credits)
{
+ journal_t *journal = handle->h_transaction->t_journal;
+ struct journal_head *jh = bh2jh(bh);
+
BUFFER_TRACE(bh, "entry");
- handle->h_buffer_credits += credits;
+
+ /* return credit back to the handle if it was really spent */
+ if (credits)
+ handle->h_buffer_credits++;
+
+ jbd_lock_bh_state(bh);
+ J_ASSERT(jh->b_tcount > 0);
+
+ jh->b_tcount--;
+ if (jh->b_tcount == 0) {
+ /* we can drop it from the transaction -bzzz */
+ J_ASSERT(jh->b_transaction == handle->h_transaction ||
+ jh->b_next_transaction == handle->h_transaction);
+ if (jh->b_transaction == handle->h_transaction) {
+ spin_lock(&journal->j_list_lock);
+ __journal_unfile_buffer(jh);
+ spin_unlock(&journal->j_list_lock);
+ } else if(jh->b_next_transaction) {
+ jh->b_next_transaction = NULL;
+ }
+
+ /*
+ * this was last reference to the block from the current
+ * transaction and we'd like to return credit to the
+ * whole transaction -bzzz
+ */
+ if (!credits)
+ handle->h_buffer_credits++;
+ }
+ jbd_unlock_bh_state(bh);
}

/**


2005-01-24 22:12:10

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: [Ext2-devel] [PATCH] JBD: journal_release_buffer()

Hi,

On Wed, 2005-01-19 at 15:32, Alex Tomas wrote:

> @@ -1178,8 +1199,40 @@
> void
> journal_release_buffer(handle_t *handle, struct buffer_head *bh, int credits)
> {
> + /* return credit back to the handle if it was really spent */
> + if (credits)
> + handle->h_buffer_credits++;

> + jh->b_tcount--;
> + if (jh->b_tcount == 0) {
> + /*
> + * this was last reference to the block from the current
> + * transaction and we'd like to return credit to the
> + * whole transaction -bzzz
> + */
> + if (!credits)
> + handle->h_buffer_credits++;

I think there's a problem here.

What if:
Process A gets write access, and is the first to do so (*credits=1)
Processes B gets write access (*credits=0)
B modifies the buffer and finishes
A looks again, sees B's modifications, and releases the buffer because
it's no use any more.

Now, B's release didn't return credits. The bh is part of the
transaction and was not released.

What does A do on journal_release_buffer()? It can either:

1) return its credit. Not legal: the bh is part of the transaction, A
had the only credit for it, it cannot be returned or else we risk
overflowing the journal.

2) Keep its credit. Legal from jbd's perspective, but breaks ext3
callers: the whole point of journal_release_buffer() is to try to get a
credit back. The caller has only a finite number of credits, and must
complete the handle within that limit. If we don't return the credit,
then we've just lost a credit to the handle, and risk the handle running
out of credit before completing the operation (allocation or xattr
update.)

I looked into this in some depth the first time it came up in the
context of AG's xattr fixes. The only solution I could see required us
to take a buffer credit during get_write_access in *ALL* cases where the
buffer is not already dirtied against the current transaction (ie. has
undergone journal_dirty_metadata() in addition to get_write_access()).
(If it's dirty, then we don't have a problem, as the bh is guaranteed to
be used anyway.) Only then can we safely do a release.

I coded up a patch for that, but came up against another difficulty.
Trouble is, we don't always know if a bh is dirty against the
transaction. If a bh is committing against one transaction, then a
get_write_access from a second transaction just sets b_next_transaction;
but the logic doesn't distinguish between that and a
journal_dirty_metadata() as far as the list operations are concerned, so
if we release the buffer in this state, we can't tell if it's dirty or
not.

Your patch helps there: by checking b_tcount we will be able to tell
whether a buffer with b_next_transaction == j_running_transaction really
needs the credit or not.

Cheers,
Stephen


2005-01-24 22:29:10

by Alex Tomas

[permalink] [raw]
Subject: Re: [Ext2-devel] [PATCH] JBD: journal_release_buffer()

>>>>> Stephen C Tweedie (SCT) writes:

>> + /* return credit back to the handle if it was really spent */
>> + if (credits)
>> + handle->h_buffer_credits++;

>> + jh->b_tcount--;
>> + if (jh->b_tcount == 0) {
>> + /*
>> + * this was last reference to the block from the current
>> + * transaction and we'd like to return credit to the
>> + * whole transaction -bzzz
>> + */
>> + if (!credits)
>> + handle->h_buffer_credits++;

SCT> I think there's a problem here.

SCT> What if:
SCT> Process A gets write access, and is the first to do so (*credits=1)
SCT> Processes B gets write access (*credits=0)
SCT> B modifies the buffer and finishes
SCT> A looks again, sees B's modifications, and releases the buffer because
SCT> it's no use any more.

SCT> Now, B's release didn't return credits. The bh is part of the
SCT> transaction and was not released.

hmmm. that's a good catch. so, with this patch A increments h_buffer_credits
and this one will go to the t_outstanding_credits while the buffer is still
part of the transaction. indeed, an imbalance.

probably something like the following would be enough?

+ /* return credit back to the handle if it was really spent */
+ if (credits) {
+ handle->h_buffer_credits++;
+ spin_lock(&handle->h_transaction->t_handle_lock);
+ handle->h_transaction->t_outstanding_credits++;
+ spin_lock(&handle->h_transaction->t_handle_lock);
+ }

thanks, Alex


2005-01-24 23:36:58

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: [Ext2-devel] [PATCH] JBD: journal_release_buffer()

Hi,

On Mon, 2005-01-24 at 22:24, Alex Tomas wrote:
> hmmm. that's a good catch. so, with this patch A increments h_buffer_credits
> and this one will go to the t_outstanding_credits while the buffer is still
> part of the transaction. indeed, an imbalance.
>
> probably something like the following would be enough?
>
> + /* return credit back to the handle if it was really spent */
> + if (credits) {
> + handle->h_buffer_credits++;
> + spin_lock(&handle->h_transaction->t_handle_lock);
> + handle->h_transaction->t_outstanding_credits++;
> + spin_lock(&handle->h_transaction->t_handle_lock);
> + }

That returns the credit to A (satisfying ext3), but you just grew
t_outstanding_credits, thus growing the journal commitments without
checking if it's safe to do so or being able to handle failure. So it
just reintroduces the original bug.

--Stephen

2005-01-25 14:37:53

by Alex Tomas

[permalink] [raw]
Subject: Re: [Ext2-devel] [PATCH] JBD: journal_release_buffer()


Hi, could you review the following solution?


t_outstanding_credits - number of _modified_ blocks in the transaction
t_reserved - number of blocks all running handle reserved

transaction size = t_outstanding_credits + t_reserved;





#define TSIZE(t) ((t)->t_outstanding_credits + (t)->t_reserved)

journal_start(blocks)
{
if (TSIZE(transaction) + blocks > MAX)
wait_for_space(journal);

transaction->t_reserved += blocks;
handle->h_buffer_credits = blocks;
}


journal_get_write_access(handle, bh)
{
if (jh->b_tcount >= 0)
jh->b_tcount++;
}

journal_dirty_metadata(handle, bh)
{
transaction->t_reserved--;
handle->h_buffer_credits--;
if (jh->b_tcount > 0) {
/* modifed, no need to track it any more */
transaction->t_outstanding_credits++;
jh->b_tcount = -1;
}
}

journal_release_buffer(handle, bh)
{
if (jh->b_tcount > 0) {
/* it's not modified yet */
jh->b_tcount--;
if (jh->b_tcount == 0) {
/* remove from the transaction */
}
}
}

journal_stop(handle)
{
transaction->t_outstanding_credits -= handle->h_buffer_credits;
}


thanks, Alex


2005-01-25 16:22:14

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: [Ext2-devel] [PATCH] JBD: journal_release_buffer()

Hi,

On Tue, 2005-01-25 at 14:36, Alex Tomas wrote:
> Hi, could you review the following solution?
>
>
> t_outstanding_credits - number of _modified_ blocks in the transaction
> t_reserved - number of blocks all running handle reserved
> transaction size = t_outstanding_credits + t_reserved;

Ah, a work of genius.

Yes, this appears to cover all of the cases. It changes the semantics
of the jbd layer subtly: a handle isn't "charged" for its use of bh's
until it actually dirties them. But the handle is required to reserve
enough space up front in any case, so it shouldn't matter much exactly
when it gets charged. It _would_ affect journal_extend(), if that were
called between a journal_get_write_access() and a
journal_dirty_metadata(), but I don't think we do that anywhere, and the
change in behaviour ought to be merely a slight difference in when it's
legal to extend --- it shouldn't break the rules.

> journal_dirty_metadata(handle, bh)
> {
> transaction->t_reserved--;
> handle->h_buffer_credits--;
> if (jh->b_tcount > 0) {
> /* modifed, no need to track it any more */
> transaction->t_outstanding_credits++;
> jh->b_tcount = -1;
> }
> }

Actually, the whole thing can be wrapped in if (jh->b_tcount > 0) {}, I
think. If we have already charged the transaction for this buffer, then
there's no need to charge the handle for it again. That's going to be
particularly important for truncate(), where we are continually updating
the same blocks (eg. bitmap, indirect blocks), so we want to make sure
that after the first journal_dirty_metadata() call, no further charge is
made.

If we do that, do we in fact need t_reserved at all?

Very nice!

Cheers,
Stephen

2005-01-25 19:33:03

by Alex Tomas

[permalink] [raw]
Subject: Re: [Ext2-devel] [PATCH] JBD: journal_release_buffer()

>>>>> Stephen C Tweedie (SCT) writes:

>> journal_dirty_metadata(handle, bh)
>> {
>> transaction->t_reserved--;
>> handle->h_buffer_credits--;
>> if (jh->b_tcount > 0) {
>> /* modifed, no need to track it any more */
>> transaction-> t_outstanding_credits++;
>> jh-> b_tcount = -1;
>> }
>> }

SCT> Actually, the whole thing can be wrapped in if (jh->b_tcount > 0) {}, I
SCT> think. If we have already charged the transaction for this buffer, then
SCT> there's no need to charge the handle for it again. That's going to be
SCT> particularly important for truncate(), where we are continually updating
SCT> the same blocks (eg. bitmap, indirect blocks), so we want to make sure
SCT> that after the first journal_dirty_metadata() call, no further charge is
SCT> made.

the idea is:
1) the sooner we drop reservation, the higher probability to cover many
changes by single transaction
1) having h_buffer_credits being decremented for each modification
could help us to debug handle overflow situations

SCT> If we do that, do we in fact need t_reserved at all?

hmm. if t_outstanding_credits holds number of modified buffers,
then we need sum of all running h_buffer_credits to protect
from transaction overflow. t_reserved is sum of h_buffer_credits.


thanks, Alex

2005-01-27 03:08:29

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: [Ext2-devel] [PATCH] JBD: journal_release_buffer()

Hi,

On Tue, 2005-01-25 at 19:30, Alex Tomas wrote:

> >> journal_dirty_metadata(handle, bh)
> >> {
> >> transaction->t_reserved--;
> >> handle->h_buffer_credits--;
> >> if (jh->b_tcount > 0) {
> >> /* modifed, no need to track it any more */
> >> transaction-> t_outstanding_credits++;
> >> jh-> b_tcount = -1;
> >> }
> >> }
>
> SCT> Actually, the whole thing can be wrapped in if (jh->b_tcount > 0) {}, I
> SCT> think.

> the idea is:
> 1) the sooner we drop reservation, the higher probability to cover many
> changes by single transaction

But that's exactly why we _don't_ want to do this. You're dropping the
reservation, but remember, we return unused handle credits to the
transaction at the end of the handle's life.

So normally, when you are, for example, appending 4k at a time to a
large file, the first handle in a new transaction gets charged for the
indirect/bitmap updates. But subsequent ones do not, so as the later
handles end, they return their credits to the transaction. That allows
large numbers of handles to update the same buffers in the same
transaction.

But by reducing handle->h_buffer_credits early, even if the bh is
already modified in the current transaction, you are preventing the
return of those credits back to the transaction. Effectively, each
modification of the same bh in the same transaction is being accounted
for separately, so you're charging the transaction multiple times for a
bh which is only going to be journaled once. That's going to cause the
transaction to be closed early, as our accounting will tell us that the
transaction is full even when it has only a few buffers modified.

> 1) having h_buffer_credits being decremented for each modification
> could help us to debug handle overflow situations
>
> SCT> If we do that, do we in fact need t_reserved at all?
>
> hmm. if t_outstanding_credits holds number of modified buffers,
> then we need sum of all running h_buffer_credits to protect
> from transaction overflow. t_reserved is sum of h_buffer_credits.

Why can't we just maintain t_outstanding_credits for this? Define that
as the sum of all credits either promised or consumed by any handles.

On journal_start(), t_outstanding_credits += blocks;

On journal_dirty_metadata(),
{
if (jb->b_tcount > 0) {
/* a promised credit has turned into a consumed one */
handle->h_buffer_credits--;
jh->b_tcount = -1;
}
}

On journal_stop(), any remaining handle credits are promised but
unconsumed; simply return them:
transaction->t_outstanding_credits -= handle->h_buffer_credits;

As before journal_release_buffer() doesn't have to do anything about
credits because we're only releasing buffers if they have not been
charged for yet.

--Stephen