2003-05-18 13:07:50

by Alex Tomas

[permalink] [raw]
Subject: [RFC] probably bug in current ext3/jbd


hi!

ext3/jbd use b_committed_data buffer in order to prevent
allocation of blocks which were freed in non-committed
transaction. I think there is bug in this code. look,

some thread commit thread
----------------------------------------------------------
get_undo_access(#1)
dirty_buffer(#1)
stop_journal()

start commit


start_journal()
get_undo_access(#1):
1) wait for #1 to be
in t_forget_list

write #1 to log
put #1 onto t_forget_list


2) b_commit_data exists,
finish get_undo_access()


for_each_bh_in_forget_list() {
if (jh->b_committed_data) {
kfree(jh->b_committed_data);
jh->b_committed_data = NULL;
}
}


/* using of b_committed_data */

b_committed_data is NULL ?



with best regards, Alex


2003-05-19 20:21:52

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: [Ext2-devel] [RFC] probably bug in current ext3/jbd

Hi,

On Sun, 2003-05-18 at 18:21, Alex Tomas wrote:

> ext3/jbd use b_committed_data buffer in order to prevent
> allocation of blocks which were freed in non-committed
> transaction. I think there is bug in this code. look,

You just found a rather subtle bug which was fixed around 2 or 3 years
ago. :-)

> some thread commit thread
> ----------------------------------------------------------

> start_journal()
> get_undo_access(#1):
> 1) wait for #1 to be
> in t_forget_list

get_undo_access is a declaration of intention to modify the buffer.
When that happens, it calls do_get_write_access() with the force_copy
flag set. That means that it _always_ creates a new frozen_data copy of
the buffer the first time we get undo access to a bitmap buffer within
any given transaction. That basically means that for bitmaps,
frozen_data always holds the version of the buffer as of the end of the
previously completed transaction.

> for_each_bh_in_forget_list() {
> if (jh->b_committed_data) {
> kfree(jh->b_committed_data);
> jh->b_committed_data = NULL;
> }

Ah, but the *immediately* following lines are:

if (jh->b_frozen_data) {
jh->b_committed_data = jh->b_frozen_data;
jh->b_frozen_data = NULL;
}

so the frozen data that was preserved at get_undo_access() time has now
committed to disk and gets rotated into the b_committed_data version.
This is exactly how we get the new version of the committed data when
the old transaction commits.

Cheers,
Stephen

2003-05-19 20:33:16

by Alex Tomas

[permalink] [raw]
Subject: Re: [Ext2-devel] [RFC] probably bug in current ext3/jbd


hi!

please, look:

thread A commit thread

if (jh->b_committed_data) {
kfree(jh->b_committed_data);
jh->b_committed_data = NULL;
}
access for
b_committed_data == NULL ?

if (jh->b_frozen_data) {
jh->b_committed_data = jh->b_frozen_data;
jh->b_frozen_data = NULL;
}


or I miss something subtle here?


>>>>> Stephen C Tweedie (SCT) writes:

SCT> get_undo_access is a declaration of intention to modify the buffer.
SCT> When that happens, it calls do_get_write_access() with the force_copy
SCT> flag set. That means that it _always_ creates a new frozen_data copy of
SCT> the buffer the first time we get undo access to a bitmap buffer within
SCT> any given transaction. That basically means that for bitmaps,
SCT> frozen_data always holds the version of the buffer as of the end of the
SCT> previously completed transaction.

>> for_each_bh_in_forget_list() {
>> if (jh->b_committed_data) {
>> kfree(jh->b_committed_data);
jh-> b_committed_data = NULL;
>> }

SCT> Ah, but the *immediately* following lines are:

SCT> if (jh->b_frozen_data) {
jh-> b_committed_data = jh->b_frozen_data;
jh-> b_frozen_data = NULL;
SCT> }

SCT> so the frozen data that was preserved at get_undo_access() time has now
SCT> committed to disk and gets rotated into the b_committed_data version.
SCT> This is exactly how we get the new version of the committed data when
SCT> the old transaction commits.

SCT> Cheers,
SCT> Stephen


2003-05-19 20:38:38

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: [Ext2-devel] [RFC] probably bug in current ext3/jbd

Hi,

On Tue, 2003-05-20 at 01:46, Alex Tomas wrote:

> please, look:
>
> thread A commit thread
>
> if (jh->b_committed_data) {
> kfree(jh->b_committed_data);
> jh->b_committed_data = NULL;
> }
> access for
> b_committed_data == NULL ?

Not with BKL. Without it, yes, that's definitely a risk, and you need
some locking for the access to b_committed_data. Without that, even if
you keep the jh->b_committed_data field valid, you risk freeing the old
copy that another thread is using.

Cheers,
Stephen

2003-05-19 20:45:30

by Alex Tomas

[permalink] [raw]
Subject: Re: [Ext2-devel] [RFC] probably bug in current ext3/jbd


aha. now it's clear. thank you. I catched J_ASSERT(b_committed_data != NULL)
in ext3_free_blocks() with de-BKL'ed JBD. hence, my solution is to have a
tid journal_head indicating which transaction uses b_committed_data.

I don't want to look intrusive, but .. what do you think about new locking
schema I'm trying to implement?


>>>>> Stephen C Tweedie (SCT) writes:

>> access for
>> b_committed_data == NULL ?

SCT> Not with BKL. Without it, yes, that's definitely a risk, and you need
SCT> some locking for the access to b_committed_data. Without that, even if
SCT> you keep the jh->b_committed_data field valid, you risk freeing the old
SCT> copy that another thread is using.

SCT> Cheers,
SCT> Stephen