2008-09-17 23:27:09

by Joel Becker

[permalink] [raw]
Subject: [PATCH] [RFC] jbd2: Add buffer triggers

Filesystems often to do compute intensive operation on some
metadata. If this operation is repeated many times, it can be very
expensive. It would be much nicer if the operation could be performed
once before a buffer goes to disk.

This adds triggers to jbd2 buffer heads. Just before writing a metadata
buffer to the journal, jbd2 will optionally call a commit trigger associated
with the buffer. If the journal is aborted, an abort trigger will be
called on any dirty buffers as they are dropped from pending
transactions.

ocfs2 will use this feature.

Initially I tried to come up with a more generic trigger that could be
used for non-buffer-related events like transaction completion. It
doesn't tie nicely, because the information a buffer trigger needs
(specific to a journal_head) isn't the same as what a transaction
trigger needs (specific to a tranaction_t or perhaps journal_t). So I
implemented a buffer set, with the understanding that
journal/transaction wide triggers should be implemented separately.

There is only one trigger set allowed per buffer. I can't think of any
reason to attach more than one set. Contrast this with a journal or
transaction in which multiple places may want to watch the entire
transaction separately.

The trigger sets are considered static allocation from the jbd2
perspective. ocfs2 will just have one trigger set per block type,
setting the same set on every bh of the same type.

Signed-off-by: Joel Becker <[email protected]>
---
fs/jbd2/commit.c | 1 +
fs/jbd2/journal.c | 11 +++++++++++
fs/jbd2/transaction.c | 42 ++++++++++++++++++++++++++++++++++++++++++
include/linux/jbd2.h | 29 +++++++++++++++++++++++++++++
include/linux/journal-head.h | 5 +++++
5 files changed, 88 insertions(+), 0 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index f2ad061..07fef48 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -509,6 +509,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)

if (is_journal_aborted(journal)) {
JBUFFER_TRACE(jh, "journal is aborting: refile");
+ jbd2_buffer_abort_trigger(jh);
jbd2_journal_refile_buffer(journal, jh);
/* If that was the last one, we need to clean up
* any descriptor buffers which may have been
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 18d3063..23dbd59 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -50,6 +50,7 @@ EXPORT_SYMBOL(jbd2_journal_unlock_updates);
EXPORT_SYMBOL(jbd2_journal_get_write_access);
EXPORT_SYMBOL(jbd2_journal_get_create_access);
EXPORT_SYMBOL(jbd2_journal_get_undo_access);
+EXPORT_SYMBOL(jbd2_journal_set_triggers);
EXPORT_SYMBOL(jbd2_journal_dirty_metadata);
EXPORT_SYMBOL(jbd2_journal_release_buffer);
EXPORT_SYMBOL(jbd2_journal_forget);
@@ -354,6 +355,16 @@ repeat:
done_copy_out = 1;
}

+ /* We have the actual buffer to go out, fire any commit trigger */
+ /* XXX Checking trigger pointers here so as to skip kmap when
+ * empty */
+ if (jh_in->b_triggers && jh_in->b_triggers->t_commit) {
+ mapped_data = kmap_atomic(new_page, KM_USER0);
+
+ jbd2_buffer_commit_trigger(jh_in, mapped_data + new_offset);
+ kunmap_atomic(mapped_data, KM_USER0);
+ }
+
/*
* Did we need to do an escaping? Now we've done all the
* copying, we can finally do so.
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index e5d5405..e975afa 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -943,6 +943,48 @@ out:
}

/**
+ * void jbd2_journal_set_triggers() - Add triggers for commit writeout
+ * @bh: buffer to trigger on
+ * @type: struct jbd2_buffer_trigger_type containing the trigger(s).
+ *
+ * Set any triggers on this journal_head.
+ *
+ * Call with NULL to clear the triggers.
+ */
+void jbd2_journal_set_triggers(struct buffer_head *bh,
+ struct jbd2_buffer_trigger_type *type)
+{
+ struct journal_head *jh = bh2jh(bh);
+
+ if (jh->b_triggers && type) {
+ WARN_ON_ONCE(jh->b_triggers != type);
+ return;
+ }
+
+ jh->b_triggers = type;
+}
+
+void jbd2_buffer_commit_trigger(struct journal_head *jh, void *mapped_data)
+{
+ struct buffer_head *bh = jh2bh(jh);
+
+ if (!jh->b_triggers || !jh->b_triggers->t_commit)
+ return;
+
+ jh->b_triggers->t_commit(jh->b_triggers, bh, mapped_data, bh->b_size);
+}
+
+void jbd2_buffer_abort_trigger(struct journal_head *jh)
+{
+ if (!jh->b_triggers || !jh->b_triggers->t_abort)
+ return;
+
+ jh->b_triggers->t_abort(jh->b_triggers, jh2bh(jh));
+}
+
+
+
+/**
* int jbd2_journal_dirty_metadata() - mark a buffer as containing dirty metadata
* @handle: transaction to add buffer to.
* @bh: buffer to mark
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 3dd2090..032af18 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -994,6 +994,33 @@ int __jbd2_journal_clean_checkpoint_list(journal_t *journal);
int __jbd2_journal_remove_checkpoint(struct journal_head *);
void __jbd2_journal_insert_checkpoint(struct journal_head *, transaction_t *);

+
+/*
+ * Triggers
+ */
+
+struct jbd2_buffer_trigger_type {
+ /*
+ * Fired just before a buffer is written to the journal.
+ * mapped_data is a mapped buffer that is the frozen data for
+ * commit.
+ */
+ void (*t_commit)(struct jbd2_buffer_trigger_type *type,
+ struct buffer_head *bh, void *mapped_data,
+ size_t size);
+
+ /*
+ * Fired during journal abort for dirty buffers that will not be
+ * committed.
+ */
+ void (*t_abort)(struct jbd2_buffer_trigger_type *type,
+ struct buffer_head *bh);
+};
+
+extern void jbd2_buffer_commit_trigger(struct journal_head *jh,
+ void *mapped_data);
+extern void jbd2_buffer_abort_trigger(struct journal_head *jh);
+
/* Buffer IO */
extern int
jbd2_journal_write_metadata_buffer(transaction_t *transaction,
@@ -1032,6 +1059,8 @@ extern int jbd2_journal_extend (handle_t *, int nblocks);
extern int jbd2_journal_get_write_access(handle_t *, struct buffer_head *);
extern int jbd2_journal_get_create_access (handle_t *, struct buffer_head *);
extern int jbd2_journal_get_undo_access(handle_t *, struct buffer_head *);
+void jbd2_journal_set_triggers(struct buffer_head *,
+ struct jbd2_buffer_trigger_type *type);
extern int jbd2_journal_dirty_metadata (handle_t *, struct buffer_head *);
extern void jbd2_journal_release_buffer (handle_t *, struct buffer_head *);
extern int jbd2_journal_forget (handle_t *, struct buffer_head *);
diff --git a/include/linux/journal-head.h b/include/linux/journal-head.h
index 8a62d1e..087a1c2 100644
--- a/include/linux/journal-head.h
+++ b/include/linux/journal-head.h
@@ -12,6 +12,8 @@

typedef unsigned int tid_t; /* Unique transaction ID */
typedef struct transaction_s transaction_t; /* Compound transaction type */
+
+
struct buffer_head;

struct journal_head {
@@ -87,6 +89,9 @@ struct journal_head {
* [j_list_lock]
*/
struct journal_head *b_cpnext, *b_cpprev;
+
+ /* Trigger type */
+ struct jbd2_buffer_trigger_type *b_triggers;
};

#endif /* JOURNAL_HEAD_H_INCLUDED */
--
1.5.6.3


--

None of our men are "experts." We have most unfortunately found
it necessary to get rid of a man as soon as he thinks himself an
expert -- because no one ever considers himself expert if he really
knows his job. A man who knows a job sees so much more to be done
than he has done, that he is always pressing forward and never
gives up an instant of thought to how good and how efficient he is.
Thinking always ahead, thinking always of trying to do more, brings
a state of mind in which nothing is impossible. The moment one gets
into the "expert" state of mind a great number of things become
impossible.
- From Henry Ford Sr., "My Life and Work"

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127


2008-09-17 23:29:07

by Joel Becker

[permalink] [raw]
Subject: [PATCH] ocfs2: Use the new jbd_journal_set_triggers() to printk.


Just a placeholder trigger set that prints. ocfs2 would obviously do
more in the trigger with ot_offset.

Signed-off-by: Joel Becker <[email protected]>
---
fs/ocfs2/journal.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 51 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index 373d943..d17d4a9 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -357,6 +357,51 @@ bail:
return status;
}

+struct ocfs2_triggers {
+ struct jbd2_buffer_trigger_type ot_triggers;
+ int ot_offset;
+};
+
+static inline struct ocfs2_triggers *to_ocfs2_trigger(struct jbd2_buffer_trigger_type *triggers)
+{
+ return container_of(triggers, struct ocfs2_triggers, ot_triggers);
+}
+
+static void ocfs2_commit_trigger(struct jbd2_buffer_trigger_type *triggers,
+ struct buffer_head *bh,
+ void *data, size_t size)
+{
+ struct ocfs2_triggers *ot = to_ocfs2_trigger(triggers);
+
+ mlog(ML_NOTICE, "bh = 0x%lx, data = 0x%lx, size = %u, offset = %d\n",
+ (unsigned long)bh, (unsigned long)data, size, ot->ot_offset);
+}
+
+static void ocfs2_abort_trigger(struct jbd2_buffer_trigger_type *triggers,
+ struct buffer_head *bh)
+{
+ struct ocfs2_triggers *ot = to_ocfs2_trigger(triggers);
+
+ mlog(ML_NOTICE, "bh = 0x%lx, offset = %d\n", (unsigned long)bh, ot->ot_offset);
+}
+
+static struct ocfs2_triggers inode_triggers = {
+ .ot_triggers = {
+ .t_commit = ocfs2_commit_trigger,
+ .t_abort = ocfs2_abort_trigger,
+ },
+ .ot_offset = 10, /* Garbage */
+};
+
+static struct ocfs2_triggers other_triggers = {
+ .ot_triggers = {
+ .t_commit = ocfs2_commit_trigger,
+ .t_abort = ocfs2_abort_trigger,
+ },
+ .ot_offset = 20, /* Different garbage */
+};
+
+
int ocfs2_journal_access(handle_t *handle,
struct inode *inode,
struct buffer_head *bh,
@@ -406,6 +451,12 @@ int ocfs2_journal_access(handle_t *handle,
status = -EINVAL;
mlog(ML_ERROR, "Uknown access type!\n");
}
+ if (!status)
+ jbd2_journal_set_triggers(bh,
+ bh->b_blocknr ==
+ OCFS2_I(inode)->ip_blkno ?
+ &inode_triggers.ot_triggers :
+ &other_triggers.ot_triggers);
mutex_unlock(&OCFS2_I(inode)->ip_io_mutex);

if (status < 0)
--
1.5.6.3

--

"There are some experiences in life which should not be demanded
twice from any man, and one of them is listening to the Brahms Requiem."
- George Bernard Shaw

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2008-09-19 21:46:42

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] [RFC] jbd2: Add buffer triggers

On Sep 17, 2008 16:26 -0700, Joel Becker wrote:
> + /* We have the actual buffer to go out, fire any commit trigger */
> + /* XXX Checking trigger pointers here so as to skip kmap when
> + * empty */
> + if (jh_in->b_triggers && jh_in->b_triggers->t_commit) {
> + mapped_data = kmap_atomic(new_page, KM_USER0);
> +
> + jbd2_buffer_commit_trigger(jh_in, mapped_data + new_offset);
> + kunmap_atomic(mapped_data, KM_USER0);
> + }

In many cases the kmap will not be needed (i.e. never for ext* because the
metadata buffers will always be allocated in low memory).

> index 8a62d1e..087a1c2 100644
> --- a/include/linux/journal-head.h
> +++ b/include/linux/journal-head.h
> @@ -87,6 +89,9 @@ struct journal_head {
> struct journal_head *b_cpnext, *b_cpprev;
> +
> + /* Trigger type */
> + struct jbd2_buffer_trigger_type *b_triggers;
> };

The journal-head.h header is shared between jbd.h and jbd2.h, so it seems
a bit strange to have a jbd2_* struct here.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2008-09-20 03:37:26

by Joel Becker

[permalink] [raw]
Subject: Re: [Ocfs2-devel] [PATCH] [RFC] jbd2: Add buffer triggers

On Fri, Sep 19, 2008 at 03:46:25PM -0600, Andreas Dilger wrote:
> On Sep 17, 2008 16:26 -0700, Joel Becker wrote:
> > + /* We have the actual buffer to go out, fire any commit trigger */
> > + /* XXX Checking trigger pointers here so as to skip kmap when
> > + * empty */
> > + if (jh_in->b_triggers && jh_in->b_triggers->t_commit) {
> > + mapped_data = kmap_atomic(new_page, KM_USER0);
> > +
> > + jbd2_buffer_commit_trigger(jh_in, mapped_data + new_offset);
> > + kunmap_atomic(mapped_data, KM_USER0);
> > + }
>
> In many cases the kmap will not be needed (i.e. never for ext* because the
> metadata buffers will always be allocated in low memory).

The do_escape clause right below it unconditionally kmaps, so I
did as well. ocfs2 will never need it either, but kmap is just a noop
in those circumstances.

> > index 8a62d1e..087a1c2 100644
> > --- a/include/linux/journal-head.h
> > +++ b/include/linux/journal-head.h
> > @@ -87,6 +89,9 @@ struct journal_head {
> > struct journal_head *b_cpnext, *b_cpprev;
> > +
> > + /* Trigger type */
> > + struct jbd2_buffer_trigger_type *b_triggers;
> > };
>
> The journal-head.h header is shared between jbd.h and jbd2.h, so it seems
> a bit strange to have a jbd2_* struct here.

Well, I could call it journal_head_trigger?

Joel

--

The Graham Corollary:

The longer a socially-moderated news website exists, the
probability of an old Paul Graham link appearing at the top
approaches certainty.

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2008-09-29 01:25:29

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] [RFC] jbd2: Add buffer triggers

On Wed, Sep 17, 2008 at 04:26:30PM -0700, Joel Becker wrote:
> Filesystems often to do compute intensive operation on some
> metadata. If this operation is repeated many times, it can be very
> expensive. It would be much nicer if the operation could be performed
> once before a buffer goes to disk.

Sorry for the delay in reviewing this patch, but I've got a question.
Right now you are calling the commit trigger *after* doing the escape
data copying. This means that if the commit trigger is going to
modify the buffer (in the case of the checksum being stored in-line in
the block, which I gather is the case in OCFS2, right?), if you get
unlucky and the block begins with the four bytes JBD2_MAGIC_NUMBER,
then the checksum will be in the value which gets written out to the
log, but *not* to the value in the buffer head that will ultimately
get written to the final location on disk. In the normal case where
the journal wraps and then truncated, it means the checksum will never
get written to the disk. That seems bad. :-)

I believe the correct location to fire the per-buffer commit trigger
is right after the first kmap_atomic, right before the comment "Check
for escaping". This also fixes the bug where if the checksum is
stored in the first four bytes of the block, in the case where you
have bad luck and the checksum == JBD2_MAGIC_NUMBER, you won't end up
with a corrupted journal.

Do you agree? If so, please let me know and I can fix up the patch,
or just submit to me an updated patch.

Thanks,

- Ted

2008-10-04 00:03:39

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] [RFC] jbd2: Add buffer triggers

Hey Joel,

I'm not sure you saw this e-mail, since I think it got in a mailman
moderation queue.

I've been looking more closely at this, and I think actually using
this commit trigger gets very tricky. Exactly how do you plan to use
the commit trigger? The caller is only going to be able to easily
check the checksum on the frozen copy of the buffer; is that what you
intended? When and how do you plan set the checksum for a filesystem
which is cleanly unmounted? And are you planning on doing this for
just the superblock, or for other data structures?

- Ted

On Sun, Sep 28, 2008 at 09:25:27PM -0400, Theodore Tso wrote:
> On Wed, Sep 17, 2008 at 04:26:30PM -0700, Joel Becker wrote:
> > Filesystems often to do compute intensive operation on some
> > metadata. If this operation is repeated many times, it can be very
> > expensive. It would be much nicer if the operation could be performed
> > once before a buffer goes to disk.
>
> Sorry for the delay in reviewing this patch, but I've got a question.
> Right now you are calling the commit trigger *after* doing the escape
> data copying. This means that if the commit trigger is going to
> modify the buffer (in the case of the checksum being stored in-line in
> the block, which I gather is the case in OCFS2, right?), if you get
> unlucky and the block begins with the four bytes JBD2_MAGIC_NUMBER,
> then the checksum will be in the value which gets written out to the
> log, but *not* to the value in the buffer head that will ultimately
> get written to the final location on disk. In the normal case where
> the journal wraps and then truncated, it means the checksum will never
> get written to the disk. That seems bad. :-)
>
> I believe the correct location to fire the per-buffer commit trigger
> is right after the first kmap_atomic, right before the comment "Check
> for escaping". This also fixes the bug where if the checksum is
> stored in the first four bytes of the block, in the case where you
> have bad luck and the checksum == JBD2_MAGIC_NUMBER, you won't end up
> with a corrupted journal.
>
> Do you agree? If so, please let me know and I can fix up the patch,
> or just submit to me an updated patch.
>
> Thanks,
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2008-10-06 21:38:16

by Joel Becker

[permalink] [raw]
Subject: Re: [PATCH] [RFC] jbd2: Add buffer triggers

On Fri, Oct 03, 2008 at 08:03:36PM -0400, Theodore Tso wrote:
> I'm not sure you saw this e-mail, since I think it got in a mailman
> moderation queue.

I've flagged you as "do not moderate". I saw the email, but I
was in Prague for the cluster summit and didn't have time to look
in-depth.

> I've been looking more closely at this, and I think actually using
> this commit trigger gets very tricky. Exactly how do you plan to use
> the commit trigger? The caller is only going to be able to easily
> check the checksum on the frozen copy of the buffer; is that what you
> intended? When and how do you plan set the checksum for a filesystem
> which is cleanly unmounted? And are you planning on doing this for
> just the superblock, or for other data structures?

Every single metadata block in ocfs2 would have the commit
trigger set. I intend to calculate the block checksums in the trigger.
That way I only compute the checksum once per journal write, rather than
every time I call journal_dirty_metadata(). That's my plan, at least.
You may be right that my choice is too late and that I need to
trigger in an earlier place. That's why I was taking time to reply: I
wanted to re-read the jbd2 source and be sure of what you were saying
and how I was handling it. I haven't looked yet (just got back to the
office today), but here's what I understand so far.
As far I could tell, actual writes of the buffer to the journal
go through journal_write_metadata_buffer(). Only in
journal_write_metadata_buffer() are you sure to have a frozen copy of
the buffer that won't be affected by parallel processes doing
journal_dirty_metadata(). Thus, the new_page and new_offset values are
set to b_data or b_frozen_data as needed. When I get past that, I have
the right pointer. I need to do it before the escape, of course,
because I want to checksum the block data, not the escaped data.
Now, I'm sure this is the buffer that's going to the journal, I
think you're saying that this buffer may not be what gets checkpointed.
So the correct checksum hits the journal, but then an invalid one gets
to the real location on disk. Is that right? If so, I need to figure
out where to calculate the checksum somewhere higher, as you say.

Joel

--

"Nobody loves me,
Nobody seems to care.
Troubles and worries, people,
You know I've had my share."

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2008-10-06 21:43:50

by Joel Becker

[permalink] [raw]
Subject: Re: [PATCH] [RFC] jbd2: Add buffer triggers

On Mon, Oct 06, 2008 at 02:37:54PM -0700, Joel Becker wrote:
> On Fri, Oct 03, 2008 at 08:03:36PM -0400, Theodore Tso wrote:
> Now, I'm sure this is the buffer that's going to the journal, I
> think you're saying that this buffer may not be what gets checkpointed.
> So the correct checksum hits the journal, but then an invalid one gets
> to the real location on disk. Is that right? If so, I need to figure
> out where to calculate the checksum somewhere higher, as you say.

Ok, looking at your first email again, you're saying to move up
to the "Check for escaping" section. There we might checksum b_data
before it's copied out. This is, indeed, safe against the problem you
mentioned. I also think it's safe to checksum b_frozen_data if already
set there, as that's a frozen for commit buffer. Can I trust that
checksumming b_data there will not be overwritten by another process
doing journal_dirty_metadata() before we write out our buffer? Can I
trust that the b_frozen_data, if already copied, will be the buffer
committed and checkpointed? If so, I think that change works.

Joel

--

"What no boss of a programmer can ever understand is that a programmer
is working when he's staring out of the window"
- With apologies to Burton Rascoe

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2008-10-06 23:32:51

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] [RFC] jbd2: Add buffer triggers

On Mon, Oct 06, 2008 at 02:42:52PM -0700, Joel Becker wrote:
> On Mon, Oct 06, 2008 at 02:37:54PM -0700, Joel Becker wrote:
> > On Fri, Oct 03, 2008 at 08:03:36PM -0400, Theodore Tso wrote:
> > Now, I'm sure this is the buffer that's going to the journal, I
> > think you're saying that this buffer may not be what gets checkpointed.
> > So the correct checksum hits the journal, but then an invalid one gets
> > to the real location on disk. Is that right? If so, I need to figure
> > out where to calculate the checksum somewhere higher, as you say.
>
> Ok, looking at your first email again, you're saying to move up
> to the "Check for escaping" section. There we might checksum b_data
> before it's copied out. This is, indeed, safe against the problem you
> mentioned. I also think it's safe to checksum b_frozen_data if already
> set there, as that's a frozen for commit buffer. Can I trust that
> checksumming b_data there will not be overwritten by another process
> doing journal_dirty_metadata() before we write out our buffer? Can I
> trust that the b_frozen_data, if already copied, will be the buffer
> committed and checkpointed? If so, I think that change works.

I'm not 100% sure..... The other area that we should check very
closely is jbd2_journal_commit_transaction(); in some cases, if
jh->b_committed_data is NULL, the frozen data is thrown away (around
line 850 in transaction.c). I *think* this happens if b_frozen_data
was only copied to escape the buffer, but I'm not certain; in any
case, there's a potential that in that case you might lose the
calculated checksum and the correct value wouldn't get written to the
final location on disk.

There are parts of the jbd2 code which badly needs someone with time
to go through, grok it entirely and then write up how various bits and
pieces work. Andrew Morton is right; Stephen Tweedie wrote some
*very* clever code, and part of the problem is probably no one can
understand some of the more subtle bits off the cuff, and few can
understand it only after spend a while wrapping their minds around how
it all works. So I suspect one of is going to have take a deep dive
into the code, grok how the trifecta of b_committed_data,
b_frozen_data, and b_data interact, and which functions make what
assumptions, and then write up a lot of explanatory documentation....

Unfortunately, I'm not sure I'm going to have time for at least the
next couple of weeks. Do you think you could take a crack at it?

- Ted

2008-10-07 01:03:29

by Joel Becker

[permalink] [raw]
Subject: Re: [Ocfs2-devel] [PATCH] [RFC] jbd2: Add buffer triggers

On Mon, Oct 06, 2008 at 07:32:48PM -0400, Theodore Tso wrote:
> On Mon, Oct 06, 2008 at 02:42:52PM -0700, Joel Becker wrote:
> I'm not 100% sure..... The other area that we should check very
> closely is jbd2_journal_commit_transaction(); in some cases, if
> jh->b_committed_data is NULL, the frozen data is thrown away (around
> line 850 in transaction.c). I *think* this happens if b_frozen_data
> was only copied to escape the buffer, but I'm not certain; in any
> case, there's a potential that in that case you might lose the
> calculated checksum and the correct value wouldn't get written to the
> final location on disk.

I remember looking at this before. Basically, if there is a
later transaction also using this buffer, b_frozen_data has this
transaction's version of said buffer. We've now written that to the
journal. b_committed_data exists for get_undo_access(). We've just
committed the buffer, so we know that we no longer need to worry about
the undo access. If we have both, it's a new undo access, and it needs
b_frozen_data moved into b_committed_data.
How does this affect our checksum? This all actually falls out
from the fact that jbd skips checkpointing for buffers that are part of
a later transaction.
If this buffer is only part of the currently committing
transaction, write_metadata_buffer() will have our trigger (from here on
in the place you suggest, not where I put it) modify b_data directly.
It may get copied to b_frozen_data for escaping, but that b_data will go
out with the checkpoint. Thus, we have a valid checksum on both the
committed buffer and the checkpointed buffer.
If the buffer is part of a later transaction as well,
write_metadata_buffer() will have our trigger fire on b_frozen_data.
This will put the checksum in the journal, but not modify b_data.
In a healthy system, the later transaction will eventually commit, and
it will put a good checksum in b_data, finally checkpointing that value.
ocfs2 will force this process via journal_flush() before it lets other
nodes look at the disk.
Looking at the checkpoint part, though, I think we're not safe.
The buffer is attached to the original transaction's checkpoint list
after the commit. This buffer has the un-checksummed b_data. If the
later transaction commits before the checkpoint happens, all is good.
But if the buffer lazily writes to disk while the later transaction is
still running, the original transaction could be considered "done",
updating the journal superblock. If we crash at that moment, we have a
bad checksum on disk.
I suppose we could trigger on a checkpointed buffer going out?

Joel

--

Life's Little Instruction Book #222

"Think twice before burdening a friend with a secret."

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2008-10-07 01:05:35

by Joel Becker

[permalink] [raw]
Subject: Re: [Ocfs2-devel] [PATCH] [RFC] jbd2: Add buffer triggers

On Mon, Oct 06, 2008 at 07:32:48PM -0400, Theodore Tso wrote:
> pieces work. Andrew Morton is right; Stephen Tweedie wrote some
> *very* clever code

sct is very good at doing that, and even better at going off
afterwards :-)

> it all works. So I suspect one of is going to have take a deep dive
> into the code, grok how the trifecta of b_committed_data,
> b_frozen_data, and b_data interact, and which functions make what
> assumptions, and then write up a lot of explanatory documentation....
>
> Unfortunately, I'm not sure I'm going to have time for at least the
> next couple of weeks. Do you think you could take a crack at it?

I think I just did some of it in the previous email, so I
suppose I should do some more. Let me see if I can squeeze it in.

Joel

--

Life's Little Instruction Book #197

"Don't forget, a person's greatest emotional need is to
feel appreciated."

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2008-10-08 23:18:04

by Joel Becker

[permalink] [raw]
Subject: Re: [Ocfs2-devel] [PATCH] [RFC] jbd2: Add buffer triggers

On Mon, Oct 06, 2008 at 06:01:54PM -0700, Joel Becker wrote:
> On Mon, Oct 06, 2008 at 07:32:48PM -0400, Theodore Tso wrote:
> > On Mon, Oct 06, 2008 at 02:42:52PM -0700, Joel Becker wrote:
> > I'm not 100% sure..... The other area that we should check very
> > closely is jbd2_journal_commit_transaction(); in some cases, if
> > jh->b_committed_data is NULL, the frozen data is thrown away (around
> > line 850 in transaction.c). I *think* this happens if b_frozen_data
> > was only copied to escape the buffer, but I'm not certain; in any
> > case, there's a potential that in that case you might lose the
> > calculated checksum and the correct value wouldn't get written to the
> > final location on disk.
<snip>
> Looking at the checkpoint part, though, I think we're not safe.
> The buffer is attached to the original transaction's checkpoint list
> after the commit. This buffer has the un-checksummed b_data. If the
> later transaction commits before the checkpoint happens, all is good.
> But if the buffer lazily writes to disk while the later transaction is
> still running, the original transaction could be considered "done",
> updating the journal superblock. If we crash at that moment, we have a
> bad checksum on disk.

I chatted with Mark some about this today. He pointed out that,
logically, b_data can't be checkpointed until its data isn't in the
journal - it might be newer than the most recent transaction. So I
looked in the checkpoint code to see where this is handled, and the
checkpoint code simply forces a commit new enough to encompass b_data
when it wants to put that block to disk.
In other words, I think that the commit trigger is safe in all
circumstances once moved up in journal_commit_transaction(). I'll be
cooking that up shortly.

Joel

--

To spot the expert, pick the one who predicts the job will take the
longest and cost the most.

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2008-10-16 17:42:44

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [Ocfs2-devel] [PATCH] [RFC] jbd2: Add buffer triggers

On Wed, Oct 08, 2008 at 04:17:52PM -0700, Joel Becker wrote:
> In other words, I think that the commit trigger is safe in all
> circumstances once moved up in journal_commit_transaction(). I'll be
> cooking that up shortly.

Have you had a chance to look at this? Here's something which hasn't
been compile-tested yet, but it didn't take me long...

- Ted

jbd2: Add buffer triggers

From: Joel Becker <[email protected]>

Filesystems often to do compute intensive operation on some
metadata. If this operation is repeated many times, it can be very
expensive. It would be much nicer if the operation could be performed
once before a buffer goes to disk.

This adds triggers to jbd2 buffer heads. Just before writing a metadata
buffer to the journal, jbd2 will optionally call a commit trigger associated
with the buffer. If the journal is aborted, an abort trigger will be
called on any dirty buffers as they are dropped from pending
transactions.

ocfs2 will use this feature.

Initially I tried to come up with a more generic trigger that could be
used for non-buffer-related events like transaction completion. It
doesn't tie nicely, because the information a buffer trigger needs
(specific to a journal_head) isn't the same as what a transaction
trigger needs (specific to a tranaction_t or perhaps journal_t). So I
implemented a buffer set, with the understanding that
journal/transaction wide triggers should be implemented separately.

There is only one trigger set allowed per buffer. I can't think of any
reason to attach more than one set. Contrast this with a journal or
transaction in which multiple places may want to watch the entire
transaction separately.

The trigger sets are considered static allocation from the jbd2
perspective. ocfs2 will just have one trigger set per block type,
setting the same set on every bh of the same type.

Signed-off-by: Joel Becker <[email protected]>
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 0abe02c..6a38e8e 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -509,6 +509,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
if (is_journal_aborted(journal)) {
clear_buffer_jbddirty(jh2bh(jh));
JBUFFER_TRACE(jh, "journal is aborting: refile");
+ jbd2_buffer_abort_trigger(jh);
jbd2_journal_refile_buffer(journal, jh);
/* If that was the last one, we need to clean up
* any descriptor buffers which may have been
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 783de11..78dcbb3 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -50,6 +50,7 @@ EXPORT_SYMBOL(jbd2_journal_unlock_updates);
EXPORT_SYMBOL(jbd2_journal_get_write_access);
EXPORT_SYMBOL(jbd2_journal_get_create_access);
EXPORT_SYMBOL(jbd2_journal_get_undo_access);
+EXPORT_SYMBOL(jbd2_journal_set_triggers);
EXPORT_SYMBOL(jbd2_journal_dirty_metadata);
EXPORT_SYMBOL(jbd2_journal_release_buffer);
EXPORT_SYMBOL(jbd2_journal_forget);
@@ -320,6 +321,11 @@ repeat:
}

mapped_data = kmap_atomic(new_page, KM_USER0);
+
+ /* We have the actual buffer to go out, fire any commit trigger */
+ if (jh_in->b_triggers && jh_in->b_triggers->t_commit)
+ jbd2_buffer_commit_trigger(jh_in, mapped_data + new_offset);
+
/*
* Check for escaping
*/
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index e5d5405..e975afa 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -943,6 +943,48 @@ out:
}

/**
+ * void jbd2_journal_set_triggers() - Add triggers for commit writeout
+ * @bh: buffer to trigger on
+ * @type: struct jbd2_buffer_trigger_type containing the trigger(s).
+ *
+ * Set any triggers on this journal_head.
+ *
+ * Call with NULL to clear the triggers.
+ */
+void jbd2_journal_set_triggers(struct buffer_head *bh,
+ struct jbd2_buffer_trigger_type *type)
+{
+ struct journal_head *jh = bh2jh(bh);
+
+ if (jh->b_triggers && type) {
+ WARN_ON_ONCE(jh->b_triggers != type);
+ return;
+ }
+
+ jh->b_triggers = type;
+}
+
+void jbd2_buffer_commit_trigger(struct journal_head *jh, void *mapped_data)
+{
+ struct buffer_head *bh = jh2bh(jh);
+
+ if (!jh->b_triggers || !jh->b_triggers->t_commit)
+ return;
+
+ jh->b_triggers->t_commit(jh->b_triggers, bh, mapped_data, bh->b_size);
+}
+
+void jbd2_buffer_abort_trigger(struct journal_head *jh)
+{
+ if (!jh->b_triggers || !jh->b_triggers->t_abort)
+ return;
+
+ jh->b_triggers->t_abort(jh->b_triggers, jh2bh(jh));
+}
+
+
+
+/**
* int jbd2_journal_dirty_metadata() - mark a buffer as containing dirty metadata
* @handle: transaction to add buffer to.
* @bh: buffer to mark
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index d2e91ea..c71d809 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -998,6 +998,33 @@ int __jbd2_journal_clean_checkpoint_list(journal_t *journal);
int __jbd2_journal_remove_checkpoint(struct journal_head *);
void __jbd2_journal_insert_checkpoint(struct journal_head *, transaction_t *);

+
+/*
+ * Triggers
+ */
+
+struct jbd2_buffer_trigger_type {
+ /*
+ * Fired just before a buffer is written to the journal.
+ * mapped_data is a mapped buffer that is the frozen data for
+ * commit.
+ */
+ void (*t_commit)(struct jbd2_buffer_trigger_type *type,
+ struct buffer_head *bh, void *mapped_data,
+ size_t size);
+
+ /*
+ * Fired during journal abort for dirty buffers that will not be
+ * committed.
+ */
+ void (*t_abort)(struct jbd2_buffer_trigger_type *type,
+ struct buffer_head *bh);
+};
+
+extern void jbd2_buffer_commit_trigger(struct journal_head *jh,
+ void *mapped_data);
+extern void jbd2_buffer_abort_trigger(struct journal_head *jh);
+
/* Buffer IO */
extern int
jbd2_journal_write_metadata_buffer(transaction_t *transaction,
@@ -1036,6 +1063,8 @@ extern int jbd2_journal_extend (handle_t *, int nblocks);
extern int jbd2_journal_get_write_access(handle_t *, struct buffer_head *);
extern int jbd2_journal_get_create_access (handle_t *, struct buffer_head *);
extern int jbd2_journal_get_undo_access(handle_t *, struct buffer_head *);
+void jbd2_journal_set_triggers(struct buffer_head *,
+ struct jbd2_buffer_trigger_type *type);
extern int jbd2_journal_dirty_metadata (handle_t *, struct buffer_head *);
extern void jbd2_journal_release_buffer (handle_t *, struct buffer_head *);
extern int jbd2_journal_forget (handle_t *, struct buffer_head *);
diff --git a/include/linux/journal-head.h b/include/linux/journal-head.h
index 8a62d1e..2151e4f 100644
--- a/include/linux/journal-head.h
+++ b/include/linux/journal-head.h
@@ -12,6 +12,7 @@

typedef unsigned int tid_t; /* Unique transaction ID */
typedef struct transaction_s transaction_t; /* Compound transaction type */
+
struct buffer_head;

struct journal_head {
@@ -87,6 +88,9 @@ struct journal_head {
* [j_list_lock]
*/
struct journal_head *b_cpnext, *b_cpprev;
+
+ /* Trigger type -- only used for jbd2 */
+ struct jbd2_buffer_trigger_type *b_triggers;
};

#endif /* JOURNAL_HEAD_H_INCLUDED */

2008-10-16 19:41:12

by Joel Becker

[permalink] [raw]
Subject: Re: [Ocfs2-devel] [PATCH] [RFC] jbd2: Add buffer triggers

On Thu, Oct 16, 2008 at 01:42:41PM -0400, Theodore Tso wrote:
> On Wed, Oct 08, 2008 at 04:17:52PM -0700, Joel Becker wrote:
> > In other words, I think that the commit trigger is safe in all
> > circumstances once moved up in journal_commit_transaction(). I'll be
> > cooking that up shortly.
>
> Have you had a chance to look at this? Here's something which hasn't
> been compile-tested yet, but it didn't take me long...

Actually, I've had the fixed up patch for a week, tested even.
You caught me this morning as I was rebasing against Linus. Sorry I
didn't send it out sooner. The only difference from your version, that
I can tell, is that I unconditionally call the commit trigger function -
it handles checking for empty b_trigger, etc.

Joel

>From 342869b6588791e62c45f239041f419772ca92ba Mon Sep 17 00:00:00 2001
From: Joel Becker <[email protected]>
Date: Thu, 11 Sep 2008 15:35:47 -0700
Subject: [PATCH] jbd2: Add buffer triggers

Filesystems often to do compute intensive operation on some
metadata. If this operation is repeated many times, it can be very
expensive. It would be much nicer if the operation could be performed
once before a buffer goes to disk.

This adds triggers to jbd2 buffer heads. Just before writing a metadata
buffer to the journal, jbd2 will optionally call a commit trigger associated
with the buffer. If the journal is aborted, an abort trigger will be
called on any dirty buffers as they are dropped from pending
transactions.

ocfs2 will use this feature.

Initially I tried to come up with a more generic trigger that could be
used for non-buffer-related events like transaction completion. It
doesn't tie nicely, because the information a buffer trigger needs
(specific to a journal_head) isn't the same as what a transaction
trigger needs (specific to a tranaction_t or perhaps journal_t). So I
implemented a buffer set, with the understanding that
journal/transaction wide triggers should be implemented separately.

There is only one trigger set allowed per buffer. I can't think of any
reason to attach more than one set. Contrast this with a journal or
transaction in which multiple places may want to watch the entire
transaction separately.

The trigger sets are considered static allocation from the jbd2
perspective. ocfs2 will just have one trigger set per block type,
setting the same set on every bh of the same type.

Signed-off-by: Joel Becker <[email protected]>
---
fs/jbd2/commit.c | 1 +
fs/jbd2/journal.c | 8 ++++++++
fs/jbd2/transaction.c | 42 ++++++++++++++++++++++++++++++++++++++++++
include/linux/jbd2.h | 29 +++++++++++++++++++++++++++++
include/linux/journal-head.h | 5 +++++
5 files changed, 85 insertions(+), 0 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 0abe02c..6a38e8e 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -509,6 +509,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
if (is_journal_aborted(journal)) {
clear_buffer_jbddirty(jh2bh(jh));
JBUFFER_TRACE(jh, "journal is aborting: refile");
+ jbd2_buffer_abort_trigger(jh);
jbd2_journal_refile_buffer(journal, jh);
/* If that was the last one, we need to clean up
* any descriptor buffers which may have been
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 783de11..90e03f1 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -50,6 +50,7 @@ EXPORT_SYMBOL(jbd2_journal_unlock_updates);
EXPORT_SYMBOL(jbd2_journal_get_write_access);
EXPORT_SYMBOL(jbd2_journal_get_create_access);
EXPORT_SYMBOL(jbd2_journal_get_undo_access);
+EXPORT_SYMBOL(jbd2_journal_set_triggers);
EXPORT_SYMBOL(jbd2_journal_dirty_metadata);
EXPORT_SYMBOL(jbd2_journal_release_buffer);
EXPORT_SYMBOL(jbd2_journal_forget);
@@ -321,6 +322,13 @@ repeat:

mapped_data = kmap_atomic(new_page, KM_USER0);
/*
+ * Fire any commit trigger. Do this before checking for escaping,
+ * as the trigger may modify the magic offset. If a copy-out
+ * happens afterwards, it will have the correct data in the buffer.
+ */
+ jbd2_buffer_commit_trigger(jh_in, mapped_data + new_offset);
+
+ /*
* Check for escaping
*/
if (*((__be32 *)(mapped_data + new_offset)) ==
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index e5d5405..e975afa 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -943,6 +943,48 @@ out:
}

/**
+ * void jbd2_journal_set_triggers() - Add triggers for commit writeout
+ * @bh: buffer to trigger on
+ * @type: struct jbd2_buffer_trigger_type containing the trigger(s).
+ *
+ * Set any triggers on this journal_head.
+ *
+ * Call with NULL to clear the triggers.
+ */
+void jbd2_journal_set_triggers(struct buffer_head *bh,
+ struct jbd2_buffer_trigger_type *type)
+{
+ struct journal_head *jh = bh2jh(bh);
+
+ if (jh->b_triggers && type) {
+ WARN_ON_ONCE(jh->b_triggers != type);
+ return;
+ }
+
+ jh->b_triggers = type;
+}
+
+void jbd2_buffer_commit_trigger(struct journal_head *jh, void *mapped_data)
+{
+ struct buffer_head *bh = jh2bh(jh);
+
+ if (!jh->b_triggers || !jh->b_triggers->t_commit)
+ return;
+
+ jh->b_triggers->t_commit(jh->b_triggers, bh, mapped_data, bh->b_size);
+}
+
+void jbd2_buffer_abort_trigger(struct journal_head *jh)
+{
+ if (!jh->b_triggers || !jh->b_triggers->t_abort)
+ return;
+
+ jh->b_triggers->t_abort(jh->b_triggers, jh2bh(jh));
+}
+
+
+
+/**
* int jbd2_journal_dirty_metadata() - mark a buffer as containing dirty metadata
* @handle: transaction to add buffer to.
* @bh: buffer to mark
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index d2e91ea..c71d809 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -998,6 +998,33 @@ int __jbd2_journal_clean_checkpoint_list(journal_t *journal);
int __jbd2_journal_remove_checkpoint(struct journal_head *);
void __jbd2_journal_insert_checkpoint(struct journal_head *, transaction_t *);

+
+/*
+ * Triggers
+ */
+
+struct jbd2_buffer_trigger_type {
+ /*
+ * Fired just before a buffer is written to the journal.
+ * mapped_data is a mapped buffer that is the frozen data for
+ * commit.
+ */
+ void (*t_commit)(struct jbd2_buffer_trigger_type *type,
+ struct buffer_head *bh, void *mapped_data,
+ size_t size);
+
+ /*
+ * Fired during journal abort for dirty buffers that will not be
+ * committed.
+ */
+ void (*t_abort)(struct jbd2_buffer_trigger_type *type,
+ struct buffer_head *bh);
+};
+
+extern void jbd2_buffer_commit_trigger(struct journal_head *jh,
+ void *mapped_data);
+extern void jbd2_buffer_abort_trigger(struct journal_head *jh);
+
/* Buffer IO */
extern int
jbd2_journal_write_metadata_buffer(transaction_t *transaction,
@@ -1036,6 +1063,8 @@ extern int jbd2_journal_extend (handle_t *, int nblocks);
extern int jbd2_journal_get_write_access(handle_t *, struct buffer_head *);
extern int jbd2_journal_get_create_access (handle_t *, struct buffer_head *);
extern int jbd2_journal_get_undo_access(handle_t *, struct buffer_head *);
+void jbd2_journal_set_triggers(struct buffer_head *,
+ struct jbd2_buffer_trigger_type *type);
extern int jbd2_journal_dirty_metadata (handle_t *, struct buffer_head *);
extern void jbd2_journal_release_buffer (handle_t *, struct buffer_head *);
extern int jbd2_journal_forget (handle_t *, struct buffer_head *);
diff --git a/include/linux/journal-head.h b/include/linux/journal-head.h
index 8a62d1e..087a1c2 100644
--- a/include/linux/journal-head.h
+++ b/include/linux/journal-head.h
@@ -12,6 +12,8 @@

typedef unsigned int tid_t; /* Unique transaction ID */
typedef struct transaction_s transaction_t; /* Compound transaction type */
+
+
struct buffer_head;

struct journal_head {
@@ -87,6 +89,9 @@ struct journal_head {
* [j_list_lock]
*/
struct journal_head *b_cpnext, *b_cpprev;
+
+ /* Trigger type */
+ struct jbd2_buffer_trigger_type *b_triggers;
};

#endif /* JOURNAL_HEAD_H_INCLUDED */
--
1.5.6.5

--

"Anything that is too stupid to be spoken is sung."
- Voltaire

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2008-10-16 19:44:38

by Joel Becker

[permalink] [raw]
Subject: Re: [PATCH] [RFC] jbd2: Add buffer triggers

On Thu, Oct 16, 2008 at 12:40:11PM -0700, Joel Becker wrote:
> On Thu, Oct 16, 2008 at 01:42:41PM -0400, Theodore Tso wrote:
> > On Wed, Oct 08, 2008 at 04:17:52PM -0700, Joel Becker wrote:
> > > In other words, I think that the commit trigger is safe in all
> > > circumstances once moved up in journal_commit_transaction(). I'll be
> > > cooking that up shortly.
> >
> > Have you had a chance to look at this? Here's something which hasn't
> > been compile-tested yet, but it didn't take me long...
>
> Actually, I've had the fixed up patch for a week, tested even.

The testing was with this simple printk patch. I just watched
the journal blocks go out. Things like kernel untars, etc.

Joel

From: Joel Becker <[email protected]>
Date: Thu, 11 Sep 2008 15:53:07 -0700
Subject: [PATCH] ocfs2: Use the new jbd_journal_set_commit_trigger() to printk.

Just a placeholder trigger set that prints

Signed-off-by: Joel Becker <[email protected]>
---
fs/ocfs2/journal.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 51 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index 81e4067..f56666b 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -357,6 +357,51 @@ bail:
return status;
}

+struct ocfs2_triggers {
+ struct jbd2_buffer_trigger_type ot_triggers;
+ int ot_offset;
+};
+
+static inline struct ocfs2_triggers *to_ocfs2_trigger(struct jbd2_buffer_trigger_type *triggers)
+{
+ return container_of(triggers, struct ocfs2_triggers, ot_triggers);
+}
+
+static void ocfs2_commit_trigger(struct jbd2_buffer_trigger_type *triggers,
+ struct buffer_head *bh,
+ void *data, size_t size)
+{
+ struct ocfs2_triggers *ot = to_ocfs2_trigger(triggers);
+
+ mlog(ML_NOTICE, "bh = 0x%lx, data = 0x%lx, size = %u, offset = %d\n",
+ (unsigned long)bh, (unsigned long)data, size, ot->ot_offset);
+}
+
+static void ocfs2_abort_trigger(struct jbd2_buffer_trigger_type *triggers,
+ struct buffer_head *bh)
+{
+ struct ocfs2_triggers *ot = to_ocfs2_trigger(triggers);
+
+ mlog(ML_NOTICE, "bh = 0x%lx, offset = %d\n", (unsigned long)bh, ot->ot_offset);
+}
+
+static struct ocfs2_triggers inode_triggers = {
+ .ot_triggers = {
+ .t_commit = ocfs2_commit_trigger,
+ .t_abort = ocfs2_abort_trigger,
+ },
+ .ot_offset = 10, /* Garbage */
+};
+
+static struct ocfs2_triggers other_triggers = {
+ .ot_triggers = {
+ .t_commit = ocfs2_commit_trigger,
+ .t_abort = ocfs2_abort_trigger,
+ },
+ .ot_offset = 20, /* Different garbage */
+};
+
+
int ocfs2_journal_access(handle_t *handle,
struct inode *inode,
struct buffer_head *bh,
@@ -406,6 +451,12 @@ int ocfs2_journal_access(handle_t *handle,
status = -EINVAL;
mlog(ML_ERROR, "Uknown access type!\n");
}
+ if (!status)
+ jbd2_journal_set_triggers(bh,
+ bh->b_blocknr ==
+ OCFS2_I(inode)->ip_blkno ?
+ &inode_triggers.ot_triggers :
+ &other_triggers.ot_triggers);
mutex_unlock(&OCFS2_I(inode)->ip_io_mutex);

if (status < 0)
--
1.5.6.5

--

"In a crisis, don't hide behind anything or anybody. They're going
to find you anyway."
- Paul "Bear" Bryant

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2008-10-17 12:28:43

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [Ocfs2-devel] [PATCH] [RFC] jbd2: Add buffer triggers

On Thu, Oct 16, 2008 at 12:40:11PM -0700, Joel Becker wrote:
> Actually, I've had the fixed up patch for a week, tested even.
> You caught me this morning as I was rebasing against Linus. Sorry I
> didn't send it out sooner. The only difference from your version, that
> I can tell, is that I unconditionally call the commit trigger function -
> it handles checking for empty b_trigger, etc.
>

Looks good to me.

Acked-by: "Theodore Ts'o" <[email protected]>

Given that OCFS2 is going to be the first user of this patch, I guess
it would make sense for you to push it to Linus in an ocfs2 branch?
Or would you like me to send it to Linus?

- Ted

2008-10-17 17:11:23

by Mark Fasheh

[permalink] [raw]
Subject: Re: [Ocfs2-devel] [PATCH] [RFC] jbd2: Add buffer triggers

On Fri, Oct 17, 2008 at 08:28:40AM -0400, Theodore Tso wrote:
> Looks good to me.
>
> Acked-by: "Theodore Ts'o" <[email protected]>

Excellet, thanks for the review.

>
> Given that OCFS2 is going to be the first user of this patch, I guess
> it would make sense for you to push it to Linus in an ocfs2 branch?
> Or would you like me to send it to Linus?

I'll pick it up in ocfs2.git.

Thanks, Mark


--
Mark Fasheh