2008-12-25 18:05:09

by Mark Fasheh

[permalink] [raw]
Subject: [git patches] Ocfs2 patches for merge window, batch 3/3

Hi,

This is the 3rd and final batch of Ocfs2 patches intended for the merge
window. The 2nd batch were sent out previously:

http://lkml.org/lkml/2008/12/22/213

This batch includes some more xattr fixes, some dlm fixes from Sunil and
meta data checksumming support from Joel.

With all the other prep patches, the checksumming patches become pretty
straight forward. Mostly it's a matter of finding the right spot in our disk
structures. For most blocks, we just take an unused field. Directory data
gets a hidden structure at the end of every block. We'll actually be making
use of this structure in the future for directory indexing support. The
checksum code stores an ECC in each 64 bit field, so single bit errors can
be corrected as those blocks are read.

Checking on read is done with the per-metadata-type callbacks we previously
added. Write checks are done via a new jbd2 mechanism - 'buffer triggers',
which is implemented in the 1st patch of this series. The first version of
the buffer triggers patch got Ted's sign off, but we made some changes and
sent another version to the ext4 list. The version sent here is the latest
one. if I got my git send-email command right, Ted and the Ext4 list should
be CC'd.
--Mark

Please pull from 'upstream-round3' branch of
git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2.git upstream-round3

to receive the following updates:

fs/jbd2/commit.c | 9 +
fs/jbd2/journal.c | 19 ++
fs/jbd2/transaction.c | 47 ++++
fs/ocfs2/Makefile | 1 +
fs/ocfs2/alloc.c | 315 ++++++++++++++++------------
fs/ocfs2/alloc.h | 9 +-
fs/ocfs2/aops.c | 8 +-
fs/ocfs2/blockcheck.c | 477 ++++++++++++++++++++++++++++++++++++++++++
fs/ocfs2/blockcheck.h | 82 +++++++
fs/ocfs2/dir.c | 282 ++++++++++++++++++++++---
fs/ocfs2/dir.h | 2 +
fs/ocfs2/dlm/dlmast.c | 52 +++---
fs/ocfs2/dlm/dlmcommon.h | 3 +
fs/ocfs2/dlm/dlmdebug.c | 53 ++---
fs/ocfs2/dlm/dlmdomain.c | 1 +
fs/ocfs2/dlm/dlmmaster.c | 42 ++++-
fs/ocfs2/dlm/dlmthread.c | 3 +-
fs/ocfs2/file.c | 16 +-
fs/ocfs2/inode.c | 33 +++-
fs/ocfs2/journal.c | 188 ++++++++++++++++-
fs/ocfs2/journal.h | 32 +++-
fs/ocfs2/localalloc.c | 18 +-
fs/ocfs2/namei.c | 38 ++--
fs/ocfs2/ocfs2.h | 15 ++
fs/ocfs2/ocfs2_fs.h | 87 +++++++-
fs/ocfs2/quota_global.c | 25 ++-
fs/ocfs2/quota_local.c | 18 +-
fs/ocfs2/resize.c | 16 +-
fs/ocfs2/suballoc.c | 87 ++++++---
fs/ocfs2/super.c | 11 +
fs/ocfs2/xattr.c | 312 ++++++++++++++++++----------
fs/ocfs2/xattr.h | 14 ++
include/linux/jbd2.h | 31 +++
include/linux/journal-head.h | 8 +
34 files changed, 1910 insertions(+), 444 deletions(-)
create mode 100644 fs/ocfs2/blockcheck.c
create mode 100644 fs/ocfs2/blockcheck.h

Joel Becker (23):
jbd2: Add buffer triggers
ocfs2: Add the on-disk structures for metadata checksums.
ocfs2: Add the underlying blockcheck code.
ocfs2: Add a validation hook for quota block reads.
ocfs2: block read meta ecc.
ocfs2: Add journal_access functions with jbd2 triggers.
ocfs2: Wrap up the common use cases of ocfs2_new_path().
ocfs2: Use metadata-specific ocfs2_journal_access_*() functions.
ocfs2: Add ecc and checksums to ocfs2 xattr buckets.
ocfs2: Create ocfs2_xattr_value_buf.
ocfs2: Pull ocfs2_xattr_value_buf up from __ocfs2_remove_xattr_range().
ocfs2: Pull ocfs2_xattr_value_buf up into ocfs2_xattr_value_truncate().
ocfs2: Pass ocfs2_xattr_value_buf into ocfs2_xattr_value_truncate().
ocfs2: Pass value buf to ocfs2_xattr_update_entry().
ocfs2: Use ocfs2_xattr_value_buf in ocfs2_xattr_set_entry().
ocfs2: Pass value buf to ocfs2_remove_value_outside().
ocfs2: Use proper journal_access function in xattr.c
ocfs2: Checksum and ECC for directory blocks.
ocfs2: Validate superblock with checksum and ecc.
ocfs2: Enable metadata checksums.
ocfs2: Don't hand-code xor in ocfs2_hamming_encode().
ocfs2: Another hamming code optimization.
ocfs2: One more hamming code optimization.

Mark Fasheh (1):
ocfs2: Add directory block trailers.

Sunil Mushran (5):
ocfs2/dlm: Fix a race between migrate request and exit domain
ocfs2/dlm: Clean up errors in dlm_proxy_ast_handler()
ocfs2/dlm: Hold off sending lockres drop ref message while lockres is migrating
ocfs2/dlm: Fix race in adding/removing lockres' to/from the tracking list
ocfs2/dlm: Fix race during lockres mastery

Tao Ma (3):
ocfs2/xattr: Remove extend_trans call and add its credits from the beginning
ocfs2/xattr: Always updating ctime during xattr set.
ocfs2/xattr: fix credits calculation during index create

Tiger Yang (3):
ocfs2: calculate and reserve credits for xattr value in mknod
ocfs2: alloc xattr bucket in ocfs2_xattr_set_handle
ocfs2: Add xattr support checking in init_security


2008-12-25 18:05:37

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 01/35] jbd2: Add buffer triggers

From: Joel Becker <[email protected]>

Filesystems often to do compute intensive operation on some
metadata. If this operation is repeated many times, it can be very
expensive. It would be much nicer if the operation could be performed
once before a buffer goes to disk.

This adds triggers to jbd2 buffer heads. Just before writing a metadata
buffer to the journal, jbd2 will optionally call a commit trigger associated
with the buffer. If the journal is aborted, an abort trigger will be
called on any dirty buffers as they are dropped from pending
transactions.

ocfs2 will use this feature.

Initially I tried to come up with a more generic trigger that could be
used for non-buffer-related events like transaction completion. It
doesn't tie nicely, because the information a buffer trigger needs
(specific to a journal_head) isn't the same as what a transaction
trigger needs (specific to a tranaction_t or perhaps journal_t). So I
implemented a buffer set, with the understanding that
journal/transaction wide triggers should be implemented separately.

There is only one trigger set allowed per buffer. I can't think of any
reason to attach more than one set. Contrast this with a journal or
transaction in which multiple places may want to watch the entire
transaction separately.

The trigger sets are considered static allocation from the jbd2
perspective. ocfs2 will just have one trigger set per block type,
setting the same set on every bh of the same type.

Signed-off-by: Joel Becker <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/jbd2/commit.c | 9 ++++++++
fs/jbd2/journal.c | 19 +++++++++++++++++
fs/jbd2/transaction.c | 47 ++++++++++++++++++++++++++++++++++++++++++
include/linux/jbd2.h | 31 +++++++++++++++++++++++++++
include/linux/journal-head.h | 8 +++++++
5 files changed, 114 insertions(+), 0 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index ebc667b..c8a1bac 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -509,6 +509,10 @@ void jbd2_journal_commit_transaction(journal_t *journal)
if (is_journal_aborted(journal)) {
clear_buffer_jbddirty(jh2bh(jh));
JBUFFER_TRACE(jh, "journal is aborting: refile");
+ jbd2_buffer_abort_trigger(jh,
+ jh->b_frozen_data ?
+ jh->b_frozen_triggers :
+ jh->b_triggers);
jbd2_journal_refile_buffer(journal, jh);
/* If that was the last one, we need to clean up
* any descriptor buffers which may have been
@@ -844,6 +848,9 @@ restart_loop:
* data.
*
* Otherwise, we can just throw away the frozen data now.
+ *
+ * We also know that the frozen data has already fired
+ * its triggers if they exist, so we can clear that too.
*/
if (jh->b_committed_data) {
jbd2_free(jh->b_committed_data, bh->b_size);
@@ -851,10 +858,12 @@ restart_loop:
if (jh->b_frozen_data) {
jh->b_committed_data = jh->b_frozen_data;
jh->b_frozen_data = NULL;
+ jh->b_frozen_triggers = NULL;
}
} else if (jh->b_frozen_data) {
jbd2_free(jh->b_frozen_data, bh->b_size);
jh->b_frozen_data = NULL;
+ jh->b_frozen_triggers = NULL;
}

spin_lock(&journal->j_list_lock);
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index e70d657..f6bff9d 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -50,6 +50,7 @@ EXPORT_SYMBOL(jbd2_journal_unlock_updates);
EXPORT_SYMBOL(jbd2_journal_get_write_access);
EXPORT_SYMBOL(jbd2_journal_get_create_access);
EXPORT_SYMBOL(jbd2_journal_get_undo_access);
+EXPORT_SYMBOL(jbd2_journal_set_triggers);
EXPORT_SYMBOL(jbd2_journal_dirty_metadata);
EXPORT_SYMBOL(jbd2_journal_release_buffer);
EXPORT_SYMBOL(jbd2_journal_forget);
@@ -290,6 +291,7 @@ int jbd2_journal_write_metadata_buffer(transaction_t *transaction,
struct page *new_page;
unsigned int new_offset;
struct buffer_head *bh_in = jh2bh(jh_in);
+ struct jbd2_buffer_trigger_type *triggers;

/*
* The buffer really shouldn't be locked: only the current committing
@@ -314,13 +316,23 @@ repeat:
done_copy_out = 1;
new_page = virt_to_page(jh_in->b_frozen_data);
new_offset = offset_in_page(jh_in->b_frozen_data);
+ triggers = jh_in->b_frozen_triggers;
} else {
new_page = jh2bh(jh_in)->b_page;
new_offset = offset_in_page(jh2bh(jh_in)->b_data);
+ triggers = jh_in->b_triggers;
}

mapped_data = kmap_atomic(new_page, KM_USER0);
/*
+ * Fire any commit trigger. Do this before checking for escaping,
+ * as the trigger may modify the magic offset. If a copy-out
+ * happens afterwards, it will have the correct data in the buffer.
+ */
+ jbd2_buffer_commit_trigger(jh_in, mapped_data + new_offset,
+ triggers);
+
+ /*
* Check for escaping
*/
if (*((__be32 *)(mapped_data + new_offset)) ==
@@ -352,6 +364,13 @@ repeat:
new_page = virt_to_page(tmp);
new_offset = offset_in_page(tmp);
done_copy_out = 1;
+
+ /*
+ * This isn't strictly necessary, as we're using frozen
+ * data for the escaping, but it keeps consistency with
+ * b_frozen_data usage.
+ */
+ jh_in->b_frozen_triggers = jh_in->b_triggers;
}

/*
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 39b7805..4f925a4 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -741,6 +741,12 @@ done:
source = kmap_atomic(page, KM_USER0);
memcpy(jh->b_frozen_data, source+offset, jh2bh(jh)->b_size);
kunmap_atomic(source, KM_USER0);
+
+ /*
+ * Now that the frozen data is saved off, we need to store
+ * any matching triggers.
+ */
+ jh->b_frozen_triggers = jh->b_triggers;
}
jbd_unlock_bh_state(bh);

@@ -944,6 +950,47 @@ out:
}

/**
+ * void jbd2_journal_set_triggers() - Add triggers for commit writeout
+ * @bh: buffer to trigger on
+ * @type: struct jbd2_buffer_trigger_type containing the trigger(s).
+ *
+ * Set any triggers on this journal_head. This is always safe, because
+ * triggers for a committing buffer will be saved off, and triggers for
+ * a running transaction will match the buffer in that transaction.
+ *
+ * Call with NULL to clear the triggers.
+ */
+void jbd2_journal_set_triggers(struct buffer_head *bh,
+ struct jbd2_buffer_trigger_type *type)
+{
+ struct journal_head *jh = bh2jh(bh);
+
+ jh->b_triggers = type;
+}
+
+void jbd2_buffer_commit_trigger(struct journal_head *jh, void *mapped_data,
+ struct jbd2_buffer_trigger_type *triggers)
+{
+ struct buffer_head *bh = jh2bh(jh);
+
+ if (!triggers || !triggers->t_commit)
+ return;
+
+ triggers->t_commit(triggers, bh, mapped_data, bh->b_size);
+}
+
+void jbd2_buffer_abort_trigger(struct journal_head *jh,
+ struct jbd2_buffer_trigger_type *triggers)
+{
+ if (!triggers || !triggers->t_abort)
+ return;
+
+ triggers->t_abort(triggers, jh2bh(jh));
+}
+
+
+
+/**
* int jbd2_journal_dirty_metadata() - mark a buffer as containing dirty metadata
* @handle: transaction to add buffer to.
* @bh: buffer to mark
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index f366457..3445647 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1008,6 +1008,35 @@ int __jbd2_journal_clean_checkpoint_list(journal_t *journal);
int __jbd2_journal_remove_checkpoint(struct journal_head *);
void __jbd2_journal_insert_checkpoint(struct journal_head *, transaction_t *);

+
+/*
+ * Triggers
+ */
+
+struct jbd2_buffer_trigger_type {
+ /*
+ * Fired just before a buffer is written to the journal.
+ * mapped_data is a mapped buffer that is the frozen data for
+ * commit.
+ */
+ void (*t_commit)(struct jbd2_buffer_trigger_type *type,
+ struct buffer_head *bh, void *mapped_data,
+ size_t size);
+
+ /*
+ * Fired during journal abort for dirty buffers that will not be
+ * committed.
+ */
+ void (*t_abort)(struct jbd2_buffer_trigger_type *type,
+ struct buffer_head *bh);
+};
+
+extern void jbd2_buffer_commit_trigger(struct journal_head *jh,
+ void *mapped_data,
+ struct jbd2_buffer_trigger_type *triggers);
+extern void jbd2_buffer_abort_trigger(struct journal_head *jh,
+ struct jbd2_buffer_trigger_type *triggers);
+
/* Buffer IO */
extern int
jbd2_journal_write_metadata_buffer(transaction_t *transaction,
@@ -1046,6 +1075,8 @@ extern int jbd2_journal_extend (handle_t *, int nblocks);
extern int jbd2_journal_get_write_access(handle_t *, struct buffer_head *);
extern int jbd2_journal_get_create_access (handle_t *, struct buffer_head *);
extern int jbd2_journal_get_undo_access(handle_t *, struct buffer_head *);
+void jbd2_journal_set_triggers(struct buffer_head *,
+ struct jbd2_buffer_trigger_type *type);
extern int jbd2_journal_dirty_metadata (handle_t *, struct buffer_head *);
extern void jbd2_journal_release_buffer (handle_t *, struct buffer_head *);
extern int jbd2_journal_forget (handle_t *, struct buffer_head *);
diff --git a/include/linux/journal-head.h b/include/linux/journal-head.h
index bb70ebb..525aac3 100644
--- a/include/linux/journal-head.h
+++ b/include/linux/journal-head.h
@@ -12,6 +12,8 @@

typedef unsigned int tid_t; /* Unique transaction ID */
typedef struct transaction_s transaction_t; /* Compound transaction type */
+
+
struct buffer_head;

struct journal_head {
@@ -87,6 +89,12 @@ struct journal_head {
* [j_list_lock]
*/
struct journal_head *b_cpnext, *b_cpprev;
+
+ /* Trigger type */
+ struct jbd2_buffer_trigger_type *b_triggers;
+
+ /* Trigger type for the committing transaction's frozen data */
+ struct jbd2_buffer_trigger_type *b_frozen_triggers;
};

#endif /* JOURNAL_HEAD_H_INCLUDED */
--
1.5.6

2008-12-25 18:05:57

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 02/35] ocfs2: Add the on-disk structures for metadata checksums.

From: Joel Becker <[email protected]>

Define struct ocfs2_block_check, an 8-byte structure containing a 32bit
crc32_le and a 16bit hamming code ecc. This will be used for metadata
checksums. Add the structure to free spaces in the various metadata
structures.

Add the OCFS2_FEATURE_INCOMPAT_META_ECC bit.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/ocfs2_fs.h | 55 ++++++++++++++++++++++++++++++++++++++++++++++----
1 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/fs/ocfs2/ocfs2_fs.h b/fs/ocfs2/ocfs2_fs.h
index 359732e..290fa26 100644
--- a/fs/ocfs2/ocfs2_fs.h
+++ b/fs/ocfs2/ocfs2_fs.h
@@ -149,6 +149,9 @@
/* Support for extended attributes */
#define OCFS2_FEATURE_INCOMPAT_XATTR 0x0200

+/* Metadata checksum and error correction */
+#define OCFS2_FEATURE_INCOMPAT_META_ECC 0x0800
+
/*
* backup superblock flag is used to indicate that this volume
* has backup superblocks.
@@ -427,6 +430,22 @@ static unsigned char ocfs2_type_by_mode[S_IFMT >> S_SHIFT] = {
#define OCFS2_RAW_SB(dinode) (&((dinode)->id2.i_super))

/*
+ * Block checking structure. This is used in metadata to validate the
+ * contents. If OCFS2_FEATURE_INCOMPAT_META_ECC is not set, it is all
+ * zeros.
+ */
+struct ocfs2_block_check {
+/*00*/ __le32 bc_crc32e; /* 802.3 Ethernet II CRC32 */
+ __le16 bc_ecc; /* Single-error-correction parity vector.
+ This is a simple Hamming code dependant
+ on the blocksize. OCFS2's maximum
+ blocksize, 4K, requires 16 parity bits,
+ so we fit in __le16. */
+ __le16 bc_reserved1;
+/*08*/
+};
+
+/*
* On disk extent record for OCFS2
* It describes a range of clusters on disk.
*
@@ -513,7 +532,7 @@ struct ocfs2_truncate_log {
struct ocfs2_extent_block
{
/*00*/ __u8 h_signature[8]; /* Signature for verification */
- __le64 h_reserved1;
+ struct ocfs2_block_check h_check; /* Error checking */
/*10*/ __le16 h_suballoc_slot; /* Slot suballocator this
extent_header belongs to */
__le16 h_suballoc_bit; /* Bit offset in suballocator
@@ -683,7 +702,8 @@ struct ocfs2_dinode {
was set in i_flags */
__le16 i_dyn_features;
__le64 i_xattr_loc;
-/*80*/ __le64 i_reserved2[7];
+/*80*/ struct ocfs2_block_check i_check; /* Error checking */
+/*88*/ __le64 i_reserved2[6];
/*B8*/ union {
__le64 i_pad1; /* Generic way to refer to this
64bit union */
@@ -750,7 +770,8 @@ struct ocfs2_group_desc
/*20*/ __le64 bg_parent_dinode; /* dinode which owns me, in
blocks */
__le64 bg_blkno; /* Offset on disk, in blocks */
-/*30*/ __le64 bg_reserved2[2];
+/*30*/ struct ocfs2_block_check bg_check; /* Error checking */
+ __le64 bg_reserved2;
/*40*/ __u8 bg_bitmap[0];
};

@@ -793,7 +814,12 @@ struct ocfs2_xattr_header {
in this extent record,
only valid in the first
bucket. */
- __le64 xh_csum;
+ struct ocfs2_block_check xh_check; /* Error checking
+ (Note, this is only
+ used for xattr
+ buckets. A block uses
+ xb_check and sets
+ this field to zero.) */
struct ocfs2_xattr_entry xh_entries[0]; /* xattr entry list. */
};

@@ -844,7 +870,7 @@ struct ocfs2_xattr_block {
block group */
__le32 xb_fs_generation; /* Must match super block */
/*10*/ __le64 xb_blkno; /* Offset on disk, in blocks */
- __le64 xb_csum;
+ struct ocfs2_block_check xb_check; /* Error checking */
/*20*/ __le16 xb_flags; /* Indicates whether this block contains
real xattr or a xattr tree. */
__le16 xb_reserved0;
@@ -988,6 +1014,25 @@ struct ocfs2_local_disk_dqblk {
/*10*/ __le64 dqb_inodemod; /* Change in the amount of used inodes */
};

+
+/*
+ * The quota trailer lives at the end of each quota block.
+ */
+
+struct ocfs2_disk_dqtrailer {
+/*00*/ struct ocfs2_block_check dq_check; /* Error checking */
+/*08*/ /* Cannot be larger than OCFS2_QBLK_RESERVED_SPACE */
+};
+
+static inline struct ocfs2_disk_dqtrailer *ocfs2_block_dqtrailer(int blocksize,
+ void *buf)
+{
+ char *ptr = buf;
+ ptr += blocksize - OCFS2_QBLK_RESERVED_SPACE;
+
+ return (struct ocfs2_disk_dqtrailer *)ptr;
+}
+
#ifdef __KERNEL__
static inline int ocfs2_fast_symlink_chars(struct super_block *sb)
{
--
1.5.6

2008-12-25 18:06:24

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 03/35] ocfs2: Add the underlying blockcheck code.

From: Joel Becker <[email protected]>

This is the code that computes crc32 and ecc for ocfs2 metadata blocks.
There are high-level functions that check whether the filesystem has the
ecc feature, mid-level functions that work on a single block or array of
buffer_heads, and the low-level ecc hamming code that can handle
multiple buffers like crc32_le().

It's not hooked up to the filesystem yet.

Signed-off-by: Joel Becker <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/Makefile | 1 +
fs/ocfs2/blockcheck.c | 480 +++++++++++++++++++++++++++++++++++++++++++++++++
fs/ocfs2/blockcheck.h | 82 +++++++++
fs/ocfs2/ocfs2.h | 8 +
4 files changed, 571 insertions(+), 0 deletions(-)
create mode 100644 fs/ocfs2/blockcheck.c
create mode 100644 fs/ocfs2/blockcheck.h

diff --git a/fs/ocfs2/Makefile b/fs/ocfs2/Makefile
index 7e4b361..0159607 100644
--- a/fs/ocfs2/Makefile
+++ b/fs/ocfs2/Makefile
@@ -12,6 +12,7 @@ obj-$(CONFIG_OCFS2_FS_USERSPACE_CLUSTER) += ocfs2_stack_user.o
ocfs2-objs := \
alloc.o \
aops.o \
+ blockcheck.o \
buffer_head_io.o \
dcache.o \
dir.o \
diff --git a/fs/ocfs2/blockcheck.c b/fs/ocfs2/blockcheck.c
new file mode 100644
index 0000000..2bf3d7f
--- /dev/null
+++ b/fs/ocfs2/blockcheck.c
@@ -0,0 +1,480 @@
+/* -*- mode: c; c-basic-offset: 8; -*-
+ * vim: noexpandtab sw=8 ts=8 sts=0:
+ *
+ * blockcheck.c
+ *
+ * Checksum and ECC codes for the OCFS2 userspace library.
+ *
+ * Copyright (C) 2006, 2008 Oracle. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/crc32.h>
+#include <linux/buffer_head.h>
+#include <linux/bitops.h>
+#include <asm/byteorder.h>
+
+#include "ocfs2.h"
+
+#include "blockcheck.h"
+
+
+
+/*
+ * We use the following conventions:
+ *
+ * d = # data bits
+ * p = # parity bits
+ * c = # total code bits (d + p)
+ */
+static int calc_parity_bits(unsigned int d)
+{
+ unsigned int p;
+
+ /*
+ * Bits required for Single Error Correction is as follows:
+ *
+ * d + p + 1 <= 2^p
+ *
+ * We're restricting ourselves to 31 bits of parity, that should be
+ * sufficient.
+ */
+ for (p = 1; p < 32; p++)
+ {
+ if ((d + p + 1) <= (1 << p))
+ return p;
+ }
+
+ return 0;
+}
+
+/*
+ * Calculate the bit offset in the hamming code buffer based on the bit's
+ * offset in the data buffer. Since the hamming code reserves all
+ * power-of-two bits for parity, the data bit number and the code bit
+ * number are offest by all the parity bits beforehand.
+ *
+ * Recall that bit numbers in hamming code are 1-based. This function
+ * takes the 0-based data bit from the caller.
+ *
+ * An example. Take bit 1 of the data buffer. 1 is a power of two (2^0),
+ * so it's a parity bit. 2 is a power of two (2^1), so it's a parity bit.
+ * 3 is not a power of two. So bit 1 of the data buffer ends up as bit 3
+ * in the code buffer.
+ */
+static unsigned int calc_code_bit(unsigned int i)
+{
+ unsigned int b, p;
+
+ /*
+ * Data bits are 0-based, but we're talking code bits, which
+ * are 1-based.
+ */
+ b = i + 1;
+
+ /*
+ * For every power of two below our bit number, bump our bit.
+ *
+ * We compare with (b + 1) becuase we have to compare with what b
+ * would be _if_ it were bumped up by the parity bit. Capice?
+ */
+ for (p = 0; (1 << p) < (b + 1); p++)
+ b++;
+
+ return b;
+}
+
+/*
+ * This is the low level encoder function. It can be called across
+ * multiple hunks just like the crc32 code. 'd' is the number of bits
+ * _in_this_hunk_. nr is the bit offset of this hunk. So, if you had
+ * two 512B buffers, you would do it like so:
+ *
+ * parity = ocfs2_hamming_encode(0, buf1, 512 * 8, 0);
+ * parity = ocfs2_hamming_encode(parity, buf2, 512 * 8, 512 * 8);
+ *
+ * If you just have one buffer, use ocfs2_hamming_encode_block().
+ */
+u32 ocfs2_hamming_encode(u32 parity, void *data, unsigned int d, unsigned int nr)
+{
+ unsigned int p = calc_parity_bits(nr + d);
+ unsigned int i, j, b;
+
+ BUG_ON(!p);
+
+ /*
+ * b is the hamming code bit number. Hamming code specifies a
+ * 1-based array, but C uses 0-based. So 'i' is for C, and 'b' is
+ * for the algorithm.
+ *
+ * The i++ in the for loop is so that the start offset passed
+ * to ocfs2_find_next_bit_set() is one greater than the previously
+ * found bit.
+ */
+ for (i = 0; (i = ocfs2_find_next_bit(data, d, i)) < d; i++)
+ {
+ /*
+ * i is the offset in this hunk, nr + i is the total bit
+ * offset.
+ */
+ b = calc_code_bit(nr + i);
+
+ for (j = 0; j < p; j++)
+ {
+ /*
+ * Data bits in the resultant code are checked by
+ * parity bits that are part of the bit number
+ * representation. Huh?
+ *
+ * <wikipedia href="http://en.wikipedia.org/wiki/Hamming_code">
+ * In other words, the parity bit at position 2^k
+ * checks bits in positions having bit k set in
+ * their binary representation. Conversely, for
+ * instance, bit 13, i.e. 1101(2), is checked by
+ * bits 1000(2) = 8, 0100(2)=4 and 0001(2) = 1.
+ * </wikipedia>
+ *
+ * Note that 'k' is the _code_ bit number. 'b' in
+ * our loop.
+ */
+ if (b & (1 << j))
+ parity ^= (1 << j);
+ }
+ }
+
+ /* While the data buffer was treated as little endian, the
+ * return value is in host endian. */
+ return parity;
+}
+
+u32 ocfs2_hamming_encode_block(void *data, unsigned int blocksize)
+{
+ return ocfs2_hamming_encode(0, data, blocksize * 8, 0);
+}
+
+/*
+ * Like ocfs2_hamming_encode(), this can handle hunks. nr is the bit
+ * offset of the current hunk. If bit to be fixed is not part of the
+ * current hunk, this does nothing.
+ *
+ * If you only have one hunk, use ocfs2_hamming_fix_block().
+ */
+void ocfs2_hamming_fix(void *data, unsigned int d, unsigned int nr,
+ unsigned int fix)
+{
+ unsigned int p = calc_parity_bits(nr + d);
+ unsigned int i, b;
+
+ BUG_ON(!p);
+
+ /*
+ * If the bit to fix has an hweight of 1, it's a parity bit. One
+ * busted parity bit is its own error. Nothing to do here.
+ */
+ if (hweight32(fix) == 1)
+ return;
+
+ /*
+ * nr + d is the bit right past the data hunk we're looking at.
+ * If fix after that, nothing to do
+ */
+ if (fix >= calc_code_bit(nr + d))
+ return;
+
+ /*
+ * nr is the offset in the data hunk we're starting at. Let's
+ * start b at the offset in the code buffer. See hamming_encode()
+ * for a more detailed description of 'b'.
+ */
+ b = calc_code_bit(nr);
+ /* If the fix is before this hunk, nothing to do */
+ if (fix < b)
+ return;
+
+ for (i = 0; i < d; i++, b++)
+ {
+ /* Skip past parity bits */
+ while (hweight32(b) == 1)
+ b++;
+
+ /*
+ * i is the offset in this data hunk.
+ * nr + i is the offset in the total data buffer.
+ * b is the offset in the total code buffer.
+ *
+ * Thus, when b == fix, bit i in the current hunk needs
+ * fixing.
+ */
+ if (b == fix)
+ {
+ if (ocfs2_test_bit(i, data))
+ ocfs2_clear_bit(i, data);
+ else
+ ocfs2_set_bit(i, data);
+ break;
+ }
+ }
+}
+
+void ocfs2_hamming_fix_block(void *data, unsigned int blocksize,
+ unsigned int fix)
+{
+ ocfs2_hamming_fix(data, blocksize * 8, 0, fix);
+}
+
+/*
+ * This function generates check information for a block.
+ * data is the block to be checked. bc is a pointer to the
+ * ocfs2_block_check structure describing the crc32 and the ecc.
+ *
+ * bc should be a pointer inside data, as the function will
+ * take care of zeroing it before calculating the check information. If
+ * bc does not point inside data, the caller must make sure any inline
+ * ocfs2_block_check structures are zeroed.
+ *
+ * The data buffer must be in on-disk endian (little endian for ocfs2).
+ * bc will be filled with little-endian values and will be ready to go to
+ * disk.
+ */
+void ocfs2_block_check_compute(void *data, size_t blocksize,
+ struct ocfs2_block_check *bc)
+{
+ u32 crc;
+ u32 ecc;
+
+ memset(bc, 0, sizeof(struct ocfs2_block_check));
+
+ crc = crc32_le(~0, data, blocksize);
+ ecc = ocfs2_hamming_encode_block(data, blocksize);
+
+ /*
+ * No ecc'd ocfs2 structure is larger than 4K, so ecc will be no
+ * larger than 16 bits.
+ */
+ BUG_ON(ecc > USHORT_MAX);
+
+ bc->bc_crc32e = cpu_to_le32(crc);
+ bc->bc_ecc = cpu_to_le16((u16)ecc);
+}
+
+/*
+ * This function validates existing check information. Like _compute,
+ * the function will take care of zeroing bc before calculating check codes.
+ * If bc is not a pointer inside data, the caller must have zeroed any
+ * inline ocfs2_block_check structures.
+ *
+ * Again, the data passed in should be the on-disk endian.
+ */
+int ocfs2_block_check_validate(void *data, size_t blocksize,
+ struct ocfs2_block_check *bc)
+{
+ int rc = 0;
+ struct ocfs2_block_check check;
+ u32 crc, ecc;
+
+ check.bc_crc32e = le32_to_cpu(bc->bc_crc32e);
+ check.bc_ecc = le16_to_cpu(bc->bc_ecc);
+
+ memset(bc, 0, sizeof(struct ocfs2_block_check));
+
+ /* Fast path - if the crc32 validates, we're good to go */
+ crc = crc32_le(~0, data, blocksize);
+ if (crc == check.bc_crc32e)
+ goto out;
+
+ /* Ok, try ECC fixups */
+ ecc = ocfs2_hamming_encode_block(data, blocksize);
+ ocfs2_hamming_fix_block(data, blocksize, ecc ^ check.bc_ecc);
+
+ /* And check the crc32 again */
+ crc = crc32_le(~0, data, blocksize);
+ if (crc == check.bc_crc32e)
+ goto out;
+
+ rc = -EIO;
+
+out:
+ bc->bc_crc32e = cpu_to_le32(check.bc_crc32e);
+ bc->bc_ecc = cpu_to_le16(check.bc_ecc);
+
+ return rc;
+}
+
+/*
+ * This function generates check information for a list of buffer_heads.
+ * bhs is the blocks to be checked. bc is a pointer to the
+ * ocfs2_block_check structure describing the crc32 and the ecc.
+ *
+ * bc should be a pointer inside data, as the function will
+ * take care of zeroing it before calculating the check information. If
+ * bc does not point inside data, the caller must make sure any inline
+ * ocfs2_block_check structures are zeroed.
+ *
+ * The data buffer must be in on-disk endian (little endian for ocfs2).
+ * bc will be filled with little-endian values and will be ready to go to
+ * disk.
+ */
+void ocfs2_block_check_compute_bhs(struct buffer_head **bhs, int nr,
+ struct ocfs2_block_check *bc)
+{
+ int i;
+ u32 crc, ecc;
+
+ BUG_ON(nr < 0);
+
+ if (!nr)
+ return;
+
+ memset(bc, 0, sizeof(struct ocfs2_block_check));
+
+ for (i = 0, crc = ~0, ecc = 0; i < nr; i++) {
+ crc = crc32_le(crc, bhs[i]->b_data, bhs[i]->b_size);
+ /*
+ * The number of bits in a buffer is obviously b_size*8.
+ * The offset of this buffer is b_size*i, so the bit offset
+ * of this buffer is b_size*8*i.
+ */
+ ecc = (u16)ocfs2_hamming_encode(ecc, bhs[i]->b_data,
+ bhs[i]->b_size * 8,
+ bhs[i]->b_size * 8 * i);
+ }
+
+ /*
+ * No ecc'd ocfs2 structure is larger than 4K, so ecc will be no
+ * larger than 16 bits.
+ */
+ BUG_ON(ecc > USHORT_MAX);
+
+ bc->bc_crc32e = cpu_to_le32(crc);
+ bc->bc_ecc = cpu_to_le16((u16)ecc);
+}
+
+/*
+ * This function validates existing check information on a list of
+ * buffer_heads. Like _compute_bhs, the function will take care of
+ * zeroing bc before calculating check codes. If bc is not a pointer
+ * inside data, the caller must have zeroed any inline
+ * ocfs2_block_check structures.
+ *
+ * Again, the data passed in should be the on-disk endian.
+ */
+int ocfs2_block_check_validate_bhs(struct buffer_head **bhs, int nr,
+ struct ocfs2_block_check *bc)
+{
+ int i, rc = 0;
+ struct ocfs2_block_check check;
+ u32 crc, ecc, fix;
+
+ BUG_ON(nr < 0);
+
+ if (!nr)
+ return 0;
+
+ check.bc_crc32e = le32_to_cpu(bc->bc_crc32e);
+ check.bc_ecc = le16_to_cpu(bc->bc_ecc);
+
+ memset(bc, 0, sizeof(struct ocfs2_block_check));
+
+ /* Fast path - if the crc32 validates, we're good to go */
+ for (i = 0, crc = ~0; i < nr; i++)
+ crc = crc32_le(crc, bhs[i]->b_data, bhs[i]->b_size);
+ if (crc == check.bc_crc32e)
+ goto out;
+
+ mlog(ML_ERROR,
+ "CRC32 failed: stored: %u, computed %u. Applying ECC.\n",
+ (unsigned int)check.bc_crc32e, (unsigned int)crc);
+
+ /* Ok, try ECC fixups */
+ for (i = 0, ecc = 0; i < nr; i++) {
+ /*
+ * The number of bits in a buffer is obviously b_size*8.
+ * The offset of this buffer is b_size*i, so the bit offset
+ * of this buffer is b_size*8*i.
+ */
+ ecc = (u16)ocfs2_hamming_encode(ecc, bhs[i]->b_data,
+ bhs[i]->b_size * 8,
+ bhs[i]->b_size * 8 * i);
+ }
+ fix = ecc ^ check.bc_ecc;
+ for (i = 0; i < nr; i++) {
+ /*
+ * Try the fix against each buffer. It will only affect
+ * one of them.
+ */
+ ocfs2_hamming_fix(bhs[i]->b_data, bhs[i]->b_size * 8,
+ bhs[i]->b_size * 8 * i, fix);
+ }
+
+ /* And check the crc32 again */
+ for (i = 0, crc = ~0; i < nr; i++)
+ crc = crc32_le(crc, bhs[i]->b_data, bhs[i]->b_size);
+ if (crc == check.bc_crc32e)
+ goto out;
+
+ mlog(ML_ERROR, "Fixed CRC32 failed: stored: %u, computed %u\n",
+ (unsigned int)check.bc_crc32e, (unsigned int)crc);
+
+ rc = -EIO;
+
+out:
+ bc->bc_crc32e = cpu_to_le32(check.bc_crc32e);
+ bc->bc_ecc = cpu_to_le16(check.bc_ecc);
+
+ return rc;
+}
+
+/*
+ * These are the main API. They check the superblock flag before
+ * calling the underlying operations.
+ *
+ * They expect the buffer(s) to be in disk format.
+ */
+void ocfs2_compute_meta_ecc(struct super_block *sb, void *data,
+ struct ocfs2_block_check *bc)
+{
+ if (ocfs2_meta_ecc(OCFS2_SB(sb)))
+ ocfs2_block_check_compute(data, sb->s_blocksize, bc);
+}
+
+int ocfs2_validate_meta_ecc(struct super_block *sb, void *data,
+ struct ocfs2_block_check *bc)
+{
+ int rc = 0;
+
+ if (ocfs2_meta_ecc(OCFS2_SB(sb)))
+ rc = ocfs2_block_check_validate(data, sb->s_blocksize, bc);
+
+ return rc;
+}
+
+void ocfs2_compute_meta_ecc_bhs(struct super_block *sb,
+ struct buffer_head **bhs, int nr,
+ struct ocfs2_block_check *bc)
+{
+ if (ocfs2_meta_ecc(OCFS2_SB(sb)))
+ ocfs2_block_check_compute_bhs(bhs, nr, bc);
+}
+
+int ocfs2_validate_meta_ecc_bhs(struct super_block *sb,
+ struct buffer_head **bhs, int nr,
+ struct ocfs2_block_check *bc)
+{
+ int rc = 0;
+
+ if (ocfs2_meta_ecc(OCFS2_SB(sb)))
+ rc = ocfs2_block_check_validate_bhs(bhs, nr, bc);
+
+ return rc;
+}
+
diff --git a/fs/ocfs2/blockcheck.h b/fs/ocfs2/blockcheck.h
new file mode 100644
index 0000000..70ec3fe
--- /dev/null
+++ b/fs/ocfs2/blockcheck.h
@@ -0,0 +1,82 @@
+/* -*- mode: c; c-basic-offset: 8; -*-
+ * vim: noexpandtab sw=8 ts=8 sts=0:
+ *
+ * blockcheck.h
+ *
+ * Checksum and ECC codes for the OCFS2 userspace library.
+ *
+ * Copyright (C) 2004, 2008 Oracle. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
+#ifndef OCFS2_BLOCKCHECK_H
+#define OCFS2_BLOCKCHECK_H
+
+
+/* High level block API */
+void ocfs2_compute_meta_ecc(struct super_block *sb, void *data,
+ struct ocfs2_block_check *bc);
+int ocfs2_validate_meta_ecc(struct super_block *sb, void *data,
+ struct ocfs2_block_check *bc);
+void ocfs2_compute_meta_ecc_bhs(struct super_block *sb,
+ struct buffer_head **bhs, int nr,
+ struct ocfs2_block_check *bc);
+int ocfs2_validate_meta_ecc_bhs(struct super_block *sb,
+ struct buffer_head **bhs, int nr,
+ struct ocfs2_block_check *bc);
+
+/* Lower level API */
+void ocfs2_block_check_compute(void *data, size_t blocksize,
+ struct ocfs2_block_check *bc);
+int ocfs2_block_check_validate(void *data, size_t blocksize,
+ struct ocfs2_block_check *bc);
+void ocfs2_block_check_compute_bhs(struct buffer_head **bhs, int nr,
+ struct ocfs2_block_check *bc);
+int ocfs2_block_check_validate_bhs(struct buffer_head **bhs, int nr,
+ struct ocfs2_block_check *bc);
+
+/*
+ * Hamming code functions
+ */
+
+/*
+ * Encoding hamming code parity bits for a buffer.
+ *
+ * This is the low level encoder function. It can be called across
+ * multiple hunks just like the crc32 code. 'd' is the number of bits
+ * _in_this_hunk_. nr is the bit offset of this hunk. So, if you had
+ * two 512B buffers, you would do it like so:
+ *
+ * parity = ocfs2_hamming_encode(0, buf1, 512 * 8, 0);
+ * parity = ocfs2_hamming_encode(parity, buf2, 512 * 8, 512 * 8);
+ *
+ * If you just have one buffer, use ocfs2_hamming_encode_block().
+ */
+u32 ocfs2_hamming_encode(u32 parity, void *data, unsigned int d,
+ unsigned int nr);
+/*
+ * Fix a buffer with a bit error. The 'fix' is the original parity
+ * xor'd with the parity calculated now.
+ *
+ * Like ocfs2_hamming_encode(), this can handle hunks. nr is the bit
+ * offset of the current hunk. If bit to be fixed is not part of the
+ * current hunk, this does nothing.
+ *
+ * If you only have one buffer, use ocfs2_hamming_fix_block().
+ */
+void ocfs2_hamming_fix(void *data, unsigned int d, unsigned int nr,
+ unsigned int fix);
+
+/* Convenience wrappers for a single buffer of data */
+extern u32 ocfs2_hamming_encode_block(void *data, unsigned int blocksize);
+extern void ocfs2_hamming_fix_block(void *data, unsigned int blocksize,
+ unsigned int fix);
+#endif
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index 5c77798..2bb389f 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -382,6 +382,13 @@ static inline int ocfs2_supports_xattr(struct ocfs2_super *osb)
return 0;
}

+static inline int ocfs2_meta_ecc(struct ocfs2_super *osb)
+{
+ if (osb->s_feature_incompat & OCFS2_FEATURE_INCOMPAT_META_ECC)
+ return 1;
+ return 0;
+}
+
/* set / clear functions because cluster events can make these happen
* in parallel so we want the transitions to be atomic. this also
* means that any future flags osb_flags must be protected by spinlock
@@ -615,5 +622,6 @@ static inline s16 ocfs2_get_inode_steal_slot(struct ocfs2_super *osb)
#define ocfs2_clear_bit ext2_clear_bit
#define ocfs2_test_bit ext2_test_bit
#define ocfs2_find_next_zero_bit ext2_find_next_zero_bit
+#define ocfs2_find_next_bit ext2_find_next_bit
#endif /* OCFS2_H */

--
1.5.6

2008-12-25 18:06:42

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 04/35] ocfs2: Add a validation hook for quota block reads.

From: Joel Becker <[email protected]>

Add a currently-returns-success hook for quota block reads. We'll be
adding checks to this.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/quota_global.c | 14 +++++++++++++-
1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/fs/ocfs2/quota_global.c b/fs/ocfs2/quota_global.c
index 4b38a2e..5b05f94 100644
--- a/fs/ocfs2/quota_global.c
+++ b/fs/ocfs2/quota_global.c
@@ -85,13 +85,25 @@ struct qtree_fmt_operations ocfs2_global_ops = {
.is_id = ocfs2_global_is_id,
};

+static int ocfs2_validate_quota_block(struct super_block *sb,
+ struct buffer_head *bh)
+{
+ struct ocfs2_disk_dqtrailer *dqt = ocfs2_dq_trailer(sb, bh->b_data);
+
+ mlog(0, "Validating quota block %llu\n",
+ (unsigned long long)bh->b_blocknr);
+
+ return 0;
+}
+
int ocfs2_read_quota_block(struct inode *inode, u64 v_block,
struct buffer_head **bh)
{
int rc = 0;
struct buffer_head *tmp = *bh;

- rc = ocfs2_read_virt_blocks(inode, v_block, 1, &tmp, 0, NULL);
+ rc = ocfs2_read_virt_blocks(inode, v_block, 1, &tmp, 0,
+ ocfs2_validate_quota_block);
if (rc)
mlog_errno(rc);

--
1.5.6

2008-12-25 18:06:58

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 05/35] ocfs2: block read meta ecc.

From: Joel Becker <[email protected]>

Add block check calls to the read_block validate functions. This is the
almost all of the read-side checking of metaecc. xattr buckets are not checked
yet. Writes are also unchecked, and so a read-write mount will quickly fail.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/alloc.c | 17 +++++++++++++++++
fs/ocfs2/blockcheck.c | 9 +++++++++
fs/ocfs2/inode.c | 18 +++++++++++++++++-
fs/ocfs2/quota_global.c | 13 +++++++++++--
fs/ocfs2/suballoc.c | 31 ++++++++++++++++++++++++++++++-
fs/ocfs2/xattr.c | 17 +++++++++++++++++
6 files changed, 101 insertions(+), 4 deletions(-)

diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 84a7bd4..6b27f74 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -37,6 +37,7 @@

#include "alloc.h"
#include "aops.h"
+#include "blockcheck.h"
#include "dlmglue.h"
#include "extent_map.h"
#include "inode.h"
@@ -682,12 +683,28 @@ struct ocfs2_merge_ctxt {
static int ocfs2_validate_extent_block(struct super_block *sb,
struct buffer_head *bh)
{
+ int rc;
struct ocfs2_extent_block *eb =
(struct ocfs2_extent_block *)bh->b_data;

mlog(0, "Validating extent block %llu\n",
(unsigned long long)bh->b_blocknr);

+ BUG_ON(!buffer_uptodate(bh));
+
+ /*
+ * If the ecc fails, we return the error but otherwise
+ * leave the filesystem running. We know any error is
+ * local to this block.
+ */
+ rc = ocfs2_validate_meta_ecc(sb, bh->b_data, &eb->h_check);
+ if (rc)
+ return rc;
+
+ /*
+ * Errors after here are fatal.
+ */
+
if (!OCFS2_IS_VALID_EXTENT_BLOCK(eb)) {
ocfs2_error(sb,
"Extent block #%llu has bad signature %.*s",
diff --git a/fs/ocfs2/blockcheck.c b/fs/ocfs2/blockcheck.c
index 2bf3d7f..2ce6ae5 100644
--- a/fs/ocfs2/blockcheck.c
+++ b/fs/ocfs2/blockcheck.c
@@ -24,6 +24,8 @@
#include <linux/bitops.h>
#include <asm/byteorder.h>

+#include <cluster/masklog.h>
+
#include "ocfs2.h"

#include "blockcheck.h"
@@ -292,6 +294,10 @@ int ocfs2_block_check_validate(void *data, size_t blocksize,
if (crc == check.bc_crc32e)
goto out;

+ mlog(ML_ERROR,
+ "CRC32 failed: stored: %u, computed %u. Applying ECC.\n",
+ (unsigned int)check.bc_crc32e, (unsigned int)crc);
+
/* Ok, try ECC fixups */
ecc = ocfs2_hamming_encode_block(data, blocksize);
ocfs2_hamming_fix_block(data, blocksize, ecc ^ check.bc_ecc);
@@ -301,6 +307,9 @@ int ocfs2_block_check_validate(void *data, size_t blocksize,
if (crc == check.bc_crc32e)
goto out;

+ mlog(ML_ERROR, "Fixed CRC32 failed: stored: %u, computed %u\n",
+ (unsigned int)check.bc_crc32e, (unsigned int)crc);
+
rc = -EIO;

out:
diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
index 288512c..9370b65 100644
--- a/fs/ocfs2/inode.c
+++ b/fs/ocfs2/inode.c
@@ -38,6 +38,7 @@
#include "ocfs2.h"

#include "alloc.h"
+#include "blockcheck.h"
#include "dlmglue.h"
#include "extent_map.h"
#include "file.h"
@@ -1262,7 +1263,7 @@ void ocfs2_refresh_inode(struct inode *inode,
int ocfs2_validate_inode_block(struct super_block *sb,
struct buffer_head *bh)
{
- int rc = -EINVAL;
+ int rc;
struct ocfs2_dinode *di = (struct ocfs2_dinode *)bh->b_data;

mlog(0, "Validating dinode %llu\n",
@@ -1270,6 +1271,21 @@ int ocfs2_validate_inode_block(struct super_block *sb,

BUG_ON(!buffer_uptodate(bh));

+ /*
+ * If the ecc fails, we return the error but otherwise
+ * leave the filesystem running. We know any error is
+ * local to this block.
+ */
+ rc = ocfs2_validate_meta_ecc(sb, bh->b_data, &di->i_check);
+ if (rc)
+ goto bail;
+
+ /*
+ * Errors after here are fatal.
+ */
+
+ rc = -EINVAL;
+
if (!OCFS2_IS_VALID_DINODE(di)) {
ocfs2_error(sb, "Invalid dinode #%llu: signature = %.*s\n",
(unsigned long long)bh->b_blocknr, 7,
diff --git a/fs/ocfs2/quota_global.c b/fs/ocfs2/quota_global.c
index 5b05f94..d338438 100644
--- a/fs/ocfs2/quota_global.c
+++ b/fs/ocfs2/quota_global.c
@@ -16,6 +16,7 @@
#include "ocfs2_fs.h"
#include "ocfs2.h"
#include "alloc.h"
+#include "blockcheck.h"
#include "inode.h"
#include "journal.h"
#include "file.h"
@@ -88,12 +89,20 @@ struct qtree_fmt_operations ocfs2_global_ops = {
static int ocfs2_validate_quota_block(struct super_block *sb,
struct buffer_head *bh)
{
- struct ocfs2_disk_dqtrailer *dqt = ocfs2_dq_trailer(sb, bh->b_data);
+ struct ocfs2_disk_dqtrailer *dqt =
+ ocfs2_block_dqtrailer(sb->s_blocksize, bh->b_data);

mlog(0, "Validating quota block %llu\n",
(unsigned long long)bh->b_blocknr);

- return 0;
+ BUG_ON(!buffer_uptodate(bh));
+
+ /*
+ * If the ecc fails, we return the error but otherwise
+ * leave the filesystem running. We know any error is
+ * local to this block.
+ */
+ return ocfs2_validate_meta_ecc(sb, bh->b_data, &dqt->dq_check);
}

int ocfs2_read_quota_block(struct inode *inode, u64 v_block,
diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index 226fe21..7875576 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -35,6 +35,7 @@
#include "ocfs2.h"

#include "alloc.h"
+#include "blockcheck.h"
#include "dlmglue.h"
#include "inode.h"
#include "journal.h"
@@ -250,8 +251,18 @@ int ocfs2_check_group_descriptor(struct super_block *sb,
struct buffer_head *bh)
{
int rc;
+ struct ocfs2_group_desc *gd = (struct ocfs2_group_desc *)bh->b_data;
+
+ BUG_ON(!buffer_uptodate(bh));

- rc = ocfs2_validate_gd_self(sb, bh, 1);
+ /*
+ * If the ecc fails, we return the error but otherwise
+ * leave the filesystem running. We know any error is
+ * local to this block.
+ */
+ rc = ocfs2_validate_meta_ecc(sb, bh->b_data, &gd->bg_check);
+ if (!rc)
+ rc = ocfs2_validate_gd_self(sb, bh, 1);
if (!rc)
rc = ocfs2_validate_gd_parent(sb, di, bh, 1);

@@ -261,9 +272,27 @@ int ocfs2_check_group_descriptor(struct super_block *sb,
static int ocfs2_validate_group_descriptor(struct super_block *sb,
struct buffer_head *bh)
{
+ int rc;
+ struct ocfs2_group_desc *gd = (struct ocfs2_group_desc *)bh->b_data;
+
mlog(0, "Validating group descriptor %llu\n",
(unsigned long long)bh->b_blocknr);

+ BUG_ON(!buffer_uptodate(bh));
+
+ /*
+ * If the ecc fails, we return the error but otherwise
+ * leave the filesystem running. We know any error is
+ * local to this block.
+ */
+ rc = ocfs2_validate_meta_ecc(sb, bh->b_data, &gd->bg_check);
+ if (rc)
+ return rc;
+
+ /*
+ * Errors after here are fatal.
+ */
+
return ocfs2_validate_gd_self(sb, bh, 0);
}

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index dfc51c3..bc822d6 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -42,6 +42,7 @@

#include "ocfs2.h"
#include "alloc.h"
+#include "blockcheck.h"
#include "dlmglue.h"
#include "file.h"
#include "symlink.h"
@@ -322,12 +323,28 @@ static void ocfs2_xattr_bucket_copy_data(struct ocfs2_xattr_bucket *dest,
static int ocfs2_validate_xattr_block(struct super_block *sb,
struct buffer_head *bh)
{
+ int rc;
struct ocfs2_xattr_block *xb =
(struct ocfs2_xattr_block *)bh->b_data;

mlog(0, "Validating xattr block %llu\n",
(unsigned long long)bh->b_blocknr);

+ BUG_ON(!buffer_uptodate(bh));
+
+ /*
+ * If the ecc fails, we return the error but otherwise
+ * leave the filesystem running. We know any error is
+ * local to this block.
+ */
+ rc = ocfs2_validate_meta_ecc(sb, bh->b_data, &xb->xb_check);
+ if (rc)
+ return rc;
+
+ /*
+ * Errors after here are fatal
+ */
+
if (!OCFS2_IS_VALID_XATTR_BLOCK(xb)) {
ocfs2_error(sb,
"Extended attribute block #%llu has bad "
--
1.5.6

2008-12-25 18:07:25

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 06/35] ocfs2: Add journal_access functions with jbd2 triggers.

From: Joel Becker <[email protected]>

We create wrappers for ocfs2_journal_access() that are specific to the
type of metadata block. This allows us to associate jbd2 commit
triggers with the block. The triggers will compute metadata ecc in a
future commit.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/journal.c | 159 ++++++++++++++++++++++++++++++++++++++++++++++++++-
fs/ocfs2/journal.h | 31 +++++++++--
2 files changed, 181 insertions(+), 9 deletions(-)

diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index 302f114..2daa584 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -35,6 +35,7 @@
#include "ocfs2.h"

#include "alloc.h"
+#include "blockcheck.h"
#include "dir.h"
#include "dlmglue.h"
#include "extent_map.h"
@@ -369,10 +370,110 @@ bail:
return status;
}

-int ocfs2_journal_access(handle_t *handle,
- struct inode *inode,
- struct buffer_head *bh,
- int type)
+struct ocfs2_triggers {
+ struct jbd2_buffer_trigger_type ot_triggers;
+ int ot_offset;
+};
+
+static inline struct ocfs2_triggers *to_ocfs2_trigger(struct jbd2_buffer_trigger_type *triggers)
+{
+ return container_of(triggers, struct ocfs2_triggers, ot_triggers);
+}
+
+static void ocfs2_commit_trigger(struct jbd2_buffer_trigger_type *triggers,
+ struct buffer_head *bh,
+ void *data, size_t size)
+{
+ struct ocfs2_triggers *ot = to_ocfs2_trigger(triggers);
+
+ /*
+ * We aren't guaranteed to have the superblock here, so we
+ * must unconditionally compute the ecc data.
+ * __ocfs2_journal_access() will only set the triggers if
+ * metaecc is enabled.
+ */
+ ocfs2_block_check_compute(data, size, data + ot->ot_offset);
+}
+
+/*
+ * Quota blocks have their own trigger because the struct ocfs2_block_check
+ * offset depends on the blocksize.
+ */
+static void ocfs2_dq_commit_trigger(struct jbd2_buffer_trigger_type *triggers,
+ struct buffer_head *bh,
+ void *data, size_t size)
+{
+ struct ocfs2_disk_dqtrailer *dqt =
+ ocfs2_block_dqtrailer(size, data);
+
+ /*
+ * We aren't guaranteed to have the superblock here, so we
+ * must unconditionally compute the ecc data.
+ * __ocfs2_journal_access() will only set the triggers if
+ * metaecc is enabled.
+ */
+ ocfs2_block_check_compute(data, size, &dqt->dq_check);
+}
+
+static void ocfs2_abort_trigger(struct jbd2_buffer_trigger_type *triggers,
+ struct buffer_head *bh)
+{
+ mlog(ML_ERROR,
+ "ocfs2_abort_trigger called by JBD2. bh = 0x%lx, "
+ "bh->b_blocknr = %llu\n",
+ (unsigned long)bh,
+ (unsigned long long)bh->b_blocknr);
+
+ /* We aren't guaranteed to have the superblock here - but if we
+ * don't, it'll just crash. */
+ ocfs2_error(bh->b_assoc_map->host->i_sb,
+ "JBD2 has aborted our journal, ocfs2 cannot continue\n");
+}
+
+static struct ocfs2_triggers di_triggers = {
+ .ot_triggers = {
+ .t_commit = ocfs2_commit_trigger,
+ .t_abort = ocfs2_abort_trigger,
+ },
+ .ot_offset = offsetof(struct ocfs2_dinode, i_check),
+};
+
+static struct ocfs2_triggers eb_triggers = {
+ .ot_triggers = {
+ .t_commit = ocfs2_commit_trigger,
+ .t_abort = ocfs2_abort_trigger,
+ },
+ .ot_offset = offsetof(struct ocfs2_extent_block, h_check),
+};
+
+static struct ocfs2_triggers gd_triggers = {
+ .ot_triggers = {
+ .t_commit = ocfs2_commit_trigger,
+ .t_abort = ocfs2_abort_trigger,
+ },
+ .ot_offset = offsetof(struct ocfs2_group_desc, bg_check),
+};
+
+static struct ocfs2_triggers xb_triggers = {
+ .ot_triggers = {
+ .t_commit = ocfs2_commit_trigger,
+ .t_abort = ocfs2_abort_trigger,
+ },
+ .ot_offset = offsetof(struct ocfs2_xattr_block, xb_check),
+};
+
+static struct ocfs2_triggers dq_triggers = {
+ .ot_triggers = {
+ .t_commit = ocfs2_dq_commit_trigger,
+ .t_abort = ocfs2_abort_trigger,
+ },
+};
+
+static int __ocfs2_journal_access(handle_t *handle,
+ struct inode *inode,
+ struct buffer_head *bh,
+ struct ocfs2_triggers *triggers,
+ int type)
{
int status;

@@ -418,6 +519,8 @@ int ocfs2_journal_access(handle_t *handle,
status = -EINVAL;
mlog(ML_ERROR, "Uknown access type!\n");
}
+ if (!status && ocfs2_meta_ecc(OCFS2_SB(inode->i_sb)) && triggers)
+ jbd2_journal_set_triggers(bh, &triggers->ot_triggers);
mutex_unlock(&OCFS2_I(inode)->ip_io_mutex);

if (status < 0)
@@ -428,6 +531,54 @@ int ocfs2_journal_access(handle_t *handle,
return status;
}

+int ocfs2_journal_access_di(handle_t *handle, struct inode *inode,
+ struct buffer_head *bh, int type)
+{
+ return __ocfs2_journal_access(handle, inode, bh, &di_triggers,
+ type);
+}
+
+int ocfs2_journal_access_eb(handle_t *handle, struct inode *inode,
+ struct buffer_head *bh, int type)
+{
+ return __ocfs2_journal_access(handle, inode, bh, &eb_triggers,
+ type);
+}
+
+int ocfs2_journal_access_gd(handle_t *handle, struct inode *inode,
+ struct buffer_head *bh, int type)
+{
+ return __ocfs2_journal_access(handle, inode, bh, &gd_triggers,
+ type);
+}
+
+int ocfs2_journal_access_db(handle_t *handle, struct inode *inode,
+ struct buffer_head *bh, int type)
+{
+ /* Right now, nothing for dirblocks */
+ return __ocfs2_journal_access(handle, inode, bh, NULL, type);
+}
+
+int ocfs2_journal_access_xb(handle_t *handle, struct inode *inode,
+ struct buffer_head *bh, int type)
+{
+ return __ocfs2_journal_access(handle, inode, bh, &xb_triggers,
+ type);
+}
+
+int ocfs2_journal_access_dq(handle_t *handle, struct inode *inode,
+ struct buffer_head *bh, int type)
+{
+ return __ocfs2_journal_access(handle, inode, bh, &dq_triggers,
+ type);
+}
+
+int ocfs2_journal_access(handle_t *handle, struct inode *inode,
+ struct buffer_head *bh, int type)
+{
+ return __ocfs2_journal_access(handle, inode, bh, NULL, type);
+}
+
int ocfs2_journal_dirty(handle_t *handle,
struct buffer_head *bh)
{
diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
index 37013bf..bca370d 100644
--- a/fs/ocfs2/journal.h
+++ b/fs/ocfs2/journal.h
@@ -212,9 +212,12 @@ static inline void ocfs2_checkpoint_inode(struct inode *inode)
* ocfs2_extend_trans - Extend a handle by nblocks credits. This may
* commit the handle to disk in the process, but will
* not release any locks taken during the transaction.
- * ocfs2_journal_access - Notify the handle that we want to journal this
+ * ocfs2_journal_access* - Notify the handle that we want to journal this
* buffer. Will have to call ocfs2_journal_dirty once
* we've actually dirtied it. Type is one of . or .
+ * Always call the specific flavor of
+ * ocfs2_journal_access_*() unless you intend to
+ * manage the checksum by hand.
* ocfs2_journal_dirty - Mark a journalled buffer as having dirty data.
* ocfs2_jbd2_file_inode - Mark an inode so that its data goes out before
* the current handle commits.
@@ -244,10 +247,28 @@ int ocfs2_extend_trans(handle_t *handle, int nblocks);
#define OCFS2_JOURNAL_ACCESS_WRITE 1
#define OCFS2_JOURNAL_ACCESS_UNDO 2

-int ocfs2_journal_access(handle_t *handle,
- struct inode *inode,
- struct buffer_head *bh,
- int type);
+/* ocfs2_inode */
+int ocfs2_journal_access_di(handle_t *handle, struct inode *inode,
+ struct buffer_head *bh, int type);
+/* ocfs2_extent_block */
+int ocfs2_journal_access_eb(handle_t *handle, struct inode *inode,
+ struct buffer_head *bh, int type);
+/* ocfs2_group_desc */
+int ocfs2_journal_access_gd(handle_t *handle, struct inode *inode,
+ struct buffer_head *bh, int type);
+/* ocfs2_xattr_block */
+int ocfs2_journal_access_xb(handle_t *handle, struct inode *inode,
+ struct buffer_head *bh, int type);
+/* quota blocks */
+int ocfs2_journal_access_dq(handle_t *handle, struct inode *inode,
+ struct buffer_head *bh, int type);
+/* dirblock */
+int ocfs2_journal_access_db(handle_t *handle, struct inode *inode,
+ struct buffer_head *bh, int type);
+/* Anything that has no ecc */
+int ocfs2_journal_access(handle_t *handle, struct inode *inode,
+ struct buffer_head *bh, int type);
+
/*
* A word about the journal_access/journal_dirty "dance". It is
* entirely legal to journal_access a buffer more than once (as long
--
1.5.6

2008-12-25 18:07:47

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 07/35] ocfs2: Wrap up the common use cases of ocfs2_new_path().

From: Joel Becker <[email protected]>

The majority of ocfs2_new_path() calls are:

ocfs2_new_path(path_root_bh(otherpath),
path_root_el(otherpath));

Let's call that ocfs2_new_path_from_path(). The rest do similar things
from struct ocfs2_extent_tree. Let's call those
ocfs2_new_path_from_et(). This will make the next change easier.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/alloc.c | 48 ++++++++++++++++++++++++------------------------
1 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 6b27f74..c22ff49 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -532,6 +532,16 @@ static struct ocfs2_path *ocfs2_new_path(struct buffer_head *root_bh,
return path;
}

+static struct ocfs2_path *ocfs2_new_path_from_path(struct ocfs2_path *path)
+{
+ return ocfs2_new_path(path_root_bh(path), path_root_el(path));
+}
+
+static struct ocfs2_path *ocfs2_new_path_from_et(struct ocfs2_extent_tree *et)
+{
+ return ocfs2_new_path(et->et_root_bh, et->et_root_el);
+}
+
/*
* Convenience function to journal all components in a path.
*/
@@ -2150,8 +2160,7 @@ static int ocfs2_rotate_tree_right(struct inode *inode,

*ret_left_path = NULL;

- left_path = ocfs2_new_path(path_root_bh(right_path),
- path_root_el(right_path));
+ left_path = ocfs2_new_path_from_path(right_path);
if (!left_path) {
ret = -ENOMEM;
mlog_errno(ret);
@@ -2692,8 +2701,7 @@ static int __ocfs2_rotate_tree_left(struct inode *inode,
goto out;
}

- left_path = ocfs2_new_path(path_root_bh(path),
- path_root_el(path));
+ left_path = ocfs2_new_path_from_path(path);
if (!left_path) {
ret = -ENOMEM;
mlog_errno(ret);
@@ -2702,8 +2710,7 @@ static int __ocfs2_rotate_tree_left(struct inode *inode,

ocfs2_cp_path(left_path, path);

- right_path = ocfs2_new_path(path_root_bh(path),
- path_root_el(path));
+ right_path = ocfs2_new_path_from_path(path);
if (!right_path) {
ret = -ENOMEM;
mlog_errno(ret);
@@ -2833,8 +2840,7 @@ static int ocfs2_remove_rightmost_path(struct inode *inode, handle_t *handle,
* We have a path to the left of this one - it needs
* an update too.
*/
- left_path = ocfs2_new_path(path_root_bh(path),
- path_root_el(path));
+ left_path = ocfs2_new_path_from_path(path);
if (!left_path) {
ret = -ENOMEM;
mlog_errno(ret);
@@ -3075,8 +3081,7 @@ static int ocfs2_get_right_path(struct inode *inode,
/* This function shouldn't be called for the rightmost leaf. */
BUG_ON(right_cpos == 0);

- right_path = ocfs2_new_path(path_root_bh(left_path),
- path_root_el(left_path));
+ right_path = ocfs2_new_path_from_path(left_path);
if (!right_path) {
ret = -ENOMEM;
mlog_errno(ret);
@@ -3247,8 +3252,7 @@ static int ocfs2_get_left_path(struct inode *inode,
/* This function shouldn't be called for the leftmost leaf. */
BUG_ON(left_cpos == 0);

- left_path = ocfs2_new_path(path_root_bh(right_path),
- path_root_el(right_path));
+ left_path = ocfs2_new_path_from_path(right_path);
if (!left_path) {
ret = -ENOMEM;
mlog_errno(ret);
@@ -3780,8 +3784,7 @@ static int ocfs2_append_rec_to_path(struct inode *inode, handle_t *handle,
* leftmost leaf.
*/
if (left_cpos) {
- left_path = ocfs2_new_path(path_root_bh(right_path),
- path_root_el(right_path));
+ left_path = ocfs2_new_path_from_path(right_path);
if (!left_path) {
ret = -ENOMEM;
mlog_errno(ret);
@@ -4018,7 +4021,7 @@ static int ocfs2_do_insert_extent(struct inode *inode,
goto out_update_clusters;
}

- right_path = ocfs2_new_path(et->et_root_bh, et->et_root_el);
+ right_path = ocfs2_new_path_from_et(et);
if (!right_path) {
ret = -ENOMEM;
mlog_errno(ret);
@@ -4130,8 +4133,7 @@ ocfs2_figure_merge_contig_type(struct inode *inode, struct ocfs2_path *path,
goto out;

if (left_cpos != 0) {
- left_path = ocfs2_new_path(path_root_bh(path),
- path_root_el(path));
+ left_path = ocfs2_new_path_from_path(path);
if (!left_path)
goto out;

@@ -4187,8 +4189,7 @@ ocfs2_figure_merge_contig_type(struct inode *inode, struct ocfs2_path *path,
if (right_cpos == 0)
goto out;

- right_path = ocfs2_new_path(path_root_bh(path),
- path_root_el(path));
+ right_path = ocfs2_new_path_from_path(path);
if (!right_path)
goto out;

@@ -4381,7 +4382,7 @@ static int ocfs2_figure_insert_type(struct inode *inode,
return 0;
}

- path = ocfs2_new_path(et->et_root_bh, et->et_root_el);
+ path = ocfs2_new_path_from_et(et);
if (!path) {
ret = -ENOMEM;
mlog_errno(ret);
@@ -4910,7 +4911,7 @@ int ocfs2_mark_extent_written(struct inode *inode,
if (et->et_ops == &ocfs2_dinode_et_ops)
ocfs2_extent_map_trunc(inode, 0);

- left_path = ocfs2_new_path(et->et_root_bh, et->et_root_el);
+ left_path = ocfs2_new_path_from_et(et);
if (!left_path) {
ret = -ENOMEM;
mlog_errno(ret);
@@ -5082,8 +5083,7 @@ static int ocfs2_truncate_rec(struct inode *inode, handle_t *handle,
}

if (left_cpos && le16_to_cpu(el->l_next_free_rec) > 1) {
- left_path = ocfs2_new_path(path_root_bh(path),
- path_root_el(path));
+ left_path = ocfs2_new_path_from_path(path);
if (!left_path) {
ret = -ENOMEM;
mlog_errno(ret);
@@ -5192,7 +5192,7 @@ int ocfs2_remove_extent(struct inode *inode,

ocfs2_extent_map_trunc(inode, 0);

- path = ocfs2_new_path(et->et_root_bh, et->et_root_el);
+ path = ocfs2_new_path_from_et(et);
if (!path) {
ret = -ENOMEM;
mlog_errno(ret);
--
1.5.6

2008-12-25 18:08:09

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 08/35] ocfs2: Use metadata-specific ocfs2_journal_access_*() functions.

From: Joel Becker <[email protected]>

The per-metadata-type ocfs2_journal_access_*() functions hook up jbd2
commit triggers and allow us to compute metadata ecc right before the
buffers are written out. This commit provides ecc for inodes, extent
blocks, group descriptors, and quota blocks. It is not safe to use
extened attributes and metaecc at the same time yet.

The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide
the type of block at their root. Before, it didn't matter, but now the
root block must use the appropriate ocfs2_journal_access_*() function.
To keep this abstract, the structures now have a pointer to the matching
journal_access function and a wrapper call to call it.

A few places use naked ocfs2_write_block() calls instead of adding the
blocks to the journal. We make sure to calculate their checksum and ecc
before the write.

Since we pass around the journal_access functions. Let's typedef them
in ocfs2.h.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/alloc.c | 233 +++++++++++++++++++++++++++-------------------
fs/ocfs2/alloc.h | 5 +-
fs/ocfs2/aops.c | 8 +-
fs/ocfs2/dir.c | 48 ++++++----
fs/ocfs2/file.c | 16 ++--
fs/ocfs2/inode.c | 17 ++--
fs/ocfs2/journal.c | 2 +
fs/ocfs2/journal.h | 3 +-
fs/ocfs2/localalloc.c | 18 ++--
fs/ocfs2/namei.c | 38 ++++----
fs/ocfs2/ocfs2.h | 4 +
fs/ocfs2/quota_global.c | 2 +-
fs/ocfs2/quota_local.c | 18 ++--
fs/ocfs2/resize.c | 16 ++--
fs/ocfs2/suballoc.c | 58 ++++++------
15 files changed, 280 insertions(+), 206 deletions(-)

diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index c22ff49..6e58fd5 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -298,11 +298,13 @@ static struct ocfs2_extent_tree_operations ocfs2_xattr_tree_et_ops = {
static void __ocfs2_init_extent_tree(struct ocfs2_extent_tree *et,
struct inode *inode,
struct buffer_head *bh,
+ ocfs2_journal_access_func access,
void *obj,
struct ocfs2_extent_tree_operations *ops)
{
et->et_ops = ops;
et->et_root_bh = bh;
+ et->et_root_journal_access = access;
if (!obj)
obj = (void *)bh->b_data;
et->et_object = obj;
@@ -318,15 +320,16 @@ void ocfs2_init_dinode_extent_tree(struct ocfs2_extent_tree *et,
struct inode *inode,
struct buffer_head *bh)
{
- __ocfs2_init_extent_tree(et, inode, bh, NULL, &ocfs2_dinode_et_ops);
+ __ocfs2_init_extent_tree(et, inode, bh, ocfs2_journal_access_di,
+ NULL, &ocfs2_dinode_et_ops);
}

void ocfs2_init_xattr_tree_extent_tree(struct ocfs2_extent_tree *et,
struct inode *inode,
struct buffer_head *bh)
{
- __ocfs2_init_extent_tree(et, inode, bh, NULL,
- &ocfs2_xattr_tree_et_ops);
+ __ocfs2_init_extent_tree(et, inode, bh, ocfs2_journal_access_xb,
+ NULL, &ocfs2_xattr_tree_et_ops);
}

void ocfs2_init_xattr_value_extent_tree(struct ocfs2_extent_tree *et,
@@ -334,7 +337,7 @@ void ocfs2_init_xattr_value_extent_tree(struct ocfs2_extent_tree *et,
struct buffer_head *bh,
struct ocfs2_xattr_value_root *xv)
{
- __ocfs2_init_extent_tree(et, inode, bh, xv,
+ __ocfs2_init_extent_tree(et, inode, bh, ocfs2_journal_access, xv,
&ocfs2_xattr_value_et_ops);
}

@@ -356,6 +359,15 @@ static inline void ocfs2_et_update_clusters(struct inode *inode,
et->et_ops->eo_update_clusters(inode, et, clusters);
}

+static inline int ocfs2_et_root_journal_access(handle_t *handle,
+ struct inode *inode,
+ struct ocfs2_extent_tree *et,
+ int type)
+{
+ return et->et_root_journal_access(handle, inode, et->et_root_bh,
+ type);
+}
+
static inline int ocfs2_et_insert_check(struct inode *inode,
struct ocfs2_extent_tree *et,
struct ocfs2_extent_rec *rec)
@@ -396,12 +408,14 @@ struct ocfs2_path_item {
#define OCFS2_MAX_PATH_DEPTH 5

struct ocfs2_path {
- int p_tree_depth;
- struct ocfs2_path_item p_node[OCFS2_MAX_PATH_DEPTH];
+ int p_tree_depth;
+ ocfs2_journal_access_func p_root_access;
+ struct ocfs2_path_item p_node[OCFS2_MAX_PATH_DEPTH];
};

#define path_root_bh(_path) ((_path)->p_node[0].bh)
#define path_root_el(_path) ((_path)->p_node[0].el)
+#define path_root_access(_path)((_path)->p_root_access)
#define path_leaf_bh(_path) ((_path)->p_node[(_path)->p_tree_depth].bh)
#define path_leaf_el(_path) ((_path)->p_node[(_path)->p_tree_depth].el)
#define path_num_items(_path) ((_path)->p_tree_depth + 1)
@@ -434,6 +448,8 @@ static void ocfs2_reinit_path(struct ocfs2_path *path, int keep_root)
*/
if (keep_root)
depth = le16_to_cpu(path_root_el(path)->l_tree_depth);
+ else
+ path_root_access(path) = NULL;

path->p_tree_depth = depth;
}
@@ -459,6 +475,7 @@ static void ocfs2_cp_path(struct ocfs2_path *dest, struct ocfs2_path *src)

BUG_ON(path_root_bh(dest) != path_root_bh(src));
BUG_ON(path_root_el(dest) != path_root_el(src));
+ BUG_ON(path_root_access(dest) != path_root_access(src));

ocfs2_reinit_path(dest, 1);

@@ -480,6 +497,7 @@ static void ocfs2_mv_path(struct ocfs2_path *dest, struct ocfs2_path *src)
int i;

BUG_ON(path_root_bh(dest) != path_root_bh(src));
+ BUG_ON(path_root_access(dest) != path_root_access(src));

for(i = 1; i < OCFS2_MAX_PATH_DEPTH; i++) {
brelse(dest->p_node[i].bh);
@@ -515,7 +533,8 @@ static inline void ocfs2_path_insert_eb(struct ocfs2_path *path, int index,
}

static struct ocfs2_path *ocfs2_new_path(struct buffer_head *root_bh,
- struct ocfs2_extent_list *root_el)
+ struct ocfs2_extent_list *root_el,
+ ocfs2_journal_access_func access)
{
struct ocfs2_path *path;

@@ -527,6 +546,7 @@ static struct ocfs2_path *ocfs2_new_path(struct buffer_head *root_bh,
get_bh(root_bh);
path_root_bh(path) = root_bh;
path_root_el(path) = root_el;
+ path_root_access(path) = access;
}

return path;
@@ -534,12 +554,38 @@ static struct ocfs2_path *ocfs2_new_path(struct buffer_head *root_bh,

static struct ocfs2_path *ocfs2_new_path_from_path(struct ocfs2_path *path)
{
- return ocfs2_new_path(path_root_bh(path), path_root_el(path));
+ return ocfs2_new_path(path_root_bh(path), path_root_el(path),
+ path_root_access(path));
}

static struct ocfs2_path *ocfs2_new_path_from_et(struct ocfs2_extent_tree *et)
{
- return ocfs2_new_path(et->et_root_bh, et->et_root_el);
+ return ocfs2_new_path(et->et_root_bh, et->et_root_el,
+ et->et_root_journal_access);
+}
+
+/*
+ * Journal the buffer at depth idx. All idx>0 are extent_blocks,
+ * otherwise it's the root_access function.
+ *
+ * I don't like the way this function's name looks next to
+ * ocfs2_journal_access_path(), but I don't have a better one.
+ */
+static int ocfs2_path_bh_journal_access(handle_t *handle,
+ struct inode *inode,
+ struct ocfs2_path *path,
+ int idx)
+{
+ ocfs2_journal_access_func access = path_root_access(path);
+
+ if (!access)
+ access = ocfs2_journal_access;
+
+ if (idx)
+ access = ocfs2_journal_access_eb;
+
+ return access(handle, inode, path->p_node[idx].bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
}

/*
@@ -554,8 +600,7 @@ static int ocfs2_journal_access_path(struct inode *inode, handle_t *handle,
goto out;

for(i = 0; i < path_num_items(path); i++) {
- ret = ocfs2_journal_access(handle, inode, path->p_node[i].bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_path_bh_journal_access(handle, inode, path, i);
if (ret < 0) {
mlog_errno(ret);
goto out;
@@ -708,8 +753,11 @@ static int ocfs2_validate_extent_block(struct super_block *sb,
* local to this block.
*/
rc = ocfs2_validate_meta_ecc(sb, bh->b_data, &eb->h_check);
- if (rc)
+ if (rc) {
+ mlog(ML_ERROR, "Checksum failed for extent block %llu\n",
+ (unsigned long long)bh->b_blocknr);
return rc;
+ }

/*
* Errors after here are fatal.
@@ -842,8 +890,8 @@ static int ocfs2_create_new_meta_bhs(struct ocfs2_super *osb,
}
ocfs2_set_new_buffer_uptodate(inode, bhs[i]);

- status = ocfs2_journal_access(handle, inode, bhs[i],
- OCFS2_JOURNAL_ACCESS_CREATE);
+ status = ocfs2_journal_access_eb(handle, inode, bhs[i],
+ OCFS2_JOURNAL_ACCESS_CREATE);
if (status < 0) {
mlog_errno(status);
goto bail;
@@ -986,8 +1034,8 @@ static int ocfs2_add_branch(struct ocfs2_super *osb,
BUG_ON(!OCFS2_IS_VALID_EXTENT_BLOCK(eb));
eb_el = &eb->h_list;

- status = ocfs2_journal_access(handle, inode, bh,
- OCFS2_JOURNAL_ACCESS_CREATE);
+ status = ocfs2_journal_access_eb(handle, inode, bh,
+ OCFS2_JOURNAL_ACCESS_CREATE);
if (status < 0) {
mlog_errno(status);
goto bail;
@@ -1026,21 +1074,21 @@ static int ocfs2_add_branch(struct ocfs2_super *osb,
* journal_dirty erroring as it won't unless we've aborted the
* handle (in which case we would never be here) so reserving
* the write with journal_access is all we need to do. */
- status = ocfs2_journal_access(handle, inode, *last_eb_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_eb(handle, inode, *last_eb_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto bail;
}
- status = ocfs2_journal_access(handle, inode, et->et_root_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_et_root_journal_access(handle, inode, et,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto bail;
}
if (eb_bh) {
- status = ocfs2_journal_access(handle, inode, eb_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_eb(handle, inode, eb_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto bail;
@@ -1129,8 +1177,8 @@ static int ocfs2_shift_tree_depth(struct ocfs2_super *osb,
eb_el = &eb->h_list;
root_el = et->et_root_el;

- status = ocfs2_journal_access(handle, inode, new_eb_bh,
- OCFS2_JOURNAL_ACCESS_CREATE);
+ status = ocfs2_journal_access_eb(handle, inode, new_eb_bh,
+ OCFS2_JOURNAL_ACCESS_CREATE);
if (status < 0) {
mlog_errno(status);
goto bail;
@@ -1148,8 +1196,8 @@ static int ocfs2_shift_tree_depth(struct ocfs2_super *osb,
goto bail;
}

- status = ocfs2_journal_access(handle, inode, et->et_root_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_et_root_journal_access(handle, inode, et,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto bail;
@@ -1918,25 +1966,23 @@ static int ocfs2_rotate_subtree_right(struct inode *inode,
root_bh = left_path->p_node[subtree_index].bh;
BUG_ON(root_bh != right_path->p_node[subtree_index].bh);

- ret = ocfs2_journal_access(handle, inode, root_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_path_bh_journal_access(handle, inode, right_path,
+ subtree_index);
if (ret) {
mlog_errno(ret);
goto out;
}

for(i = subtree_index + 1; i < path_num_items(right_path); i++) {
- ret = ocfs2_journal_access(handle, inode,
- right_path->p_node[i].bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_path_bh_journal_access(handle, inode,
+ right_path, i);
if (ret) {
mlog_errno(ret);
goto out;
}

- ret = ocfs2_journal_access(handle, inode,
- left_path->p_node[i].bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_path_bh_journal_access(handle, inode,
+ left_path, i);
if (ret) {
mlog_errno(ret);
goto out;
@@ -2455,9 +2501,9 @@ static int ocfs2_rotate_subtree_left(struct inode *inode, handle_t *handle,
return -EAGAIN;

if (le16_to_cpu(right_leaf_el->l_next_free_rec) > 1) {
- ret = ocfs2_journal_access(handle, inode,
- path_leaf_bh(right_path),
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_journal_access_eb(handle, inode,
+ path_leaf_bh(right_path),
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
goto out;
@@ -2474,8 +2520,8 @@ static int ocfs2_rotate_subtree_left(struct inode *inode, handle_t *handle,
* We have to update i_last_eb_blk during the meta
* data delete.
*/
- ret = ocfs2_journal_access(handle, inode, et_root_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_et_root_journal_access(handle, inode, et,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
goto out;
@@ -2490,25 +2536,23 @@ static int ocfs2_rotate_subtree_left(struct inode *inode, handle_t *handle,
*/
BUG_ON(right_has_empty && !del_right_subtree);

- ret = ocfs2_journal_access(handle, inode, root_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_path_bh_journal_access(handle, inode, right_path,
+ subtree_index);
if (ret) {
mlog_errno(ret);
goto out;
}

for(i = subtree_index + 1; i < path_num_items(right_path); i++) {
- ret = ocfs2_journal_access(handle, inode,
- right_path->p_node[i].bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_path_bh_journal_access(handle, inode,
+ right_path, i);
if (ret) {
mlog_errno(ret);
goto out;
}

- ret = ocfs2_journal_access(handle, inode,
- left_path->p_node[i].bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_path_bh_journal_access(handle, inode,
+ left_path, i);
if (ret) {
mlog_errno(ret);
goto out;
@@ -2653,16 +2697,17 @@ out:

static int ocfs2_rotate_rightmost_leaf_left(struct inode *inode,
handle_t *handle,
- struct buffer_head *bh,
- struct ocfs2_extent_list *el)
+ struct ocfs2_path *path)
{
int ret;
+ struct buffer_head *bh = path_leaf_bh(path);
+ struct ocfs2_extent_list *el = path_leaf_el(path);

if (!ocfs2_is_empty_extent(&el->l_recs[0]))
return 0;

- ret = ocfs2_journal_access(handle, inode, bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_path_bh_journal_access(handle, inode, path,
+ path_num_items(path) - 1);
if (ret) {
mlog_errno(ret);
goto out;
@@ -2744,9 +2789,8 @@ static int __ocfs2_rotate_tree_left(struct inode *inode,
* Caller might still want to make changes to the
* tree root, so re-add it to the journal here.
*/
- ret = ocfs2_journal_access(handle, inode,
- path_root_bh(left_path),
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_path_bh_journal_access(handle, inode,
+ left_path, 0);
if (ret) {
mlog_errno(ret);
goto out;
@@ -2929,8 +2973,7 @@ rightmost_no_delete:
* it up front.
*/
ret = ocfs2_rotate_rightmost_leaf_left(inode, handle,
- path_leaf_bh(path),
- path_leaf_el(path));
+ path);
if (ret)
mlog_errno(ret);
goto out;
@@ -3164,8 +3207,8 @@ static int ocfs2_merge_rec_right(struct inode *inode,
root_bh = left_path->p_node[subtree_index].bh;
BUG_ON(root_bh != right_path->p_node[subtree_index].bh);

- ret = ocfs2_journal_access(handle, inode, root_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_path_bh_journal_access(handle, inode, right_path,
+ subtree_index);
if (ret) {
mlog_errno(ret);
goto out;
@@ -3173,17 +3216,15 @@ static int ocfs2_merge_rec_right(struct inode *inode,

for (i = subtree_index + 1;
i < path_num_items(right_path); i++) {
- ret = ocfs2_journal_access(handle, inode,
- right_path->p_node[i].bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_path_bh_journal_access(handle, inode,
+ right_path, i);
if (ret) {
mlog_errno(ret);
goto out;
}

- ret = ocfs2_journal_access(handle, inode,
- left_path->p_node[i].bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_path_bh_journal_access(handle, inode,
+ left_path, i);
if (ret) {
mlog_errno(ret);
goto out;
@@ -3195,8 +3236,8 @@ static int ocfs2_merge_rec_right(struct inode *inode,
right_rec = &el->l_recs[index + 1];
}

- ret = ocfs2_journal_access(handle, inode, bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_path_bh_journal_access(handle, inode, left_path,
+ path_num_items(left_path) - 1);
if (ret) {
mlog_errno(ret);
goto out;
@@ -3335,8 +3376,8 @@ static int ocfs2_merge_rec_left(struct inode *inode,
root_bh = left_path->p_node[subtree_index].bh;
BUG_ON(root_bh != right_path->p_node[subtree_index].bh);

- ret = ocfs2_journal_access(handle, inode, root_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_path_bh_journal_access(handle, inode, right_path,
+ subtree_index);
if (ret) {
mlog_errno(ret);
goto out;
@@ -3344,17 +3385,15 @@ static int ocfs2_merge_rec_left(struct inode *inode,

for (i = subtree_index + 1;
i < path_num_items(right_path); i++) {
- ret = ocfs2_journal_access(handle, inode,
- right_path->p_node[i].bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_path_bh_journal_access(handle, inode,
+ right_path, i);
if (ret) {
mlog_errno(ret);
goto out;
}

- ret = ocfs2_journal_access(handle, inode,
- left_path->p_node[i].bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_path_bh_journal_access(handle, inode,
+ left_path, i);
if (ret) {
mlog_errno(ret);
goto out;
@@ -3366,8 +3405,8 @@ static int ocfs2_merge_rec_left(struct inode *inode,
has_empty_extent = 1;
}

- ret = ocfs2_journal_access(handle, inode, bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_path_bh_journal_access(handle, inode, left_path,
+ path_num_items(left_path) - 1);
if (ret) {
mlog_errno(ret);
goto out;
@@ -4009,8 +4048,8 @@ static int ocfs2_do_insert_extent(struct inode *inode,

el = et->et_root_el;

- ret = ocfs2_journal_access(handle, inode, et->et_root_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_et_root_journal_access(handle, inode, et,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
goto out;
@@ -4071,8 +4110,8 @@ static int ocfs2_do_insert_extent(struct inode *inode,
* ocfs2_rotate_tree_right() might have extended the
* transaction without re-journaling our tree root.
*/
- ret = ocfs2_journal_access(handle, inode, et->et_root_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_et_root_journal_access(handle, inode, et,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
goto out;
@@ -4593,9 +4632,9 @@ int ocfs2_add_clusters_in_btree(struct ocfs2_super *osb,

BUG_ON(num_bits > clusters_to_add);

- /* reserve our write early -- insert_extent may update the inode */
- status = ocfs2_journal_access(handle, inode, et->et_root_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ /* reserve our write early -- insert_extent may update the tree root */
+ status = ocfs2_et_root_journal_access(handle, inode, et,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto leave;
@@ -5347,8 +5386,8 @@ int ocfs2_remove_btree_range(struct inode *inode,
goto out;
}

- ret = ocfs2_journal_access(handle, inode, et->et_root_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_et_root_journal_access(handle, inode, et,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
goto out;
@@ -5461,8 +5500,8 @@ int ocfs2_truncate_log_append(struct ocfs2_super *osb,
goto bail;
}

- status = ocfs2_journal_access(handle, tl_inode, tl_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_di(handle, tl_inode, tl_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto bail;
@@ -5523,8 +5562,8 @@ static int ocfs2_replay_truncate_records(struct ocfs2_super *osb,
while (i >= 0) {
/* Caller has given us at least enough credits to
* update the truncate log dinode */
- status = ocfs2_journal_access(handle, tl_inode, tl_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_di(handle, tl_inode, tl_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto bail;
@@ -5780,6 +5819,7 @@ int ocfs2_begin_truncate_log_recovery(struct ocfs2_super *osb,
* tl_used. */
tl->tl_used = 0;

+ ocfs2_compute_meta_ecc(osb->sb, tl_bh->b_data, &di->i_check);
status = ocfs2_write_block(osb, tl_bh, tl_inode);
if (status < 0) {
mlog_errno(status);
@@ -6546,8 +6586,8 @@ static int ocfs2_do_truncate(struct ocfs2_super *osb,
}

if (last_eb_bh) {
- status = ocfs2_journal_access(handle, inode, last_eb_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_eb(handle, inode, last_eb_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto bail;
@@ -6908,8 +6948,8 @@ int ocfs2_convert_inline_data_to_extents(struct inode *inode,
goto out_unlock;
}

- ret = ocfs2_journal_access(handle, inode, di_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_journal_access_di(handle, inode, di_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
goto out_commit;
@@ -7043,7 +7083,8 @@ int ocfs2_commit_truncate(struct ocfs2_super *osb,
new_highest_cpos = ocfs2_clusters_for_bytes(osb->sb,
i_size_read(inode));

- path = ocfs2_new_path(fe_bh, &di->id2.i_list);
+ path = ocfs2_new_path(fe_bh, &di->id2.i_list,
+ ocfs2_journal_access_di);
if (!path) {
status = -ENOMEM;
mlog_errno(status);
@@ -7276,8 +7317,8 @@ int ocfs2_truncate_inline(struct inode *inode, struct buffer_head *di_bh,
goto out;
}

- ret = ocfs2_journal_access(handle, inode, di_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_journal_access_di(handle, inode, di_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
goto out_commit;
diff --git a/fs/ocfs2/alloc.h b/fs/ocfs2/alloc.h
index 59d37d1..4b6fea2 100644
--- a/fs/ocfs2/alloc.h
+++ b/fs/ocfs2/alloc.h
@@ -45,7 +45,9 @@
*
* ocfs2_extent_tree contains info for the root of the b-tree, it must have a
* root ocfs2_extent_list and a root_bh so that they can be used in the b-tree
- * functions.
+ * functions. With metadata ecc, we now call different journal_access
+ * functions for each type of metadata, so it must have the
+ * root_journal_access function.
* ocfs2_extent_tree_operations abstract the normal operations we do for
* the root of extent b-tree.
*/
@@ -54,6 +56,7 @@ struct ocfs2_extent_tree {
struct ocfs2_extent_tree_operations *et_ops;
struct buffer_head *et_root_bh;
struct ocfs2_extent_list *et_root_el;
+ ocfs2_journal_access_func et_root_journal_access;
void *et_object;
unsigned int et_max_leaf_clusters;
};
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 6b647ec..a067a6c 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -1512,8 +1512,8 @@ static int ocfs2_write_begin_inline(struct address_space *mapping,
goto out;
}

- ret = ocfs2_journal_access(handle, inode, wc->w_di_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_journal_access_di(handle, inode, wc->w_di_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
ocfs2_commit_trans(osb, handle);

@@ -1740,8 +1740,8 @@ int ocfs2_write_begin_nolock(struct address_space *mapping,
* We don't want this to fail in ocfs2_write_end(), so do it
* here.
*/
- ret = ocfs2_journal_access(handle, inode, wc->w_di_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_journal_access_di(handle, inode, wc->w_di_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
goto out_quota;
diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c
index 3708fe4..45e4e03 100644
--- a/fs/ocfs2/dir.c
+++ b/fs/ocfs2/dir.c
@@ -378,14 +378,18 @@ int ocfs2_update_entry(struct inode *dir, handle_t *handle,
struct inode *new_entry_inode)
{
int ret;
+ ocfs2_journal_access_func access = ocfs2_journal_access_db;

/*
* The same code works fine for both inline-data and extent
- * based directories, so no need to split this up.
+ * based directories, so no need to split this up. The only
+ * difference is the journal_access function.
*/

- ret = ocfs2_journal_access(handle, dir, de_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ if (OCFS2_I(dir)->ip_dyn_features & OCFS2_INLINE_DATA_FL)
+ access = ocfs2_journal_access_di;
+
+ ret = access(handle, dir, de_bh, OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
goto out;
@@ -407,9 +411,13 @@ static int __ocfs2_delete_entry(handle_t *handle, struct inode *dir,
{
struct ocfs2_dir_entry *de, *pde;
int i, status = -ENOENT;
+ ocfs2_journal_access_func access = ocfs2_journal_access_db;

mlog_entry("(0x%p, 0x%p, 0x%p, 0x%p)\n", handle, dir, de_del, bh);

+ if (OCFS2_I(dir)->ip_dyn_features & OCFS2_INLINE_DATA_FL)
+ access = ocfs2_journal_access_di;
+
i = 0;
pde = NULL;
de = (struct ocfs2_dir_entry *) first_de;
@@ -420,8 +428,8 @@ static int __ocfs2_delete_entry(handle_t *handle, struct inode *dir,
goto bail;
}
if (de == de_del) {
- status = ocfs2_journal_access(handle, dir, bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = access(handle, dir, bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
status = -EIO;
mlog_errno(status);
@@ -581,8 +589,14 @@ int __ocfs2_add_entry(handle_t *handle,
goto bail;
}

- status = ocfs2_journal_access(handle, dir, insert_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ if (insert_bh == parent_fe_bh)
+ status = ocfs2_journal_access_di(handle, dir,
+ insert_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
+ else
+ status = ocfs2_journal_access_db(handle, dir,
+ insert_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
/* By now the buffer is marked for journaling */
offset += le16_to_cpu(de->rec_len);
if (le64_to_cpu(de->inode)) {
@@ -1081,8 +1095,8 @@ static int ocfs2_fill_new_dir_id(struct ocfs2_super *osb,
struct ocfs2_inline_data *data = &di->id2.i_data;
unsigned int size = le16_to_cpu(data->id_count);

- ret = ocfs2_journal_access(handle, inode, di_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_journal_access_di(handle, inode, di_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
goto out;
@@ -1129,8 +1143,8 @@ static int ocfs2_fill_new_dir_el(struct ocfs2_super *osb,

ocfs2_set_new_buffer_uptodate(inode, new_bh);

- status = ocfs2_journal_access(handle, inode, new_bh,
- OCFS2_JOURNAL_ACCESS_CREATE);
+ status = ocfs2_journal_access_db(handle, inode, new_bh,
+ OCFS2_JOURNAL_ACCESS_CREATE);
if (status < 0) {
mlog_errno(status);
goto bail;
@@ -1292,8 +1306,8 @@ static int ocfs2_expand_inline_dir(struct inode *dir, struct buffer_head *di_bh,

ocfs2_set_new_buffer_uptodate(dir, dirdata_bh);

- ret = ocfs2_journal_access(handle, dir, dirdata_bh,
- OCFS2_JOURNAL_ACCESS_CREATE);
+ ret = ocfs2_journal_access_db(handle, dir, dirdata_bh,
+ OCFS2_JOURNAL_ACCESS_CREATE);
if (ret) {
mlog_errno(ret);
goto out_commit;
@@ -1319,8 +1333,8 @@ static int ocfs2_expand_inline_dir(struct inode *dir, struct buffer_head *di_bh,
* We let the later dirent insert modify c/mtime - to the user
* the data hasn't changed.
*/
- ret = ocfs2_journal_access(handle, dir, di_bh,
- OCFS2_JOURNAL_ACCESS_CREATE);
+ ret = ocfs2_journal_access_di(handle, dir, di_bh,
+ OCFS2_JOURNAL_ACCESS_CREATE);
if (ret) {
mlog_errno(ret);
goto out_commit;
@@ -1583,8 +1597,8 @@ do_extend:

ocfs2_set_new_buffer_uptodate(dir, new_bh);

- status = ocfs2_journal_access(handle, dir, new_bh,
- OCFS2_JOURNAL_ACCESS_CREATE);
+ status = ocfs2_journal_access_db(handle, dir, new_bh,
+ OCFS2_JOURNAL_ACCESS_CREATE);
if (status < 0) {
mlog_errno(status);
goto bail;
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 9374d37..e8f795f 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -256,8 +256,8 @@ int ocfs2_update_inode_atime(struct inode *inode,
goto out;
}

- ret = ocfs2_journal_access(handle, inode, bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_journal_access_di(handle, inode, bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
goto out_commit;
@@ -353,8 +353,8 @@ static int ocfs2_orphan_for_truncate(struct ocfs2_super *osb,
goto out;
}

- status = ocfs2_journal_access(handle, inode, fe_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_di(handle, inode, fe_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto out_commit;
@@ -590,8 +590,8 @@ restarted_transaction:
/* reserve a write to the file entry early on - that we if we
* run out of credits in the allocation path, we can still
* update i_size. */
- status = ocfs2_journal_access(handle, inode, bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_di(handle, inode, bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto leave;
@@ -1121,8 +1121,8 @@ static int __ocfs2_write_remove_suid(struct inode *inode,
goto out;
}

- ret = ocfs2_journal_access(handle, inode, bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_journal_access_di(handle, inode, bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret < 0) {
mlog_errno(ret);
goto out_trans;
diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
index 9370b65..229e707 100644
--- a/fs/ocfs2/inode.c
+++ b/fs/ocfs2/inode.c
@@ -537,8 +537,8 @@ static int ocfs2_truncate_for_delete(struct ocfs2_super *osb,
goto out;
}

- status = ocfs2_journal_access(handle, inode, fe_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_di(handle, inode, fe_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto out;
@@ -621,8 +621,8 @@ static int ocfs2_remove_inode(struct inode *inode,
}

/* set the inodes dtime */
- status = ocfs2_journal_access(handle, inode, di_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_di(handle, inode, di_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto bail_commit;
@@ -1190,8 +1190,8 @@ int ocfs2_mark_inode_dirty(handle_t *handle,
mlog_entry("(inode %llu)\n",
(unsigned long long)OCFS2_I(inode)->ip_blkno);

- status = ocfs2_journal_access(handle, inode, bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_di(handle, inode, bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto leave;
@@ -1277,8 +1277,11 @@ int ocfs2_validate_inode_block(struct super_block *sb,
* local to this block.
*/
rc = ocfs2_validate_meta_ecc(sb, bh->b_data, &di->i_check);
- if (rc)
+ if (rc) {
+ mlog(ML_ERROR, "Checksum failed for dinode %llu\n",
+ (unsigned long long)bh->b_blocknr);
goto bail;
+ }

/*
* Errors after here are fatal.
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index 2daa584..3b54dba 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -752,6 +752,7 @@ static int ocfs2_journal_toggle_dirty(struct ocfs2_super *osb,
if (replayed)
ocfs2_bump_recovery_generation(fe);

+ ocfs2_compute_meta_ecc(osb->sb, bh->b_data, &fe->i_check);
status = ocfs2_write_block(osb, bh, journal->j_inode);
if (status < 0)
mlog_errno(status);
@@ -1486,6 +1487,7 @@ static int ocfs2_replay_journal(struct ocfs2_super *osb,
osb->slot_recovery_generations[slot_num] =
ocfs2_get_recovery_generation(fe);

+ ocfs2_compute_meta_ecc(osb->sb, bh->b_data, &fe->i_check);
status = ocfs2_write_block(osb, bh, inode);
if (status < 0)
mlog_errno(status);
diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
index bca370d..3c3532e 100644
--- a/fs/ocfs2/journal.h
+++ b/fs/ocfs2/journal.h
@@ -247,9 +247,10 @@ int ocfs2_extend_trans(handle_t *handle, int nblocks);
#define OCFS2_JOURNAL_ACCESS_WRITE 1
#define OCFS2_JOURNAL_ACCESS_UNDO 2

+
/* ocfs2_inode */
int ocfs2_journal_access_di(handle_t *handle, struct inode *inode,
- struct buffer_head *bh, int type);
+ struct buffer_head *bh, int type);
/* ocfs2_extent_block */
int ocfs2_journal_access_eb(handle_t *handle, struct inode *inode,
struct buffer_head *bh, int type);
diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
index 19cfb1b..ec70cdb 100644
--- a/fs/ocfs2/localalloc.c
+++ b/fs/ocfs2/localalloc.c
@@ -36,6 +36,7 @@
#include "ocfs2.h"

#include "alloc.h"
+#include "blockcheck.h"
#include "dlmglue.h"
#include "inode.h"
#include "journal.h"
@@ -382,8 +383,8 @@ void ocfs2_shutdown_local_alloc(struct ocfs2_super *osb)
}
memcpy(alloc_copy, alloc, bh->b_size);

- status = ocfs2_journal_access(handle, local_alloc_inode, bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_di(handle, local_alloc_inode, bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto out_commit;
@@ -476,6 +477,7 @@ int ocfs2_begin_local_alloc_recovery(struct ocfs2_super *osb,
alloc = (struct ocfs2_dinode *) alloc_bh->b_data;
ocfs2_clear_local_alloc(alloc);

+ ocfs2_compute_meta_ecc(osb->sb, alloc_bh->b_data, &alloc->i_check);
status = ocfs2_write_block(osb, alloc_bh, inode);
if (status < 0)
mlog_errno(status);
@@ -762,9 +764,9 @@ int ocfs2_claim_local_alloc_bits(struct ocfs2_super *osb,
* delete bits from it! */
*num_bits = bits_wanted;

- status = ocfs2_journal_access(handle, local_alloc_inode,
- osb->local_alloc_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_di(handle, local_alloc_inode,
+ osb->local_alloc_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto bail;
@@ -1240,9 +1242,9 @@ static int ocfs2_local_alloc_slide_window(struct ocfs2_super *osb,
}
memcpy(alloc_copy, alloc, osb->local_alloc_bh->b_size);

- status = ocfs2_journal_access(handle, local_alloc_inode,
- osb->local_alloc_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_di(handle, local_alloc_inode,
+ osb->local_alloc_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto bail;
diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
index 02c8026..7587d24 100644
--- a/fs/ocfs2/namei.c
+++ b/fs/ocfs2/namei.c
@@ -361,8 +361,8 @@ static int ocfs2_mknod(struct inode *dir,
goto leave;
}

- status = ocfs2_journal_access(handle, dir, parent_fe_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_di(handle, dir, parent_fe_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto leave;
@@ -493,8 +493,8 @@ static int ocfs2_mknod_locked(struct ocfs2_super *osb,
}
ocfs2_set_new_buffer_uptodate(inode, *new_fe_bh);

- status = ocfs2_journal_access(handle, inode, *new_fe_bh,
- OCFS2_JOURNAL_ACCESS_CREATE);
+ status = ocfs2_journal_access_di(handle, inode, *new_fe_bh,
+ OCFS2_JOURNAL_ACCESS_CREATE);
if (status < 0) {
mlog_errno(status);
goto leave;
@@ -664,8 +664,8 @@ static int ocfs2_link(struct dentry *old_dentry,
goto out_unlock_inode;
}

- err = ocfs2_journal_access(handle, inode, fe_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ err = ocfs2_journal_access_di(handle, inode, fe_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (err < 0) {
mlog_errno(err);
goto out_commit;
@@ -851,8 +851,8 @@ static int ocfs2_unlink(struct inode *dir,
goto leave;
}

- status = ocfs2_journal_access(handle, inode, fe_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_di(handle, inode, fe_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto leave;
@@ -1265,8 +1265,8 @@ static int ocfs2_rename(struct inode *old_dir,
goto bail;
}
}
- status = ocfs2_journal_access(handle, new_inode, newfe_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_di(handle, new_inode, newfe_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto bail;
@@ -1312,8 +1312,8 @@ static int ocfs2_rename(struct inode *old_dir,
old_inode->i_ctime = CURRENT_TIME;
mark_inode_dirty(old_inode);

- status = ocfs2_journal_access(handle, old_inode, old_inode_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_di(handle, old_inode, old_inode_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status >= 0) {
old_di = (struct ocfs2_dinode *) old_inode_bh->b_data;

@@ -1389,9 +1389,9 @@ static int ocfs2_rename(struct inode *old_dir,
(int)old_dir_nlink, old_dir->i_nlink);
} else {
struct ocfs2_dinode *fe;
- status = ocfs2_journal_access(handle, old_dir,
- old_dir_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_di(handle, old_dir,
+ old_dir_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
fe = (struct ocfs2_dinode *) old_dir_bh->b_data;
fe->i_links_count = cpu_to_le16(old_dir->i_nlink);
status = ocfs2_journal_dirty(handle, old_dir_bh);
@@ -1898,8 +1898,8 @@ static int ocfs2_orphan_add(struct ocfs2_super *osb,
goto leave;
}

- status = ocfs2_journal_access(handle, orphan_dir_inode, orphan_dir_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_di(handle, orphan_dir_inode, orphan_dir_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto leave;
@@ -1986,8 +1986,8 @@ int ocfs2_orphan_del(struct ocfs2_super *osb,
goto leave;
}

- status = ocfs2_journal_access(handle,orphan_dir_inode, orphan_dir_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_di(handle,orphan_dir_inode, orphan_dir_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto leave;
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index 2bb389f..bad87d0 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -339,6 +339,10 @@ struct ocfs2_super

#define OCFS2_SB(sb) ((struct ocfs2_super *)(sb)->s_fs_info)

+/* Useful typedef for passing around journal access functions */
+typedef int (*ocfs2_journal_access_func)(handle_t *handle, struct inode *inode,
+ struct buffer_head *bh, int type);
+
static inline int ocfs2_should_order_data(struct inode *inode)
{
if (!S_ISREG(inode->i_mode))
diff --git a/fs/ocfs2/quota_global.c b/fs/ocfs2/quota_global.c
index d338438..f0c616c 100644
--- a/fs/ocfs2/quota_global.c
+++ b/fs/ocfs2/quota_global.c
@@ -242,7 +242,7 @@ ssize_t ocfs2_quota_write(struct super_block *sb, int type,
set_buffer_uptodate(bh);
unlock_buffer(bh);
ocfs2_set_buffer_uptodate(gqinode, bh);
- err = ocfs2_journal_access(handle, gqinode, bh, ja_type);
+ err = ocfs2_journal_access_dq(handle, gqinode, bh, ja_type);
if (err < 0) {
brelse(bh);
goto out;
diff --git a/fs/ocfs2/quota_local.c b/fs/ocfs2/quota_local.c
index 5353c42..a5f6e2a 100644
--- a/fs/ocfs2/quota_local.c
+++ b/fs/ocfs2/quota_local.c
@@ -106,8 +106,8 @@ static int ocfs2_modify_bh(struct inode *inode, struct buffer_head *bh,
mlog_errno(status);
return status;
}
- status = ocfs2_journal_access(handle, inode, bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_dq(handle, inode, bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
ocfs2_commit_trans(OCFS2_SB(sb), handle);
@@ -506,7 +506,7 @@ static int ocfs2_recover_local_quota_file(struct inode *lqinode,
goto out_commit;
}
/* Release local quota file entry */
- status = ocfs2_journal_access(handle, lqinode,
+ status = ocfs2_journal_access_dq(handle, lqinode,
qbh, OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
@@ -614,8 +614,8 @@ int ocfs2_finish_quota_recovery(struct ocfs2_super *osb,
mlog_errno(status);
goto out_bh;
}
- status = ocfs2_journal_access(handle, lqinode, bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_dq(handle, lqinode, bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto out_trans;
@@ -981,8 +981,8 @@ static struct ocfs2_quota_chunk *ocfs2_local_quota_add_chunk(
goto out;
}

- status = ocfs2_journal_access(handle, lqinode, bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_dq(handle, lqinode, bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto out_trans;
@@ -1074,7 +1074,7 @@ static struct ocfs2_quota_chunk *ocfs2_extend_local_quota_file(
mlog_errno(status);
goto out;
}
- status = ocfs2_journal_access(handle, lqinode, chunk->qc_headerbh,
+ status = ocfs2_journal_access_dq(handle, lqinode, chunk->qc_headerbh,
OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
@@ -1207,7 +1207,7 @@ static int ocfs2_local_release_dquot(struct dquot *dquot)
goto out;
}

- status = ocfs2_journal_access(handle, sb_dqopt(sb)->files[type],
+ status = ocfs2_journal_access_dq(handle, sb_dqopt(sb)->files[type],
od->dq_chunk->qc_headerbh, OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
diff --git a/fs/ocfs2/resize.c b/fs/ocfs2/resize.c
index 867de3e..424adaa 100644
--- a/fs/ocfs2/resize.c
+++ b/fs/ocfs2/resize.c
@@ -106,8 +106,8 @@ static int ocfs2_update_last_group_and_inode(handle_t *handle,
mlog_entry("(new_clusters=%d, first_new_cluster = %u)\n",
new_clusters, first_new_cluster);

- ret = ocfs2_journal_access(handle, bm_inode, group_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_journal_access_gd(handle, bm_inode, group_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret < 0) {
mlog_errno(ret);
goto out;
@@ -141,8 +141,8 @@ static int ocfs2_update_last_group_and_inode(handle_t *handle,
}

/* update the inode accordingly. */
- ret = ocfs2_journal_access(handle, bm_inode, bm_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_journal_access_di(handle, bm_inode, bm_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret < 0) {
mlog_errno(ret);
goto out_rollback;
@@ -536,8 +536,8 @@ int ocfs2_group_add(struct inode *inode, struct ocfs2_new_group_input *input)
cl = &fe->id2.i_chain;
cr = &cl->cl_recs[input->chain];

- ret = ocfs2_journal_access(handle, main_bm_inode, group_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_journal_access_gd(handle, main_bm_inode, group_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret < 0) {
mlog_errno(ret);
goto out_commit;
@@ -552,8 +552,8 @@ int ocfs2_group_add(struct inode *inode, struct ocfs2_new_group_input *input)
goto out_commit;
}

- ret = ocfs2_journal_access(handle, main_bm_inode, main_bm_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_journal_access_di(handle, main_bm_inode, main_bm_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret < 0) {
mlog_errno(ret);
goto out_commit;
diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index 7875576..a696286 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -261,7 +261,11 @@ int ocfs2_check_group_descriptor(struct super_block *sb,
* local to this block.
*/
rc = ocfs2_validate_meta_ecc(sb, bh->b_data, &gd->bg_check);
- if (!rc)
+ if (rc) {
+ mlog(ML_ERROR,
+ "Checksum failed for group descriptor %llu\n",
+ (unsigned long long)bh->b_blocknr);
+ } else
rc = ocfs2_validate_gd_self(sb, bh, 1);
if (!rc)
rc = ocfs2_validate_gd_parent(sb, di, bh, 1);
@@ -343,10 +347,10 @@ static int ocfs2_block_group_fill(handle_t *handle,
goto bail;
}

- status = ocfs2_journal_access(handle,
- alloc_inode,
- bg_bh,
- OCFS2_JOURNAL_ACCESS_CREATE);
+ status = ocfs2_journal_access_gd(handle,
+ alloc_inode,
+ bg_bh,
+ OCFS2_JOURNAL_ACCESS_CREATE);
if (status < 0) {
mlog_errno(status);
goto bail;
@@ -476,8 +480,8 @@ static int ocfs2_block_group_alloc(struct ocfs2_super *osb,

bg = (struct ocfs2_group_desc *) bg_bh->b_data;

- status = ocfs2_journal_access(handle, alloc_inode,
- bh, OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_di(handle, alloc_inode,
+ bh, OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto bail;
@@ -986,10 +990,10 @@ static inline int ocfs2_block_group_set_bits(handle_t *handle,
if (ocfs2_is_cluster_bitmap(alloc_inode))
journal_type = OCFS2_JOURNAL_ACCESS_UNDO;

- status = ocfs2_journal_access(handle,
- alloc_inode,
- group_bh,
- journal_type);
+ status = ocfs2_journal_access_gd(handle,
+ alloc_inode,
+ group_bh,
+ journal_type);
if (status < 0) {
mlog_errno(status);
goto bail;
@@ -1060,8 +1064,8 @@ static int ocfs2_relink_block_group(handle_t *handle,
bg_ptr = le64_to_cpu(bg->bg_next_group);
prev_bg_ptr = le64_to_cpu(prev_bg->bg_next_group);

- status = ocfs2_journal_access(handle, alloc_inode, prev_bg_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_gd(handle, alloc_inode, prev_bg_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto out_rollback;
@@ -1075,8 +1079,8 @@ static int ocfs2_relink_block_group(handle_t *handle,
goto out_rollback;
}

- status = ocfs2_journal_access(handle, alloc_inode, bg_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_gd(handle, alloc_inode, bg_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto out_rollback;
@@ -1090,8 +1094,8 @@ static int ocfs2_relink_block_group(handle_t *handle,
goto out_rollback;
}

- status = ocfs2_journal_access(handle, alloc_inode, fe_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_di(handle, alloc_inode, fe_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto out_rollback;
@@ -1242,8 +1246,8 @@ static int ocfs2_alloc_dinode_update_counts(struct inode *inode,
struct ocfs2_dinode *di = (struct ocfs2_dinode *) di_bh->b_data;
struct ocfs2_chain_list *cl = (struct ocfs2_chain_list *) &di->id2.i_chain;

- ret = ocfs2_journal_access(handle, inode, di_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_journal_access_di(handle, inode, di_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret < 0) {
mlog_errno(ret);
goto out;
@@ -1414,10 +1418,10 @@ static int ocfs2_search_chain(struct ocfs2_alloc_context *ac,

/* Ok, claim our bits now: set the info on dinode, chainlist
* and then the group */
- status = ocfs2_journal_access(handle,
- alloc_inode,
- ac->ac_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_di(handle,
+ alloc_inode,
+ ac->ac_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto bail;
@@ -1824,8 +1828,8 @@ static inline int ocfs2_block_group_clear_bits(handle_t *handle,
if (ocfs2_is_cluster_bitmap(alloc_inode))
journal_type = OCFS2_JOURNAL_ACCESS_UNDO;

- status = ocfs2_journal_access(handle, alloc_inode, group_bh,
- journal_type);
+ status = ocfs2_journal_access_gd(handle, alloc_inode, group_bh,
+ journal_type);
if (status < 0) {
mlog_errno(status);
goto bail;
@@ -1900,8 +1904,8 @@ int ocfs2_free_suballoc_bits(handle_t *handle,
goto bail;
}

- status = ocfs2_journal_access(handle, alloc_inode, alloc_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = ocfs2_journal_access_di(handle, alloc_inode, alloc_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto bail;
--
1.5.6

2008-12-25 18:08:30

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 09/35] ocfs2: Add ecc and checksums to ocfs2 xattr buckets.

From: Joel Becker <[email protected]>

The xattr bucket can span multiple blocks on disk. We have wrappers
for this structure in the code. We use the new multi-block ecc calls to
calculate and validate the bucket.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 13 +++++++++++++
1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index bc822d6..7c2f4c9 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -273,6 +273,15 @@ static int ocfs2_read_xattr_bucket(struct ocfs2_xattr_bucket *bucket,
rc = ocfs2_read_blocks(bucket->bu_inode, xb_blkno,
bucket->bu_blocks, bucket->bu_bhs, 0,
NULL);
+ if (!rc) {
+ rc = ocfs2_validate_meta_ecc_bhs(bucket->bu_inode->i_sb,
+ bucket->bu_bhs,
+ bucket->bu_blocks,
+ &bucket_xh(bucket)->xh_check);
+ if (rc)
+ mlog_errno(rc);
+ }
+
if (rc)
ocfs2_xattr_bucket_relse(bucket);
return rc;
@@ -301,6 +310,10 @@ static void ocfs2_xattr_bucket_journal_dirty(handle_t *handle,
{
int i;

+ ocfs2_compute_meta_ecc_bhs(bucket->bu_inode->i_sb,
+ bucket->bu_bhs, bucket->bu_blocks,
+ &bucket_xh(bucket)->xh_check);
+
for (i = 0; i < bucket->bu_blocks; i++)
ocfs2_journal_dirty(handle, bucket->bu_bhs[i]);
}
--
1.5.6

2008-12-25 18:08:49

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 10/35] ocfs2: Create ocfs2_xattr_value_buf.

From: Joel Becker <[email protected]>

When an ocfs2 extended attribute is large enough to require its own
allocation tree, we root it with an ocfs2_xattr_value_root. However,
these roots can be a part of inodes, xattr blocks, or xattr buckets.
Thus, they need a different journal access function for each container.

We wrap the bh, its journal access function, and the value root (xv) in
a structure called ocfs2_xattr_valu_buf. This is a package that can
be passed around. In this first pass, we simply pass it to the
extent tree code.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/alloc.c | 25 +++++++++++--------------
fs/ocfs2/alloc.h | 4 ++--
fs/ocfs2/xattr.c | 34 ++++++++++++++++++++++------------
fs/ocfs2/xattr.h | 14 ++++++++++++++
4 files changed, 49 insertions(+), 28 deletions(-)

diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 6e58fd5..874c0bd 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -48,6 +48,7 @@
#include "file.h"
#include "super.h"
#include "uptodate.h"
+#include "xattr.h"

#include "buffer_head_io.h"

@@ -207,36 +208,33 @@ static void ocfs2_dinode_fill_root_el(struct ocfs2_extent_tree *et)

static void ocfs2_xattr_value_fill_root_el(struct ocfs2_extent_tree *et)
{
- struct ocfs2_xattr_value_root *xv = et->et_object;
+ struct ocfs2_xattr_value_buf *vb = et->et_object;

- et->et_root_el = &xv->xr_list;
+ et->et_root_el = &vb->vb_xv->xr_list;
}

static void ocfs2_xattr_value_set_last_eb_blk(struct ocfs2_extent_tree *et,
u64 blkno)
{
- struct ocfs2_xattr_value_root *xv =
- (struct ocfs2_xattr_value_root *)et->et_object;
+ struct ocfs2_xattr_value_buf *vb = et->et_object;

- xv->xr_last_eb_blk = cpu_to_le64(blkno);
+ vb->vb_xv->xr_last_eb_blk = cpu_to_le64(blkno);
}

static u64 ocfs2_xattr_value_get_last_eb_blk(struct ocfs2_extent_tree *et)
{
- struct ocfs2_xattr_value_root *xv =
- (struct ocfs2_xattr_value_root *) et->et_object;
+ struct ocfs2_xattr_value_buf *vb = et->et_object;

- return le64_to_cpu(xv->xr_last_eb_blk);
+ return le64_to_cpu(vb->vb_xv->xr_last_eb_blk);
}

static void ocfs2_xattr_value_update_clusters(struct inode *inode,
struct ocfs2_extent_tree *et,
u32 clusters)
{
- struct ocfs2_xattr_value_root *xv =
- (struct ocfs2_xattr_value_root *)et->et_object;
+ struct ocfs2_xattr_value_buf *vb = et->et_object;

- le32_add_cpu(&xv->xr_clusters, clusters);
+ le32_add_cpu(&vb->vb_xv->xr_clusters, clusters);
}

static struct ocfs2_extent_tree_operations ocfs2_xattr_value_et_ops = {
@@ -334,10 +332,9 @@ void ocfs2_init_xattr_tree_extent_tree(struct ocfs2_extent_tree *et,

void ocfs2_init_xattr_value_extent_tree(struct ocfs2_extent_tree *et,
struct inode *inode,
- struct buffer_head *bh,
- struct ocfs2_xattr_value_root *xv)
+ struct ocfs2_xattr_value_buf *vb)
{
- __ocfs2_init_extent_tree(et, inode, bh, ocfs2_journal_access, xv,
+ __ocfs2_init_extent_tree(et, inode, vb->vb_bh, vb->vb_access, vb,
&ocfs2_xattr_value_et_ops);
}

diff --git a/fs/ocfs2/alloc.h b/fs/ocfs2/alloc.h
index 4b6fea2..cceff5c 100644
--- a/fs/ocfs2/alloc.h
+++ b/fs/ocfs2/alloc.h
@@ -71,10 +71,10 @@ void ocfs2_init_dinode_extent_tree(struct ocfs2_extent_tree *et,
void ocfs2_init_xattr_tree_extent_tree(struct ocfs2_extent_tree *et,
struct inode *inode,
struct buffer_head *bh);
+struct ocfs2_xattr_value_buf;
void ocfs2_init_xattr_value_extent_tree(struct ocfs2_extent_tree *et,
struct inode *inode,
- struct buffer_head *bh,
- struct ocfs2_xattr_value_root *xv);
+ struct ocfs2_xattr_value_buf *vb);

/*
* Read an extent block into *bh. If *bh is NULL, a bh will be
diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 7c2f4c9..123d378 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -581,21 +581,26 @@ static int ocfs2_xattr_extend_allocation(struct inode *inode,
handle_t *handle = ctxt->handle;
enum ocfs2_alloc_restarted why;
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
- u32 prev_clusters, logical_start = le32_to_cpu(xv->xr_clusters);
+ struct ocfs2_xattr_value_buf vb = {
+ .vb_bh = xattr_bh,
+ .vb_xv = xv,
+ .vb_access = ocfs2_journal_access,
+ };
+ u32 prev_clusters, logical_start = le32_to_cpu(vb.vb_xv->xr_clusters);
struct ocfs2_extent_tree et;

mlog(0, "(clusters_to_add for xattr= %u)\n", clusters_to_add);

- ocfs2_init_xattr_value_extent_tree(&et, inode, xattr_bh, xv);
+ ocfs2_init_xattr_value_extent_tree(&et, inode, &vb);

- status = ocfs2_journal_access(handle, inode, xattr_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ status = vb.vb_access(handle, inode, vb.vb_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto leave;
}

- prev_clusters = le32_to_cpu(xv->xr_clusters);
+ prev_clusters = le32_to_cpu(vb.vb_xv->xr_clusters);
status = ocfs2_add_clusters_in_btree(osb,
inode,
&logical_start,
@@ -611,13 +616,13 @@ static int ocfs2_xattr_extend_allocation(struct inode *inode,
goto leave;
}

- status = ocfs2_journal_dirty(handle, xattr_bh);
+ status = ocfs2_journal_dirty(handle, vb.vb_bh);
if (status < 0) {
mlog_errno(status);
goto leave;
}

- clusters_to_add -= le32_to_cpu(xv->xr_clusters) - prev_clusters;
+ clusters_to_add -= le32_to_cpu(vb.vb_xv->xr_clusters) - prev_clusters;

/*
* We should have already allocated enough space before the transaction,
@@ -640,11 +645,16 @@ static int __ocfs2_remove_xattr_range(struct inode *inode,
u64 phys_blkno = ocfs2_clusters_to_blocks(inode->i_sb, phys_cpos);
handle_t *handle = ctxt->handle;
struct ocfs2_extent_tree et;
+ struct ocfs2_xattr_value_buf vb = {
+ .vb_bh = root_bh,
+ .vb_xv = xv,
+ .vb_access = ocfs2_journal_access,
+ };

- ocfs2_init_xattr_value_extent_tree(&et, inode, root_bh, xv);
+ ocfs2_init_xattr_value_extent_tree(&et, inode, &vb);

- ret = ocfs2_journal_access(handle, inode, root_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = vb.vb_access(handle, inode, vb.vb_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
goto out;
@@ -657,9 +667,9 @@ static int __ocfs2_remove_xattr_range(struct inode *inode,
goto out;
}

- le32_add_cpu(&xv->xr_clusters, -len);
+ le32_add_cpu(&vb.vb_xv->xr_clusters, -len);

- ret = ocfs2_journal_dirty(handle, root_bh);
+ ret = ocfs2_journal_dirty(handle, vb.vb_bh);
if (ret) {
mlog_errno(ret);
goto out;
diff --git a/fs/ocfs2/xattr.h b/fs/ocfs2/xattr.h
index 9a67e7d..5a1ebc7 100644
--- a/fs/ocfs2/xattr.h
+++ b/fs/ocfs2/xattr.h
@@ -70,4 +70,18 @@ int ocfs2_calc_xattr_init(struct inode *, struct buffer_head *,
int, struct ocfs2_security_xattr_info *,
int *, int *, struct ocfs2_alloc_context **);

+/*
+ * xattrs can live inside an inode, as part of an external xattr block,
+ * or inside an xattr bucket, which is the leaf of a tree rooted in an
+ * xattr block. Some of the xattr calls, especially the value setting
+ * functions, want to treat each of these locations as equal. Let's wrap
+ * them in a structure that we can pass around instead of raw buffer_heads.
+ */
+struct ocfs2_xattr_value_buf {
+ struct buffer_head *vb_bh;
+ ocfs2_journal_access_func vb_access;
+ struct ocfs2_xattr_value_root *vb_xv;
+};
+
+
#endif /* OCFS2_XATTR_H */
--
1.5.6

2008-12-25 18:09:09

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 11/35] ocfs2: Pull ocfs2_xattr_value_buf up from __ocfs2_remove_xattr_range().

From: Joel Becker <[email protected]>

Place an ocfs2_xattr_value_buf in __ocfs2_xattr_shrink_size() and pass
it down to __ocfs2_remove_xattr_range().

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 28 ++++++++++++++--------------
1 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 123d378..3b059cf 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -636,8 +636,7 @@ leave:
}

static int __ocfs2_remove_xattr_range(struct inode *inode,
- struct buffer_head *root_bh,
- struct ocfs2_xattr_value_root *xv,
+ struct ocfs2_xattr_value_buf *vb,
u32 cpos, u32 phys_cpos, u32 len,
struct ocfs2_xattr_set_ctxt *ctxt)
{
@@ -645,16 +644,11 @@ static int __ocfs2_remove_xattr_range(struct inode *inode,
u64 phys_blkno = ocfs2_clusters_to_blocks(inode->i_sb, phys_cpos);
handle_t *handle = ctxt->handle;
struct ocfs2_extent_tree et;
- struct ocfs2_xattr_value_buf vb = {
- .vb_bh = root_bh,
- .vb_xv = xv,
- .vb_access = ocfs2_journal_access,
- };

- ocfs2_init_xattr_value_extent_tree(&et, inode, &vb);
+ ocfs2_init_xattr_value_extent_tree(&et, inode, vb);

- ret = vb.vb_access(handle, inode, vb.vb_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = vb->vb_access(handle, inode, vb->vb_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
goto out;
@@ -667,9 +661,9 @@ static int __ocfs2_remove_xattr_range(struct inode *inode,
goto out;
}

- le32_add_cpu(&vb.vb_xv->xr_clusters, -len);
+ le32_add_cpu(&vb->vb_xv->xr_clusters, -len);

- ret = ocfs2_journal_dirty(handle, vb.vb_bh);
+ ret = ocfs2_journal_dirty(handle, vb->vb_bh);
if (ret) {
mlog_errno(ret);
goto out;
@@ -693,6 +687,11 @@ static int ocfs2_xattr_shrink_size(struct inode *inode,
int ret = 0;
u32 trunc_len, cpos, phys_cpos, alloc_size;
u64 block;
+ struct ocfs2_xattr_value_buf vb = {
+ .vb_bh = root_bh,
+ .vb_xv = xv,
+ .vb_access = ocfs2_journal_access,
+ };

if (old_clusters <= new_clusters)
return 0;
@@ -701,7 +700,8 @@ static int ocfs2_xattr_shrink_size(struct inode *inode,
trunc_len = old_clusters - new_clusters;
while (trunc_len) {
ret = ocfs2_xattr_get_clusters(inode, cpos, &phys_cpos,
- &alloc_size, &xv->xr_list);
+ &alloc_size,
+ &vb.vb_xv->xr_list);
if (ret) {
mlog_errno(ret);
goto out;
@@ -710,7 +710,7 @@ static int ocfs2_xattr_shrink_size(struct inode *inode,
if (alloc_size > trunc_len)
alloc_size = trunc_len;

- ret = __ocfs2_remove_xattr_range(inode, root_bh, xv, cpos,
+ ret = __ocfs2_remove_xattr_range(inode, &vb, cpos,
phys_cpos, alloc_size,
ctxt);
if (ret) {
--
1.5.6

2008-12-25 18:09:47

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 13/35] ocfs2: Pass ocfs2_xattr_value_buf into ocfs2_xattr_value_truncate().

From: Joel Becker <[email protected]>

The callers of ocfs2_xattr_value_truncate() now pass in
ocfs2_xattr_value_bufs. These callers are the ones that calculated the
xv location, so they are the right starting point.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 66 +++++++++++++++++++++++++++--------------------------
1 files changed, 34 insertions(+), 32 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 4ce8019..409f9ee 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -718,19 +718,13 @@ out:
}

static int ocfs2_xattr_value_truncate(struct inode *inode,
- struct buffer_head *root_bh,
- struct ocfs2_xattr_value_root *xv,
+ struct ocfs2_xattr_value_buf *vb,
int len,
struct ocfs2_xattr_set_ctxt *ctxt)
{
int ret;
u32 new_clusters = ocfs2_clusters_for_bytes(inode->i_sb, len);
- u32 old_clusters = le32_to_cpu(xv->xr_clusters);
- struct ocfs2_xattr_value_buf vb = {
- .vb_bh = root_bh,
- .vb_xv = xv,
- .vb_access = ocfs2_journal_access,
- };
+ u32 old_clusters = le32_to_cpu(vb->vb_xv->xr_clusters);

if (new_clusters == old_clusters)
return 0;
@@ -738,11 +732,11 @@ static int ocfs2_xattr_value_truncate(struct inode *inode,
if (new_clusters > old_clusters)
ret = ocfs2_xattr_extend_allocation(inode,
new_clusters - old_clusters,
- &vb, ctxt);
+ vb, ctxt);
else
ret = ocfs2_xattr_shrink_size(inode,
old_clusters, new_clusters,
- &vb, ctxt);
+ vb, ctxt);

return ret;
}
@@ -1330,6 +1324,10 @@ static int ocfs2_xattr_set_value_outside(struct inode *inode,
struct ocfs2_xattr_value_root *xv = NULL;
size_t size = OCFS2_XATTR_SIZE(name_len) + OCFS2_XATTR_ROOT_SIZE;
int ret = 0;
+ struct ocfs2_xattr_value_buf vb = {
+ .vb_bh = xs->xattr_bh,
+ .vb_access = ocfs2_journal_access
+ };

memset(val, 0, size);
memcpy(val, xi->name, name_len);
@@ -1340,9 +1338,9 @@ static int ocfs2_xattr_set_value_outside(struct inode *inode,
xv->xr_list.l_tree_depth = 0;
xv->xr_list.l_count = cpu_to_le16(1);
xv->xr_list.l_next_free_rec = 0;
+ vb.vb_xv = xv;

- ret = ocfs2_xattr_value_truncate(inode, xs->xattr_bh, xv,
- xi->value_len, ctxt);
+ ret = ocfs2_xattr_value_truncate(inode, &vb, xi->value_len, ctxt);
if (ret < 0) {
mlog_errno(ret);
return ret;
@@ -1352,7 +1350,7 @@ static int ocfs2_xattr_set_value_outside(struct inode *inode,
mlog_errno(ret);
return ret;
}
- ret = __ocfs2_xattr_set_value_outside(inode, ctxt->handle, xv,
+ ret = __ocfs2_xattr_set_value_outside(inode, ctxt->handle, vb.vb_xv,
xi->value, xi->value_len);
if (ret < 0)
mlog_errno(ret);
@@ -1550,9 +1548,12 @@ static int ocfs2_xattr_set_entry(struct inode *inode,
goto out;
} else if (!ocfs2_xattr_is_local(xs->here)) {
/* For existing xattr which has value outside */
- struct ocfs2_xattr_value_root *xv = NULL;
- xv = (struct ocfs2_xattr_value_root *)(val +
- OCFS2_XATTR_SIZE(name_len));
+ struct ocfs2_xattr_value_buf vb = {
+ .vb_bh = xs->xattr_bh,
+ .vb_xv = (struct ocfs2_xattr_value_root *)
+ (val + OCFS2_XATTR_SIZE(name_len)),
+ .vb_access = ocfs2_journal_access,
+ };

if (xi->value_len > OCFS2_XATTR_INLINE_SIZE) {
/*
@@ -1561,8 +1562,7 @@ static int ocfs2_xattr_set_entry(struct inode *inode,
* then set new value with set_value_outside().
*/
ret = ocfs2_xattr_value_truncate(inode,
- xs->xattr_bh,
- xv,
+ &vb,
xi->value_len,
ctxt);
if (ret < 0) {
@@ -1582,7 +1582,7 @@ static int ocfs2_xattr_set_entry(struct inode *inode,

ret = __ocfs2_xattr_set_value_outside(inode,
handle,
- xv,
+ vb.vb_xv,
xi->value,
xi->value_len);
if (ret < 0)
@@ -1594,8 +1594,7 @@ static int ocfs2_xattr_set_entry(struct inode *inode,
* just trucate old value to zero.
*/
ret = ocfs2_xattr_value_truncate(inode,
- xs->xattr_bh,
- xv,
+ &vb,
0,
ctxt);
if (ret < 0)
@@ -1714,15 +1713,17 @@ static int ocfs2_remove_value_outside(struct inode*inode,
struct ocfs2_xattr_entry *entry = &header->xh_entries[i];

if (!ocfs2_xattr_is_local(entry)) {
- struct ocfs2_xattr_value_root *xv;
+ struct ocfs2_xattr_value_buf vb = {
+ .vb_bh = bh,
+ .vb_access = ocfs2_journal_access,
+ };
void *val;

val = (void *)header +
le16_to_cpu(entry->xe_name_offset);
- xv = (struct ocfs2_xattr_value_root *)
+ vb.vb_xv = (struct ocfs2_xattr_value_root *)
(val + OCFS2_XATTR_SIZE(entry->xe_name_len));
- ret = ocfs2_xattr_value_truncate(inode, bh, xv,
- 0, &ctxt);
+ ret = ocfs2_xattr_value_truncate(inode, &vb, 0, &ctxt);
if (ret < 0) {
mlog_errno(ret);
break;
@@ -4651,11 +4652,12 @@ static int ocfs2_xattr_bucket_value_truncate(struct inode *inode,
{
int ret, offset;
u64 value_blk;
- struct buffer_head *value_bh = NULL;
- struct ocfs2_xattr_value_root *xv;
struct ocfs2_xattr_entry *xe;
struct ocfs2_xattr_header *xh = bucket_xh(bucket);
size_t blocksize = inode->i_sb->s_blocksize;
+ struct ocfs2_xattr_value_buf vb = {
+ .vb_access = ocfs2_journal_access,
+ };

xe = &xh->xh_entries[xe_off];

@@ -4669,11 +4671,11 @@ static int ocfs2_xattr_bucket_value_truncate(struct inode *inode,
/* We don't allow ocfs2_xattr_value to be stored in different block. */
BUG_ON(value_blk != (offset + OCFS2_XATTR_ROOT_SIZE - 1) / blocksize);

- value_bh = bucket->bu_bhs[value_blk];
- BUG_ON(!value_bh);
+ vb.vb_bh = bucket->bu_bhs[value_blk];
+ BUG_ON(!vb.vb_bh);

- xv = (struct ocfs2_xattr_value_root *)
- (value_bh->b_data + offset % blocksize);
+ vb.vb_xv = (struct ocfs2_xattr_value_root *)
+ (vb.vb_bh->b_data + offset % blocksize);

ret = ocfs2_xattr_bucket_journal_access(ctxt->handle, bucket,
OCFS2_JOURNAL_ACCESS_WRITE);
@@ -4691,7 +4693,7 @@ static int ocfs2_xattr_bucket_value_truncate(struct inode *inode,
*/
mlog(0, "truncate %u in xattr bucket %llu to %d bytes.\n",
xe_off, (unsigned long long)bucket_blkno(bucket), len);
- ret = ocfs2_xattr_value_truncate(inode, value_bh, xv, len, ctxt);
+ ret = ocfs2_xattr_value_truncate(inode, &vb, len, ctxt);
if (ret) {
mlog_errno(ret);
goto out_dirty;
--
1.5.6

2008-12-25 18:09:32

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 12/35] ocfs2: Pull ocfs2_xattr_value_buf up into ocfs2_xattr_value_truncate().

From: Joel Becker <[email protected]>

Place an ocfs2_xattr_value_buf in ocfs2_xattr_value_truncate() and pass
it down to ocfs2_xattr_shrink_size(). We can also pass it into
ocfs2_xattr_extend_allocation(), replacing its ocfs2_xattr_value_buf.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 41 +++++++++++++++++------------------------
1 files changed, 17 insertions(+), 24 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 3b059cf..4ce8019 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -573,34 +573,28 @@ int ocfs2_calc_xattr_init(struct inode *dir,

static int ocfs2_xattr_extend_allocation(struct inode *inode,
u32 clusters_to_add,
- struct buffer_head *xattr_bh,
- struct ocfs2_xattr_value_root *xv,
+ struct ocfs2_xattr_value_buf *vb,
struct ocfs2_xattr_set_ctxt *ctxt)
{
int status = 0;
handle_t *handle = ctxt->handle;
enum ocfs2_alloc_restarted why;
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
- struct ocfs2_xattr_value_buf vb = {
- .vb_bh = xattr_bh,
- .vb_xv = xv,
- .vb_access = ocfs2_journal_access,
- };
- u32 prev_clusters, logical_start = le32_to_cpu(vb.vb_xv->xr_clusters);
+ u32 prev_clusters, logical_start = le32_to_cpu(vb->vb_xv->xr_clusters);
struct ocfs2_extent_tree et;

mlog(0, "(clusters_to_add for xattr= %u)\n", clusters_to_add);

- ocfs2_init_xattr_value_extent_tree(&et, inode, &vb);
+ ocfs2_init_xattr_value_extent_tree(&et, inode, vb);

- status = vb.vb_access(handle, inode, vb.vb_bh,
+ status = vb->vb_access(handle, inode, vb->vb_bh,
OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
mlog_errno(status);
goto leave;
}

- prev_clusters = le32_to_cpu(vb.vb_xv->xr_clusters);
+ prev_clusters = le32_to_cpu(vb->vb_xv->xr_clusters);
status = ocfs2_add_clusters_in_btree(osb,
inode,
&logical_start,
@@ -616,13 +610,13 @@ static int ocfs2_xattr_extend_allocation(struct inode *inode,
goto leave;
}

- status = ocfs2_journal_dirty(handle, vb.vb_bh);
+ status = ocfs2_journal_dirty(handle, vb->vb_bh);
if (status < 0) {
mlog_errno(status);
goto leave;
}

- clusters_to_add -= le32_to_cpu(vb.vb_xv->xr_clusters) - prev_clusters;
+ clusters_to_add -= le32_to_cpu(vb->vb_xv->xr_clusters) - prev_clusters;

/*
* We should have already allocated enough space before the transaction,
@@ -680,18 +674,12 @@ out:
static int ocfs2_xattr_shrink_size(struct inode *inode,
u32 old_clusters,
u32 new_clusters,
- struct buffer_head *root_bh,
- struct ocfs2_xattr_value_root *xv,
+ struct ocfs2_xattr_value_buf *vb,
struct ocfs2_xattr_set_ctxt *ctxt)
{
int ret = 0;
u32 trunc_len, cpos, phys_cpos, alloc_size;
u64 block;
- struct ocfs2_xattr_value_buf vb = {
- .vb_bh = root_bh,
- .vb_xv = xv,
- .vb_access = ocfs2_journal_access,
- };

if (old_clusters <= new_clusters)
return 0;
@@ -701,7 +689,7 @@ static int ocfs2_xattr_shrink_size(struct inode *inode,
while (trunc_len) {
ret = ocfs2_xattr_get_clusters(inode, cpos, &phys_cpos,
&alloc_size,
- &vb.vb_xv->xr_list);
+ &vb->vb_xv->xr_list);
if (ret) {
mlog_errno(ret);
goto out;
@@ -710,7 +698,7 @@ static int ocfs2_xattr_shrink_size(struct inode *inode,
if (alloc_size > trunc_len)
alloc_size = trunc_len;

- ret = __ocfs2_remove_xattr_range(inode, &vb, cpos,
+ ret = __ocfs2_remove_xattr_range(inode, vb, cpos,
phys_cpos, alloc_size,
ctxt);
if (ret) {
@@ -738,6 +726,11 @@ static int ocfs2_xattr_value_truncate(struct inode *inode,
int ret;
u32 new_clusters = ocfs2_clusters_for_bytes(inode->i_sb, len);
u32 old_clusters = le32_to_cpu(xv->xr_clusters);
+ struct ocfs2_xattr_value_buf vb = {
+ .vb_bh = root_bh,
+ .vb_xv = xv,
+ .vb_access = ocfs2_journal_access,
+ };

if (new_clusters == old_clusters)
return 0;
@@ -745,11 +738,11 @@ static int ocfs2_xattr_value_truncate(struct inode *inode,
if (new_clusters > old_clusters)
ret = ocfs2_xattr_extend_allocation(inode,
new_clusters - old_clusters,
- root_bh, xv, ctxt);
+ &vb, ctxt);
else
ret = ocfs2_xattr_shrink_size(inode,
old_clusters, new_clusters,
- root_bh, xv, ctxt);
+ &vb, ctxt);

return ret;
}
--
1.5.6

2008-12-25 18:10:13

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 14/35] ocfs2: Pass value buf to ocfs2_xattr_update_entry().

From: Joel Becker <[email protected]>

ocfs2_xattr_update_entry() updates the entry portion of an xattr buffer.
This can be part of multiple metadata block types, so pass the buffer in
via an ocfs2_xattr_value_buf.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 10 ++++++----
1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 409f9ee..6a05612 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -1282,12 +1282,13 @@ static int ocfs2_xattr_update_entry(struct inode *inode,
handle_t *handle,
struct ocfs2_xattr_info *xi,
struct ocfs2_xattr_search *xs,
+ struct ocfs2_xattr_value_buf *vb,
size_t offs)
{
int ret;

- ret = ocfs2_journal_access(handle, inode, xs->xattr_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = vb->vb_access(handle, inode, vb->vb_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
goto out;
@@ -1301,7 +1302,7 @@ static int ocfs2_xattr_update_entry(struct inode *inode,
ocfs2_xattr_set_local(xs->here, 0);
ocfs2_xattr_hash_entry(inode, xs->header, xs->here);

- ret = ocfs2_journal_dirty(handle, xs->xattr_bh);
+ ret = ocfs2_journal_dirty(handle, vb->vb_bh);
if (ret < 0)
mlog_errno(ret);
out:
@@ -1345,7 +1346,7 @@ static int ocfs2_xattr_set_value_outside(struct inode *inode,
mlog_errno(ret);
return ret;
}
- ret = ocfs2_xattr_update_entry(inode, ctxt->handle, xi, xs, offs);
+ ret = ocfs2_xattr_update_entry(inode, ctxt->handle, xi, xs, &vb, offs);
if (ret < 0) {
mlog_errno(ret);
return ret;
@@ -1574,6 +1575,7 @@ static int ocfs2_xattr_set_entry(struct inode *inode,
handle,
xi,
xs,
+ &vb,
offs);
if (ret < 0) {
mlog_errno(ret);
--
1.5.6

2008-12-25 18:10:47

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 15/35] ocfs2: Use ocfs2_xattr_value_buf in ocfs2_xattr_set_entry().

From: Joel Becker <[email protected]>

ocfs2_xattr_set_entry is the function that knows what type of block it
is setting into. This is what we wanted from ocfs2_xattr_value_buf.
Plus, moving the value buf up into ocfs2_xattr_set_entry() allows us to
pass it into ocfs2_xattr_set_value_outside() and ocfs2_xattr_cleanup().

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 53 +++++++++++++++++++++++++++++------------------------
1 files changed, 29 insertions(+), 24 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 6a05612..c08b5e8 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -1252,6 +1252,7 @@ static int ocfs2_xattr_cleanup(struct inode *inode,
handle_t *handle,
struct ocfs2_xattr_info *xi,
struct ocfs2_xattr_search *xs,
+ struct ocfs2_xattr_value_buf *vb,
size_t offs)
{
int ret = 0;
@@ -1259,8 +1260,8 @@ static int ocfs2_xattr_cleanup(struct inode *inode,
void *val = xs->base + offs;
size_t size = OCFS2_XATTR_SIZE(name_len) + OCFS2_XATTR_ROOT_SIZE;

- ret = ocfs2_journal_access(handle, inode, xs->xattr_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = vb->vb_access(handle, inode, vb->vb_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
goto out;
@@ -1271,7 +1272,7 @@ static int ocfs2_xattr_cleanup(struct inode *inode,
memset((void *)xs->here, 0, sizeof(struct ocfs2_xattr_entry));
memset(val, 0, size);

- ret = ocfs2_journal_dirty(handle, xs->xattr_bh);
+ ret = ocfs2_journal_dirty(handle, vb->vb_bh);
if (ret < 0)
mlog_errno(ret);
out:
@@ -1318,6 +1319,7 @@ static int ocfs2_xattr_set_value_outside(struct inode *inode,
struct ocfs2_xattr_info *xi,
struct ocfs2_xattr_search *xs,
struct ocfs2_xattr_set_ctxt *ctxt,
+ struct ocfs2_xattr_value_buf *vb,
size_t offs)
{
size_t name_len = strlen(xi->name);
@@ -1325,10 +1327,6 @@ static int ocfs2_xattr_set_value_outside(struct inode *inode,
struct ocfs2_xattr_value_root *xv = NULL;
size_t size = OCFS2_XATTR_SIZE(name_len) + OCFS2_XATTR_ROOT_SIZE;
int ret = 0;
- struct ocfs2_xattr_value_buf vb = {
- .vb_bh = xs->xattr_bh,
- .vb_access = ocfs2_journal_access
- };

memset(val, 0, size);
memcpy(val, xi->name, name_len);
@@ -1339,19 +1337,19 @@ static int ocfs2_xattr_set_value_outside(struct inode *inode,
xv->xr_list.l_tree_depth = 0;
xv->xr_list.l_count = cpu_to_le16(1);
xv->xr_list.l_next_free_rec = 0;
- vb.vb_xv = xv;
+ vb->vb_xv = xv;

- ret = ocfs2_xattr_value_truncate(inode, &vb, xi->value_len, ctxt);
+ ret = ocfs2_xattr_value_truncate(inode, vb, xi->value_len, ctxt);
if (ret < 0) {
mlog_errno(ret);
return ret;
}
- ret = ocfs2_xattr_update_entry(inode, ctxt->handle, xi, xs, &vb, offs);
+ ret = ocfs2_xattr_update_entry(inode, ctxt->handle, xi, xs, vb, offs);
if (ret < 0) {
mlog_errno(ret);
return ret;
}
- ret = __ocfs2_xattr_set_value_outside(inode, ctxt->handle, vb.vb_xv,
+ ret = __ocfs2_xattr_set_value_outside(inode, ctxt->handle, vb->vb_xv,
xi->value, xi->value_len);
if (ret < 0)
mlog_errno(ret);
@@ -1488,6 +1486,16 @@ static int ocfs2_xattr_set_entry(struct inode *inode,
.value = xi->value,
.value_len = xi->value_len,
};
+ struct ocfs2_xattr_value_buf vb = {
+ .vb_bh = xs->xattr_bh,
+ .vb_access = ocfs2_journal_access_di,
+ };
+
+ if (!(flag & OCFS2_INLINE_XATTR_FL)) {
+ BUG_ON(xs->xattr_bh == xs->inode_bh);
+ vb.vb_access = ocfs2_journal_access_xb;
+ } else
+ BUG_ON(xs->xattr_bh != xs->inode_bh);

/* Compute min_offs, last and free space. */
last = xs->header->xh_entries;
@@ -1543,18 +1551,14 @@ static int ocfs2_xattr_set_entry(struct inode *inode,
if (ocfs2_xattr_is_local(xs->here) && size == size_l) {
/* Replace existing local xattr with tree root */
ret = ocfs2_xattr_set_value_outside(inode, xi, xs,
- ctxt, offs);
+ ctxt, &vb, offs);
if (ret < 0)
mlog_errno(ret);
goto out;
} else if (!ocfs2_xattr_is_local(xs->here)) {
/* For existing xattr which has value outside */
- struct ocfs2_xattr_value_buf vb = {
- .vb_bh = xs->xattr_bh,
- .vb_xv = (struct ocfs2_xattr_value_root *)
- (val + OCFS2_XATTR_SIZE(name_len)),
- .vb_access = ocfs2_journal_access,
- };
+ vb.vb_xv = (struct ocfs2_xattr_value_root *)
+ (val + OCFS2_XATTR_SIZE(name_len));

if (xi->value_len > OCFS2_XATTR_INLINE_SIZE) {
/*
@@ -1605,16 +1609,16 @@ static int ocfs2_xattr_set_entry(struct inode *inode,
}
}

- ret = ocfs2_journal_access(handle, inode, xs->inode_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_journal_access_di(handle, inode, xs->inode_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
goto out;
}

if (!(flag & OCFS2_INLINE_XATTR_FL)) {
- ret = ocfs2_journal_access(handle, inode, xs->xattr_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = vb.vb_access(handle, inode, vb.vb_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
goto out;
@@ -1674,7 +1678,8 @@ static int ocfs2_xattr_set_entry(struct inode *inode,
* This is the second step for value size > INLINE_SIZE.
*/
size_t offs = le16_to_cpu(xs->here->xe_name_offset);
- ret = ocfs2_xattr_set_value_outside(inode, xi, xs, ctxt, offs);
+ ret = ocfs2_xattr_set_value_outside(inode, xi, xs, ctxt,
+ &vb, offs);
if (ret < 0) {
int ret2;

@@ -1684,7 +1689,7 @@ static int ocfs2_xattr_set_entry(struct inode *inode,
* the junk tree root we have already set in local.
*/
ret2 = ocfs2_xattr_cleanup(inode, ctxt->handle,
- xi, xs, offs);
+ xi, xs, &vb, offs);
if (ret2 < 0)
mlog_errno(ret2);
}
--
1.5.6

2008-12-25 18:11:03

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 16/35] ocfs2: Pass value buf to ocfs2_remove_value_outside().

From: Joel Becker <[email protected]>

ocfs2_remove_value_outside() needs to know the type of buffer it is
looking at. Pass in an ocfs2_xattr_value_buf.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 22 +++++++++++++---------
1 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index c08b5e8..d2760e6 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -1699,7 +1699,7 @@ out:
}

static int ocfs2_remove_value_outside(struct inode*inode,
- struct buffer_head *bh,
+ struct ocfs2_xattr_value_buf *vb,
struct ocfs2_xattr_header *header)
{
int ret = 0, i;
@@ -1720,17 +1720,13 @@ static int ocfs2_remove_value_outside(struct inode*inode,
struct ocfs2_xattr_entry *entry = &header->xh_entries[i];

if (!ocfs2_xattr_is_local(entry)) {
- struct ocfs2_xattr_value_buf vb = {
- .vb_bh = bh,
- .vb_access = ocfs2_journal_access,
- };
void *val;

val = (void *)header +
le16_to_cpu(entry->xe_name_offset);
- vb.vb_xv = (struct ocfs2_xattr_value_root *)
+ vb->vb_xv = (struct ocfs2_xattr_value_root *)
(val + OCFS2_XATTR_SIZE(entry->xe_name_len));
- ret = ocfs2_xattr_value_truncate(inode, &vb, 0, &ctxt);
+ ret = ocfs2_xattr_value_truncate(inode, vb, 0, &ctxt);
if (ret < 0) {
mlog_errno(ret);
break;
@@ -1752,12 +1748,16 @@ static int ocfs2_xattr_ibody_remove(struct inode *inode,
struct ocfs2_dinode *di = (struct ocfs2_dinode *)di_bh->b_data;
struct ocfs2_xattr_header *header;
int ret;
+ struct ocfs2_xattr_value_buf vb = {
+ .vb_bh = di_bh,
+ .vb_access = ocfs2_journal_access_di,
+ };

header = (struct ocfs2_xattr_header *)
((void *)di + inode->i_sb->s_blocksize -
le16_to_cpu(di->i_xattr_inline_size));

- ret = ocfs2_remove_value_outside(inode, di_bh, header);
+ ret = ocfs2_remove_value_outside(inode, &vb, header);

return ret;
}
@@ -1767,11 +1767,15 @@ static int ocfs2_xattr_block_remove(struct inode *inode,
{
struct ocfs2_xattr_block *xb;
int ret = 0;
+ struct ocfs2_xattr_value_buf vb = {
+ .vb_bh = blk_bh,
+ .vb_access = ocfs2_journal_access_xb,
+ };

xb = (struct ocfs2_xattr_block *)blk_bh->b_data;
if (!(le16_to_cpu(xb->xb_flags) & OCFS2_XATTR_INDEXED)) {
struct ocfs2_xattr_header *header = &(xb->xb_attrs.xb_header);
- ret = ocfs2_remove_value_outside(inode, blk_bh, header);
+ ret = ocfs2_remove_value_outside(inode, &vb, header);
} else
ret = ocfs2_delete_xattr_index_block(inode, blk_bh);

--
1.5.6

2008-12-25 18:11:28

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 17/35] ocfs2: Use proper journal_access function in xattr.c

From: Joel Becker <[email protected]>

Change the rest of the naked ocfs2_journal_access() calls in
fs/ocfs2/xattr.c to use the appropriate ocfs2_journal_access_*() call
for their metadata type.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 24 ++++++++++++------------
1 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index d2760e6..17028aa 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -1894,8 +1894,8 @@ int ocfs2_xattr_remove(struct inode *inode, struct buffer_head *di_bh)
mlog_errno(ret);
goto out;
}
- ret = ocfs2_journal_access(handle, inode, di_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_journal_access_di(handle, inode, di_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
goto out_commit;
@@ -2103,8 +2103,8 @@ static int ocfs2_xattr_block_set(struct inode *inode,
int ret;

if (!xs->xattr_bh) {
- ret = ocfs2_journal_access(handle, inode, xs->inode_bh,
- OCFS2_JOURNAL_ACCESS_CREATE);
+ ret = ocfs2_journal_access_di(handle, inode, xs->inode_bh,
+ OCFS2_JOURNAL_ACCESS_CREATE);
if (ret < 0) {
mlog_errno(ret);
goto end;
@@ -2121,8 +2121,8 @@ static int ocfs2_xattr_block_set(struct inode *inode,
new_bh = sb_getblk(inode->i_sb, first_blkno);
ocfs2_set_new_buffer_uptodate(inode, new_bh);

- ret = ocfs2_journal_access(handle, inode, new_bh,
- OCFS2_JOURNAL_ACCESS_CREATE);
+ ret = ocfs2_journal_access_xb(handle, inode, new_bh,
+ OCFS2_JOURNAL_ACCESS_CREATE);
if (ret < 0) {
mlog_errno(ret);
goto end;
@@ -3377,8 +3377,8 @@ static int ocfs2_xattr_create_index_block(struct inode *inode,
*/
down_write(&oi->ip_alloc_sem);

- ret = ocfs2_journal_access(handle, inode, xb_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_journal_access_xb(handle, inode, xb_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
goto out;
@@ -4216,8 +4216,8 @@ static int ocfs2_add_new_xattr_cluster(struct inode *inode,

ocfs2_init_xattr_tree_extent_tree(&et, inode, root_bh);

- ret = ocfs2_journal_access(handle, inode, root_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_journal_access_xb(handle, inode, root_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret < 0) {
mlog_errno(ret);
goto leave;
@@ -4808,8 +4808,8 @@ static int ocfs2_rm_xattr_cluster(struct inode *inode,
goto out;
}

- ret = ocfs2_journal_access(handle, inode, root_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_journal_access_xb(handle, inode, root_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
goto out_commit;
--
1.5.6

2008-12-25 18:11:43

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 18/35] ocfs2: Add directory block trailers.

Future ocfs2 features metaecc and indexed directories need to store a
little bit of data in each dirblock. For compatibility, we place this
in a trailer at the end of the dirblock. The trailer plays itself as an
empty dirent, so that if the features are turned off, it can be reused
without requiring a tunefs scan.

This code adds the trailer and validates it when the block is read in.

[ Mark is the original author, but I reinserted this code before his
dir index work. -- Joel ]

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/dir.c | 197 +++++++++++++++++++++++++++++++++++++++++++++++----
fs/ocfs2/ocfs2.h | 3 +
fs/ocfs2/ocfs2_fs.h | 29 ++++++++
3 files changed, 215 insertions(+), 14 deletions(-)

diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c
index 45e4e03..1efd0ab 100644
--- a/fs/ocfs2/dir.c
+++ b/fs/ocfs2/dir.c
@@ -84,6 +84,63 @@ static int ocfs2_do_extend_dir(struct super_block *sb,
struct buffer_head **new_bh);

/*
+ * These are distinct checks because future versions of the file system will
+ * want to have a trailing dirent structure independent of indexing.
+ */
+static int ocfs2_dir_has_trailer(struct inode *dir)
+{
+ if (OCFS2_I(dir)->ip_dyn_features & OCFS2_INLINE_DATA_FL)
+ return 0;
+
+ return ocfs2_meta_ecc(OCFS2_SB(dir->i_sb));
+}
+
+static int ocfs2_supports_dir_trailer(struct ocfs2_super *osb)
+{
+ return ocfs2_meta_ecc(osb);
+}
+
+static inline unsigned int ocfs2_dir_trailer_blk_off(struct super_block *sb)
+{
+ return sb->s_blocksize - sizeof(struct ocfs2_dir_block_trailer);
+}
+
+#define ocfs2_trailer_from_bh(_bh, _sb) ((struct ocfs2_dir_block_trailer *) ((_bh)->b_data + ocfs2_dir_trailer_blk_off((_sb))))
+
+/*
+ * XXX: This is executed once on every dirent. We should consider optimizing
+ * it.
+ */
+static int ocfs2_skip_dir_trailer(struct inode *dir,
+ struct ocfs2_dir_entry *de,
+ unsigned long offset,
+ unsigned long blklen)
+{
+ unsigned long toff = blklen - sizeof(struct ocfs2_dir_block_trailer);
+
+ if (!ocfs2_dir_has_trailer(dir))
+ return 0;
+
+ if (offset != toff)
+ return 0;
+
+ return 1;
+}
+
+static void ocfs2_init_dir_trailer(struct inode *inode,
+ struct buffer_head *bh)
+{
+ struct ocfs2_dir_block_trailer *trailer;
+
+ trailer = ocfs2_trailer_from_bh(bh, inode->i_sb);
+ strcpy(trailer->db_signature, OCFS2_DIR_TRAILER_SIGNATURE);
+ trailer->db_compat_rec_len =
+ cpu_to_le16(sizeof(struct ocfs2_dir_block_trailer));
+ trailer->db_parent_dinode = cpu_to_le64(OCFS2_I(inode)->ip_blkno);
+ trailer->db_blkno = cpu_to_le64(bh->b_blocknr);
+}
+
+/*
* bh passed here can be an inode block or a dir data block, depending
* on the inode inline data flag.
*/
@@ -232,16 +289,60 @@ static int ocfs2_read_dir_block(struct inode *inode, u64 v_block,
{
int rc = 0;
struct buffer_head *tmp = *bh;
+ struct ocfs2_dir_block_trailer *trailer;

rc = ocfs2_read_virt_blocks(inode, v_block, 1, &tmp, flags,
ocfs2_validate_dir_block);
- if (rc)
+ if (rc) {
mlog_errno(rc);
+ goto out;
+ }
+
+ /*
+ * We check the trailer here rather than in
+ * ocfs2_validate_dir_block() because that function doesn't have
+ * the inode to test.
+ */
+ if (!(flags & OCFS2_BH_READAHEAD) &&
+ ocfs2_dir_has_trailer(inode)) {
+ trailer = ocfs2_trailer_from_bh(tmp, inode->i_sb);
+ if (!OCFS2_IS_VALID_DIR_TRAILER(trailer)) {
+ rc = -EINVAL;
+ ocfs2_error(inode->i_sb,
+ "Invalid dirblock #%llu: "
+ "signature = %.*s\n",
+ (unsigned long long)tmp->b_blocknr, 7,
+ trailer->db_signature);
+ goto out;
+ }
+ if (le64_to_cpu(trailer->db_blkno) != tmp->b_blocknr) {
+ rc = -EINVAL;
+ ocfs2_error(inode->i_sb,
+ "Directory block #%llu has an invalid "
+ "db_blkno of %llu",
+ (unsigned long long)tmp->b_blocknr,
+ (unsigned long long)le64_to_cpu(trailer->db_blkno));
+ goto out;
+ }
+ if (le64_to_cpu(trailer->db_parent_dinode) !=
+ OCFS2_I(inode)->ip_blkno) {
+ rc = -EINVAL;
+ ocfs2_error(inode->i_sb,
+ "Directory block #%llu on dinode "
+ "#%llu has an invalid parent_dinode "
+ "of %llu",
+ (unsigned long long)tmp->b_blocknr,
+ (unsigned long long)OCFS2_I(inode)->ip_blkno,
+ (unsigned long long)le64_to_cpu(trailer->db_blkno));
+ goto out;
+ }
+ }

/* If ocfs2_read_virt_blocks() got us a new bh, pass it up. */
- if (!rc && !*bh)
+ if (!*bh)
*bh = tmp;

+out:
return rc ? -EIO : 0;
}

@@ -581,6 +682,16 @@ int __ocfs2_add_entry(handle_t *handle,
goto bail;
}

+ /* We're guaranteed that we should have space, so we
+ * can't possibly have hit the trailer...right? */
+ mlog_bug_on_msg(ocfs2_skip_dir_trailer(dir, de, offset, size),
+ "Hit dir trailer trying to insert %.*s "
+ "(namelen %d) into directory %llu. "
+ "offset is %lu, trailer offset is %d\n",
+ namelen, name, namelen,
+ (unsigned long long)parent_fe_bh->b_blocknr,
+ offset, ocfs2_dir_trailer_blk_off(dir->i_sb));
+
if (ocfs2_dirent_would_fit(de, rec_len)) {
dir->i_mtime = dir->i_ctime = CURRENT_TIME;
retval = ocfs2_mark_inode_dirty(handle, dir, parent_fe_bh);
@@ -622,6 +733,7 @@ int __ocfs2_add_entry(handle_t *handle,
retval = 0;
goto bail;
}
+
offset += le16_to_cpu(de->rec_len);
de = (struct ocfs2_dir_entry *) ((char *) de + le16_to_cpu(de->rec_len));
}
@@ -1059,9 +1171,15 @@ int ocfs2_empty_dir(struct inode *inode)
return !priv.seen_other;
}

-static void ocfs2_fill_initial_dirents(struct inode *inode,
- struct inode *parent,
- char *start, unsigned int size)
+/*
+ * Fills "." and ".." dirents in a new directory block. Returns dirent for
+ * "..", which might be used during creation of a directory with a trailing
+ * header. It is otherwise safe to ignore the return code.
+ */
+static struct ocfs2_dir_entry *ocfs2_fill_initial_dirents(struct inode *inode,
+ struct inode *parent,
+ char *start,
+ unsigned int size)
{
struct ocfs2_dir_entry *de = (struct ocfs2_dir_entry *)start;

@@ -1078,6 +1196,8 @@ static void ocfs2_fill_initial_dirents(struct inode *inode,
de->name_len = 2;
strcpy(de->name, "..");
ocfs2_set_de_type(de, S_IFDIR);
+
+ return de;
}

/*
@@ -1130,10 +1250,15 @@ static int ocfs2_fill_new_dir_el(struct ocfs2_super *osb,
struct ocfs2_alloc_context *data_ac)
{
int status;
+ unsigned int size = osb->sb->s_blocksize;
struct buffer_head *new_bh = NULL;
+ struct ocfs2_dir_entry *de;

mlog_entry_void();

+ if (ocfs2_supports_dir_trailer(osb))
+ size = ocfs2_dir_trailer_blk_off(parent->i_sb);
+
status = ocfs2_do_extend_dir(osb->sb, handle, inode, fe_bh,
data_ac, NULL, &new_bh);
if (status < 0) {
@@ -1151,8 +1276,9 @@ static int ocfs2_fill_new_dir_el(struct ocfs2_super *osb,
}
memset(new_bh->b_data, 0, osb->sb->s_blocksize);

- ocfs2_fill_initial_dirents(inode, parent, new_bh->b_data,
- osb->sb->s_blocksize);
+ de = ocfs2_fill_initial_dirents(inode, parent, new_bh->b_data, size);
+ if (ocfs2_supports_dir_trailer(osb))
+ ocfs2_init_dir_trailer(inode, new_bh);

status = ocfs2_journal_dirty(handle, new_bh);
if (status < 0) {
@@ -1193,13 +1319,27 @@ int ocfs2_fill_new_dir(struct ocfs2_super *osb,
data_ac);
}

+/*
+ * Expand rec_len of the rightmost dirent in a directory block so that it
+ * contains the end of our valid space for dirents. We do this during
+ * expansion from an inline directory to one with extents. The first dir block
+ * in that case is taken from the inline data portion of the inode block.
+ *
+ * We add the dir trailer if this filesystem wants it.
+ */
static void ocfs2_expand_last_dirent(char *start, unsigned int old_size,
- unsigned int new_size)
+ struct super_block *sb)
{
struct ocfs2_dir_entry *de;
struct ocfs2_dir_entry *prev_de;
char *de_buf, *limit;
- unsigned int bytes = new_size - old_size;
+ unsigned int new_size = sb->s_blocksize;
+ unsigned int bytes;
+
+ if (ocfs2_supports_dir_trailer(OCFS2_SB(sb)))
+ new_size = ocfs2_dir_trailer_blk_off(sb);
+
+ bytes = new_size - old_size;

limit = start + old_size;
de_buf = start;
@@ -1316,8 +1456,9 @@ static int ocfs2_expand_inline_dir(struct inode *dir, struct buffer_head *di_bh,
memcpy(dirdata_bh->b_data, di->id2.i_data.id_data, i_size_read(dir));
memset(dirdata_bh->b_data + i_size_read(dir), 0,
sb->s_blocksize - i_size_read(dir));
- ocfs2_expand_last_dirent(dirdata_bh->b_data, i_size_read(dir),
- sb->s_blocksize);
+ ocfs2_expand_last_dirent(dirdata_bh->b_data, i_size_read(dir), sb);
+ if (ocfs2_supports_dir_trailer(osb))
+ ocfs2_init_dir_trailer(dir, dirdata_bh);

ret = ocfs2_journal_dirty(handle, dirdata_bh);
if (ret) {
@@ -1604,9 +1745,15 @@ do_extend:
goto bail;
}
memset(new_bh->b_data, 0, sb->s_blocksize);
+
de = (struct ocfs2_dir_entry *) new_bh->b_data;
de->inode = 0;
- de->rec_len = cpu_to_le16(sb->s_blocksize);
+ if (ocfs2_dir_has_trailer(dir)) {
+ de->rec_len = cpu_to_le16(ocfs2_dir_trailer_blk_off(sb));
+ ocfs2_init_dir_trailer(dir, new_bh);
+ } else {
+ de->rec_len = cpu_to_le16(sb->s_blocksize);
+ }
status = ocfs2_journal_dirty(handle, new_bh);
if (status < 0) {
mlog_errno(status);
@@ -1648,11 +1795,21 @@ static int ocfs2_find_dir_space_id(struct inode *dir, struct buffer_head *di_bh,
unsigned int *blocks_wanted)
{
int ret;
+ struct super_block *sb = dir->i_sb;
struct ocfs2_dinode *di = (struct ocfs2_dinode *)di_bh->b_data;
struct ocfs2_dir_entry *de, *last_de = NULL;
char *de_buf, *limit;
unsigned long offset = 0;
- unsigned int rec_len, new_rec_len;
+ unsigned int rec_len, new_rec_len, free_space = dir->i_sb->s_blocksize;
+
+ /*
+ * This calculates how many free bytes we'd have in block zero, should
+ * this function force expansion to an extent tree.
+ */
+ if (ocfs2_supports_dir_trailer(OCFS2_SB(sb)))
+ free_space = ocfs2_dir_trailer_blk_off(sb) - i_size_read(dir);
+ else
+ free_space = dir->i_sb->s_blocksize - i_size_read(dir);

de_buf = di->id2.i_data.id_data;
limit = de_buf + i_size_read(dir);
@@ -1669,6 +1826,11 @@ static int ocfs2_find_dir_space_id(struct inode *dir, struct buffer_head *di_bh,
ret = -EEXIST;
goto out;
}
+ /*
+ * No need to check for a trailing dirent record here as
+ * they're not used for inline dirs.
+ */
+
if (ocfs2_dirent_would_fit(de, rec_len)) {
/* Ok, we found a spot. Return this bh and let
* the caller actually fill it in. */
@@ -1689,7 +1851,7 @@ static int ocfs2_find_dir_space_id(struct inode *dir, struct buffer_head *di_bh,
* dirent can be found.
*/
*blocks_wanted = 1;
- new_rec_len = le16_to_cpu(last_de->rec_len) + (dir->i_sb->s_blocksize - i_size_read(dir));
+ new_rec_len = le16_to_cpu(last_de->rec_len) + free_space;
if (new_rec_len < (rec_len + OCFS2_DIR_REC_LEN(last_de->name_len)))
*blocks_wanted = 2;

@@ -1707,6 +1869,7 @@ static int ocfs2_find_dir_space_el(struct inode *dir, const char *name,
struct ocfs2_dir_entry *de;
struct super_block *sb = dir->i_sb;
int status;
+ int blocksize = dir->i_sb->s_blocksize;

status = ocfs2_read_dir_block(dir, 0, &bh, 0);
if (status) {
@@ -1748,6 +1911,11 @@ static int ocfs2_find_dir_space_el(struct inode *dir, const char *name,
status = -EEXIST;
goto bail;
}
+
+ if (ocfs2_skip_dir_trailer(dir, de, offset % blocksize,
+ blocksize))
+ goto next;
+
if (ocfs2_dirent_would_fit(de, rec_len)) {
/* Ok, we found a spot. Return this bh and let
* the caller actually fill it in. */
@@ -1756,6 +1924,7 @@ static int ocfs2_find_dir_space_el(struct inode *dir, const char *name,
status = 0;
goto bail;
}
+next:
offset += le16_to_cpu(de->rec_len);
de = (struct ocfs2_dir_entry *)((char *) de + le16_to_cpu(de->rec_len));
}
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index bad87d0..ad5c24a 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -470,6 +470,9 @@ static inline int ocfs2_uses_extended_slot_map(struct ocfs2_super *osb)
#define OCFS2_IS_VALID_XATTR_BLOCK(ptr) \
(!strcmp((ptr)->xb_signature, OCFS2_XATTR_BLOCK_SIGNATURE))

+#define OCFS2_IS_VALID_DIR_TRAILER(ptr) \
+ (!strcmp((ptr)->db_signature, OCFS2_DIR_TRAILER_SIGNATURE))
+
static inline unsigned long ino_from_blkno(struct super_block *sb,
u64 blkno)
{
diff --git a/fs/ocfs2/ocfs2_fs.h b/fs/ocfs2/ocfs2_fs.h
index 290fa26..af0013b 100644
--- a/fs/ocfs2/ocfs2_fs.h
+++ b/fs/ocfs2/ocfs2_fs.h
@@ -65,6 +65,7 @@
#define OCFS2_EXTENT_BLOCK_SIGNATURE "EXBLK01"
#define OCFS2_GROUP_DESC_SIGNATURE "GROUP01"
#define OCFS2_XATTR_BLOCK_SIGNATURE "XATTR01"
+#define OCFS2_DIR_TRAILER_SIGNATURE "DIRTRL1"

/* Compatibility flags */
#define OCFS2_HAS_COMPAT_FEATURE(sb,mask) \
@@ -752,6 +753,34 @@ struct ocfs2_dir_entry {
} __attribute__ ((packed));

/*
+ * Per-block record for the unindexed directory btree. This is carefully
+ * crafted so that the rec_len and name_len records of an ocfs2_dir_entry are
+ * mirrored. That way, the directory manipulation code needs a minimal amount
+ * of update.
+ *
+ * NOTE: Keep this structure aligned to a multiple of 4 bytes.
+ */
+struct ocfs2_dir_block_trailer {
+/*00*/ __le64 db_compat_inode; /* Always zero. Was inode */
+
+ __le16 db_compat_rec_len; /* Backwards compatible with
+ * ocfs2_dir_entry. */
+ __u8 db_compat_name_len; /* Always zero. Was name_len */
+ __u8 db_reserved0;
+ __le16 db_reserved1;
+ __le16 db_free_rec_len; /* Size of largest empty hole
+ * in this block. (unused) */
+/*10*/ __u8 db_signature[8]; /* Signature for verification */
+ __le64 db_reserved2;
+ __le64 db_free_next; /* Next block in list (unused) */
+/*20*/ __le64 db_blkno; /* Offset on disk, in blocks */
+ __le64 db_parent_dinode; /* dinode which owns me, in
+ blocks */
+/*30*/ __le64 db_check; /* Error checking */
+/*40*/
+};
+
+/*
* On disk allocator group structure for OCFS2
*/
struct ocfs2_group_desc
--
1.5.6

2008-12-25 18:12:00

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 19/35] ocfs2: Checksum and ECC for directory blocks.

From: Joel Becker <[email protected]>

Use the db_check field of ocfs2_dir_block_trailer to crc/ecc the
dirblocks.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/dir.c | 37 +++++++++++++++++++++++++++++++++++--
fs/ocfs2/dir.h | 2 ++
fs/ocfs2/journal.c | 31 +++++++++++++++++++++++++++++--
fs/ocfs2/ocfs2_fs.h | 2 +-
4 files changed, 67 insertions(+), 5 deletions(-)

diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c
index 1efd0ab..f2c4098 100644
--- a/fs/ocfs2/dir.c
+++ b/fs/ocfs2/dir.c
@@ -48,6 +48,7 @@
#include "ocfs2.h"

#include "alloc.h"
+#include "blockcheck.h"
#include "dir.h"
#include "dlmglue.h"
#include "extent_map.h"
@@ -107,6 +108,17 @@ static inline unsigned int ocfs2_dir_trailer_blk_off(struct super_block *sb)

#define ocfs2_trailer_from_bh(_bh, _sb) ((struct ocfs2_dir_block_trailer *) ((_bh)->b_data + ocfs2_dir_trailer_blk_off((_sb))))

+/* XXX ocfs2_block_dqtrailer() is similar but not quite - can we make
+ * them more consistent? */
+struct ocfs2_dir_block_trailer *ocfs2_dir_trailer_from_size(int blocksize,
+ void *data)
+{
+ char *p = data;
+
+ p += blocksize - sizeof(struct ocfs2_dir_block_trailer);
+ return (struct ocfs2_dir_block_trailer *)p;
+}
+
/*
* XXX: This is executed once on every dirent. We should consider optimizing
* it.
@@ -268,14 +280,35 @@ out:
static int ocfs2_validate_dir_block(struct super_block *sb,
struct buffer_head *bh)
{
+ int rc;
+ struct ocfs2_dir_block_trailer *trailer =
+ ocfs2_trailer_from_bh(bh, sb);
+
+
/*
- * Nothing yet. We don't validate dirents here, that's handled
+ * We don't validate dirents here, that's handled
* in-place when the code walks them.
*/
mlog(0, "Validating dirblock %llu\n",
(unsigned long long)bh->b_blocknr);

- return 0;
+ BUG_ON(!buffer_uptodate(bh));
+
+ /*
+ * If the ecc fails, we return the error but otherwise
+ * leave the filesystem running. We know any error is
+ * local to this block.
+ *
+ * Note that we are safe to call this even if the directory
+ * doesn't have a trailer. Filesystems without metaecc will do
+ * nothing, and filesystems with it will have one.
+ */
+ rc = ocfs2_validate_meta_ecc(sb, bh->b_data, &trailer->db_check);
+ if (rc)
+ mlog(ML_ERROR, "Checksum failed for dinode %llu\n",
+ (unsigned long long)bh->b_blocknr);
+
+ return rc;
}

/*
diff --git a/fs/ocfs2/dir.h b/fs/ocfs2/dir.h
index ce48b90..c511e2e 100644
--- a/fs/ocfs2/dir.h
+++ b/fs/ocfs2/dir.h
@@ -83,4 +83,6 @@ int ocfs2_fill_new_dir(struct ocfs2_super *osb,
struct buffer_head *fe_bh,
struct ocfs2_alloc_context *data_ac);

+struct ocfs2_dir_block_trailer *ocfs2_dir_trailer_from_size(int blocksize,
+ void *data);
#endif /* OCFS2_DIR_H */
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index 3b54dba..57d7d25 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -415,6 +415,26 @@ static void ocfs2_dq_commit_trigger(struct jbd2_buffer_trigger_type *triggers,
ocfs2_block_check_compute(data, size, &dqt->dq_check);
}

+/*
+ * Directory blocks also have their own trigger because the
+ * struct ocfs2_block_check offset depends on the blocksize.
+ */
+static void ocfs2_db_commit_trigger(struct jbd2_buffer_trigger_type *triggers,
+ struct buffer_head *bh,
+ void *data, size_t size)
+{
+ struct ocfs2_dir_block_trailer *trailer =
+ ocfs2_dir_trailer_from_size(size, data);
+
+ /*
+ * We aren't guaranteed to have the superblock here, so we
+ * must unconditionally compute the ecc data.
+ * __ocfs2_journal_access() will only set the triggers if
+ * metaecc is enabled.
+ */
+ ocfs2_block_check_compute(data, size, &trailer->db_check);
+}
+
static void ocfs2_abort_trigger(struct jbd2_buffer_trigger_type *triggers,
struct buffer_head *bh)
{
@@ -454,6 +474,13 @@ static struct ocfs2_triggers gd_triggers = {
.ot_offset = offsetof(struct ocfs2_group_desc, bg_check),
};

+static struct ocfs2_triggers db_triggers = {
+ .ot_triggers = {
+ .t_commit = ocfs2_db_commit_trigger,
+ .t_abort = ocfs2_abort_trigger,
+ },
+};
+
static struct ocfs2_triggers xb_triggers = {
.ot_triggers = {
.t_commit = ocfs2_commit_trigger,
@@ -555,8 +582,8 @@ int ocfs2_journal_access_gd(handle_t *handle, struct inode *inode,
int ocfs2_journal_access_db(handle_t *handle, struct inode *inode,
struct buffer_head *bh, int type)
{
- /* Right now, nothing for dirblocks */
- return __ocfs2_journal_access(handle, inode, bh, NULL, type);
+ return __ocfs2_journal_access(handle, inode, bh, &db_triggers,
+ type);
}

int ocfs2_journal_access_xb(handle_t *handle, struct inode *inode,
diff --git a/fs/ocfs2/ocfs2_fs.h b/fs/ocfs2/ocfs2_fs.h
index af0013b..698ef3d 100644
--- a/fs/ocfs2/ocfs2_fs.h
+++ b/fs/ocfs2/ocfs2_fs.h
@@ -776,7 +776,7 @@ struct ocfs2_dir_block_trailer {
/*20*/ __le64 db_blkno; /* Offset on disk, in blocks */
__le64 db_parent_dinode; /* dinode which owns me, in
blocks */
-/*30*/ __le64 db_check; /* Error checking */
+/*30*/ struct ocfs2_block_check db_check; /* Error checking */
/*40*/
};

--
1.5.6

2008-12-25 18:12:23

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 20/35] ocfs2: Validate superblock with checksum and ecc.

From: Joel Becker <[email protected]>

The superblock is read via a raw call. Validate it after we find it
from its signature.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/super.c | 11 +++++++++++
1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index bc43138..a79e67b 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -52,6 +52,7 @@
#include "ocfs1_fs_compat.h"

#include "alloc.h"
+#include "blockcheck.h"
#include "dlmglue.h"
#include "export.h"
#include "extent_map.h"
@@ -1982,6 +1983,15 @@ static int ocfs2_verify_volume(struct ocfs2_dinode *di,

if (memcmp(di->i_signature, OCFS2_SUPER_BLOCK_SIGNATURE,
strlen(OCFS2_SUPER_BLOCK_SIGNATURE)) == 0) {
+ /* We have to do a raw check of the feature here */
+ if (le32_to_cpu(di->id2.i_super.s_feature_incompat) &
+ OCFS2_FEATURE_INCOMPAT_META_ECC) {
+ status = ocfs2_block_check_validate(bh->b_data,
+ bh->b_size,
+ &di->i_check);
+ if (status)
+ goto out;
+ }
status = -EINVAL;
if ((1 << le32_to_cpu(di->id2.i_super.s_blocksize_bits)) != blksz) {
mlog(ML_ERROR, "found superblock with incorrect block "
@@ -2023,6 +2033,7 @@ static int ocfs2_verify_volume(struct ocfs2_dinode *di,
}
}

+out:
mlog_exit(status);
return status;
}
--
1.5.6

2008-12-25 18:12:41

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 21/35] ocfs2: Enable metadata checksums.

From: Joel Becker <[email protected]>

Add OCFS2_FEATURE_INCOMPAT_META_ECC to the list of supported features.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/ocfs2_fs.h | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/ocfs2/ocfs2_fs.h b/fs/ocfs2/ocfs2_fs.h
index 698ef3d..c7ae45a 100644
--- a/fs/ocfs2/ocfs2_fs.h
+++ b/fs/ocfs2/ocfs2_fs.h
@@ -94,7 +94,8 @@
| OCFS2_FEATURE_INCOMPAT_INLINE_DATA \
| OCFS2_FEATURE_INCOMPAT_EXTENDED_SLOT_MAP \
| OCFS2_FEATURE_INCOMPAT_USERSPACE_STACK \
- | OCFS2_FEATURE_INCOMPAT_XATTR)
+ | OCFS2_FEATURE_INCOMPAT_XATTR \
+ | OCFS2_FEATURE_INCOMPAT_META_ECC)
#define OCFS2_FEATURE_RO_COMPAT_SUPP (OCFS2_FEATURE_RO_COMPAT_UNWRITTEN \
| OCFS2_FEATURE_RO_COMPAT_USRQUOTA \
| OCFS2_FEATURE_RO_COMPAT_GRPQUOTA)
--
1.5.6

2008-12-25 18:12:59

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 22/35] ocfs2: Don't hand-code xor in ocfs2_hamming_encode().

From: Joel Becker <[email protected]>

When I wrote ocfs2_hamming_encode(), I was following documentation of
the algorithm and didn't have quite the (possibly still imperfect) grasp
of it I do now. As part of this, I literally hand-coded xor. I would
test a bit, and then add that bit via xor to the parity word.

I can, of course, just do a single xor of the parity word and the source
word (the code buffer bit offset). This cuts CPU usage by 53% on a
mostly populated buffer (an inode containing utmp.h inline).

Joel

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/blockcheck.c | 67 ++++++++++++++----------------------------------
1 files changed, 20 insertions(+), 47 deletions(-)

diff --git a/fs/ocfs2/blockcheck.c b/fs/ocfs2/blockcheck.c
index 2ce6ae5..1d5083c 100644
--- a/fs/ocfs2/blockcheck.c
+++ b/fs/ocfs2/blockcheck.c
@@ -31,7 +31,6 @@
#include "blockcheck.h"


-
/*
* We use the following conventions:
*
@@ -39,26 +38,6 @@
* p = # parity bits
* c = # total code bits (d + p)
*/
-static int calc_parity_bits(unsigned int d)
-{
- unsigned int p;
-
- /*
- * Bits required for Single Error Correction is as follows:
- *
- * d + p + 1 <= 2^p
- *
- * We're restricting ourselves to 31 bits of parity, that should be
- * sufficient.
- */
- for (p = 1; p < 32; p++)
- {
- if ((d + p + 1) <= (1 << p))
- return p;
- }
-
- return 0;
-}

/*
* Calculate the bit offset in the hamming code buffer based on the bit's
@@ -109,10 +88,9 @@ static unsigned int calc_code_bit(unsigned int i)
*/
u32 ocfs2_hamming_encode(u32 parity, void *data, unsigned int d, unsigned int nr)
{
- unsigned int p = calc_parity_bits(nr + d);
- unsigned int i, j, b;
+ unsigned int i, b;

- BUG_ON(!p);
+ BUG_ON(!d);

/*
* b is the hamming code bit number. Hamming code specifies a
@@ -131,27 +109,23 @@ u32 ocfs2_hamming_encode(u32 parity, void *data, unsigned int d, unsigned int nr
*/
b = calc_code_bit(nr + i);

- for (j = 0; j < p; j++)
- {
- /*
- * Data bits in the resultant code are checked by
- * parity bits that are part of the bit number
- * representation. Huh?
- *
- * <wikipedia href="http://en.wikipedia.org/wiki/Hamming_code">
- * In other words, the parity bit at position 2^k
- * checks bits in positions having bit k set in
- * their binary representation. Conversely, for
- * instance, bit 13, i.e. 1101(2), is checked by
- * bits 1000(2) = 8, 0100(2)=4 and 0001(2) = 1.
- * </wikipedia>
- *
- * Note that 'k' is the _code_ bit number. 'b' in
- * our loop.
- */
- if (b & (1 << j))
- parity ^= (1 << j);
- }
+ /*
+ * Data bits in the resultant code are checked by
+ * parity bits that are part of the bit number
+ * representation. Huh?
+ *
+ * <wikipedia href="http://en.wikipedia.org/wiki/Hamming_code">
+ * In other words, the parity bit at position 2^k
+ * checks bits in positions having bit k set in
+ * their binary representation. Conversely, for
+ * instance, bit 13, i.e. 1101(2), is checked by
+ * bits 1000(2) = 8, 0100(2)=4 and 0001(2) = 1.
+ * </wikipedia>
+ *
+ * Note that 'k' is the _code_ bit number. 'b' in
+ * our loop.
+ */
+ parity ^= b;
}

/* While the data buffer was treated as little endian, the
@@ -174,10 +148,9 @@ u32 ocfs2_hamming_encode_block(void *data, unsigned int blocksize)
void ocfs2_hamming_fix(void *data, unsigned int d, unsigned int nr,
unsigned int fix)
{
- unsigned int p = calc_parity_bits(nr + d);
unsigned int i, b;

- BUG_ON(!p);
+ BUG_ON(!d);

/*
* If the bit to fix has an hweight of 1, it's a parity bit. One
--
1.5.6

2008-12-25 18:13:29

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 23/35] ocfs2: Another hamming code optimization.

From: Joel Becker <[email protected]>

In the calc_code_bit() function, we must find all powers of two beneath
the code bit number, *after* it's shifted by those powers of two. This
requires a loop to see where it ends up.

We can optimize it by starting at its most significant bit. This shaves
32% off the time, for a total of 67.6% shaved off of the original, naive
implementation.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/blockcheck.c | 40 +++++++++++++++++++++++++++++++++++++++-
1 files changed, 39 insertions(+), 1 deletions(-)

diff --git a/fs/ocfs2/blockcheck.c b/fs/ocfs2/blockcheck.c
index 1d5083c..f102ec9 100644
--- a/fs/ocfs2/blockcheck.c
+++ b/fs/ocfs2/blockcheck.c
@@ -39,6 +39,35 @@
* c = # total code bits (d + p)
*/

+
+/*
+ * Find the log base 2 of 32-bit v.
+ *
+ * Algorithm found on http://graphics.stanford.edu/~seander/bithacks.html,
+ * by Sean Eron Anderson. Code on the page is in the public domain unless
+ * otherwise noted.
+ *
+ * This particular algorithm is credited to Eric Cole.
+ */
+static int find_highest_bit_set(unsigned int v)
+{
+
+ static const int MultiplyDeBruijnBitPosition[32] =
+ {
+ 0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
+ 31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
+ };
+
+ v |= v >> 1; /* first round down to power of 2 */
+ v |= v >> 2;
+ v |= v >> 4;
+ v |= v >> 8;
+ v |= v >> 16;
+ v = (v >> 1) + 1;
+
+ return MultiplyDeBruijnBitPosition[(u32)(v * 0x077CB531UL) >> 27];
+}
+
/*
* Calculate the bit offset in the hamming code buffer based on the bit's
* offset in the data buffer. Since the hamming code reserves all
@@ -64,12 +93,21 @@ static unsigned int calc_code_bit(unsigned int i)
b = i + 1;

/*
+ * As a cheat, we know that all bits below b's highest bit must be
+ * parity bits, so we can start there.
+ */
+ p = find_highest_bit_set(b);
+ b += p;
+
+ /*
* For every power of two below our bit number, bump our bit.
*
* We compare with (b + 1) becuase we have to compare with what b
* would be _if_ it were bumped up by the parity bit. Capice?
+ *
+ * We start p at 2^p because of the cheat above.
*/
- for (p = 0; (1 << p) < (b + 1); p++)
+ for (p = (1 << p); p < (b + 1); p <<= 1)
b++;

return b;
--
1.5.6

2008-12-25 18:13:47

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 24/35] ocfs2: One more hamming code optimization.

From: Joel Becker <[email protected]>

The previous optimization used a fast find-highest-bit-set operation to
give us a good starting point in calc_code_bit(). This version lets the
caller cache the previous code buffer bit offset. Thus, the next call
always starts where the last one left off.

This reduces the calculation another 39%, for a total 80% reduction from
the original, naive implementation. At least, on my machine. This also
brings the parity calculation to within an order of magnitude of the
crc32 calculation.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/blockcheck.c | 61 +++++++++++++++---------------------------------
1 files changed, 19 insertions(+), 42 deletions(-)

diff --git a/fs/ocfs2/blockcheck.c b/fs/ocfs2/blockcheck.c
index f102ec9..2a947c4 100644
--- a/fs/ocfs2/blockcheck.c
+++ b/fs/ocfs2/blockcheck.c
@@ -41,34 +41,6 @@


/*
- * Find the log base 2 of 32-bit v.
- *
- * Algorithm found on http://graphics.stanford.edu/~seander/bithacks.html,
- * by Sean Eron Anderson. Code on the page is in the public domain unless
- * otherwise noted.
- *
- * This particular algorithm is credited to Eric Cole.
- */
-static int find_highest_bit_set(unsigned int v)
-{
-
- static const int MultiplyDeBruijnBitPosition[32] =
- {
- 0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
- 31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
- };
-
- v |= v >> 1; /* first round down to power of 2 */
- v |= v >> 2;
- v |= v >> 4;
- v |= v >> 8;
- v |= v >> 16;
- v = (v >> 1) + 1;
-
- return MultiplyDeBruijnBitPosition[(u32)(v * 0x077CB531UL) >> 27];
-}
-
-/*
* Calculate the bit offset in the hamming code buffer based on the bit's
* offset in the data buffer. Since the hamming code reserves all
* power-of-two bits for parity, the data bit number and the code bit
@@ -81,10 +53,14 @@ static int find_highest_bit_set(unsigned int v)
* so it's a parity bit. 2 is a power of two (2^1), so it's a parity bit.
* 3 is not a power of two. So bit 1 of the data buffer ends up as bit 3
* in the code buffer.
+ *
+ * The caller can pass in *p if it wants to keep track of the most recent
+ * number of parity bits added. This allows the function to start the
+ * calculation at the last place.
*/
-static unsigned int calc_code_bit(unsigned int i)
+static unsigned int calc_code_bit(unsigned int i, unsigned int *p_cache)
{
- unsigned int b, p;
+ unsigned int b, p = 0;

/*
* Data bits are 0-based, but we're talking code bits, which
@@ -92,24 +68,25 @@ static unsigned int calc_code_bit(unsigned int i)
*/
b = i + 1;

- /*
- * As a cheat, we know that all bits below b's highest bit must be
- * parity bits, so we can start there.
- */
- p = find_highest_bit_set(b);
+ /* Use the cache if it is there */
+ if (p_cache)
+ p = *p_cache;
b += p;

/*
* For every power of two below our bit number, bump our bit.
*
- * We compare with (b + 1) becuase we have to compare with what b
+ * We compare with (b + 1) because we have to compare with what b
* would be _if_ it were bumped up by the parity bit. Capice?
*
- * We start p at 2^p because of the cheat above.
+ * p is set above.
*/
- for (p = (1 << p); p < (b + 1); p <<= 1)
+ for (; (1 << p) < (b + 1); p++)
b++;

+ if (p_cache)
+ *p_cache = p;
+
return b;
}

@@ -126,7 +103,7 @@ static unsigned int calc_code_bit(unsigned int i)
*/
u32 ocfs2_hamming_encode(u32 parity, void *data, unsigned int d, unsigned int nr)
{
- unsigned int i, b;
+ unsigned int i, b, p = 0;

BUG_ON(!d);

@@ -145,7 +122,7 @@ u32 ocfs2_hamming_encode(u32 parity, void *data, unsigned int d, unsigned int nr
* i is the offset in this hunk, nr + i is the total bit
* offset.
*/
- b = calc_code_bit(nr + i);
+ b = calc_code_bit(nr + i, &p);

/*
* Data bits in the resultant code are checked by
@@ -201,7 +178,7 @@ void ocfs2_hamming_fix(void *data, unsigned int d, unsigned int nr,
* nr + d is the bit right past the data hunk we're looking at.
* If fix after that, nothing to do
*/
- if (fix >= calc_code_bit(nr + d))
+ if (fix >= calc_code_bit(nr + d, NULL))
return;

/*
@@ -209,7 +186,7 @@ void ocfs2_hamming_fix(void *data, unsigned int d, unsigned int nr,
* start b at the offset in the code buffer. See hamming_encode()
* for a more detailed description of 'b'.
*/
- b = calc_code_bit(nr);
+ b = calc_code_bit(nr, NULL);
/* If the fix is before this hunk, nothing to do */
if (fix < b)
return;
--
1.5.6

2008-12-25 18:14:06

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 25/35] ocfs2/dlm: Fix a race between migrate request and exit domain

From: Sunil Mushran <[email protected]>

Patch address a racing migrate request message and an exit domain message.
Instead of blocking exit domains for the duration of the migrate, we ignore
failure to deliver that message. This is because an exiting domain should
not have any active locks and thus has no role to play in the migration.

Signed-off-by: Sunil Mushran <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/dlm/dlmmaster.c | 23 +++++++++++++++++++----
1 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
index 44f87ca..92fd1d7 100644
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -2949,7 +2949,7 @@ static int dlm_do_migrate_request(struct dlm_ctxt *dlm,
struct dlm_node_iter *iter)
{
struct dlm_migrate_request migrate;
- int ret, status = 0;
+ int ret, skip, status = 0;
int nodenum;

memset(&migrate, 0, sizeof(migrate));
@@ -2966,12 +2966,27 @@ static int dlm_do_migrate_request(struct dlm_ctxt *dlm,
nodenum == new_master)
continue;

+ /* We could race exit domain. If exited, skip. */
+ spin_lock(&dlm->spinlock);
+ skip = (!test_bit(nodenum, dlm->domain_map));
+ spin_unlock(&dlm->spinlock);
+ if (skip) {
+ clear_bit(nodenum, iter->node_map);
+ continue;
+ }
+
ret = o2net_send_message(DLM_MIGRATE_REQUEST_MSG, dlm->key,
&migrate, sizeof(migrate), nodenum,
&status);
- if (ret < 0)
- mlog_errno(ret);
- else if (status < 0) {
+ if (ret < 0) {
+ mlog(0, "migrate_request returned %d!\n", ret);
+ if (!dlm_is_host_down(ret)) {
+ mlog(ML_ERROR, "unhandled error=%d!\n", ret);
+ BUG();
+ }
+ clear_bit(nodenum, iter->node_map);
+ ret = 0;
+ } else if (status < 0) {
mlog(0, "migrate request (node %u) returned %d!\n",
nodenum, status);
ret = status;
--
1.5.6

2008-12-25 18:14:28

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 26/35] ocfs2/dlm: Clean up errors in dlm_proxy_ast_handler()

From: Sunil Mushran <[email protected]>

Patch cleans printed errors in dlm_proxy_ast_handler(). The errors now includes
the node number that sent the (b)ast. Also it reduces the number of endian swaps
of the cookie.

Signed-off-by: Sunil Mushran <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/dlm/dlmast.c | 52 +++++++++++++++++++++++++-----------------------
1 files changed, 27 insertions(+), 25 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmast.c b/fs/ocfs2/dlm/dlmast.c
index 644bee5..d07ddbe 100644
--- a/fs/ocfs2/dlm/dlmast.c
+++ b/fs/ocfs2/dlm/dlmast.c
@@ -275,6 +275,7 @@ int dlm_proxy_ast_handler(struct o2net_msg *msg, u32 len, void *data,
struct list_head *iter, *head=NULL;
u64 cookie;
u32 flags;
+ u8 node;

if (!dlm_grab(dlm)) {
dlm_error(DLM_REJECTED);
@@ -286,18 +287,21 @@ int dlm_proxy_ast_handler(struct o2net_msg *msg, u32 len, void *data,

name = past->name;
locklen = past->namelen;
- cookie = be64_to_cpu(past->cookie);
+ cookie = past->cookie;
flags = be32_to_cpu(past->flags);
+ node = past->node_idx;

if (locklen > DLM_LOCKID_NAME_MAX) {
ret = DLM_IVBUFLEN;
- mlog(ML_ERROR, "Invalid name length in proxy ast handler!\n");
+ mlog(ML_ERROR, "Invalid name length (%d) in proxy ast "
+ "handler!\n", locklen);
goto leave;
}

if ((flags & (LKM_PUT_LVB|LKM_GET_LVB)) ==
(LKM_PUT_LVB|LKM_GET_LVB)) {
- mlog(ML_ERROR, "both PUT and GET lvb specified\n");
+ mlog(ML_ERROR, "Both PUT and GET lvb specified, (0x%x)\n",
+ flags);
ret = DLM_BADARGS;
goto leave;
}
@@ -310,22 +314,21 @@ int dlm_proxy_ast_handler(struct o2net_msg *msg, u32 len, void *data,
if (past->type != DLM_AST &&
past->type != DLM_BAST) {
mlog(ML_ERROR, "Unknown ast type! %d, cookie=%u:%llu"
- "name=%.*s\n", past->type,
- dlm_get_lock_cookie_node(cookie),
- dlm_get_lock_cookie_seq(cookie),
- locklen, name);
+ "name=%.*s, node=%u\n", past->type,
+ dlm_get_lock_cookie_node(be64_to_cpu(cookie)),
+ dlm_get_lock_cookie_seq(be64_to_cpu(cookie)),
+ locklen, name, node);
ret = DLM_IVLOCKID;
goto leave;
}

res = dlm_lookup_lockres(dlm, name, locklen);
if (!res) {
- mlog(0, "got %sast for unknown lockres! "
- "cookie=%u:%llu, name=%.*s, namelen=%u\n",
- past->type == DLM_AST ? "" : "b",
- dlm_get_lock_cookie_node(cookie),
- dlm_get_lock_cookie_seq(cookie),
- locklen, name, locklen);
+ mlog(0, "Got %sast for unknown lockres! cookie=%u:%llu, "
+ "name=%.*s, node=%u\n", (past->type == DLM_AST ? "" : "b"),
+ dlm_get_lock_cookie_node(be64_to_cpu(cookie)),
+ dlm_get_lock_cookie_seq(be64_to_cpu(cookie)),
+ locklen, name, node);
ret = DLM_IVLOCKID;
goto leave;
}
@@ -337,12 +340,12 @@ int dlm_proxy_ast_handler(struct o2net_msg *msg, u32 len, void *data,

spin_lock(&res->spinlock);
if (res->state & DLM_LOCK_RES_RECOVERING) {
- mlog(0, "responding with DLM_RECOVERING!\n");
+ mlog(0, "Responding with DLM_RECOVERING!\n");
ret = DLM_RECOVERING;
goto unlock_out;
}
if (res->state & DLM_LOCK_RES_MIGRATING) {
- mlog(0, "responding with DLM_MIGRATING!\n");
+ mlog(0, "Responding with DLM_MIGRATING!\n");
ret = DLM_MIGRATING;
goto unlock_out;
}
@@ -351,7 +354,7 @@ int dlm_proxy_ast_handler(struct o2net_msg *msg, u32 len, void *data,
lock = NULL;
list_for_each(iter, head) {
lock = list_entry (iter, struct dlm_lock, list);
- if (be64_to_cpu(lock->ml.cookie) == cookie)
+ if (lock->ml.cookie == cookie)
goto do_ast;
}

@@ -363,15 +366,15 @@ int dlm_proxy_ast_handler(struct o2net_msg *msg, u32 len, void *data,

list_for_each(iter, head) {
lock = list_entry (iter, struct dlm_lock, list);
- if (be64_to_cpu(lock->ml.cookie) == cookie)
+ if (lock->ml.cookie == cookie)
goto do_ast;
}

- mlog(0, "got %sast for unknown lock! cookie=%u:%llu, "
- "name=%.*s, namelen=%u\n", past->type == DLM_AST ? "" : "b",
- dlm_get_lock_cookie_node(cookie),
- dlm_get_lock_cookie_seq(cookie),
- locklen, name, locklen);
+ mlog(0, "Got %sast for unknown lock! cookie=%u:%llu, name=%.*s, "
+ "node=%u\n", past->type == DLM_AST ? "" : "b",
+ dlm_get_lock_cookie_node(be64_to_cpu(cookie)),
+ dlm_get_lock_cookie_seq(be64_to_cpu(cookie)),
+ locklen, name, node);

ret = DLM_NORMAL;
unlock_out:
@@ -383,8 +386,8 @@ do_ast:
if (past->type == DLM_AST) {
/* do not alter lock refcount. switching lists. */
list_move_tail(&lock->list, &res->granted);
- mlog(0, "ast: adding to granted list... type=%d, "
- "convert_type=%d\n", lock->ml.type, lock->ml.convert_type);
+ mlog(0, "ast: Adding to granted list... type=%d, "
+ "convert_type=%d\n", lock->ml.type, lock->ml.convert_type);
if (lock->ml.convert_type != LKM_IVMODE) {
lock->ml.type = lock->ml.convert_type;
lock->ml.convert_type = LKM_IVMODE;
@@ -408,7 +411,6 @@ do_ast:
dlm_do_local_bast(dlm, res, lock, past->blocked_type);

leave:
-
if (res)
dlm_lockres_put(res);

--
1.5.6

2008-12-25 18:14:45

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 27/35] ocfs2/dlm: Hold off sending lockres drop ref message while lockres is migrating

From: Sunil Mushran <[email protected]>

During lockres purge, o2dlm sends a drop reference message to the lockres
master. This patch delays the message if the lockres is being migrated.

Fixes oss bugzilla#1012
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1012

Signed-off-by: Sunil Mushran <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/dlm/dlmthread.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
index 4060bb3..d129520 100644
--- a/fs/ocfs2/dlm/dlmthread.c
+++ b/fs/ocfs2/dlm/dlmthread.c
@@ -181,7 +181,8 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,

spin_lock(&res->spinlock);
/* This ensures that clear refmap is sent after the set */
- __dlm_wait_on_lockres_flags(res, DLM_LOCK_RES_SETREF_INPROG);
+ __dlm_wait_on_lockres_flags(res, (DLM_LOCK_RES_SETREF_INPROG |
+ DLM_LOCK_RES_MIGRATING));
spin_unlock(&res->spinlock);

/* clear our bit from the master's refmap, ignore errors */
--
1.5.6

2008-12-25 18:15:07

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 28/35] ocfs2/dlm: Fix race in adding/removing lockres' to/from the tracking list

From: Sunil Mushran <[email protected]>

This patch adds a new lock, dlm->tracking_lock, to protect adding/removing
lockres' to/from the dlm->tracking_list. We were previously using dlm->spinlock
for the same, but that proved inadequate as we could be freeing a lockres from
a context that did not hold that lock. As the new lock only protects this list,
we can explicitly take it when removing the lockres from the tracking list.

This bug was exposed when testing multiple processes concurrently flock() the
same file.

Signed-off-by: Sunil Mushran <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/dlm/dlmcommon.h | 3 ++
fs/ocfs2/dlm/dlmdebug.c | 53 ++++++++++++++++++++-------------------------
fs/ocfs2/dlm/dlmdomain.c | 1 +
fs/ocfs2/dlm/dlmmaster.c | 10 ++++++++
4 files changed, 38 insertions(+), 29 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmcommon.h b/fs/ocfs2/dlm/dlmcommon.h
index d5a86fb..bb53714 100644
--- a/fs/ocfs2/dlm/dlmcommon.h
+++ b/fs/ocfs2/dlm/dlmcommon.h
@@ -140,6 +140,7 @@ struct dlm_ctxt
unsigned int purge_count;
spinlock_t spinlock;
spinlock_t ast_lock;
+ spinlock_t track_lock;
char *name;
u8 node_num;
u32 key;
@@ -316,6 +317,8 @@ struct dlm_lock_resource
* put on a list for the dlm thread to run. */
unsigned long last_used;

+ struct dlm_ctxt *dlm;
+
unsigned migration_pending:1;
atomic_t asts_reserved;
spinlock_t spinlock;
diff --git a/fs/ocfs2/dlm/dlmdebug.c b/fs/ocfs2/dlm/dlmdebug.c
index 1b81dcb..b32f60a 100644
--- a/fs/ocfs2/dlm/dlmdebug.c
+++ b/fs/ocfs2/dlm/dlmdebug.c
@@ -630,43 +630,38 @@ static void *lockres_seq_start(struct seq_file *m, loff_t *pos)
{
struct debug_lockres *dl = m->private;
struct dlm_ctxt *dlm = dl->dl_ctxt;
+ struct dlm_lock_resource *oldres = dl->dl_res;
struct dlm_lock_resource *res = NULL;
+ struct list_head *track_list;

- spin_lock(&dlm->spinlock);
+ spin_lock(&dlm->track_lock);
+ if (oldres)
+ track_list = &oldres->tracking;
+ else
+ track_list = &dlm->tracking_list;

- if (dl->dl_res) {
- list_for_each_entry(res, &dl->dl_res->tracking, tracking) {
- if (dl->dl_res) {
- dlm_lockres_put(dl->dl_res);
- dl->dl_res = NULL;
- }
- if (&res->tracking == &dlm->tracking_list) {
- mlog(0, "End of list found, %p\n", res);
- dl = NULL;
- break;
- }
+ list_for_each_entry(res, track_list, tracking) {
+ if (&res->tracking == &dlm->tracking_list)
+ res = NULL;
+ else
dlm_lockres_get(res);
- dl->dl_res = res;
- break;
- }
- } else {
- if (!list_empty(&dlm->tracking_list)) {
- list_for_each_entry(res, &dlm->tracking_list, tracking)
- break;
- dlm_lockres_get(res);
- dl->dl_res = res;
- } else
- dl = NULL;
+ break;
}
+ spin_unlock(&dlm->track_lock);

- if (dl) {
- spin_lock(&dl->dl_res->spinlock);
- dump_lockres(dl->dl_res, dl->dl_buf, dl->dl_len - 1);
- spin_unlock(&dl->dl_res->spinlock);
- }
+ if (oldres)
+ dlm_lockres_put(oldres);

- spin_unlock(&dlm->spinlock);
+ dl->dl_res = res;
+
+ if (res) {
+ spin_lock(&res->spinlock);
+ dump_lockres(res, dl->dl_buf, dl->dl_len - 1);
+ spin_unlock(&res->spinlock);
+ } else
+ dl = NULL;

+ /* passed to seq_show */
return dl;
}

diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
index 63f8125..d8d578f 100644
--- a/fs/ocfs2/dlm/dlmdomain.c
+++ b/fs/ocfs2/dlm/dlmdomain.c
@@ -1550,6 +1550,7 @@ static struct dlm_ctxt *dlm_alloc_ctxt(const char *domain,
spin_lock_init(&dlm->spinlock);
spin_lock_init(&dlm->master_lock);
spin_lock_init(&dlm->ast_lock);
+ spin_lock_init(&dlm->track_lock);
INIT_LIST_HEAD(&dlm->list);
INIT_LIST_HEAD(&dlm->dirty_list);
INIT_LIST_HEAD(&dlm->reco.resources);
diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
index 92fd1d7..cbf3abe 100644
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -505,8 +505,10 @@ void dlm_change_lockres_owner(struct dlm_ctxt *dlm,
static void dlm_lockres_release(struct kref *kref)
{
struct dlm_lock_resource *res;
+ struct dlm_ctxt *dlm;

res = container_of(kref, struct dlm_lock_resource, refs);
+ dlm = res->dlm;

/* This should not happen -- all lockres' have a name
* associated with them at init time. */
@@ -515,6 +517,7 @@ static void dlm_lockres_release(struct kref *kref)
mlog(0, "destroying lockres %.*s\n", res->lockname.len,
res->lockname.name);

+ spin_lock(&dlm->track_lock);
if (!list_empty(&res->tracking))
list_del_init(&res->tracking);
else {
@@ -522,6 +525,9 @@ static void dlm_lockres_release(struct kref *kref)
res->lockname.len, res->lockname.name);
dlm_print_one_lock_resource(res);
}
+ spin_unlock(&dlm->track_lock);
+
+ dlm_put(dlm);

if (!hlist_unhashed(&res->hash_node) ||
!list_empty(&res->granted) ||
@@ -595,6 +601,10 @@ static void dlm_init_lockres(struct dlm_ctxt *dlm,
res->migration_pending = 0;
res->inflight_locks = 0;

+ /* put in dlm_lockres_release */
+ dlm_grab(dlm);
+ res->dlm = dlm;
+
kref_init(&res->refs);

/* just for consistency */
--
1.5.6

2008-12-25 18:15:33

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 29/35] ocfs2/dlm: Fix race during lockres mastery

From: Sunil Mushran <[email protected]>

dlm_get_lock_resource() is supposed to return a lock resource with a proper
master. If multiple concurrent threads attempt to lookup the lockres for the
same lockid while the lock mastery in underway, one or more threads are likely
to return a lockres without a proper master.

This patch makes the threads wait in dlm_get_lock_resource() while the mastery
is underway, ensuring all threads return the lockres with a proper master.

This issue is known to be limited to users using the flock() syscall. For all
other fs operations, the ocfs2 dlmglue layer serializes the dlm op for each
lockid.

Users encountering this bug will see flock() return EINVAL and dmesg have the
following error:
ERROR: Dlm error "DLM_BADARGS" while calling dlmlock on resource <LOCKID>: bad api args

Reported-by: Coly Li <[email protected]>
Signed-off-by: Sunil Mushran <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/dlm/dlmmaster.c | 9 ++++++++-
1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
index cbf3abe..54e182a 100644
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -732,14 +732,21 @@ lookup:
if (tmpres) {
int dropping_ref = 0;

+ spin_unlock(&dlm->spinlock);
+
spin_lock(&tmpres->spinlock);
+ /* We wait for the other thread that is mastering the resource */
+ if (tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN) {
+ __dlm_wait_on_lockres(tmpres);
+ BUG_ON(tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN);
+ }
+
if (tmpres->owner == dlm->node_num) {
BUG_ON(tmpres->state & DLM_LOCK_RES_DROPPING_REF);
dlm_lockres_grab_inflight_ref(dlm, tmpres);
} else if (tmpres->state & DLM_LOCK_RES_DROPPING_REF)
dropping_ref = 1;
spin_unlock(&tmpres->spinlock);
- spin_unlock(&dlm->spinlock);

/* wait until done messaging the master, drop our ref to allow
* the lockres to be purged, start over. */
--
1.5.6

2008-12-25 18:15:52

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 30/35] ocfs2/xattr: Remove extend_trans call and add its credits from the beginning

From: Tao Ma <[email protected]>

Actually, when setting a new xattr value, we know it from the very
beginning, and it isn't like the extension of bucket in which case
we can't figure it out. So remove ocfs2_extend_trans in that function
and calculate it before the transaction. It also relieve acl operation
from the worry about the side effect of ocfs2_extend_trans.

Signed-off-by: Tao Ma <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 23 ++++++++++-------------
1 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 17028aa..93a1ab4 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -1169,7 +1169,7 @@ static int __ocfs2_xattr_set_value_outside(struct inode *inode,
const void *value,
int value_len)
{
- int ret = 0, i, cp_len, credits;
+ int ret = 0, i, cp_len;
u16 blocksize = inode->i_sb->s_blocksize;
u32 p_cluster, num_clusters;
u32 cpos = 0, bpc = ocfs2_clusters_to_blocks(inode->i_sb, 1);
@@ -1179,18 +1179,6 @@ static int __ocfs2_xattr_set_value_outside(struct inode *inode,

BUG_ON(clusters > le32_to_cpu(xv->xr_clusters));

- /*
- * In __ocfs2_xattr_set_value_outside has already been dirtied,
- * so we don't need to worry about whether ocfs2_extend_trans
- * will create a new transactio for us or not.
- */
- credits = clusters * bpc;
- ret = ocfs2_extend_trans(handle, credits);
- if (ret) {
- mlog_errno(ret);
- goto out;
- }
-
while (cpos < clusters) {
ret = ocfs2_xattr_get_clusters(inode, cpos, &p_cluster,
&num_clusters, &xv->xr_list);
@@ -2233,6 +2221,15 @@ static int ocfs2_calc_xattr_set_need(struct inode *inode,
xi->value_len);
u64 value_size;

+ /*
+ * Calculate the clusters we need to write.
+ * No matter whether we replace an old one or add a new one,
+ * we need this for writing.
+ */
+ if (xi->value_len > OCFS2_XATTR_INLINE_SIZE)
+ credits += new_clusters *
+ ocfs2_clusters_to_blocks(inode->i_sb, 1);
+
if (xis->not_found && xbs->not_found) {
credits += ocfs2_blocks_per_xattr_bucket(inode->i_sb);

--
1.5.6

2008-12-25 18:16:13

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 31/35] ocfs2/xattr: Always updating ctime during xattr set.

From: Tao Ma <[email protected]>

In xattr set, we should always update ctime if the operation goes
sucessfully. The old one mistakenly put it in ocfs2_xattr_set_entry
which is only called when we set xattr in inode or xattr block. The
side benefit is that it resolve the bug 1052 since in that scenario,
ocfs2_calc_xattr_set_need only calc out the xattr set credits while
ocfs2_xattr_set_entry update the inode also which isn't concerned with
the process of xattr set.

Signed-off-by: Tao Ma <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 20 ++++++++++++++++----
1 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 93a1ab4..3e2e92d 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -1651,10 +1651,6 @@ static int ocfs2_xattr_set_entry(struct inode *inode,
oi->ip_dyn_features |= flag;
di->i_dyn_features = cpu_to_le16(oi->ip_dyn_features);
spin_unlock(&oi->ip_lock);
- /* Update inode ctime */
- inode->i_ctime = CURRENT_TIME;
- di->i_ctime = cpu_to_le64(inode->i_ctime.tv_sec);
- di->i_ctime_nsec = cpu_to_le32(inode->i_ctime.tv_nsec);

ret = ocfs2_journal_dirty(handle, xs->inode_bh);
if (ret < 0)
@@ -2574,6 +2570,20 @@ static int __ocfs2_xattr_set_handle(struct inode *inode,
}
}

+ if (!ret) {
+ /* Update inode ctime. */
+ ret = ocfs2_journal_access(ctxt->handle, inode, xis->inode_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
+ if (ret) {
+ mlog_errno(ret);
+ goto out;
+ }
+
+ inode->i_ctime = CURRENT_TIME;
+ di->i_ctime = cpu_to_le64(inode->i_ctime.tv_sec);
+ di->i_ctime_nsec = cpu_to_le32(inode->i_ctime.tv_nsec);
+ ocfs2_journal_dirty(ctxt->handle, xis->inode_bh);
+ }
out:
return ret;
}
@@ -2750,6 +2760,8 @@ int ocfs2_xattr_set(struct inode *inode,
goto cleanup;
}

+ /* we need to update inode's ctime field, so add credit for it. */
+ credits += OCFS2_INODE_UPDATE_CREDITS;
ctxt.handle = ocfs2_start_trans(osb, credits);
if (IS_ERR(ctxt.handle)) {
ret = PTR_ERR(ctxt.handle);
--
1.5.6

2008-12-25 18:16:32

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 32/35] ocfs2/xattr: fix credits calculation during index create

From: Tao Ma <[email protected]>

When creating a xattr index block, the old calculation forget
to add credits for the meta change of the alloc file. So add
more credits and more comments to explain it.

Signed-off-by: Tao Ma <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 10 +++++++++-
1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 3e2e92d..73fb9f7 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -2359,13 +2359,21 @@ meta_guess:
} else
xb = (struct ocfs2_xattr_block *)xbs->xattr_bh->b_data;

+ /*
+ * If there is already an xattr tree, good, we can calculate
+ * like other b-trees. Otherwise we may have the chance of
+ * create a tree, the credit calculation is borrowed from
+ * ocfs2_calc_extend_credits with root_el = NULL. And the
+ * new tree will be cluster based, so no meta is needed.
+ */
if (le16_to_cpu(xb->xb_flags) & OCFS2_XATTR_INDEXED) {
struct ocfs2_extent_list *el =
&xb->xb_attrs.xb_root.xt_list;
meta_add += ocfs2_extend_meta_needed(el);
credits += ocfs2_calc_extend_credits(inode->i_sb,
el, 1);
- }
+ } else
+ credits += OCFS2_SUBALLOC_ALLOC + 1;

/*
* This cluster will be used either for new bucket or for
--
1.5.6

2008-12-25 18:16:47

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 33/35] ocfs2: calculate and reserve credits for xattr value in mknod

From: Tiger Yang <[email protected]>

We extend the credits for xattr's large value in set_value_outside
before, this can give rise to a credits issue when we set one security
entry and two acl entries duing mknod. As we remove extend_trans form
set_value_outside, we must calculate and reserve the credits for
xattr's large value in mknod.

Signed-off-by: Tiger Yang <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 40 ++++++++++++++++++++++++++--------------
1 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 73fb9f7..e5be470 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -490,9 +490,14 @@ int ocfs2_calc_security_init(struct inode *dir,
}

/* reserve clusters for xattr value which will be set in B tree*/
- if (si->value_len > OCFS2_XATTR_INLINE_SIZE)
- *want_clusters += ocfs2_clusters_for_bytes(dir->i_sb,
- si->value_len);
+ if (si->value_len > OCFS2_XATTR_INLINE_SIZE) {
+ int new_clusters = ocfs2_clusters_for_bytes(dir->i_sb,
+ si->value_len);
+
+ *xattr_credits += ocfs2_clusters_to_blocks(dir->i_sb,
+ new_clusters);
+ *want_clusters += new_clusters;
+ }
return ret;
}

@@ -506,9 +511,7 @@ int ocfs2_calc_xattr_init(struct inode *dir,
{
int ret = 0;
struct ocfs2_super *osb = OCFS2_SB(dir->i_sb);
- int s_size = 0;
- int a_size = 0;
- int acl_len = 0;
+ int s_size = 0, a_size = 0, acl_len = 0, new_clusters;

if (si->enable)
s_size = ocfs2_xattr_entry_real_size(strlen(si->name),
@@ -556,16 +559,25 @@ int ocfs2_calc_xattr_init(struct inode *dir,
*xattr_credits += ocfs2_blocks_per_xattr_bucket(dir->i_sb);
}

- /* reserve clusters for xattr value which will be set in B tree*/
- if (si->enable && si->value_len > OCFS2_XATTR_INLINE_SIZE)
- *want_clusters += ocfs2_clusters_for_bytes(dir->i_sb,
- si->value_len);
+ /*
+ * reserve credits and clusters for xattrs which has large value
+ * and have to be set outside
+ */
+ if (si->enable && si->value_len > OCFS2_XATTR_INLINE_SIZE) {
+ new_clusters = ocfs2_clusters_for_bytes(dir->i_sb,
+ si->value_len);
+ *xattr_credits += ocfs2_clusters_to_blocks(dir->i_sb,
+ new_clusters);
+ *want_clusters += new_clusters;
+ }
if (osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL &&
acl_len > OCFS2_XATTR_INLINE_SIZE) {
- *want_clusters += ocfs2_clusters_for_bytes(dir->i_sb, acl_len);
- if (S_ISDIR(mode))
- *want_clusters += ocfs2_clusters_for_bytes(dir->i_sb,
- acl_len);
+ /* for directory, it has DEFAULT and ACCESS two types of acls */
+ new_clusters = (S_ISDIR(mode) ? 2 : 1) *
+ ocfs2_clusters_for_bytes(dir->i_sb, acl_len);
+ *xattr_credits += ocfs2_clusters_to_blocks(dir->i_sb,
+ new_clusters);
+ *want_clusters += new_clusters;
}

return ret;
--
1.5.6

2008-12-25 18:17:09

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 34/35] ocfs2: alloc xattr bucket in ocfs2_xattr_set_handle

From: Tiger Yang <[email protected]>

In extreme situation, may need xattr bucket for setting
security entry and acl entries during mknod. This only
happens when block size is too small.

Signed-off-by: Tiger Yang <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 18 +++++++++++++++---
1 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index e5be470..095b0bb 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -2611,9 +2611,7 @@ out:
/*
* This function only called duing creating inode
* for init security/acl xattrs of the new inode.
- * The xattrs could be put into ibody or extent block,
- * xattr bucket would not be use in this case.
- * transanction credits also be reserved in here.
+ * All transanction credits have been reserved in mknod.
*/
int ocfs2_xattr_set_handle(handle_t *handle,
struct inode *inode,
@@ -2653,6 +2651,19 @@ int ocfs2_xattr_set_handle(handle_t *handle,
if (!ocfs2_supports_xattr(OCFS2_SB(inode->i_sb)))
return -EOPNOTSUPP;

+ /*
+ * In extreme situation, may need xattr bucket when
+ * block size is too small. And we have already reserved
+ * the credits for bucket in mknod.
+ */
+ if (inode->i_sb->s_blocksize == OCFS2_MIN_BLOCKSIZE) {
+ xbs.bucket = ocfs2_xattr_bucket_new(inode);
+ if (!xbs.bucket) {
+ mlog_errno(-ENOMEM);
+ return -ENOMEM;
+ }
+ }
+
xis.inode_bh = xbs.inode_bh = di_bh;
di = (struct ocfs2_dinode *)di_bh->b_data;

@@ -2672,6 +2683,7 @@ int ocfs2_xattr_set_handle(handle_t *handle,
cleanup:
up_write(&OCFS2_I(inode)->ip_xattr_sem);
brelse(xbs.xattr_bh);
+ ocfs2_xattr_bucket_free(xbs.bucket);

return ret;
}
--
1.5.6

2008-12-25 18:17:27

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 35/35] ocfs2: Add xattr support checking in init_security

From: Tiger Yang <[email protected]>

We must check whether ocfs2 volume support xattr in init_security,
if not support xattr and security is enable, would cause failure of mknod.

Signed-off-by: Tiger Yang <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 095b0bb..e1d638a 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -5324,6 +5324,9 @@ int ocfs2_init_security_get(struct inode *inode,
struct inode *dir,
struct ocfs2_security_xattr_info *si)
{
+ /* check whether ocfs2 support feature xattr */
+ if (!ocfs2_supports_xattr(OCFS2_SB(dir->i_sb)))
+ return -EOPNOTSUPP;
return security_inode_init_security(inode, dir, &si->name, &si->value,
&si->value_len);
}
--
1.5.6

2008-12-31 18:16:39

by Mark Fasheh

[permalink] [raw]
Subject: Re: [Ocfs2-devel] [PATCH 01/35] jbd2: Add buffer triggers

Hey Ted,

I just wanted to double check that the latest version of this patch
is ok with you before I send it off to Linus. IMHO, it should be fine - the
previous version got your Acked-by, and this just represents the final,
bug-fixed product. Still, it's good to make sure we're all on the same page :)

Thanks,
--Mark

On Thu, Dec 25, 2008 at 10:04:16AM -0800, Mark Fasheh wrote:
> From: Joel Becker <[email protected]>
>
> Filesystems often to do compute intensive operation on some
> metadata. If this operation is repeated many times, it can be very
> expensive. It would be much nicer if the operation could be performed
> once before a buffer goes to disk.
>
> This adds triggers to jbd2 buffer heads. Just before writing a metadata
> buffer to the journal, jbd2 will optionally call a commit trigger associated
> with the buffer. If the journal is aborted, an abort trigger will be
> called on any dirty buffers as they are dropped from pending
> transactions.
>
> ocfs2 will use this feature.
>
> Initially I tried to come up with a more generic trigger that could be
> used for non-buffer-related events like transaction completion. It
> doesn't tie nicely, because the information a buffer trigger needs
> (specific to a journal_head) isn't the same as what a transaction
> trigger needs (specific to a tranaction_t or perhaps journal_t). So I
> implemented a buffer set, with the understanding that
> journal/transaction wide triggers should be implemented separately.
>
> There is only one trigger set allowed per buffer. I can't think of any
> reason to attach more than one set. Contrast this with a journal or
> transaction in which multiple places may want to watch the entire
> transaction separately.
>
> The trigger sets are considered static allocation from the jbd2
> perspective. ocfs2 will just have one trigger set per block type,
> setting the same set on every bh of the same type.
>
> Signed-off-by: Joel Becker <[email protected]>
> Cc: "Theodore Ts'o" <[email protected]>
> Cc: <[email protected]>
> Signed-off-by: Mark Fasheh <[email protected]>
> ---
> fs/jbd2/commit.c | 9 ++++++++
> fs/jbd2/journal.c | 19 +++++++++++++++++
> fs/jbd2/transaction.c | 47 ++++++++++++++++++++++++++++++++++++++++++
> include/linux/jbd2.h | 31 +++++++++++++++++++++++++++
> include/linux/journal-head.h | 8 +++++++
> 5 files changed, 114 insertions(+), 0 deletions(-)
>
> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> index ebc667b..c8a1bac 100644
> --- a/fs/jbd2/commit.c
> +++ b/fs/jbd2/commit.c
> @@ -509,6 +509,10 @@ void jbd2_journal_commit_transaction(journal_t *journal)
> if (is_journal_aborted(journal)) {
> clear_buffer_jbddirty(jh2bh(jh));
> JBUFFER_TRACE(jh, "journal is aborting: refile");
> + jbd2_buffer_abort_trigger(jh,
> + jh->b_frozen_data ?
> + jh->b_frozen_triggers :
> + jh->b_triggers);
> jbd2_journal_refile_buffer(journal, jh);
> /* If that was the last one, we need to clean up
> * any descriptor buffers which may have been
> @@ -844,6 +848,9 @@ restart_loop:
> * data.
> *
> * Otherwise, we can just throw away the frozen data now.
> + *
> + * We also know that the frozen data has already fired
> + * its triggers if they exist, so we can clear that too.
> */
> if (jh->b_committed_data) {
> jbd2_free(jh->b_committed_data, bh->b_size);
> @@ -851,10 +858,12 @@ restart_loop:
> if (jh->b_frozen_data) {
> jh->b_committed_data = jh->b_frozen_data;
> jh->b_frozen_data = NULL;
> + jh->b_frozen_triggers = NULL;
> }
> } else if (jh->b_frozen_data) {
> jbd2_free(jh->b_frozen_data, bh->b_size);
> jh->b_frozen_data = NULL;
> + jh->b_frozen_triggers = NULL;
> }
>
> spin_lock(&journal->j_list_lock);
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index e70d657..f6bff9d 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -50,6 +50,7 @@ EXPORT_SYMBOL(jbd2_journal_unlock_updates);
> EXPORT_SYMBOL(jbd2_journal_get_write_access);
> EXPORT_SYMBOL(jbd2_journal_get_create_access);
> EXPORT_SYMBOL(jbd2_journal_get_undo_access);
> +EXPORT_SYMBOL(jbd2_journal_set_triggers);
> EXPORT_SYMBOL(jbd2_journal_dirty_metadata);
> EXPORT_SYMBOL(jbd2_journal_release_buffer);
> EXPORT_SYMBOL(jbd2_journal_forget);
> @@ -290,6 +291,7 @@ int jbd2_journal_write_metadata_buffer(transaction_t *transaction,
> struct page *new_page;
> unsigned int new_offset;
> struct buffer_head *bh_in = jh2bh(jh_in);
> + struct jbd2_buffer_trigger_type *triggers;
>
> /*
> * The buffer really shouldn't be locked: only the current committing
> @@ -314,13 +316,23 @@ repeat:
> done_copy_out = 1;
> new_page = virt_to_page(jh_in->b_frozen_data);
> new_offset = offset_in_page(jh_in->b_frozen_data);
> + triggers = jh_in->b_frozen_triggers;
> } else {
> new_page = jh2bh(jh_in)->b_page;
> new_offset = offset_in_page(jh2bh(jh_in)->b_data);
> + triggers = jh_in->b_triggers;
> }
>
> mapped_data = kmap_atomic(new_page, KM_USER0);
> /*
> + * Fire any commit trigger. Do this before checking for escaping,
> + * as the trigger may modify the magic offset. If a copy-out
> + * happens afterwards, it will have the correct data in the buffer.
> + */
> + jbd2_buffer_commit_trigger(jh_in, mapped_data + new_offset,
> + triggers);
> +
> + /*
> * Check for escaping
> */
> if (*((__be32 *)(mapped_data + new_offset)) ==
> @@ -352,6 +364,13 @@ repeat:
> new_page = virt_to_page(tmp);
> new_offset = offset_in_page(tmp);
> done_copy_out = 1;
> +
> + /*
> + * This isn't strictly necessary, as we're using frozen
> + * data for the escaping, but it keeps consistency with
> + * b_frozen_data usage.
> + */
> + jh_in->b_frozen_triggers = jh_in->b_triggers;
> }
>
> /*
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index 39b7805..4f925a4 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -741,6 +741,12 @@ done:
> source = kmap_atomic(page, KM_USER0);
> memcpy(jh->b_frozen_data, source+offset, jh2bh(jh)->b_size);
> kunmap_atomic(source, KM_USER0);
> +
> + /*
> + * Now that the frozen data is saved off, we need to store
> + * any matching triggers.
> + */
> + jh->b_frozen_triggers = jh->b_triggers;
> }
> jbd_unlock_bh_state(bh);
>
> @@ -944,6 +950,47 @@ out:
> }
>
> /**
> + * void jbd2_journal_set_triggers() - Add triggers for commit writeout
> + * @bh: buffer to trigger on
> + * @type: struct jbd2_buffer_trigger_type containing the trigger(s).
> + *
> + * Set any triggers on this journal_head. This is always safe, because
> + * triggers for a committing buffer will be saved off, and triggers for
> + * a running transaction will match the buffer in that transaction.
> + *
> + * Call with NULL to clear the triggers.
> + */
> +void jbd2_journal_set_triggers(struct buffer_head *bh,
> + struct jbd2_buffer_trigger_type *type)
> +{
> + struct journal_head *jh = bh2jh(bh);
> +
> + jh->b_triggers = type;
> +}
> +
> +void jbd2_buffer_commit_trigger(struct journal_head *jh, void *mapped_data,
> + struct jbd2_buffer_trigger_type *triggers)
> +{
> + struct buffer_head *bh = jh2bh(jh);
> +
> + if (!triggers || !triggers->t_commit)
> + return;
> +
> + triggers->t_commit(triggers, bh, mapped_data, bh->b_size);
> +}
> +
> +void jbd2_buffer_abort_trigger(struct journal_head *jh,
> + struct jbd2_buffer_trigger_type *triggers)
> +{
> + if (!triggers || !triggers->t_abort)
> + return;
> +
> + triggers->t_abort(triggers, jh2bh(jh));
> +}
> +
> +
> +
> +/**
> * int jbd2_journal_dirty_metadata() - mark a buffer as containing dirty metadata
> * @handle: transaction to add buffer to.
> * @bh: buffer to mark
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index f366457..3445647 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -1008,6 +1008,35 @@ int __jbd2_journal_clean_checkpoint_list(journal_t *journal);
> int __jbd2_journal_remove_checkpoint(struct journal_head *);
> void __jbd2_journal_insert_checkpoint(struct journal_head *, transaction_t *);
>
> +
> +/*
> + * Triggers
> + */
> +
> +struct jbd2_buffer_trigger_type {
> + /*
> + * Fired just before a buffer is written to the journal.
> + * mapped_data is a mapped buffer that is the frozen data for
> + * commit.
> + */
> + void (*t_commit)(struct jbd2_buffer_trigger_type *type,
> + struct buffer_head *bh, void *mapped_data,
> + size_t size);
> +
> + /*
> + * Fired during journal abort for dirty buffers that will not be
> + * committed.
> + */
> + void (*t_abort)(struct jbd2_buffer_trigger_type *type,
> + struct buffer_head *bh);
> +};
> +
> +extern void jbd2_buffer_commit_trigger(struct journal_head *jh,
> + void *mapped_data,
> + struct jbd2_buffer_trigger_type *triggers);
> +extern void jbd2_buffer_abort_trigger(struct journal_head *jh,
> + struct jbd2_buffer_trigger_type *triggers);
> +
> /* Buffer IO */
> extern int
> jbd2_journal_write_metadata_buffer(transaction_t *transaction,
> @@ -1046,6 +1075,8 @@ extern int jbd2_journal_extend (handle_t *, int nblocks);
> extern int jbd2_journal_get_write_access(handle_t *, struct buffer_head *);
> extern int jbd2_journal_get_create_access (handle_t *, struct buffer_head *);
> extern int jbd2_journal_get_undo_access(handle_t *, struct buffer_head *);
> +void jbd2_journal_set_triggers(struct buffer_head *,
> + struct jbd2_buffer_trigger_type *type);
> extern int jbd2_journal_dirty_metadata (handle_t *, struct buffer_head *);
> extern void jbd2_journal_release_buffer (handle_t *, struct buffer_head *);
> extern int jbd2_journal_forget (handle_t *, struct buffer_head *);
> diff --git a/include/linux/journal-head.h b/include/linux/journal-head.h
> index bb70ebb..525aac3 100644
> --- a/include/linux/journal-head.h
> +++ b/include/linux/journal-head.h
> @@ -12,6 +12,8 @@
>
> typedef unsigned int tid_t; /* Unique transaction ID */
> typedef struct transaction_s transaction_t; /* Compound transaction type */
> +
> +
> struct buffer_head;
>
> struct journal_head {
> @@ -87,6 +89,12 @@ struct journal_head {
> * [j_list_lock]
> */
> struct journal_head *b_cpnext, *b_cpprev;
> +
> + /* Trigger type */
> + struct jbd2_buffer_trigger_type *b_triggers;
> +
> + /* Trigger type for the committing transaction's frozen data */
> + struct jbd2_buffer_trigger_type *b_frozen_triggers;
> };
>
> #endif /* JOURNAL_HEAD_H_INCLUDED */
> --
> 1.5.6
>
>
> _______________________________________________
> Ocfs2-devel mailing list
> [email protected]
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
--
Mark Fasheh