2008-12-19 21:54:32

by Mark Fasheh

[permalink] [raw]
Subject: [git patches] Ocfs2 patches for merge window, batch 1/3

Hi,

As of right now, the upstream-linus branch in ocfs2.git has 136 patches for
the next merge window. Luckily, a large chunk of those are either small or
easily read. Also, many of them are generic quota changes (which already
have appropriate review and signoffs) which I carried in ocfs2.git because
Ocfs2 quotas depended on them.

Regardless, I plan to post the ocfs2 merge window patches in 3 batches to
make review easier. The first set of patches follows this mail. The final
two will be sent early next week and will include the generic quota patches.

Aside from some xattr cleanups and fixes, these patches include supoprt for
POSIX ACLs and security attributes. These are implemented as special
extended attributes in a similar manner to other file systems.

2.6.28 added Ocfs2 support for JBD2, and made it the default. A patch is in
here which drops support for JBD completely from Ocfs2. This should make
coding up new features easier as we only have one journaling layer to
target. Users will see no difference as JBD2 is already backwards
compatible with JBD formatted journals.

The other set of cleanups in this series is to our meta data I/O paths. Joel
did a lot of work to seperate those paths into block specific calls (with an
appropriate generic helper). This not only reduced arguments at each call
site, but it allowed us to do more efficient checking for corrupted meta
data. In the 3rd round, these internal hooks will be used to enable meta
data checksums on Ocfs2.
--Mark

Please pull from 'upstream-round1' branch of
git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2.git upstream-round1

to receive the following updates:

Documentation/filesystems/ocfs2.txt | 3 +-
fs/Kconfig | 11 +-
fs/ocfs2/Makefile | 4 +
fs/ocfs2/acl.c | 479 ++++++++
fs/ocfs2/acl.h | 58 +
fs/ocfs2/alloc.c | 381 +++++--
fs/ocfs2/alloc.h | 21 +
fs/ocfs2/aops.c | 35 +-
fs/ocfs2/buffer_head_io.c | 33 +-
fs/ocfs2/buffer_head_io.h | 27 +-
fs/ocfs2/dir.c | 123 +-
fs/ocfs2/dlmglue.c | 12 +-
fs/ocfs2/extent_map.c | 96 ++-
fs/ocfs2/extent_map.h | 24 +
fs/ocfs2/file.c | 115 +--
fs/ocfs2/inode.c | 128 ++-
fs/ocfs2/inode.h | 16 +-
fs/ocfs2/journal.c | 34 +-
fs/ocfs2/journal.h | 11 +-
fs/ocfs2/localalloc.c | 8 +-
fs/ocfs2/namei.c | 238 +++--
fs/ocfs2/ocfs2.h | 24 +-
fs/ocfs2/ocfs2_jbd_compat.h | 82 --
fs/ocfs2/resize.c | 60 +-
fs/ocfs2/slot_map.c | 4 +-
fs/ocfs2/suballoc.c | 278 +++--
fs/ocfs2/suballoc.h | 18 +-
fs/ocfs2/super.c | 33 +
fs/ocfs2/symlink.c | 2 +-
fs/ocfs2/xattr.c | 2257 +++++++++++++++++++++--------------
fs/ocfs2/xattr.h | 31 +
31 files changed, 3006 insertions(+), 1640 deletions(-)
create mode 100644 fs/ocfs2/acl.c
create mode 100644 fs/ocfs2/acl.h
delete mode 100644 fs/ocfs2/ocfs2_jbd_compat.h

Joel Becker (24):
ocfs2: Field prefixes for the xattr_bucket structure
ocfs2: Convenient access to an xattr bucket's block number.
ocfs2: Convenient access to xattr bucket data blocks.
ocfs2: Convenient access to an xattr bucket's header.
ocfs2: Provide a wrapper to brelse() xattr bucket buffers.
ocfs2: Improve ocfs2_read_xattr_bucket().
ocfs2: Wrap journal_access/journal_dirty for xattr buckets.
ocfs2: Copy xattr buckets with a dedicated function.
ocfs2: Take ocfs2_xattr_bucket structures off of the stack.
ocfs2: Use buckets in ocfs2_xattr_bucket_find().
ocfs2: Use buckets in ocfs2_xattr_create_index_block().
ocfs2: Use buckets in ocfs2_defrag_xattr_bucket().
ocfs2: Use buckets in ocfs2_xattr_set_entry_in_bucket().
ocfs2: Wrap inode block reads in a dedicated function.
ocfs2: Morph the haphazard OCFS2_IS_VALID_DINODE() checks.
ocfs2: Consolidate validation of group descriptors.
ocfs2: Wrap group descriptor reads in a dedicated function.
ocfs2: Morph the haphazard OCFS2_IS_VALID_GROUP_DESC() checks.
ocfs2: Wrap extent block reads in a dedicated function.
ocfs2: Wrap dirblock reads in a dedicated function.
ocfs2: Wrap xattr block reads in a dedicated function
ocfs2: Validate metadata only when it's read from disk.
ocfs2: Wrap virtual block reads in ocfs2_read_virt_blocks()
ocfs2: Convert ocfs2_read_dir_block() to ocfs2_read_virt_blocks()

Mark Fasheh (2):
ocfs2: turn __ocfs2_remove_inode_range() into ocfs2_remove_btree_range()
ocfs2: Remove JBD compatibility layer

Tao Ma (9):
ocfs2/xattr: Remove additional bucket allocation in bucket defragment.
ocfs2/xattr: Only set buffer update if it doesn't exist in cache.
ocfs2/xattr: Only extend xattr bucket in need.
ocfs2: Add clusters free in dealloc_ctxt.
ocfs2/xattr: Move clusters free into dealloc.
ocfs2/xattr: Reserve meta/data at the beginning of ocfs2_xattr_set.
ocfs2/xattr: Merge xattr set transaction.
ocfs2/xattr: Fix a bug in xattr allocation estimation
ocfs2/xattr: Restore not_found in xis

Tiger Yang (10):
ocfs2: move new inode allocation out of the transaction
ocfs2: add ocfs2_xattr_set_handle
ocfs2: add security xattr API
ocfs2: add ocfs2_init_security in during file create
ocfs2: add ocfs2_xattr_get_nolock
ocfs2: add POSIX ACL API
ocfs2: add ocfs2_check_acl
ocfs2: add ocfs2_acl_chmod
ocfs2: add ocfs2_init_acl in mknod
ocfs2: add mount option and Kconfig option for acl


2008-12-19 21:54:48

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 01/45] ocfs2: Field prefixes for the xattr_bucket structure

From: Joel Becker <[email protected]>

The ocfs2_xattr_bucket structure keeps track of the buffers for one
xattr bucket. Let's prefix the fields for easier code navigation.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 100 +++++++++++++++++++++++++++---------------------------
1 files changed, 50 insertions(+), 50 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 74d7367..9c0ee42 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -61,8 +61,8 @@ struct ocfs2_xattr_def_value_root {
};

struct ocfs2_xattr_bucket {
- struct buffer_head *bhs[OCFS2_XATTR_MAX_BLOCKS_PER_BUCKET];
- struct ocfs2_xattr_header *xh;
+ struct buffer_head *bu_bhs[OCFS2_XATTR_MAX_BLOCKS_PER_BUCKET];
+ struct ocfs2_xattr_header *bu_xh;
};

#define OCFS2_XATTR_ROOT_SIZE (sizeof(struct ocfs2_xattr_def_value_root))
@@ -795,11 +795,11 @@ static int ocfs2_xattr_block_get(struct inode *inode,

if (le16_to_cpu(xb->xb_flags) & OCFS2_XATTR_INDEXED) {
ret = ocfs2_xattr_bucket_get_name_value(inode,
- xs->bucket.xh,
+ xs->bucket.bu_xh,
i,
&block_off,
&name_offset);
- xs->base = xs->bucket.bhs[block_off]->b_data;
+ xs->base = xs->bucket.bu_bhs[block_off]->b_data;
}
if (ocfs2_xattr_is_local(xs->here)) {
memcpy(buffer, (void *)xs->base +
@@ -818,7 +818,7 @@ static int ocfs2_xattr_block_get(struct inode *inode,
ret = size;
cleanup:
for (i = 0; i < OCFS2_XATTR_MAX_BLOCKS_PER_BUCKET; i++)
- brelse(xs->bucket.bhs[i]);
+ brelse(xs->bucket.bu_bhs[i]);
memset(&xs->bucket, 0, sizeof(xs->bucket));

brelse(xs->xattr_bh);
@@ -2032,7 +2032,7 @@ cleanup:
brelse(di_bh);
brelse(xbs.xattr_bh);
for (i = 0; i < blk_per_bucket; i++)
- brelse(xbs.bucket.bhs[i]);
+ brelse(xbs.bucket.bu_bhs[i]);

return ret;
}
@@ -2276,13 +2276,13 @@ static int ocfs2_xattr_bucket_find(struct inode *inode,
lower_bh = bh;
bh = NULL;
}
- xs->bucket.bhs[0] = lower_bh;
- xs->bucket.xh = (struct ocfs2_xattr_header *)
- xs->bucket.bhs[0]->b_data;
+ xs->bucket.bu_bhs[0] = lower_bh;
+ xs->bucket.bu_xh = (struct ocfs2_xattr_header *)
+ xs->bucket.bu_bhs[0]->b_data;
lower_bh = NULL;

- xs->header = xs->bucket.xh;
- xs->base = xs->bucket.bhs[0]->b_data;
+ xs->header = xs->bucket.bu_xh;
+ xs->base = xs->bucket.bu_bhs[0]->b_data;
xs->end = xs->base + inode->i_sb->s_blocksize;

if (found) {
@@ -2290,8 +2290,8 @@ static int ocfs2_xattr_bucket_find(struct inode *inode,
* If we have found the xattr enty, read all the blocks in
* this bucket.
*/
- ret = ocfs2_read_blocks(inode, xs->bucket.bhs[0]->b_blocknr + 1,
- blk_per_bucket - 1, &xs->bucket.bhs[1],
+ ret = ocfs2_read_blocks(inode, xs->bucket.bu_bhs[0]->b_blocknr + 1,
+ blk_per_bucket - 1, &xs->bucket.bu_bhs[1],
0);
if (ret) {
mlog_errno(ret);
@@ -2300,7 +2300,7 @@ static int ocfs2_xattr_bucket_find(struct inode *inode,

xs->here = &xs->header->xh_entries[index];
mlog(0, "find xattr %s in bucket %llu, entry = %u\n", name,
- (unsigned long long)xs->bucket.bhs[0]->b_blocknr, index);
+ (unsigned long long)xs->bucket.bu_bhs[0]->b_blocknr, index);
} else
ret = -ENODATA;

@@ -2370,23 +2370,23 @@ static int ocfs2_iterate_xattr_buckets(struct inode *inode,

for (i = 0; i < num_buckets; i++, blkno += blk_per_bucket) {
ret = ocfs2_read_blocks(inode, blkno, blk_per_bucket,
- bucket.bhs, 0);
+ bucket.bu_bhs, 0);
if (ret) {
mlog_errno(ret);
goto out;
}

- bucket.xh = (struct ocfs2_xattr_header *)bucket.bhs[0]->b_data;
+ bucket.bu_xh = (struct ocfs2_xattr_header *)bucket.bu_bhs[0]->b_data;
/*
* The real bucket num in this series of blocks is stored
* in the 1st bucket.
*/
if (i == 0)
- num_buckets = le16_to_cpu(bucket.xh->xh_num_buckets);
+ num_buckets = le16_to_cpu(bucket.bu_xh->xh_num_buckets);

mlog(0, "iterating xattr bucket %llu, first hash %u\n",
(unsigned long long)blkno,
- le32_to_cpu(bucket.xh->xh_entries[0].xe_name_hash));
+ le32_to_cpu(bucket.bu_xh->xh_entries[0].xe_name_hash));
if (func) {
ret = func(inode, &bucket, para);
if (ret) {
@@ -2396,13 +2396,13 @@ static int ocfs2_iterate_xattr_buckets(struct inode *inode,
}

for (j = 0; j < blk_per_bucket; j++)
- brelse(bucket.bhs[j]);
+ brelse(bucket.bu_bhs[j]);
memset(&bucket, 0, sizeof(bucket));
}

out:
for (j = 0; j < blk_per_bucket; j++)
- brelse(bucket.bhs[j]);
+ brelse(bucket.bu_bhs[j]);

return ret;
}
@@ -2441,21 +2441,21 @@ static int ocfs2_list_xattr_bucket(struct inode *inode,
int i, block_off, new_offset;
const char *prefix, *name;

- for (i = 0 ; i < le16_to_cpu(bucket->xh->xh_count); i++) {
- struct ocfs2_xattr_entry *entry = &bucket->xh->xh_entries[i];
+ for (i = 0 ; i < le16_to_cpu(bucket->bu_xh->xh_count); i++) {
+ struct ocfs2_xattr_entry *entry = &bucket->bu_xh->xh_entries[i];
type = ocfs2_xattr_get_type(entry);
prefix = ocfs2_xattr_prefix(type);

if (prefix) {
ret = ocfs2_xattr_bucket_get_name_value(inode,
- bucket->xh,
+ bucket->bu_xh,
i,
&block_off,
&new_offset);
if (ret)
break;

- name = (const char *)bucket->bhs[block_off]->b_data +
+ name = (const char *)bucket->bu_bhs[block_off]->b_data +
new_offset;
ret = ocfs2_xattr_list_entry(xl->buffer,
xl->buffer_size,
@@ -2626,10 +2626,10 @@ static int ocfs2_xattr_update_xattr_search(struct inode *inode,
int i, blocksize = inode->i_sb->s_blocksize;
u16 blk_per_bucket = ocfs2_blocks_per_xattr_bucket(inode->i_sb);

- xs->bucket.bhs[0] = new_bh;
+ xs->bucket.bu_bhs[0] = new_bh;
get_bh(new_bh);
- xs->bucket.xh = (struct ocfs2_xattr_header *)xs->bucket.bhs[0]->b_data;
- xs->header = xs->bucket.xh;
+ xs->bucket.bu_xh = (struct ocfs2_xattr_header *)xs->bucket.bu_bhs[0]->b_data;
+ xs->header = xs->bucket.bu_xh;

xs->base = new_bh->b_data;
xs->end = xs->base + inode->i_sb->s_blocksize;
@@ -2637,8 +2637,8 @@ static int ocfs2_xattr_update_xattr_search(struct inode *inode,
if (!xs->not_found) {
if (OCFS2_XATTR_BUCKET_SIZE != blocksize) {
ret = ocfs2_read_blocks(inode,
- xs->bucket.bhs[0]->b_blocknr + 1,
- blk_per_bucket - 1, &xs->bucket.bhs[1],
+ xs->bucket.bu_bhs[0]->b_blocknr + 1,
+ blk_per_bucket - 1, &xs->bucket.bu_bhs[1],
0);
if (ret) {
mlog_errno(ret);
@@ -2835,7 +2835,7 @@ static int ocfs2_defrag_xattr_bucket(struct inode *inode,
size_t end, offset, len, value_len;
struct ocfs2_xattr_header *xh;
char *entries, *buf, *bucket_buf = NULL;
- u64 blkno = bucket->bhs[0]->b_blocknr;
+ u64 blkno = bucket->bu_bhs[0]->b_blocknr;
u16 blk_per_bucket = ocfs2_blocks_per_xattr_bucket(inode->i_sb);
u16 xh_free_start;
size_t blocksize = inode->i_sb->s_blocksize;
@@ -3929,7 +3929,7 @@ static inline char *ocfs2_xattr_bucket_get_val(struct inode *inode,
int block_off = offs >> inode->i_sb->s_blocksize_bits;

offs = offs % inode->i_sb->s_blocksize;
- return bucket->bhs[block_off]->b_data + offs;
+ return bucket->bu_bhs[block_off]->b_data + offs;
}

/*
@@ -4124,12 +4124,12 @@ static int ocfs2_xattr_set_entry_in_bucket(struct inode *inode,

mlog(0, "Set xattr entry len = %lu index = %d in bucket %llu\n",
(unsigned long)xi->value_len, xi->name_index,
- (unsigned long long)xs->bucket.bhs[0]->b_blocknr);
+ (unsigned long long)xs->bucket.bu_bhs[0]->b_blocknr);

- if (!xs->bucket.bhs[1]) {
+ if (!xs->bucket.bu_bhs[1]) {
ret = ocfs2_read_blocks(inode,
- xs->bucket.bhs[0]->b_blocknr + 1,
- blk_per_bucket - 1, &xs->bucket.bhs[1],
+ xs->bucket.bu_bhs[0]->b_blocknr + 1,
+ blk_per_bucket - 1, &xs->bucket.bu_bhs[1],
0);
if (ret) {
mlog_errno(ret);
@@ -4146,7 +4146,7 @@ static int ocfs2_xattr_set_entry_in_bucket(struct inode *inode,
}

for (i = 0; i < blk_per_bucket; i++) {
- ret = ocfs2_journal_access(handle, inode, xs->bucket.bhs[i],
+ ret = ocfs2_journal_access(handle, inode, xs->bucket.bu_bhs[i],
OCFS2_JOURNAL_ACCESS_WRITE);
if (ret < 0) {
mlog_errno(ret);
@@ -4158,7 +4158,7 @@ static int ocfs2_xattr_set_entry_in_bucket(struct inode *inode,

/*Only dirty the blocks we have touched in set xattr. */
ret = ocfs2_xattr_bucket_handle_journal(inode, handle, xs,
- xs->bucket.bhs, blk_per_bucket);
+ xs->bucket.bu_bhs, blk_per_bucket);
if (ret)
mlog_errno(ret);
out:
@@ -4272,10 +4272,10 @@ static int ocfs2_xattr_bucket_value_truncate_xs(struct inode *inode,
struct ocfs2_xattr_entry *xe = xs->here;
struct ocfs2_xattr_header *xh = (struct ocfs2_xattr_header *)xs->base;

- BUG_ON(!xs->bucket.bhs[0] || !xe || ocfs2_xattr_is_local(xe));
+ BUG_ON(!xs->bucket.bu_bhs[0] || !xe || ocfs2_xattr_is_local(xe));

offset = xe - xh->xh_entries;
- ret = ocfs2_xattr_bucket_value_truncate(inode, xs->bucket.bhs[0],
+ ret = ocfs2_xattr_bucket_value_truncate(inode, xs->bucket.bu_bhs[0],
offset, len);
if (ret)
mlog_errno(ret);
@@ -4395,7 +4395,7 @@ static void ocfs2_xattr_bucket_remove_xs(struct inode *inode,
struct ocfs2_xattr_search *xs)
{
handle_t *handle = NULL;
- struct ocfs2_xattr_header *xh = xs->bucket.xh;
+ struct ocfs2_xattr_header *xh = xs->bucket.bu_xh;
struct ocfs2_xattr_entry *last = &xh->xh_entries[
le16_to_cpu(xh->xh_count) - 1];
int ret = 0;
@@ -4407,7 +4407,7 @@ static void ocfs2_xattr_bucket_remove_xs(struct inode *inode,
return;
}

- ret = ocfs2_journal_access(handle, inode, xs->bucket.bhs[0],
+ ret = ocfs2_journal_access(handle, inode, xs->bucket.bu_bhs[0],
OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
@@ -4420,7 +4420,7 @@ static void ocfs2_xattr_bucket_remove_xs(struct inode *inode,
memset(last, 0, sizeof(struct ocfs2_xattr_entry));
le16_add_cpu(&xh->xh_count, -1);

- ret = ocfs2_journal_dirty(handle, xs->bucket.bhs[0]);
+ ret = ocfs2_journal_dirty(handle, xs->bucket.bu_bhs[0]);
if (ret < 0)
mlog_errno(ret);
out_commit:
@@ -4530,7 +4530,7 @@ static int ocfs2_check_xattr_bucket_collision(struct inode *inode,
struct ocfs2_xattr_bucket *bucket,
const char *name)
{
- struct ocfs2_xattr_header *xh = bucket->xh;
+ struct ocfs2_xattr_header *xh = bucket->bu_xh;
u32 name_hash = ocfs2_xattr_name_hash(inode, name, strlen(name));

if (name_hash != le32_to_cpu(xh->xh_entries[0].xe_name_hash))
@@ -4540,7 +4540,7 @@ static int ocfs2_check_xattr_bucket_collision(struct inode *inode,
xh->xh_entries[0].xe_name_hash) {
mlog(ML_ERROR, "Too much hash collision in xattr bucket %llu, "
"hash = %u\n",
- (unsigned long long)bucket->bhs[0]->b_blocknr,
+ (unsigned long long)bucket->bu_bhs[0]->b_blocknr,
le32_to_cpu(xh->xh_entries[0].xe_name_hash));
return -ENOSPC;
}
@@ -4574,7 +4574,7 @@ try_again:

mlog_bug_on_msg(header_size > blocksize, "bucket %llu has header size "
"of %u which exceed block size\n",
- (unsigned long long)xs->bucket.bhs[0]->b_blocknr,
+ (unsigned long long)xs->bucket.bu_bhs[0]->b_blocknr,
header_size);

if (xi->value && xi->value_len > OCFS2_XATTR_INLINE_SIZE)
@@ -4614,7 +4614,7 @@ try_again:
mlog(0, "xs->not_found = %d, in xattr bucket %llu: free = %d, "
"need = %d, max_free = %d, xh_free_start = %u, xh_name_value_len ="
" %u\n", xs->not_found,
- (unsigned long long)xs->bucket.bhs[0]->b_blocknr,
+ (unsigned long long)xs->bucket.bu_bhs[0]->b_blocknr,
free, need, max_free, le16_to_cpu(xh->xh_free_start),
le16_to_cpu(xh->xh_name_value_len));

@@ -4667,14 +4667,14 @@ try_again:

ret = ocfs2_add_new_xattr_bucket(inode,
xs->xattr_bh,
- xs->bucket.bhs[0]);
+ xs->bucket.bu_bhs[0]);
if (ret) {
mlog_errno(ret);
goto out;
}

for (i = 0; i < blk_per_bucket; i++)
- brelse(xs->bucket.bhs[i]);
+ brelse(xs->bucket.bu_bhs[i]);

memset(&xs->bucket, 0, sizeof(xs->bucket));

@@ -4700,7 +4700,7 @@ static int ocfs2_delete_xattr_in_bucket(struct inode *inode,
void *para)
{
int ret = 0;
- struct ocfs2_xattr_header *xh = bucket->xh;
+ struct ocfs2_xattr_header *xh = bucket->bu_xh;
u16 i;
struct ocfs2_xattr_entry *xe;

@@ -4710,7 +4710,7 @@ static int ocfs2_delete_xattr_in_bucket(struct inode *inode,
continue;

ret = ocfs2_xattr_bucket_value_truncate(inode,
- bucket->bhs[0],
+ bucket->bu_bhs[0],
i, 0);
if (ret) {
mlog_errno(ret);
--
1.5.6

2008-12-19 21:55:14

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 02/45] ocfs2: Convenient access to an xattr bucket's block number.

From: Joel Becker <[email protected]>

The xattr code often wants to know the block number of an xattr bucket.
This is usually found by dereferencing the first bh hanging off of the
ocfs2_xattr_bucket structure. Rather than do this all the time, let's
provide a nice little macro. The idea is ripped from the ocfs2_path
code.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 20 +++++++++++---------
1 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 9c0ee42..3cf8e80 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -154,6 +154,8 @@ static inline u16 ocfs2_xattr_max_xe_in_bucket(struct super_block *sb)
return len / sizeof(struct ocfs2_xattr_entry);
}

+#define bucket_blkno(_b) ((_b)->bu_bhs[0]->b_blocknr)
+
static inline const char *ocfs2_xattr_prefix(int name_index)
{
struct xattr_handler *handler = NULL;
@@ -2290,7 +2292,7 @@ static int ocfs2_xattr_bucket_find(struct inode *inode,
* If we have found the xattr enty, read all the blocks in
* this bucket.
*/
- ret = ocfs2_read_blocks(inode, xs->bucket.bu_bhs[0]->b_blocknr + 1,
+ ret = ocfs2_read_blocks(inode, bucket_blkno(&xs->bucket) + 1,
blk_per_bucket - 1, &xs->bucket.bu_bhs[1],
0);
if (ret) {
@@ -2300,7 +2302,7 @@ static int ocfs2_xattr_bucket_find(struct inode *inode,

xs->here = &xs->header->xh_entries[index];
mlog(0, "find xattr %s in bucket %llu, entry = %u\n", name,
- (unsigned long long)xs->bucket.bu_bhs[0]->b_blocknr, index);
+ (unsigned long long)bucket_blkno(&xs->bucket), index);
} else
ret = -ENODATA;

@@ -2637,7 +2639,7 @@ static int ocfs2_xattr_update_xattr_search(struct inode *inode,
if (!xs->not_found) {
if (OCFS2_XATTR_BUCKET_SIZE != blocksize) {
ret = ocfs2_read_blocks(inode,
- xs->bucket.bu_bhs[0]->b_blocknr + 1,
+ bucket_blkno(&xs->bucket) + 1,
blk_per_bucket - 1, &xs->bucket.bu_bhs[1],
0);
if (ret) {
@@ -2835,7 +2837,7 @@ static int ocfs2_defrag_xattr_bucket(struct inode *inode,
size_t end, offset, len, value_len;
struct ocfs2_xattr_header *xh;
char *entries, *buf, *bucket_buf = NULL;
- u64 blkno = bucket->bu_bhs[0]->b_blocknr;
+ u64 blkno = bucket_blkno(bucket);
u16 blk_per_bucket = ocfs2_blocks_per_xattr_bucket(inode->i_sb);
u16 xh_free_start;
size_t blocksize = inode->i_sb->s_blocksize;
@@ -4124,11 +4126,11 @@ static int ocfs2_xattr_set_entry_in_bucket(struct inode *inode,

mlog(0, "Set xattr entry len = %lu index = %d in bucket %llu\n",
(unsigned long)xi->value_len, xi->name_index,
- (unsigned long long)xs->bucket.bu_bhs[0]->b_blocknr);
+ (unsigned long long)bucket_blkno(&xs->bucket));

if (!xs->bucket.bu_bhs[1]) {
ret = ocfs2_read_blocks(inode,
- xs->bucket.bu_bhs[0]->b_blocknr + 1,
+ bucket_blkno(&xs->bucket) + 1,
blk_per_bucket - 1, &xs->bucket.bu_bhs[1],
0);
if (ret) {
@@ -4540,7 +4542,7 @@ static int ocfs2_check_xattr_bucket_collision(struct inode *inode,
xh->xh_entries[0].xe_name_hash) {
mlog(ML_ERROR, "Too much hash collision in xattr bucket %llu, "
"hash = %u\n",
- (unsigned long long)bucket->bu_bhs[0]->b_blocknr,
+ (unsigned long long)bucket_blkno(bucket),
le32_to_cpu(xh->xh_entries[0].xe_name_hash));
return -ENOSPC;
}
@@ -4574,7 +4576,7 @@ try_again:

mlog_bug_on_msg(header_size > blocksize, "bucket %llu has header size "
"of %u which exceed block size\n",
- (unsigned long long)xs->bucket.bu_bhs[0]->b_blocknr,
+ (unsigned long long)bucket_blkno(&xs->bucket),
header_size);

if (xi->value && xi->value_len > OCFS2_XATTR_INLINE_SIZE)
@@ -4614,7 +4616,7 @@ try_again:
mlog(0, "xs->not_found = %d, in xattr bucket %llu: free = %d, "
"need = %d, max_free = %d, xh_free_start = %u, xh_name_value_len ="
" %u\n", xs->not_found,
- (unsigned long long)xs->bucket.bu_bhs[0]->b_blocknr,
+ (unsigned long long)bucket_blkno(&xs->bucket),
free, need, max_free, le16_to_cpu(xh->xh_free_start),
le16_to_cpu(xh->xh_name_value_len));

--
1.5.6

2008-12-19 21:55:44

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 03/45] ocfs2: Convenient access to xattr bucket data blocks.

From: Joel Becker <[email protected]>

The xattr code often wants to access the data pointer for blocks in an
xattr bucket. This is usually found by dereferencing the bh array
hanging off of the ocfs2_xattr_bucket structure. Rather than do this
all the time, let's provide a nice little macro. The idea is ripped
from the ocfs2_path code.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 15 ++++++++-------
1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 3cf8e80..8594df3 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -155,6 +155,7 @@ static inline u16 ocfs2_xattr_max_xe_in_bucket(struct super_block *sb)
}

#define bucket_blkno(_b) ((_b)->bu_bhs[0]->b_blocknr)
+#define bucket_block(_b, _n) ((_b)->bu_bhs[(_n)]->b_data)

static inline const char *ocfs2_xattr_prefix(int name_index)
{
@@ -801,7 +802,7 @@ static int ocfs2_xattr_block_get(struct inode *inode,
i,
&block_off,
&name_offset);
- xs->base = xs->bucket.bu_bhs[block_off]->b_data;
+ xs->base = bucket_block(&xs->bucket, block_off);
}
if (ocfs2_xattr_is_local(xs->here)) {
memcpy(buffer, (void *)xs->base +
@@ -2280,11 +2281,11 @@ static int ocfs2_xattr_bucket_find(struct inode *inode,
}
xs->bucket.bu_bhs[0] = lower_bh;
xs->bucket.bu_xh = (struct ocfs2_xattr_header *)
- xs->bucket.bu_bhs[0]->b_data;
+ bucket_block(&xs->bucket, 0);
lower_bh = NULL;

xs->header = xs->bucket.bu_xh;
- xs->base = xs->bucket.bu_bhs[0]->b_data;
+ xs->base = bucket_block(&xs->bucket, 0);
xs->end = xs->base + inode->i_sb->s_blocksize;

if (found) {
@@ -2378,7 +2379,7 @@ static int ocfs2_iterate_xattr_buckets(struct inode *inode,
goto out;
}

- bucket.bu_xh = (struct ocfs2_xattr_header *)bucket.bu_bhs[0]->b_data;
+ bucket.bu_xh = (struct ocfs2_xattr_header *)bucket_block(&bucket, 0);
/*
* The real bucket num in this series of blocks is stored
* in the 1st bucket.
@@ -2457,7 +2458,7 @@ static int ocfs2_list_xattr_bucket(struct inode *inode,
if (ret)
break;

- name = (const char *)bucket->bu_bhs[block_off]->b_data +
+ name = (const char *)bucket_block(bucket, block_off) +
new_offset;
ret = ocfs2_xattr_list_entry(xl->buffer,
xl->buffer_size,
@@ -2630,7 +2631,7 @@ static int ocfs2_xattr_update_xattr_search(struct inode *inode,

xs->bucket.bu_bhs[0] = new_bh;
get_bh(new_bh);
- xs->bucket.bu_xh = (struct ocfs2_xattr_header *)xs->bucket.bu_bhs[0]->b_data;
+ xs->bucket.bu_xh = (struct ocfs2_xattr_header *)bucket_block(&xs->bucket, 0);
xs->header = xs->bucket.bu_xh;

xs->base = new_bh->b_data;
@@ -3931,7 +3932,7 @@ static inline char *ocfs2_xattr_bucket_get_val(struct inode *inode,
int block_off = offs >> inode->i_sb->s_blocksize_bits;

offs = offs % inode->i_sb->s_blocksize;
- return bucket->bu_bhs[block_off]->b_data + offs;
+ return bucket_block(bucket, block_off) + offs;
}

/*
--
1.5.6

2008-12-19 21:55:57

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 04/45] ocfs2: Convenient access to an xattr bucket's header.

From: Joel Becker <[email protected]>

The xattr code often wants to access the ocfs2_xattr_header at the start
of an bucket. Rather than walk the pointer chains, let's just create
another nice macro. As a side benefit, we can get rid of the mostly
spurious ->bu_xh element on the bucket structure. The idea is ripped
from the ocfs2_path code.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 28 ++++++++++++----------------
1 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 8594df3..1b77302 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -62,7 +62,6 @@ struct ocfs2_xattr_def_value_root {

struct ocfs2_xattr_bucket {
struct buffer_head *bu_bhs[OCFS2_XATTR_MAX_BLOCKS_PER_BUCKET];
- struct ocfs2_xattr_header *bu_xh;
};

#define OCFS2_XATTR_ROOT_SIZE (sizeof(struct ocfs2_xattr_def_value_root))
@@ -156,6 +155,7 @@ static inline u16 ocfs2_xattr_max_xe_in_bucket(struct super_block *sb)

#define bucket_blkno(_b) ((_b)->bu_bhs[0]->b_blocknr)
#define bucket_block(_b, _n) ((_b)->bu_bhs[(_n)]->b_data)
+#define bucket_xh(_b) ((struct ocfs2_xattr_header *)bucket_block((_b), 0))

static inline const char *ocfs2_xattr_prefix(int name_index)
{
@@ -798,7 +798,7 @@ static int ocfs2_xattr_block_get(struct inode *inode,

if (le16_to_cpu(xb->xb_flags) & OCFS2_XATTR_INDEXED) {
ret = ocfs2_xattr_bucket_get_name_value(inode,
- xs->bucket.bu_xh,
+ bucket_xh(&xs->bucket),
i,
&block_off,
&name_offset);
@@ -2280,11 +2280,9 @@ static int ocfs2_xattr_bucket_find(struct inode *inode,
bh = NULL;
}
xs->bucket.bu_bhs[0] = lower_bh;
- xs->bucket.bu_xh = (struct ocfs2_xattr_header *)
- bucket_block(&xs->bucket, 0);
lower_bh = NULL;

- xs->header = xs->bucket.bu_xh;
+ xs->header = bucket_xh(&xs->bucket);
xs->base = bucket_block(&xs->bucket, 0);
xs->end = xs->base + inode->i_sb->s_blocksize;

@@ -2379,17 +2377,16 @@ static int ocfs2_iterate_xattr_buckets(struct inode *inode,
goto out;
}

- bucket.bu_xh = (struct ocfs2_xattr_header *)bucket_block(&bucket, 0);
/*
* The real bucket num in this series of blocks is stored
* in the 1st bucket.
*/
if (i == 0)
- num_buckets = le16_to_cpu(bucket.bu_xh->xh_num_buckets);
+ num_buckets = le16_to_cpu(bucket_xh(&bucket)->xh_num_buckets);

mlog(0, "iterating xattr bucket %llu, first hash %u\n",
(unsigned long long)blkno,
- le32_to_cpu(bucket.bu_xh->xh_entries[0].xe_name_hash));
+ le32_to_cpu(bucket_xh(&bucket)->xh_entries[0].xe_name_hash));
if (func) {
ret = func(inode, &bucket, para);
if (ret) {
@@ -2444,14 +2441,14 @@ static int ocfs2_list_xattr_bucket(struct inode *inode,
int i, block_off, new_offset;
const char *prefix, *name;

- for (i = 0 ; i < le16_to_cpu(bucket->bu_xh->xh_count); i++) {
- struct ocfs2_xattr_entry *entry = &bucket->bu_xh->xh_entries[i];
+ for (i = 0 ; i < le16_to_cpu(bucket_xh(bucket)->xh_count); i++) {
+ struct ocfs2_xattr_entry *entry = &bucket_xh(bucket)->xh_entries[i];
type = ocfs2_xattr_get_type(entry);
prefix = ocfs2_xattr_prefix(type);

if (prefix) {
ret = ocfs2_xattr_bucket_get_name_value(inode,
- bucket->bu_xh,
+ bucket_xh(bucket),
i,
&block_off,
&new_offset);
@@ -2631,8 +2628,7 @@ static int ocfs2_xattr_update_xattr_search(struct inode *inode,

xs->bucket.bu_bhs[0] = new_bh;
get_bh(new_bh);
- xs->bucket.bu_xh = (struct ocfs2_xattr_header *)bucket_block(&xs->bucket, 0);
- xs->header = xs->bucket.bu_xh;
+ xs->header = bucket_xh(&xs->bucket);

xs->base = new_bh->b_data;
xs->end = xs->base + inode->i_sb->s_blocksize;
@@ -4398,7 +4394,7 @@ static void ocfs2_xattr_bucket_remove_xs(struct inode *inode,
struct ocfs2_xattr_search *xs)
{
handle_t *handle = NULL;
- struct ocfs2_xattr_header *xh = xs->bucket.bu_xh;
+ struct ocfs2_xattr_header *xh = bucket_xh(&xs->bucket);
struct ocfs2_xattr_entry *last = &xh->xh_entries[
le16_to_cpu(xh->xh_count) - 1];
int ret = 0;
@@ -4533,7 +4529,7 @@ static int ocfs2_check_xattr_bucket_collision(struct inode *inode,
struct ocfs2_xattr_bucket *bucket,
const char *name)
{
- struct ocfs2_xattr_header *xh = bucket->bu_xh;
+ struct ocfs2_xattr_header *xh = bucket_xh(bucket);
u32 name_hash = ocfs2_xattr_name_hash(inode, name, strlen(name));

if (name_hash != le32_to_cpu(xh->xh_entries[0].xe_name_hash))
@@ -4703,7 +4699,7 @@ static int ocfs2_delete_xattr_in_bucket(struct inode *inode,
void *para)
{
int ret = 0;
- struct ocfs2_xattr_header *xh = bucket->bu_xh;
+ struct ocfs2_xattr_header *xh = bucket_xh(bucket);
u16 i;
struct ocfs2_xattr_entry *xe;

--
1.5.6

2008-12-19 21:56:27

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 05/45] ocfs2: Provide a wrapper to brelse() xattr bucket buffers.

From: Joel Becker <[email protected]>

A common theme is walking all the buffer heads on an ocfs2_xattr_bucket
and releasing them. Let's wrap that.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 33 ++++++++++++++++++---------------
1 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 1b77302..3478ad1 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -157,6 +157,17 @@ static inline u16 ocfs2_xattr_max_xe_in_bucket(struct super_block *sb)
#define bucket_block(_b, _n) ((_b)->bu_bhs[(_n)]->b_data)
#define bucket_xh(_b) ((struct ocfs2_xattr_header *)bucket_block((_b), 0))

+static void ocfs2_xattr_bucket_relse(struct inode *inode,
+ struct ocfs2_xattr_bucket *bucket)
+{
+ int i, blks = ocfs2_blocks_per_xattr_bucket(inode->i_sb);
+
+ for (i = 0; i < blks; i++) {
+ brelse(bucket->bu_bhs[i]);
+ bucket->bu_bhs[i] = NULL;
+ }
+}
+
static inline const char *ocfs2_xattr_prefix(int name_index)
{
struct xattr_handler *handler = NULL;
@@ -820,8 +831,7 @@ static int ocfs2_xattr_block_get(struct inode *inode,
}
ret = size;
cleanup:
- for (i = 0; i < OCFS2_XATTR_MAX_BLOCKS_PER_BUCKET; i++)
- brelse(xs->bucket.bu_bhs[i]);
+ ocfs2_xattr_bucket_relse(inode, &xs->bucket);
memset(&xs->bucket, 0, sizeof(xs->bucket));

brelse(xs->xattr_bh);
@@ -1932,7 +1942,6 @@ int ocfs2_xattr_set(struct inode *inode,
struct buffer_head *di_bh = NULL;
struct ocfs2_dinode *di;
int ret;
- u16 i, blk_per_bucket = ocfs2_blocks_per_xattr_bucket(inode->i_sb);

struct ocfs2_xattr_info xi = {
.name_index = name_index,
@@ -2034,8 +2043,7 @@ cleanup:
ocfs2_inode_unlock(inode, 1);
brelse(di_bh);
brelse(xbs.xattr_bh);
- for (i = 0; i < blk_per_bucket; i++)
- brelse(xbs.bucket.bu_bhs[i]);
+ ocfs2_xattr_bucket_relse(inode, &xbs.bucket);

return ret;
}
@@ -2358,7 +2366,7 @@ static int ocfs2_iterate_xattr_buckets(struct inode *inode,
xattr_bucket_func *func,
void *para)
{
- int i, j, ret = 0;
+ int i, ret = 0;
int blk_per_bucket = ocfs2_blocks_per_xattr_bucket(inode->i_sb);
u32 bpc = ocfs2_xattr_buckets_per_cluster(OCFS2_SB(inode->i_sb));
u32 num_buckets = clusters * bpc;
@@ -2395,14 +2403,12 @@ static int ocfs2_iterate_xattr_buckets(struct inode *inode,
}
}

- for (j = 0; j < blk_per_bucket; j++)
- brelse(bucket.bu_bhs[j]);
+ ocfs2_xattr_bucket_relse(inode, &bucket);
memset(&bucket, 0, sizeof(bucket));
}

out:
- for (j = 0; j < blk_per_bucket; j++)
- brelse(bucket.bu_bhs[j]);
+ ocfs2_xattr_bucket_relse(inode, &bucket);

return ret;
}
@@ -4554,11 +4560,10 @@ static int ocfs2_xattr_set_entry_index_block(struct inode *inode,
struct ocfs2_xattr_header *xh;
struct ocfs2_xattr_entry *xe;
u16 count, header_size, xh_free_start;
- int i, free, max_free, need, old;
+ int free, max_free, need, old;
size_t value_size = 0, name_len = strlen(xi->name);
size_t blocksize = inode->i_sb->s_blocksize;
int ret, allocation = 0;
- u16 blk_per_bucket = ocfs2_blocks_per_xattr_bucket(inode->i_sb);

mlog_entry("Set xattr %s in xattr index block\n", xi->name);

@@ -4672,9 +4677,7 @@ try_again:
goto out;
}

- for (i = 0; i < blk_per_bucket; i++)
- brelse(xs->bucket.bu_bhs[i]);
-
+ ocfs2_xattr_bucket_relse(inode, &xs->bucket);
memset(&xs->bucket, 0, sizeof(xs->bucket));

ret = ocfs2_xattr_index_block_find(inode, xs->xattr_bh,
--
1.5.6

2008-12-19 21:56:43

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 06/45] ocfs2: Improve ocfs2_read_xattr_bucket().

From: Joel Becker <[email protected]>

The ocfs2_read_xattr_bucket() function would read an xattr bucket into a
list of buffer heads. However, we have a nice ocfs2_xattr_bucket
structure. Let's have it fill that out instead.

In addition, ocfs2_read_xattr_bucket() would initialize buffer heads for
a bucket that's never been on disk before. That's confusing. Let's
call that functionality ocfs2_init_xattr_bucket().

The functions ocfs2_cp_xattr_bucket() and ocfs2_half_xattr_bucket() are
updated to use the ocfs2_xattr_bucket structure rather than raw bh
lists. That way they can use the new read/init calls. In addition,
they drop the wasted read of an existing target bucket.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 165 ++++++++++++++++++++++++++----------------------------
1 files changed, 79 insertions(+), 86 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 3478ad1..fa13fa4 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -168,6 +168,48 @@ static void ocfs2_xattr_bucket_relse(struct inode *inode,
}
}

+/*
+ * A bucket that has never been written to disk doesn't need to be
+ * read. We just need the buffer_heads. Don't call this for
+ * buckets that are already on disk. ocfs2_read_xattr_bucket() initializes
+ * them fully.
+ */
+static int ocfs2_init_xattr_bucket(struct inode *inode,
+ struct ocfs2_xattr_bucket *bucket,
+ u64 xb_blkno)
+{
+ int i, rc = 0;
+ int blks = ocfs2_blocks_per_xattr_bucket(inode->i_sb);
+
+ for (i = 0; i < blks; i++) {
+ bucket->bu_bhs[i] = sb_getblk(inode->i_sb, xb_blkno + i);
+ if (!bucket->bu_bhs[i]) {
+ rc = -EIO;
+ mlog_errno(rc);
+ break;
+ }
+
+ ocfs2_set_new_buffer_uptodate(inode, bucket->bu_bhs[i]);
+ }
+
+ if (rc)
+ ocfs2_xattr_bucket_relse(inode, bucket);
+ return rc;
+}
+
+/* Read the xattr bucket at xb_blkno */
+static int ocfs2_read_xattr_bucket(struct inode *inode,
+ struct ocfs2_xattr_bucket *bucket,
+ u64 xb_blkno)
+{
+ int rc, blks = ocfs2_blocks_per_xattr_bucket(inode->i_sb);
+
+ rc = ocfs2_read_blocks(inode, xb_blkno, blks, bucket->bu_bhs, 0);
+ if (rc)
+ ocfs2_xattr_bucket_relse(inode, bucket);
+ return rc;
+}
+
static inline const char *ocfs2_xattr_prefix(int name_index)
{
struct xattr_handler *handler = NULL;
@@ -3097,31 +3139,6 @@ out:
return ret;
}

-static int ocfs2_read_xattr_bucket(struct inode *inode,
- u64 blkno,
- struct buffer_head **bhs,
- int new)
-{
- int ret = 0;
- u16 i, blk_per_bucket = ocfs2_blocks_per_xattr_bucket(inode->i_sb);
-
- if (!new)
- return ocfs2_read_blocks(inode, blkno,
- blk_per_bucket, bhs, 0);
-
- for (i = 0; i < blk_per_bucket; i++) {
- bhs[i] = sb_getblk(inode->i_sb, blkno + i);
- if (bhs[i] == NULL) {
- ret = -EIO;
- mlog_errno(ret);
- break;
- }
- ocfs2_set_new_buffer_uptodate(inode, bhs[i]);
- }
-
- return ret;
-}
-
/*
* Find the suitable pos when we divide a bucket into 2.
* We have to make sure the xattrs with the same hash value exist
@@ -3184,7 +3201,7 @@ static int ocfs2_divide_xattr_bucket(struct inode *inode,
int ret, i;
int count, start, len, name_value_len = 0, xe_len, name_offset = 0;
u16 blk_per_bucket = ocfs2_blocks_per_xattr_bucket(inode->i_sb);
- struct buffer_head **s_bhs, **t_bhs = NULL;
+ struct ocfs2_xattr_bucket s_bucket, t_bucket;
struct ocfs2_xattr_header *xh;
struct ocfs2_xattr_entry *xe;
int blocksize = inode->i_sb->s_blocksize;
@@ -3192,37 +3209,34 @@ static int ocfs2_divide_xattr_bucket(struct inode *inode,
mlog(0, "move some of xattrs from bucket %llu to %llu\n",
(unsigned long long)blk, (unsigned long long)new_blk);

- s_bhs = kcalloc(blk_per_bucket, sizeof(struct buffer_head *), GFP_NOFS);
- if (!s_bhs)
- return -ENOMEM;
+ memset(&s_bucket, 0, sizeof(struct ocfs2_xattr_bucket));
+ memset(&t_bucket, 0, sizeof(struct ocfs2_xattr_bucket));

- ret = ocfs2_read_xattr_bucket(inode, blk, s_bhs, 0);
+ ret = ocfs2_read_xattr_bucket(inode, &s_bucket, blk);
if (ret) {
mlog_errno(ret);
goto out;
}

- ret = ocfs2_journal_access(handle, inode, s_bhs[0],
+ ret = ocfs2_journal_access(handle, inode, s_bucket.bu_bhs[0],
OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
goto out;
}

- t_bhs = kcalloc(blk_per_bucket, sizeof(struct buffer_head *), GFP_NOFS);
- if (!t_bhs) {
- ret = -ENOMEM;
- goto out;
- }
-
- ret = ocfs2_read_xattr_bucket(inode, new_blk, t_bhs, new_bucket_head);
+ /*
+ * Even if !new_bucket_head, we're overwriting t_bucket. Thus,
+ * there's no need to read it.
+ */
+ ret = ocfs2_init_xattr_bucket(inode, &t_bucket, new_blk);
if (ret) {
mlog_errno(ret);
goto out;
}

for (i = 0; i < blk_per_bucket; i++) {
- ret = ocfs2_journal_access(handle, inode, t_bhs[i],
+ ret = ocfs2_journal_access(handle, inode, t_bucket.bu_bhs[i],
new_bucket_head ?
OCFS2_JOURNAL_ACCESS_CREATE :
OCFS2_JOURNAL_ACCESS_WRITE);
@@ -3232,7 +3246,7 @@ static int ocfs2_divide_xattr_bucket(struct inode *inode,
}
}

- xh = (struct ocfs2_xattr_header *)s_bhs[0]->b_data;
+ xh = bucket_xh(&s_bucket);
count = le16_to_cpu(xh->xh_count);
start = ocfs2_xattr_find_divide_pos(xh);

@@ -3245,9 +3259,9 @@ static int ocfs2_divide_xattr_bucket(struct inode *inode,
* that of the last entry in the previous bucket.
*/
for (i = 0; i < blk_per_bucket; i++)
- memset(t_bhs[i]->b_data, 0, blocksize);
+ memset(bucket_block(&t_bucket, i), 0, blocksize);

- xh = (struct ocfs2_xattr_header *)t_bhs[0]->b_data;
+ xh = bucket_xh(&t_bucket);
xh->xh_free_start = cpu_to_le16(blocksize);
xh->xh_entries[0].xe_name_hash = xe->xe_name_hash;
le32_add_cpu(&xh->xh_entries[0].xe_name_hash, 1);
@@ -3257,10 +3271,11 @@ static int ocfs2_divide_xattr_bucket(struct inode *inode,

/* copy the whole bucket to the new first. */
for (i = 0; i < blk_per_bucket; i++)
- memcpy(t_bhs[i]->b_data, s_bhs[i]->b_data, blocksize);
+ memcpy(bucket_block(&t_bucket, i), bucket_block(&s_bucket, i),
+ blocksize);

/* update the new bucket. */
- xh = (struct ocfs2_xattr_header *)t_bhs[0]->b_data;
+ xh = bucket_xh(&t_bucket);

/*
* Calculate the total name/value len and xh_free_start for
@@ -3325,7 +3340,7 @@ set_num_buckets:
xh->xh_num_buckets = 0;

for (i = 0; i < blk_per_bucket; i++) {
- ocfs2_journal_dirty(handle, t_bhs[i]);
+ ocfs2_journal_dirty(handle, t_bucket.bu_bhs[i]);
if (ret)
mlog_errno(ret);
}
@@ -3342,29 +3357,20 @@ set_num_buckets:
if (start == count)
goto out;

- xh = (struct ocfs2_xattr_header *)s_bhs[0]->b_data;
+ xh = bucket_xh(&s_bucket);
memset(&xh->xh_entries[start], 0,
sizeof(struct ocfs2_xattr_entry) * (count - start));
xh->xh_count = cpu_to_le16(start);
xh->xh_free_start = cpu_to_le16(name_offset);
xh->xh_name_value_len = cpu_to_le16(name_value_len);

- ocfs2_journal_dirty(handle, s_bhs[0]);
+ ocfs2_journal_dirty(handle, s_bucket.bu_bhs[0]);
if (ret)
mlog_errno(ret);

out:
- if (s_bhs) {
- for (i = 0; i < blk_per_bucket; i++)
- brelse(s_bhs[i]);
- }
- kfree(s_bhs);
-
- if (t_bhs) {
- for (i = 0; i < blk_per_bucket; i++)
- brelse(t_bhs[i]);
- }
- kfree(t_bhs);
+ ocfs2_xattr_bucket_relse(inode, &s_bucket);
+ ocfs2_xattr_bucket_relse(inode, &t_bucket);

return ret;
}
@@ -3384,7 +3390,7 @@ static int ocfs2_cp_xattr_bucket(struct inode *inode,
int ret, i;
int blk_per_bucket = ocfs2_blocks_per_xattr_bucket(inode->i_sb);
int blocksize = inode->i_sb->s_blocksize;
- struct buffer_head **s_bhs, **t_bhs = NULL;
+ struct ocfs2_xattr_bucket s_bucket, t_bucket;

BUG_ON(s_blkno == t_blkno);

@@ -3392,28 +3398,23 @@ static int ocfs2_cp_xattr_bucket(struct inode *inode,
(unsigned long long)s_blkno, (unsigned long long)t_blkno,
t_is_new);

- s_bhs = kzalloc(sizeof(struct buffer_head *) * blk_per_bucket,
- GFP_NOFS);
- if (!s_bhs)
- return -ENOMEM;
+ memset(&s_bucket, 0, sizeof(struct ocfs2_xattr_bucket));
+ memset(&t_bucket, 0, sizeof(struct ocfs2_xattr_bucket));

- ret = ocfs2_read_xattr_bucket(inode, s_blkno, s_bhs, 0);
+ ret = ocfs2_read_xattr_bucket(inode, &s_bucket, s_blkno);
if (ret)
goto out;

- t_bhs = kzalloc(sizeof(struct buffer_head *) * blk_per_bucket,
- GFP_NOFS);
- if (!t_bhs) {
- ret = -ENOMEM;
- goto out;
- }
-
- ret = ocfs2_read_xattr_bucket(inode, t_blkno, t_bhs, t_is_new);
+ /*
+ * Even if !t_is_new, we're overwriting t_bucket. Thus,
+ * there's no need to read it.
+ */
+ ret = ocfs2_init_xattr_bucket(inode, &t_bucket, t_blkno);
if (ret)
goto out;

for (i = 0; i < blk_per_bucket; i++) {
- ret = ocfs2_journal_access(handle, inode, t_bhs[i],
+ ret = ocfs2_journal_access(handle, inode, t_bucket.bu_bhs[i],
t_is_new ?
OCFS2_JOURNAL_ACCESS_CREATE :
OCFS2_JOURNAL_ACCESS_WRITE);
@@ -3422,22 +3423,14 @@ static int ocfs2_cp_xattr_bucket(struct inode *inode,
}

for (i = 0; i < blk_per_bucket; i++) {
- memcpy(t_bhs[i]->b_data, s_bhs[i]->b_data, blocksize);
- ocfs2_journal_dirty(handle, t_bhs[i]);
+ memcpy(bucket_block(&t_bucket, i), bucket_block(&s_bucket, i),
+ blocksize);
+ ocfs2_journal_dirty(handle, t_bucket.bu_bhs[i]);
}

out:
- if (s_bhs) {
- for (i = 0; i < blk_per_bucket; i++)
- brelse(s_bhs[i]);
- }
- kfree(s_bhs);
-
- if (t_bhs) {
- for (i = 0; i < blk_per_bucket; i++)
- brelse(t_bhs[i]);
- }
- kfree(t_bhs);
+ ocfs2_xattr_bucket_relse(inode, &s_bucket);
+ ocfs2_xattr_bucket_relse(inode, &t_bucket);

return ret;
}
--
1.5.6

2008-12-19 21:56:57

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 07/45] ocfs2: Wrap journal_access/journal_dirty for xattr buckets.

From: Joel Becker <[email protected]>

A common action is to call ocfs2_journal_access() and
ocfs2_journal_dirty() on the buffer heads of an xattr bucket. Let's
create nice wrappers.

While we're there, let's drop the places that try to be smart by writing
only the first and last blocks of a bucket. A bucket is contiguous, so
writing the whole thing is actually more efficient.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 140 +++++++++++++++++++++++++-----------------------------
1 files changed, 64 insertions(+), 76 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index fa13fa4..99aefe4 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -210,6 +210,37 @@ static int ocfs2_read_xattr_bucket(struct inode *inode,
return rc;
}

+static int ocfs2_xattr_bucket_journal_access(handle_t *handle,
+ struct inode *inode,
+ struct ocfs2_xattr_bucket *bucket,
+ int type)
+{
+ int i, rc = 0;
+ int blks = ocfs2_blocks_per_xattr_bucket(inode->i_sb);
+
+ for (i = 0; i < blks; i++) {
+ rc = ocfs2_journal_access(handle, inode,
+ bucket->bu_bhs[i], type);
+ if (rc) {
+ mlog_errno(rc);
+ break;
+ }
+ }
+
+ return rc;
+}
+
+static void ocfs2_xattr_bucket_journal_dirty(handle_t *handle,
+ struct inode *inode,
+ struct ocfs2_xattr_bucket *bucket)
+{
+ int i, blks = ocfs2_blocks_per_xattr_bucket(inode->i_sb);
+
+ for (i = 0; i < blks; i++)
+ ocfs2_journal_dirty(handle, bucket->bu_bhs[i]);
+}
+
+
static inline const char *ocfs2_xattr_prefix(int name_index)
{
struct xattr_handler *handler = NULL;
@@ -3218,8 +3249,8 @@ static int ocfs2_divide_xattr_bucket(struct inode *inode,
goto out;
}

- ret = ocfs2_journal_access(handle, inode, s_bucket.bu_bhs[0],
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_xattr_bucket_journal_access(handle, inode, &s_bucket,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
goto out;
@@ -3235,15 +3266,13 @@ static int ocfs2_divide_xattr_bucket(struct inode *inode,
goto out;
}

- for (i = 0; i < blk_per_bucket; i++) {
- ret = ocfs2_journal_access(handle, inode, t_bucket.bu_bhs[i],
- new_bucket_head ?
- OCFS2_JOURNAL_ACCESS_CREATE :
- OCFS2_JOURNAL_ACCESS_WRITE);
- if (ret) {
- mlog_errno(ret);
- goto out;
- }
+ ret = ocfs2_xattr_bucket_journal_access(handle, inode, &t_bucket,
+ new_bucket_head ?
+ OCFS2_JOURNAL_ACCESS_CREATE :
+ OCFS2_JOURNAL_ACCESS_WRITE);
+ if (ret) {
+ mlog_errno(ret);
+ goto out;
}

xh = bucket_xh(&s_bucket);
@@ -3339,11 +3368,7 @@ set_num_buckets:
else
xh->xh_num_buckets = 0;

- for (i = 0; i < blk_per_bucket; i++) {
- ocfs2_journal_dirty(handle, t_bucket.bu_bhs[i]);
- if (ret)
- mlog_errno(ret);
- }
+ ocfs2_xattr_bucket_journal_dirty(handle, inode, &t_bucket);

/* store the first_hash of the new bucket. */
if (first_hash)
@@ -3364,9 +3389,7 @@ set_num_buckets:
xh->xh_free_start = cpu_to_le16(name_offset);
xh->xh_name_value_len = cpu_to_le16(name_value_len);

- ocfs2_journal_dirty(handle, s_bucket.bu_bhs[0]);
- if (ret)
- mlog_errno(ret);
+ ocfs2_xattr_bucket_journal_dirty(handle, inode, &s_bucket);

out:
ocfs2_xattr_bucket_relse(inode, &s_bucket);
@@ -3413,20 +3436,18 @@ static int ocfs2_cp_xattr_bucket(struct inode *inode,
if (ret)
goto out;

- for (i = 0; i < blk_per_bucket; i++) {
- ret = ocfs2_journal_access(handle, inode, t_bucket.bu_bhs[i],
- t_is_new ?
- OCFS2_JOURNAL_ACCESS_CREATE :
- OCFS2_JOURNAL_ACCESS_WRITE);
- if (ret)
- goto out;
- }
+ ret = ocfs2_xattr_bucket_journal_access(handle, inode, &t_bucket,
+ t_is_new ?
+ OCFS2_JOURNAL_ACCESS_CREATE :
+ OCFS2_JOURNAL_ACCESS_WRITE);
+ if (ret)
+ goto out;

for (i = 0; i < blk_per_bucket; i++) {
memcpy(bucket_block(&t_bucket, i), bucket_block(&s_bucket, i),
blocksize);
- ocfs2_journal_dirty(handle, t_bucket.bu_bhs[i]);
}
+ ocfs2_xattr_bucket_journal_dirty(handle, inode, &t_bucket);

out:
ocfs2_xattr_bucket_relse(inode, &s_bucket);
@@ -3799,9 +3820,9 @@ static int ocfs2_extend_xattr_bucket(struct inode *inode,

/*
* We will touch all the buckets after the start_bh(include it).
- * Add one more bucket and modify the first_bh.
+ * Then we add one more bucket.
*/
- credits = end_blk - start_blk + 2 * blk_per_bucket + 1;
+ credits = end_blk - start_blk + 3 * blk_per_bucket + 1;
handle = ocfs2_start_trans(osb, credits);
if (IS_ERR(handle)) {
ret = PTR_ERR(handle);
@@ -4077,33 +4098,6 @@ set_new_name_value:
return;
}

-static int ocfs2_xattr_bucket_handle_journal(struct inode *inode,
- handle_t *handle,
- struct ocfs2_xattr_search *xs,
- struct buffer_head **bhs,
- u16 bh_num)
-{
- int ret = 0, off, block_off;
- struct ocfs2_xattr_entry *xe = xs->here;
-
- /*
- * First calculate all the blocks we should journal_access
- * and journal_dirty. The first block should always be touched.
- */
- ret = ocfs2_journal_dirty(handle, bhs[0]);
- if (ret)
- mlog_errno(ret);
-
- /* calc the data. */
- off = le16_to_cpu(xe->xe_name_offset);
- block_off = off >> inode->i_sb->s_blocksize_bits;
- ret = ocfs2_journal_dirty(handle, bhs[block_off]);
- if (ret)
- mlog_errno(ret);
-
- return ret;
-}
-
/*
* Set the xattr entry in the specified bucket.
* The bucket is indicated by xs->bucket and it should have the enough
@@ -4115,7 +4109,7 @@ static int ocfs2_xattr_set_entry_in_bucket(struct inode *inode,
u32 name_hash,
int local)
{
- int i, ret;
+ int ret;
handle_t *handle = NULL;
u16 blk_per_bucket = ocfs2_blocks_per_xattr_bucket(inode->i_sb);
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
@@ -4143,22 +4137,16 @@ static int ocfs2_xattr_set_entry_in_bucket(struct inode *inode,
goto out;
}

- for (i = 0; i < blk_per_bucket; i++) {
- ret = ocfs2_journal_access(handle, inode, xs->bucket.bu_bhs[i],
- OCFS2_JOURNAL_ACCESS_WRITE);
- if (ret < 0) {
- mlog_errno(ret);
- goto out;
- }
+ ret = ocfs2_xattr_bucket_journal_access(handle, inode, &xs->bucket,
+ OCFS2_JOURNAL_ACCESS_WRITE);
+ if (ret < 0) {
+ mlog_errno(ret);
+ goto out;
}

ocfs2_xattr_set_entry_normal(inode, xi, xs, name_hash, local);
+ ocfs2_xattr_bucket_journal_dirty(handle, inode, &xs->bucket);

- /*Only dirty the blocks we have touched in set xattr. */
- ret = ocfs2_xattr_bucket_handle_journal(inode, handle, xs,
- xs->bucket.bu_bhs, blk_per_bucket);
- if (ret)
- mlog_errno(ret);
out:
ocfs2_commit_trans(osb, handle);

@@ -4398,15 +4386,16 @@ static void ocfs2_xattr_bucket_remove_xs(struct inode *inode,
le16_to_cpu(xh->xh_count) - 1];
int ret = 0;

- handle = ocfs2_start_trans((OCFS2_SB(inode->i_sb)), 1);
+ handle = ocfs2_start_trans((OCFS2_SB(inode->i_sb)),
+ ocfs2_blocks_per_xattr_bucket(inode->i_sb));
if (IS_ERR(handle)) {
ret = PTR_ERR(handle);
mlog_errno(ret);
return;
}

- ret = ocfs2_journal_access(handle, inode, xs->bucket.bu_bhs[0],
- OCFS2_JOURNAL_ACCESS_WRITE);
+ ret = ocfs2_xattr_bucket_journal_access(handle, inode, &xs->bucket,
+ OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
goto out_commit;
@@ -4418,9 +4407,8 @@ static void ocfs2_xattr_bucket_remove_xs(struct inode *inode,
memset(last, 0, sizeof(struct ocfs2_xattr_entry));
le16_add_cpu(&xh->xh_count, -1);

- ret = ocfs2_journal_dirty(handle, xs->bucket.bu_bhs[0]);
- if (ret < 0)
- mlog_errno(ret);
+ ocfs2_xattr_bucket_journal_dirty(handle, inode, &xs->bucket);
+
out_commit:
ocfs2_commit_trans(OCFS2_SB(inode->i_sb), handle);
}
--
1.5.6

2008-12-19 21:57:37

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 09/45] ocfs2: Take ocfs2_xattr_bucket structures off of the stack.

From: Joel Becker <[email protected]>

The ocfs2_xattr_bucket structure is a nice abstraction, but it is a bit
large to have on the stack. Just like ocfs2_path, let's allocate it
with a ocfs2_xattr_bucket_new() function.

We can now store the inode on the bucket, cleaning up all the other
bucket functions. While we're here, we catch another place or two that
wasn't using ocfs2_read_xattr_bucket().

Updates:
- No longer allocating xis.bucket, as it will never be used.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 281 ++++++++++++++++++++++++++++++++----------------------
1 files changed, 166 insertions(+), 115 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 71d9e7b..766494e 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -61,7 +61,14 @@ struct ocfs2_xattr_def_value_root {
};

struct ocfs2_xattr_bucket {
+ /* The inode these xattrs are associated with */
+ struct inode *bu_inode;
+
+ /* The actual buffers that make up the bucket */
struct buffer_head *bu_bhs[OCFS2_XATTR_MAX_BLOCKS_PER_BUCKET];
+
+ /* How many blocks make up one bucket for this filesystem */
+ int bu_blocks;
};

#define OCFS2_XATTR_ROOT_SIZE (sizeof(struct ocfs2_xattr_def_value_root))
@@ -97,7 +104,7 @@ struct ocfs2_xattr_search {
*/
struct buffer_head *xattr_bh;
struct ocfs2_xattr_header *header;
- struct ocfs2_xattr_bucket bucket;
+ struct ocfs2_xattr_bucket *bucket;
void *base;
void *end;
struct ocfs2_xattr_entry *here;
@@ -157,69 +164,91 @@ static inline u16 ocfs2_xattr_max_xe_in_bucket(struct super_block *sb)
#define bucket_block(_b, _n) ((_b)->bu_bhs[(_n)]->b_data)
#define bucket_xh(_b) ((struct ocfs2_xattr_header *)bucket_block((_b), 0))

-static void ocfs2_xattr_bucket_relse(struct inode *inode,
- struct ocfs2_xattr_bucket *bucket)
+static struct ocfs2_xattr_bucket *ocfs2_xattr_bucket_new(struct inode *inode)
{
- int i, blks = ocfs2_blocks_per_xattr_bucket(inode->i_sb);
+ struct ocfs2_xattr_bucket *bucket;
+ int blks = ocfs2_blocks_per_xattr_bucket(inode->i_sb);

- for (i = 0; i < blks; i++) {
+ BUG_ON(blks > OCFS2_XATTR_MAX_BLOCKS_PER_BUCKET);
+
+ bucket = kzalloc(sizeof(struct ocfs2_xattr_bucket), GFP_NOFS);
+ if (bucket) {
+ bucket->bu_inode = inode;
+ bucket->bu_blocks = blks;
+ }
+
+ return bucket;
+}
+
+static void ocfs2_xattr_bucket_relse(struct ocfs2_xattr_bucket *bucket)
+{
+ int i;
+
+ for (i = 0; i < bucket->bu_blocks; i++) {
brelse(bucket->bu_bhs[i]);
bucket->bu_bhs[i] = NULL;
}
}

+static void ocfs2_xattr_bucket_free(struct ocfs2_xattr_bucket *bucket)
+{
+ if (bucket) {
+ ocfs2_xattr_bucket_relse(bucket);
+ bucket->bu_inode = NULL;
+ kfree(bucket);
+ }
+}
+
/*
* A bucket that has never been written to disk doesn't need to be
* read. We just need the buffer_heads. Don't call this for
* buckets that are already on disk. ocfs2_read_xattr_bucket() initializes
* them fully.
*/
-static int ocfs2_init_xattr_bucket(struct inode *inode,
- struct ocfs2_xattr_bucket *bucket,
+static int ocfs2_init_xattr_bucket(struct ocfs2_xattr_bucket *bucket,
u64 xb_blkno)
{
int i, rc = 0;
- int blks = ocfs2_blocks_per_xattr_bucket(inode->i_sb);

- for (i = 0; i < blks; i++) {
- bucket->bu_bhs[i] = sb_getblk(inode->i_sb, xb_blkno + i);
+ for (i = 0; i < bucket->bu_blocks; i++) {
+ bucket->bu_bhs[i] = sb_getblk(bucket->bu_inode->i_sb,
+ xb_blkno + i);
if (!bucket->bu_bhs[i]) {
rc = -EIO;
mlog_errno(rc);
break;
}

- ocfs2_set_new_buffer_uptodate(inode, bucket->bu_bhs[i]);
+ ocfs2_set_new_buffer_uptodate(bucket->bu_inode,
+ bucket->bu_bhs[i]);
}

if (rc)
- ocfs2_xattr_bucket_relse(inode, bucket);
+ ocfs2_xattr_bucket_relse(bucket);
return rc;
}

/* Read the xattr bucket at xb_blkno */
-static int ocfs2_read_xattr_bucket(struct inode *inode,
- struct ocfs2_xattr_bucket *bucket,
+static int ocfs2_read_xattr_bucket(struct ocfs2_xattr_bucket *bucket,
u64 xb_blkno)
{
- int rc, blks = ocfs2_blocks_per_xattr_bucket(inode->i_sb);
+ int rc;

- rc = ocfs2_read_blocks(inode, xb_blkno, blks, bucket->bu_bhs, 0);
+ rc = ocfs2_read_blocks(bucket->bu_inode, xb_blkno,
+ bucket->bu_blocks, bucket->bu_bhs, 0);
if (rc)
- ocfs2_xattr_bucket_relse(inode, bucket);
+ ocfs2_xattr_bucket_relse(bucket);
return rc;
}

static int ocfs2_xattr_bucket_journal_access(handle_t *handle,
- struct inode *inode,
struct ocfs2_xattr_bucket *bucket,
int type)
{
int i, rc = 0;
- int blks = ocfs2_blocks_per_xattr_bucket(inode->i_sb);

- for (i = 0; i < blks; i++) {
- rc = ocfs2_journal_access(handle, inode,
+ for (i = 0; i < bucket->bu_blocks; i++) {
+ rc = ocfs2_journal_access(handle, bucket->bu_inode,
bucket->bu_bhs[i], type);
if (rc) {
mlog_errno(rc);
@@ -231,24 +260,24 @@ static int ocfs2_xattr_bucket_journal_access(handle_t *handle,
}

static void ocfs2_xattr_bucket_journal_dirty(handle_t *handle,
- struct inode *inode,
struct ocfs2_xattr_bucket *bucket)
{
- int i, blks = ocfs2_blocks_per_xattr_bucket(inode->i_sb);
+ int i;

- for (i = 0; i < blks; i++)
+ for (i = 0; i < bucket->bu_blocks; i++)
ocfs2_journal_dirty(handle, bucket->bu_bhs[i]);
}

-static void ocfs2_xattr_bucket_copy_data(struct inode *inode,
- struct ocfs2_xattr_bucket *dest,
+static void ocfs2_xattr_bucket_copy_data(struct ocfs2_xattr_bucket *dest,
struct ocfs2_xattr_bucket *src)
{
int i;
- int blocksize = inode->i_sb->s_blocksize;
- int blks = ocfs2_blocks_per_xattr_bucket(inode->i_sb);
+ int blocksize = src->bu_inode->i_sb->s_blocksize;
+
+ BUG_ON(dest->bu_blocks != src->bu_blocks);
+ BUG_ON(dest->bu_inode != src->bu_inode);

- for (i = 0; i < blks; i++) {
+ for (i = 0; i < src->bu_blocks; i++) {
memcpy(bucket_block(dest, i), bucket_block(src, i),
blocksize);
}
@@ -869,7 +898,12 @@ static int ocfs2_xattr_block_get(struct inode *inode,
size_t size;
int ret = -ENODATA, name_offset, name_len, block_off, i;

- memset(&xs->bucket, 0, sizeof(xs->bucket));
+ xs->bucket = ocfs2_xattr_bucket_new(inode);
+ if (!xs->bucket) {
+ ret = -ENOMEM;
+ mlog_errno(ret);
+ goto cleanup;
+ }

ret = ocfs2_xattr_block_find(inode, name_index, name, xs);
if (ret) {
@@ -895,11 +929,11 @@ static int ocfs2_xattr_block_get(struct inode *inode,

if (le16_to_cpu(xb->xb_flags) & OCFS2_XATTR_INDEXED) {
ret = ocfs2_xattr_bucket_get_name_value(inode,
- bucket_xh(&xs->bucket),
+ bucket_xh(xs->bucket),
i,
&block_off,
&name_offset);
- xs->base = bucket_block(&xs->bucket, block_off);
+ xs->base = bucket_block(xs->bucket, block_off);
}
if (ocfs2_xattr_is_local(xs->here)) {
memcpy(buffer, (void *)xs->base +
@@ -917,8 +951,7 @@ static int ocfs2_xattr_block_get(struct inode *inode,
}
ret = size;
cleanup:
- ocfs2_xattr_bucket_relse(inode, &xs->bucket);
- memset(&xs->bucket, 0, sizeof(xs->bucket));
+ ocfs2_xattr_bucket_free(xs->bucket);

brelse(xs->xattr_bh);
xs->xattr_bh = NULL;
@@ -2047,10 +2080,20 @@ int ocfs2_xattr_set(struct inode *inode,
if (!ocfs2_supports_xattr(OCFS2_SB(inode->i_sb)))
return -EOPNOTSUPP;

+ /*
+ * Only xbs will be used on indexed trees. xis doesn't need a
+ * bucket.
+ */
+ xbs.bucket = ocfs2_xattr_bucket_new(inode);
+ if (!xbs.bucket) {
+ mlog_errno(-ENOMEM);
+ return -ENOMEM;
+ }
+
ret = ocfs2_inode_lock(inode, &di_bh, 1);
if (ret < 0) {
mlog_errno(ret);
- return ret;
+ goto cleanup_nolock;
}
xis.inode_bh = xbs.inode_bh = di_bh;
di = (struct ocfs2_dinode *)di_bh->b_data;
@@ -2127,9 +2170,10 @@ int ocfs2_xattr_set(struct inode *inode,
cleanup:
up_write(&OCFS2_I(inode)->ip_xattr_sem);
ocfs2_inode_unlock(inode, 1);
+cleanup_nolock:
brelse(di_bh);
brelse(xbs.xattr_bh);
- ocfs2_xattr_bucket_relse(inode, &xbs.bucket);
+ ocfs2_xattr_bucket_free(xbs.bucket);

return ret;
}
@@ -2373,11 +2417,11 @@ static int ocfs2_xattr_bucket_find(struct inode *inode,
lower_bh = bh;
bh = NULL;
}
- xs->bucket.bu_bhs[0] = lower_bh;
+ xs->bucket->bu_bhs[0] = lower_bh;
lower_bh = NULL;

- xs->header = bucket_xh(&xs->bucket);
- xs->base = bucket_block(&xs->bucket, 0);
+ xs->header = bucket_xh(xs->bucket);
+ xs->base = bucket_block(xs->bucket, 0);
xs->end = xs->base + inode->i_sb->s_blocksize;

if (found) {
@@ -2385,8 +2429,8 @@ static int ocfs2_xattr_bucket_find(struct inode *inode,
* If we have found the xattr enty, read all the blocks in
* this bucket.
*/
- ret = ocfs2_read_blocks(inode, bucket_blkno(&xs->bucket) + 1,
- blk_per_bucket - 1, &xs->bucket.bu_bhs[1],
+ ret = ocfs2_read_blocks(inode, bucket_blkno(xs->bucket) + 1,
+ blk_per_bucket - 1, &xs->bucket->bu_bhs[1],
0);
if (ret) {
mlog_errno(ret);
@@ -2395,7 +2439,7 @@ static int ocfs2_xattr_bucket_find(struct inode *inode,

xs->here = &xs->header->xh_entries[index];
mlog(0, "find xattr %s in bucket %llu, entry = %u\n", name,
- (unsigned long long)bucket_blkno(&xs->bucket), index);
+ (unsigned long long)bucket_blkno(xs->bucket), index);
} else
ret = -ENODATA;

@@ -2453,22 +2497,24 @@ static int ocfs2_iterate_xattr_buckets(struct inode *inode,
void *para)
{
int i, ret = 0;
- int blk_per_bucket = ocfs2_blocks_per_xattr_bucket(inode->i_sb);
u32 bpc = ocfs2_xattr_buckets_per_cluster(OCFS2_SB(inode->i_sb));
u32 num_buckets = clusters * bpc;
- struct ocfs2_xattr_bucket bucket;
+ struct ocfs2_xattr_bucket *bucket;

- memset(&bucket, 0, sizeof(bucket));
+ bucket = ocfs2_xattr_bucket_new(inode);
+ if (!bucket) {
+ mlog_errno(-ENOMEM);
+ return -ENOMEM;
+ }

mlog(0, "iterating xattr buckets in %u clusters starting from %llu\n",
clusters, (unsigned long long)blkno);

- for (i = 0; i < num_buckets; i++, blkno += blk_per_bucket) {
- ret = ocfs2_read_blocks(inode, blkno, blk_per_bucket,
- bucket.bu_bhs, 0);
+ for (i = 0; i < num_buckets; i++, blkno += bucket->bu_blocks) {
+ ret = ocfs2_read_xattr_bucket(bucket, blkno);
if (ret) {
mlog_errno(ret);
- goto out;
+ break;
}

/*
@@ -2476,26 +2522,24 @@ static int ocfs2_iterate_xattr_buckets(struct inode *inode,
* in the 1st bucket.
*/
if (i == 0)
- num_buckets = le16_to_cpu(bucket_xh(&bucket)->xh_num_buckets);
+ num_buckets = le16_to_cpu(bucket_xh(bucket)->xh_num_buckets);

mlog(0, "iterating xattr bucket %llu, first hash %u\n",
(unsigned long long)blkno,
- le32_to_cpu(bucket_xh(&bucket)->xh_entries[0].xe_name_hash));
+ le32_to_cpu(bucket_xh(bucket)->xh_entries[0].xe_name_hash));
if (func) {
- ret = func(inode, &bucket, para);
- if (ret) {
+ ret = func(inode, bucket, para);
+ if (ret)
mlog_errno(ret);
- break;
- }
+ /* Fall through to bucket_relse() */
}

- ocfs2_xattr_bucket_relse(inode, &bucket);
- memset(&bucket, 0, sizeof(bucket));
+ ocfs2_xattr_bucket_relse(bucket);
+ if (ret)
+ break;
}

-out:
- ocfs2_xattr_bucket_relse(inode, &bucket);
-
+ ocfs2_xattr_bucket_free(bucket);
return ret;
}

@@ -2718,9 +2762,9 @@ static int ocfs2_xattr_update_xattr_search(struct inode *inode,
int i, blocksize = inode->i_sb->s_blocksize;
u16 blk_per_bucket = ocfs2_blocks_per_xattr_bucket(inode->i_sb);

- xs->bucket.bu_bhs[0] = new_bh;
+ xs->bucket->bu_bhs[0] = new_bh;
get_bh(new_bh);
- xs->header = bucket_xh(&xs->bucket);
+ xs->header = bucket_xh(xs->bucket);

xs->base = new_bh->b_data;
xs->end = xs->base + inode->i_sb->s_blocksize;
@@ -2728,8 +2772,8 @@ static int ocfs2_xattr_update_xattr_search(struct inode *inode,
if (!xs->not_found) {
if (OCFS2_XATTR_BUCKET_SIZE != blocksize) {
ret = ocfs2_read_blocks(inode,
- bucket_blkno(&xs->bucket) + 1,
- blk_per_bucket - 1, &xs->bucket.bu_bhs[1],
+ bucket_blkno(xs->bucket) + 1,
+ blk_per_bucket - 1, &xs->bucket->bu_bhs[1],
0);
if (ret) {
mlog_errno(ret);
@@ -3244,8 +3288,7 @@ static int ocfs2_divide_xattr_bucket(struct inode *inode,
{
int ret, i;
int count, start, len, name_value_len = 0, xe_len, name_offset = 0;
- u16 blk_per_bucket = ocfs2_blocks_per_xattr_bucket(inode->i_sb);
- struct ocfs2_xattr_bucket s_bucket, t_bucket;
+ struct ocfs2_xattr_bucket *s_bucket = NULL, *t_bucket = NULL;
struct ocfs2_xattr_header *xh;
struct ocfs2_xattr_entry *xe;
int blocksize = inode->i_sb->s_blocksize;
@@ -3253,16 +3296,21 @@ static int ocfs2_divide_xattr_bucket(struct inode *inode,
mlog(0, "move some of xattrs from bucket %llu to %llu\n",
(unsigned long long)blk, (unsigned long long)new_blk);

- memset(&s_bucket, 0, sizeof(struct ocfs2_xattr_bucket));
- memset(&t_bucket, 0, sizeof(struct ocfs2_xattr_bucket));
+ s_bucket = ocfs2_xattr_bucket_new(inode);
+ t_bucket = ocfs2_xattr_bucket_new(inode);
+ if (!s_bucket || !t_bucket) {
+ ret = -ENOMEM;
+ mlog_errno(ret);
+ goto out;
+ }

- ret = ocfs2_read_xattr_bucket(inode, &s_bucket, blk);
+ ret = ocfs2_read_xattr_bucket(s_bucket, blk);
if (ret) {
mlog_errno(ret);
goto out;
}

- ret = ocfs2_xattr_bucket_journal_access(handle, inode, &s_bucket,
+ ret = ocfs2_xattr_bucket_journal_access(handle, s_bucket,
OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
@@ -3273,13 +3321,13 @@ static int ocfs2_divide_xattr_bucket(struct inode *inode,
* Even if !new_bucket_head, we're overwriting t_bucket. Thus,
* there's no need to read it.
*/
- ret = ocfs2_init_xattr_bucket(inode, &t_bucket, new_blk);
+ ret = ocfs2_init_xattr_bucket(t_bucket, new_blk);
if (ret) {
mlog_errno(ret);
goto out;
}

- ret = ocfs2_xattr_bucket_journal_access(handle, inode, &t_bucket,
+ ret = ocfs2_xattr_bucket_journal_access(handle, t_bucket,
new_bucket_head ?
OCFS2_JOURNAL_ACCESS_CREATE :
OCFS2_JOURNAL_ACCESS_WRITE);
@@ -3288,7 +3336,7 @@ static int ocfs2_divide_xattr_bucket(struct inode *inode,
goto out;
}

- xh = bucket_xh(&s_bucket);
+ xh = bucket_xh(s_bucket);
count = le16_to_cpu(xh->xh_count);
start = ocfs2_xattr_find_divide_pos(xh);

@@ -3300,10 +3348,10 @@ static int ocfs2_divide_xattr_bucket(struct inode *inode,
* The hash value is set as one larger than
* that of the last entry in the previous bucket.
*/
- for (i = 0; i < blk_per_bucket; i++)
- memset(bucket_block(&t_bucket, i), 0, blocksize);
+ for (i = 0; i < t_bucket->bu_blocks; i++)
+ memset(bucket_block(t_bucket, i), 0, blocksize);

- xh = bucket_xh(&t_bucket);
+ xh = bucket_xh(t_bucket);
xh->xh_free_start = cpu_to_le16(blocksize);
xh->xh_entries[0].xe_name_hash = xe->xe_name_hash;
le32_add_cpu(&xh->xh_entries[0].xe_name_hash, 1);
@@ -3312,10 +3360,10 @@ static int ocfs2_divide_xattr_bucket(struct inode *inode,
}

/* copy the whole bucket to the new first. */
- ocfs2_xattr_bucket_copy_data(inode, &t_bucket, &s_bucket);
+ ocfs2_xattr_bucket_copy_data(t_bucket, s_bucket);

/* update the new bucket. */
- xh = bucket_xh(&t_bucket);
+ xh = bucket_xh(t_bucket);

/*
* Calculate the total name/value len and xh_free_start for
@@ -3379,7 +3427,7 @@ set_num_buckets:
else
xh->xh_num_buckets = 0;

- ocfs2_xattr_bucket_journal_dirty(handle, inode, &t_bucket);
+ ocfs2_xattr_bucket_journal_dirty(handle, t_bucket);

/* store the first_hash of the new bucket. */
if (first_hash)
@@ -3393,18 +3441,18 @@ set_num_buckets:
if (start == count)
goto out;

- xh = bucket_xh(&s_bucket);
+ xh = bucket_xh(s_bucket);
memset(&xh->xh_entries[start], 0,
sizeof(struct ocfs2_xattr_entry) * (count - start));
xh->xh_count = cpu_to_le16(start);
xh->xh_free_start = cpu_to_le16(name_offset);
xh->xh_name_value_len = cpu_to_le16(name_value_len);

- ocfs2_xattr_bucket_journal_dirty(handle, inode, &s_bucket);
+ ocfs2_xattr_bucket_journal_dirty(handle, s_bucket);

out:
- ocfs2_xattr_bucket_relse(inode, &s_bucket);
- ocfs2_xattr_bucket_relse(inode, &t_bucket);
+ ocfs2_xattr_bucket_free(s_bucket);
+ ocfs2_xattr_bucket_free(t_bucket);

return ret;
}
@@ -3422,7 +3470,7 @@ static int ocfs2_cp_xattr_bucket(struct inode *inode,
int t_is_new)
{
int ret;
- struct ocfs2_xattr_bucket s_bucket, t_bucket;
+ struct ocfs2_xattr_bucket *s_bucket = NULL, *t_bucket = NULL;

BUG_ON(s_blkno == t_blkno);

@@ -3430,10 +3478,15 @@ static int ocfs2_cp_xattr_bucket(struct inode *inode,
(unsigned long long)s_blkno, (unsigned long long)t_blkno,
t_is_new);

- memset(&s_bucket, 0, sizeof(struct ocfs2_xattr_bucket));
- memset(&t_bucket, 0, sizeof(struct ocfs2_xattr_bucket));
-
- ret = ocfs2_read_xattr_bucket(inode, &s_bucket, s_blkno);
+ s_bucket = ocfs2_xattr_bucket_new(inode);
+ t_bucket = ocfs2_xattr_bucket_new(inode);
+ if (!s_bucket || !t_bucket) {
+ ret = -ENOMEM;
+ mlog_errno(ret);
+ goto out;
+ }
+
+ ret = ocfs2_read_xattr_bucket(s_bucket, s_blkno);
if (ret)
goto out;

@@ -3441,23 +3494,23 @@ static int ocfs2_cp_xattr_bucket(struct inode *inode,
* Even if !t_is_new, we're overwriting t_bucket. Thus,
* there's no need to read it.
*/
- ret = ocfs2_init_xattr_bucket(inode, &t_bucket, t_blkno);
+ ret = ocfs2_init_xattr_bucket(t_bucket, t_blkno);
if (ret)
goto out;

- ret = ocfs2_xattr_bucket_journal_access(handle, inode, &t_bucket,
+ ret = ocfs2_xattr_bucket_journal_access(handle, t_bucket,
t_is_new ?
OCFS2_JOURNAL_ACCESS_CREATE :
OCFS2_JOURNAL_ACCESS_WRITE);
if (ret)
goto out;

- ocfs2_xattr_bucket_copy_data(inode, &t_bucket, &s_bucket);
- ocfs2_xattr_bucket_journal_dirty(handle, inode, &t_bucket);
+ ocfs2_xattr_bucket_copy_data(t_bucket, s_bucket);
+ ocfs2_xattr_bucket_journal_dirty(handle, t_bucket);

out:
- ocfs2_xattr_bucket_relse(inode, &s_bucket);
- ocfs2_xattr_bucket_relse(inode, &t_bucket);
+ ocfs2_xattr_bucket_free(t_bucket);
+ ocfs2_xattr_bucket_free(s_bucket);

return ret;
}
@@ -4009,7 +4062,7 @@ static void ocfs2_xattr_set_entry_normal(struct inode *inode,
xe->xe_value_size = 0;

val = ocfs2_xattr_bucket_get_val(inode,
- &xs->bucket, offs);
+ xs->bucket, offs);
memset(val + OCFS2_XATTR_SIZE(name_len), 0,
size - OCFS2_XATTR_SIZE(name_len));
if (OCFS2_XATTR_SIZE(xi->value_len) > 0)
@@ -4087,8 +4140,7 @@ set_new_name_value:
xh->xh_free_start = cpu_to_le16(offs);
}

- val = ocfs2_xattr_bucket_get_val(inode,
- &xs->bucket, offs - size);
+ val = ocfs2_xattr_bucket_get_val(inode, xs->bucket, offs - size);
xe->xe_name_offset = cpu_to_le16(offs - size);

memset(val, 0, size);
@@ -4122,12 +4174,12 @@ static int ocfs2_xattr_set_entry_in_bucket(struct inode *inode,

mlog(0, "Set xattr entry len = %lu index = %d in bucket %llu\n",
(unsigned long)xi->value_len, xi->name_index,
- (unsigned long long)bucket_blkno(&xs->bucket));
+ (unsigned long long)bucket_blkno(xs->bucket));

- if (!xs->bucket.bu_bhs[1]) {
+ if (!xs->bucket->bu_bhs[1]) {
ret = ocfs2_read_blocks(inode,
- bucket_blkno(&xs->bucket) + 1,
- blk_per_bucket - 1, &xs->bucket.bu_bhs[1],
+ bucket_blkno(xs->bucket) + 1,
+ blk_per_bucket - 1, &xs->bucket->bu_bhs[1],
0);
if (ret) {
mlog_errno(ret);
@@ -4143,7 +4195,7 @@ static int ocfs2_xattr_set_entry_in_bucket(struct inode *inode,
goto out;
}

- ret = ocfs2_xattr_bucket_journal_access(handle, inode, &xs->bucket,
+ ret = ocfs2_xattr_bucket_journal_access(handle, xs->bucket,
OCFS2_JOURNAL_ACCESS_WRITE);
if (ret < 0) {
mlog_errno(ret);
@@ -4151,7 +4203,7 @@ static int ocfs2_xattr_set_entry_in_bucket(struct inode *inode,
}

ocfs2_xattr_set_entry_normal(inode, xi, xs, name_hash, local);
- ocfs2_xattr_bucket_journal_dirty(handle, inode, &xs->bucket);
+ ocfs2_xattr_bucket_journal_dirty(handle, xs->bucket);

out:
ocfs2_commit_trans(osb, handle);
@@ -4264,10 +4316,10 @@ static int ocfs2_xattr_bucket_value_truncate_xs(struct inode *inode,
struct ocfs2_xattr_entry *xe = xs->here;
struct ocfs2_xattr_header *xh = (struct ocfs2_xattr_header *)xs->base;

- BUG_ON(!xs->bucket.bu_bhs[0] || !xe || ocfs2_xattr_is_local(xe));
+ BUG_ON(!xs->bucket->bu_bhs[0] || !xe || ocfs2_xattr_is_local(xe));

offset = xe - xh->xh_entries;
- ret = ocfs2_xattr_bucket_value_truncate(inode, xs->bucket.bu_bhs[0],
+ ret = ocfs2_xattr_bucket_value_truncate(inode, xs->bucket->bu_bhs[0],
offset, len);
if (ret)
mlog_errno(ret);
@@ -4387,7 +4439,7 @@ static void ocfs2_xattr_bucket_remove_xs(struct inode *inode,
struct ocfs2_xattr_search *xs)
{
handle_t *handle = NULL;
- struct ocfs2_xattr_header *xh = bucket_xh(&xs->bucket);
+ struct ocfs2_xattr_header *xh = bucket_xh(xs->bucket);
struct ocfs2_xattr_entry *last = &xh->xh_entries[
le16_to_cpu(xh->xh_count) - 1];
int ret = 0;
@@ -4400,7 +4452,7 @@ static void ocfs2_xattr_bucket_remove_xs(struct inode *inode,
return;
}

- ret = ocfs2_xattr_bucket_journal_access(handle, inode, &xs->bucket,
+ ret = ocfs2_xattr_bucket_journal_access(handle, xs->bucket,
OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
@@ -4413,7 +4465,7 @@ static void ocfs2_xattr_bucket_remove_xs(struct inode *inode,
memset(last, 0, sizeof(struct ocfs2_xattr_entry));
le16_add_cpu(&xh->xh_count, -1);

- ocfs2_xattr_bucket_journal_dirty(handle, inode, &xs->bucket);
+ ocfs2_xattr_bucket_journal_dirty(handle, xs->bucket);

out_commit:
ocfs2_commit_trans(OCFS2_SB(inode->i_sb), handle);
@@ -4565,7 +4617,7 @@ try_again:

mlog_bug_on_msg(header_size > blocksize, "bucket %llu has header size "
"of %u which exceed block size\n",
- (unsigned long long)bucket_blkno(&xs->bucket),
+ (unsigned long long)bucket_blkno(xs->bucket),
header_size);

if (xi->value && xi->value_len > OCFS2_XATTR_INLINE_SIZE)
@@ -4605,7 +4657,7 @@ try_again:
mlog(0, "xs->not_found = %d, in xattr bucket %llu: free = %d, "
"need = %d, max_free = %d, xh_free_start = %u, xh_name_value_len ="
" %u\n", xs->not_found,
- (unsigned long long)bucket_blkno(&xs->bucket),
+ (unsigned long long)bucket_blkno(xs->bucket),
free, need, max_free, le16_to_cpu(xh->xh_free_start),
le16_to_cpu(xh->xh_name_value_len));

@@ -4617,7 +4669,7 @@ try_again:
* name/value will be moved, the xe shouldn't be changed
* in xs.
*/
- ret = ocfs2_defrag_xattr_bucket(inode, &xs->bucket);
+ ret = ocfs2_defrag_xattr_bucket(inode, xs->bucket);
if (ret) {
mlog_errno(ret);
goto out;
@@ -4649,7 +4701,7 @@ try_again:
* add a new bucket for the insert.
*/
ret = ocfs2_check_xattr_bucket_collision(inode,
- &xs->bucket,
+ xs->bucket,
xi->name);
if (ret) {
mlog_errno(ret);
@@ -4658,14 +4710,13 @@ try_again:

ret = ocfs2_add_new_xattr_bucket(inode,
xs->xattr_bh,
- xs->bucket.bu_bhs[0]);
+ xs->bucket->bu_bhs[0]);
if (ret) {
mlog_errno(ret);
goto out;
}

- ocfs2_xattr_bucket_relse(inode, &xs->bucket);
- memset(&xs->bucket, 0, sizeof(xs->bucket));
+ ocfs2_xattr_bucket_relse(xs->bucket);

ret = ocfs2_xattr_index_block_find(inode, xs->xattr_bh,
xi->name_index,
--
1.5.6

2008-12-19 21:57:19

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 08/45] ocfs2: Copy xattr buckets with a dedicated function.

From: Joel Becker <[email protected]>

Now that the places that copy whole buckets are using struct
ocfs2_xattr_bucket, we can do the copy in a dedicated function.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 26 ++++++++++++++++----------
1 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 99aefe4..71d9e7b 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -240,6 +240,19 @@ static void ocfs2_xattr_bucket_journal_dirty(handle_t *handle,
ocfs2_journal_dirty(handle, bucket->bu_bhs[i]);
}

+static void ocfs2_xattr_bucket_copy_data(struct inode *inode,
+ struct ocfs2_xattr_bucket *dest,
+ struct ocfs2_xattr_bucket *src)
+{
+ int i;
+ int blocksize = inode->i_sb->s_blocksize;
+ int blks = ocfs2_blocks_per_xattr_bucket(inode->i_sb);
+
+ for (i = 0; i < blks; i++) {
+ memcpy(bucket_block(dest, i), bucket_block(src, i),
+ blocksize);
+ }
+}

static inline const char *ocfs2_xattr_prefix(int name_index)
{
@@ -3299,9 +3312,7 @@ static int ocfs2_divide_xattr_bucket(struct inode *inode,
}

/* copy the whole bucket to the new first. */
- for (i = 0; i < blk_per_bucket; i++)
- memcpy(bucket_block(&t_bucket, i), bucket_block(&s_bucket, i),
- blocksize);
+ ocfs2_xattr_bucket_copy_data(inode, &t_bucket, &s_bucket);

/* update the new bucket. */
xh = bucket_xh(&t_bucket);
@@ -3410,9 +3421,7 @@ static int ocfs2_cp_xattr_bucket(struct inode *inode,
u64 t_blkno,
int t_is_new)
{
- int ret, i;
- int blk_per_bucket = ocfs2_blocks_per_xattr_bucket(inode->i_sb);
- int blocksize = inode->i_sb->s_blocksize;
+ int ret;
struct ocfs2_xattr_bucket s_bucket, t_bucket;

BUG_ON(s_blkno == t_blkno);
@@ -3443,10 +3452,7 @@ static int ocfs2_cp_xattr_bucket(struct inode *inode,
if (ret)
goto out;

- for (i = 0; i < blk_per_bucket; i++) {
- memcpy(bucket_block(&t_bucket, i), bucket_block(&s_bucket, i),
- blocksize);
- }
+ ocfs2_xattr_bucket_copy_data(inode, &t_bucket, &s_bucket);
ocfs2_xattr_bucket_journal_dirty(handle, inode, &t_bucket);

out:
--
1.5.6

2008-12-19 21:57:54

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 10/45] ocfs2: Use buckets in ocfs2_xattr_bucket_find().

From: Joel Becker <[email protected]>

Change the ocfs2_xattr_bucket_find() function to use ocfs2_xattr_bucket
as its abstraction. This makes for more efficient reads, as buckets are
linear blocks, and also has improved caching characteristics. It also
reads better.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 89 +++++++++++++++++++-----------------------------------
1 files changed, 31 insertions(+), 58 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 766494e..46986c6 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -2248,7 +2248,7 @@ typedef int (xattr_bucket_func)(struct inode *inode,
void *para);

static int ocfs2_find_xe_in_bucket(struct inode *inode,
- struct buffer_head *header_bh,
+ struct ocfs2_xattr_bucket *bucket,
int name_index,
const char *name,
u32 name_hash,
@@ -2256,11 +2256,9 @@ static int ocfs2_find_xe_in_bucket(struct inode *inode,
int *found)
{
int i, ret = 0, cmp = 1, block_off, new_offset;
- struct ocfs2_xattr_header *xh =
- (struct ocfs2_xattr_header *)header_bh->b_data;
+ struct ocfs2_xattr_header *xh = bucket_xh(bucket);
size_t name_len = strlen(name);
struct ocfs2_xattr_entry *xe = NULL;
- struct buffer_head *name_bh = NULL;
char *xe_name;

/*
@@ -2291,19 +2289,8 @@ static int ocfs2_find_xe_in_bucket(struct inode *inode,
break;
}

- ret = ocfs2_read_block(inode, header_bh->b_blocknr + block_off,
- &name_bh);
- if (ret) {
- mlog_errno(ret);
- break;
- }
- xe_name = name_bh->b_data + new_offset;
-
- cmp = memcmp(name, xe_name, name_len);
- brelse(name_bh);
- name_bh = NULL;
-
- if (cmp == 0) {
+ xe_name = bucket_block(bucket, block_off) + new_offset;
+ if (!memcmp(name, xe_name, name_len)) {
*xe_index = i;
*found = 1;
ret = 0;
@@ -2333,39 +2320,42 @@ static int ocfs2_xattr_bucket_find(struct inode *inode,
struct ocfs2_xattr_search *xs)
{
int ret, found = 0;
- struct buffer_head *bh = NULL;
- struct buffer_head *lower_bh = NULL;
struct ocfs2_xattr_header *xh = NULL;
struct ocfs2_xattr_entry *xe = NULL;
u16 index = 0;
u16 blk_per_bucket = ocfs2_blocks_per_xattr_bucket(inode->i_sb);
int low_bucket = 0, bucket, high_bucket;
+ struct ocfs2_xattr_bucket *search;
u32 last_hash;
- u64 blkno;
+ u64 blkno, lower_blkno = 0;

- ret = ocfs2_read_block(inode, p_blkno, &bh);
+ search = ocfs2_xattr_bucket_new(inode);
+ if (!search) {
+ ret = -ENOMEM;
+ mlog_errno(ret);
+ goto out;
+ }
+
+ ret = ocfs2_read_xattr_bucket(search, p_blkno);
if (ret) {
mlog_errno(ret);
goto out;
}

- xh = (struct ocfs2_xattr_header *)bh->b_data;
+ xh = bucket_xh(search);
high_bucket = le16_to_cpu(xh->xh_num_buckets) - 1;
-
while (low_bucket <= high_bucket) {
- brelse(bh);
- bh = NULL;
- bucket = (low_bucket + high_bucket) / 2;
+ ocfs2_xattr_bucket_relse(search);

+ bucket = (low_bucket + high_bucket) / 2;
blkno = p_blkno + bucket * blk_per_bucket;
-
- ret = ocfs2_read_block(inode, blkno, &bh);
+ ret = ocfs2_read_xattr_bucket(search, blkno);
if (ret) {
mlog_errno(ret);
goto out;
}

- xh = (struct ocfs2_xattr_header *)bh->b_data;
+ xh = bucket_xh(search);
xe = &xh->xh_entries[0];
if (name_hash < le32_to_cpu(xe->xe_name_hash)) {
high_bucket = bucket - 1;
@@ -2382,10 +2372,8 @@ static int ocfs2_xattr_bucket_find(struct inode *inode,

last_hash = le32_to_cpu(xe->xe_name_hash);

- /* record lower_bh which may be the insert place. */
- brelse(lower_bh);
- lower_bh = bh;
- bh = NULL;
+ /* record lower_blkno which may be the insert place. */
+ lower_blkno = blkno;

if (name_hash > le32_to_cpu(xe->xe_name_hash)) {
low_bucket = bucket + 1;
@@ -2393,7 +2381,7 @@ static int ocfs2_xattr_bucket_find(struct inode *inode,
}

/* the searched xattr should reside in this bucket if exists. */
- ret = ocfs2_find_xe_in_bucket(inode, lower_bh,
+ ret = ocfs2_find_xe_in_bucket(inode, search,
name_index, name, name_hash,
&index, &found);
if (ret) {
@@ -2408,35 +2396,21 @@ static int ocfs2_xattr_bucket_find(struct inode *inode,
* When the xattr's hash value is in the gap of 2 buckets, we will
* always set it to the previous bucket.
*/
- if (!lower_bh) {
- /*
- * We can't find any bucket whose first name_hash is less
- * than the find name_hash.
- */
- BUG_ON(bh->b_blocknr != p_blkno);
- lower_bh = bh;
- bh = NULL;
+ if (!lower_blkno)
+ lower_blkno = p_blkno;
+
+ /* This should be in cache - we just read it during the search */
+ ret = ocfs2_read_xattr_bucket(xs->bucket, lower_blkno);
+ if (ret) {
+ mlog_errno(ret);
+ goto out;
}
- xs->bucket->bu_bhs[0] = lower_bh;
- lower_bh = NULL;

xs->header = bucket_xh(xs->bucket);
xs->base = bucket_block(xs->bucket, 0);
xs->end = xs->base + inode->i_sb->s_blocksize;

if (found) {
- /*
- * If we have found the xattr enty, read all the blocks in
- * this bucket.
- */
- ret = ocfs2_read_blocks(inode, bucket_blkno(xs->bucket) + 1,
- blk_per_bucket - 1, &xs->bucket->bu_bhs[1],
- 0);
- if (ret) {
- mlog_errno(ret);
- goto out;
- }
-
xs->here = &xs->header->xh_entries[index];
mlog(0, "find xattr %s in bucket %llu, entry = %u\n", name,
(unsigned long long)bucket_blkno(xs->bucket), index);
@@ -2444,8 +2418,7 @@ static int ocfs2_xattr_bucket_find(struct inode *inode,
ret = -ENODATA;

out:
- brelse(bh);
- brelse(lower_bh);
+ ocfs2_xattr_bucket_free(search);
return ret;
}

--
1.5.6

2008-12-19 21:58:20

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 11/45] ocfs2: Use buckets in ocfs2_xattr_create_index_block().

From: Joel Becker <[email protected]>

Use the ocfs2_xattr_bucket abstraction in
ocfs2_xattr_create_index_block() and its helpers. We get more efficient
reads, a lot less buffer_head munging, and nicer code to boot. While
we're at it, ocfs2_xattr_update_xattr_search() becomes void.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 114 +++++++++++++++--------------------------------------
1 files changed, 32 insertions(+), 82 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 46986c6..76969b9 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -2649,32 +2649,34 @@ static void swap_xe(void *a, void *b, int size)
/*
* When the ocfs2_xattr_block is filled up, new bucket will be created
* and all the xattr entries will be moved to the new bucket.
+ * The header goes at the start of the bucket, and the names+values are
+ * filled from the end. This is why *target starts as the last buffer.
* Note: we need to sort the entries since they are not saved in order
* in the ocfs2_xattr_block.
*/
static void ocfs2_cp_xattr_block_to_bucket(struct inode *inode,
struct buffer_head *xb_bh,
- struct buffer_head *xh_bh,
- struct buffer_head *data_bh)
+ struct ocfs2_xattr_bucket *bucket)
{
int i, blocksize = inode->i_sb->s_blocksize;
+ int blks = ocfs2_blocks_per_xattr_bucket(inode->i_sb);
u16 offset, size, off_change;
struct ocfs2_xattr_entry *xe;
struct ocfs2_xattr_block *xb =
(struct ocfs2_xattr_block *)xb_bh->b_data;
struct ocfs2_xattr_header *xb_xh = &xb->xb_attrs.xb_header;
- struct ocfs2_xattr_header *xh =
- (struct ocfs2_xattr_header *)xh_bh->b_data;
+ struct ocfs2_xattr_header *xh = bucket_xh(bucket);
u16 count = le16_to_cpu(xb_xh->xh_count);
- char *target = xh_bh->b_data, *src = xb_bh->b_data;
+ char *src = xb_bh->b_data;
+ char *target = bucket_block(bucket, blks - 1);

mlog(0, "cp xattr from block %llu to bucket %llu\n",
(unsigned long long)xb_bh->b_blocknr,
- (unsigned long long)xh_bh->b_blocknr);
+ (unsigned long long)bucket_blkno(bucket));
+
+ for (i = 0; i < blks; i++)
+ memset(bucket_block(bucket, i), 0, blocksize);

- memset(xh_bh->b_data, 0, blocksize);
- if (data_bh)
- memset(data_bh->b_data, 0, blocksize);
/*
* Since the xe_name_offset is based on ocfs2_xattr_header,
* there is a offset change corresponding to the change of
@@ -2686,8 +2688,6 @@ static void ocfs2_cp_xattr_block_to_bucket(struct inode *inode,
size = blocksize - offset;

/* copy all the names and values. */
- if (data_bh)
- target = data_bh->b_data;
memcpy(target + offset, src + offset, size);

/* Init new header now. */
@@ -2697,7 +2697,7 @@ static void ocfs2_cp_xattr_block_to_bucket(struct inode *inode,
xh->xh_free_start = cpu_to_le16(OCFS2_XATTR_BUCKET_SIZE - size);

/* copy all the entries. */
- target = xh_bh->b_data;
+ target = bucket_block(bucket, 0);
offset = offsetof(struct ocfs2_xattr_header, xh_entries);
size = count * sizeof(struct ocfs2_xattr_entry);
memcpy(target + offset, (char *)xb_xh + offset, size);
@@ -2723,42 +2723,24 @@ static void ocfs2_cp_xattr_block_to_bucket(struct inode *inode,
* While if the entry is in index b-tree, "bucket" indicates the
* real place of the xattr.
*/
-static int ocfs2_xattr_update_xattr_search(struct inode *inode,
- struct ocfs2_xattr_search *xs,
- struct buffer_head *old_bh,
- struct buffer_head *new_bh)
+static void ocfs2_xattr_update_xattr_search(struct inode *inode,
+ struct ocfs2_xattr_search *xs,
+ struct buffer_head *old_bh)
{
- int ret = 0;
char *buf = old_bh->b_data;
struct ocfs2_xattr_block *old_xb = (struct ocfs2_xattr_block *)buf;
struct ocfs2_xattr_header *old_xh = &old_xb->xb_attrs.xb_header;
- int i, blocksize = inode->i_sb->s_blocksize;
- u16 blk_per_bucket = ocfs2_blocks_per_xattr_bucket(inode->i_sb);
+ int i;

- xs->bucket->bu_bhs[0] = new_bh;
- get_bh(new_bh);
xs->header = bucket_xh(xs->bucket);
-
- xs->base = new_bh->b_data;
+ xs->base = bucket_block(xs->bucket, 0);
xs->end = xs->base + inode->i_sb->s_blocksize;

- if (!xs->not_found) {
- if (OCFS2_XATTR_BUCKET_SIZE != blocksize) {
- ret = ocfs2_read_blocks(inode,
- bucket_blkno(xs->bucket) + 1,
- blk_per_bucket - 1, &xs->bucket->bu_bhs[1],
- 0);
- if (ret) {
- mlog_errno(ret);
- return ret;
- }
-
- }
- i = xs->here - old_xh->xh_entries;
- xs->here = &xs->header->xh_entries[i];
- }
+ if (xs->not_found)
+ return;

- return ret;
+ i = xs->here - old_xh->xh_entries;
+ xs->here = &xs->header->xh_entries[i];
}

static int ocfs2_xattr_create_index_block(struct inode *inode,
@@ -2771,18 +2753,17 @@ static int ocfs2_xattr_create_index_block(struct inode *inode,
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
struct ocfs2_inode_info *oi = OCFS2_I(inode);
struct ocfs2_alloc_context *data_ac;
- struct buffer_head *xh_bh = NULL, *data_bh = NULL;
struct buffer_head *xb_bh = xs->xattr_bh;
struct ocfs2_xattr_block *xb =
(struct ocfs2_xattr_block *)xb_bh->b_data;
struct ocfs2_xattr_tree_root *xr;
u16 xb_flags = le16_to_cpu(xb->xb_flags);
- u16 bpb = ocfs2_blocks_per_xattr_bucket(inode->i_sb);

mlog(0, "create xattr index block for %llu\n",
(unsigned long long)xb_bh->b_blocknr);

BUG_ON(xb_flags & OCFS2_XATTR_INDEXED);
+ BUG_ON(!xs->bucket);

ret = ocfs2_reserve_clusters(osb, 1, &data_ac);
if (ret) {
@@ -2798,10 +2779,10 @@ static int ocfs2_xattr_create_index_block(struct inode *inode,
down_write(&oi->ip_alloc_sem);

/*
- * 3 more credits, one for xattr block update, one for the 1st block
- * of the new xattr bucket and one for the value/data.
+ * We need more credits. One for the xattr block update and one
+ * for each block of the new xattr bucket.
*/
- credits += 3;
+ credits += 1 + ocfs2_blocks_per_xattr_bucket(inode->i_sb);
handle = ocfs2_start_trans(osb, credits);
if (IS_ERR(handle)) {
ret = PTR_ERR(handle);
@@ -2832,51 +2813,23 @@ static int ocfs2_xattr_create_index_block(struct inode *inode,
mlog(0, "allocate 1 cluster from %llu to xattr block\n",
(unsigned long long)blkno);

- xh_bh = sb_getblk(inode->i_sb, blkno);
- if (!xh_bh) {
- ret = -EIO;
+ ret = ocfs2_init_xattr_bucket(xs->bucket, blkno);
+ if (ret) {
mlog_errno(ret);
goto out_commit;
}

- ocfs2_set_new_buffer_uptodate(inode, xh_bh);
-
- ret = ocfs2_journal_access(handle, inode, xh_bh,
- OCFS2_JOURNAL_ACCESS_CREATE);
+ ret = ocfs2_xattr_bucket_journal_access(handle, xs->bucket,
+ OCFS2_JOURNAL_ACCESS_CREATE);
if (ret) {
mlog_errno(ret);
goto out_commit;
}

- if (bpb > 1) {
- data_bh = sb_getblk(inode->i_sb, blkno + bpb - 1);
- if (!data_bh) {
- ret = -EIO;
- mlog_errno(ret);
- goto out_commit;
- }
-
- ocfs2_set_new_buffer_uptodate(inode, data_bh);
-
- ret = ocfs2_journal_access(handle, inode, data_bh,
- OCFS2_JOURNAL_ACCESS_CREATE);
- if (ret) {
- mlog_errno(ret);
- goto out_commit;
- }
- }
-
- ocfs2_cp_xattr_block_to_bucket(inode, xb_bh, xh_bh, data_bh);
-
- ocfs2_journal_dirty(handle, xh_bh);
- if (data_bh)
- ocfs2_journal_dirty(handle, data_bh);
+ ocfs2_cp_xattr_block_to_bucket(inode, xb_bh, xs->bucket);
+ ocfs2_xattr_bucket_journal_dirty(handle, xs->bucket);

- ret = ocfs2_xattr_update_xattr_search(inode, xs, xb_bh, xh_bh);
- if (ret) {
- mlog_errno(ret);
- goto out_commit;
- }
+ ocfs2_xattr_update_xattr_search(inode, xs, xb_bh);

/* Change from ocfs2_xattr_header to ocfs2_xattr_tree_root */
memset(&xb->xb_attrs, 0, inode->i_sb->s_blocksize -
@@ -2911,9 +2864,6 @@ out:
if (data_ac)
ocfs2_free_alloc_context(data_ac);

- brelse(xh_bh);
- brelse(data_bh);
-
return ret;
}

--
1.5.6

2008-12-19 21:58:40

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 12/45] ocfs2: Use buckets in ocfs2_defrag_xattr_bucket().

From: Joel Becker <[email protected]>

Use the ocfs2_xattr_bucket abstraction for reading and writing the
bucket in ocfs2_defrag_xattr_bucket().

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 55 ++++++++++++++++++++++-------------------------------
1 files changed, 23 insertions(+), 32 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 76969b9..127a628 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -2894,21 +2894,11 @@ static int ocfs2_defrag_xattr_bucket(struct inode *inode,
struct ocfs2_xattr_header *xh;
char *entries, *buf, *bucket_buf = NULL;
u64 blkno = bucket_blkno(bucket);
- u16 blk_per_bucket = ocfs2_blocks_per_xattr_bucket(inode->i_sb);
u16 xh_free_start;
size_t blocksize = inode->i_sb->s_blocksize;
handle_t *handle;
- struct buffer_head **bhs;
struct ocfs2_xattr_entry *xe;
-
- bhs = kzalloc(sizeof(struct buffer_head *) * blk_per_bucket,
- GFP_NOFS);
- if (!bhs)
- return -ENOMEM;
-
- ret = ocfs2_read_blocks(inode, blkno, blk_per_bucket, bhs, 0);
- if (ret)
- goto out;
+ struct ocfs2_xattr_bucket *wb = NULL;

/*
* In order to make the operation more efficient and generic,
@@ -2922,11 +2912,21 @@ static int ocfs2_defrag_xattr_bucket(struct inode *inode,
goto out;
}

+ wb = ocfs2_xattr_bucket_new(inode);
+ if (!wb) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ ret = ocfs2_read_xattr_bucket(wb, blkno);
+ if (ret)
+ goto out;
+
buf = bucket_buf;
- for (i = 0; i < blk_per_bucket; i++, buf += blocksize)
- memcpy(buf, bhs[i]->b_data, blocksize);
+ for (i = 0; i < wb->bu_blocks; i++, buf += blocksize)
+ memcpy(buf, bucket_block(wb, i), blocksize);

- handle = ocfs2_start_trans((OCFS2_SB(inode->i_sb)), blk_per_bucket);
+ handle = ocfs2_start_trans((OCFS2_SB(inode->i_sb)), wb->bu_blocks);
if (IS_ERR(handle)) {
ret = PTR_ERR(handle);
handle = NULL;
@@ -2934,13 +2934,11 @@ static int ocfs2_defrag_xattr_bucket(struct inode *inode,
goto out;
}

- for (i = 0; i < blk_per_bucket; i++) {
- ret = ocfs2_journal_access(handle, inode, bhs[i],
- OCFS2_JOURNAL_ACCESS_WRITE);
- if (ret < 0) {
- mlog_errno(ret);
- goto commit;
- }
+ ret = ocfs2_xattr_bucket_journal_access(handle, wb,
+ OCFS2_JOURNAL_ACCESS_WRITE);
+ if (ret < 0) {
+ mlog_errno(ret);
+ goto commit;
}

xh = (struct ocfs2_xattr_header *)bucket_buf;
@@ -3009,21 +3007,14 @@ static int ocfs2_defrag_xattr_bucket(struct inode *inode,
cmp_xe, swap_xe);

buf = bucket_buf;
- for (i = 0; i < blk_per_bucket; i++, buf += blocksize) {
- memcpy(bhs[i]->b_data, buf, blocksize);
- ocfs2_journal_dirty(handle, bhs[i]);
- }
+ for (i = 0; i < wb->bu_blocks; i++, buf += blocksize)
+ memcpy(bucket_block(wb, i), buf, blocksize);
+ ocfs2_xattr_bucket_journal_dirty(handle, wb);

commit:
ocfs2_commit_trans(OCFS2_SB(inode->i_sb), handle);
out:
-
- if (bhs) {
- for (i = 0; i < blk_per_bucket; i++)
- brelse(bhs[i]);
- }
- kfree(bhs);
-
+ ocfs2_xattr_bucket_free(wb);
kfree(bucket_buf);
return ret;
}
--
1.5.6

2008-12-19 21:59:19

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 14/45] ocfs2/xattr: Remove additional bucket allocation in bucket defragment.

From: Tao Ma <[email protected]>

Joel has refactored xattr bucket and make xattr bucket a general
wrapper. So in ocfs2_defrag_xattr_bucket, we have already passed the
bucket in, so there is no need to allocate a new one and read it.

Signed-off-by: Tao Ma <[email protected]>
Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 26 +++++++-------------------
1 files changed, 7 insertions(+), 19 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 029a9f4..87cf39d 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -2898,7 +2898,6 @@ static int ocfs2_defrag_xattr_bucket(struct inode *inode,
size_t blocksize = inode->i_sb->s_blocksize;
handle_t *handle;
struct ocfs2_xattr_entry *xe;
- struct ocfs2_xattr_bucket *wb = NULL;

/*
* In order to make the operation more efficient and generic,
@@ -2912,21 +2911,11 @@ static int ocfs2_defrag_xattr_bucket(struct inode *inode,
goto out;
}

- wb = ocfs2_xattr_bucket_new(inode);
- if (!wb) {
- ret = -ENOMEM;
- goto out;
- }
-
- ret = ocfs2_read_xattr_bucket(wb, blkno);
- if (ret)
- goto out;
-
buf = bucket_buf;
- for (i = 0; i < wb->bu_blocks; i++, buf += blocksize)
- memcpy(buf, bucket_block(wb, i), blocksize);
+ for (i = 0; i < bucket->bu_blocks; i++, buf += blocksize)
+ memcpy(buf, bucket_block(bucket, i), blocksize);

- handle = ocfs2_start_trans((OCFS2_SB(inode->i_sb)), wb->bu_blocks);
+ handle = ocfs2_start_trans((OCFS2_SB(inode->i_sb)), bucket->bu_blocks);
if (IS_ERR(handle)) {
ret = PTR_ERR(handle);
handle = NULL;
@@ -2934,7 +2923,7 @@ static int ocfs2_defrag_xattr_bucket(struct inode *inode,
goto out;
}

- ret = ocfs2_xattr_bucket_journal_access(handle, wb,
+ ret = ocfs2_xattr_bucket_journal_access(handle, bucket,
OCFS2_JOURNAL_ACCESS_WRITE);
if (ret < 0) {
mlog_errno(ret);
@@ -3007,14 +2996,13 @@ static int ocfs2_defrag_xattr_bucket(struct inode *inode,
cmp_xe, swap_xe);

buf = bucket_buf;
- for (i = 0; i < wb->bu_blocks; i++, buf += blocksize)
- memcpy(bucket_block(wb, i), buf, blocksize);
- ocfs2_xattr_bucket_journal_dirty(handle, wb);
+ for (i = 0; i < bucket->bu_blocks; i++, buf += blocksize)
+ memcpy(bucket_block(bucket, i), buf, blocksize);
+ ocfs2_xattr_bucket_journal_dirty(handle, bucket);

commit:
ocfs2_commit_trans(OCFS2_SB(inode->i_sb), handle);
out:
- ocfs2_xattr_bucket_free(wb);
kfree(bucket_buf);
return ret;
}
--
1.5.6

2008-12-19 21:58:55

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 13/45] ocfs2: Use buckets in ocfs2_xattr_set_entry_in_bucket().

From: Joel Becker <[email protected]>

The ocfs2_xattr_set_entry_in_bucket() function is already working on an
ocfs2_xattr_bucket structure, so let's use the bucket API.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 11 +++++------
1 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 127a628..029a9f4 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -4083,25 +4083,24 @@ static int ocfs2_xattr_set_entry_in_bucket(struct inode *inode,
{
int ret;
handle_t *handle = NULL;
- u16 blk_per_bucket = ocfs2_blocks_per_xattr_bucket(inode->i_sb);
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+ u64 blkno;

mlog(0, "Set xattr entry len = %lu index = %d in bucket %llu\n",
(unsigned long)xi->value_len, xi->name_index,
(unsigned long long)bucket_blkno(xs->bucket));

if (!xs->bucket->bu_bhs[1]) {
- ret = ocfs2_read_blocks(inode,
- bucket_blkno(xs->bucket) + 1,
- blk_per_bucket - 1, &xs->bucket->bu_bhs[1],
- 0);
+ blkno = bucket_blkno(xs->bucket);
+ ocfs2_xattr_bucket_relse(xs->bucket);
+ ret = ocfs2_read_xattr_bucket(xs->bucket, blkno);
if (ret) {
mlog_errno(ret);
goto out;
}
}

- handle = ocfs2_start_trans(osb, blk_per_bucket);
+ handle = ocfs2_start_trans(osb, xs->bucket->bu_blocks);
if (IS_ERR(handle)) {
ret = PTR_ERR(handle);
handle = NULL;
--
1.5.6

2008-12-19 21:59:36

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 15/45] ocfs2/xattr: Only set buffer update if it doesn't exist in cache.

From: Tao Ma <[email protected]>

When we call ocfs2_init_xattr_bucket, we deem that the new buffer head
will be written to disk immediately, so we just use sb_getblk. But in
some cases the buffer may have already been in ocfs2 uptodate cache,
so we only call ocfs2_set_buffer_uptodate if the buffer head isn't
in the cache.

Signed-off-by: Tao Ma <[email protected]>
Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 87cf39d..d8fc714 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -219,8 +219,10 @@ static int ocfs2_init_xattr_bucket(struct ocfs2_xattr_bucket *bucket,
break;
}

- ocfs2_set_new_buffer_uptodate(bucket->bu_inode,
- bucket->bu_bhs[i]);
+ if (!ocfs2_buffer_uptodate(bucket->bu_inode,
+ bucket->bu_bhs[i]))
+ ocfs2_set_new_buffer_uptodate(bucket->bu_inode,
+ bucket->bu_bhs[i]);
}

if (rc)
--
1.5.6

2008-12-19 22:00:16

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 17/45] ocfs2: Add clusters free in dealloc_ctxt.

From: Tao Ma <[email protected]>

Now in ocfs2 xattr set, the whole process are divided into many small
parts and they are wrapped into diffrent transactions and it make the
set doesn't look like a real transaction. So we want to integrate it
into a real one.

In some cases we will allocate some clusters and free some in just one
transaction. e.g, one xattr is larger than inline size, so it and its
value root is stored within the inode while the value is outside in a
cluster. Then we try to update it with a smaller value(larger than the
size of root but smaller than inline size), we may need to free the
outside cluster while allocate a new bucket(one cluster) since now the
inode may be full. The old solution will lock the global_bitmap(if the
local alloc failed in stress test) and then the truncate log. This will
cause a ABBA lock with truncate log flush.

This patch add the clusters free in dealloc_ctxt, so that we can record
the free clusters during the transaction and then free it after we
release the global_bitmap in xattr set.

Signed-off-by: Tao Ma <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/alloc.c | 106 ++++++++++++++++++++++++++++++++++++++++++++++++++----
fs/ocfs2/alloc.h | 4 ++
2 files changed, 103 insertions(+), 7 deletions(-)

diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 0cc2deb..4614614 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -5800,7 +5800,10 @@ int ocfs2_truncate_log_init(struct ocfs2_super *osb)
*/

/*
- * Describes a single block free from a suballocator
+ * Describe a single bit freed from a suballocator. For the block
+ * suballocators, it represents one block. For the global cluster
+ * allocator, it represents some clusters and free_bit indicates
+ * clusters number.
*/
struct ocfs2_cached_block_free {
struct ocfs2_cached_block_free *free_next;
@@ -5815,10 +5818,10 @@ struct ocfs2_per_slot_free_list {
struct ocfs2_cached_block_free *f_first;
};

-static int ocfs2_free_cached_items(struct ocfs2_super *osb,
- int sysfile_type,
- int slot,
- struct ocfs2_cached_block_free *head)
+static int ocfs2_free_cached_blocks(struct ocfs2_super *osb,
+ int sysfile_type,
+ int slot,
+ struct ocfs2_cached_block_free *head)
{
int ret;
u64 bg_blkno;
@@ -5893,6 +5896,82 @@ out:
return ret;
}

+int ocfs2_cache_cluster_dealloc(struct ocfs2_cached_dealloc_ctxt *ctxt,
+ u64 blkno, unsigned int bit)
+{
+ int ret = 0;
+ struct ocfs2_cached_block_free *item;
+
+ item = kmalloc(sizeof(*item), GFP_NOFS);
+ if (item == NULL) {
+ ret = -ENOMEM;
+ mlog_errno(ret);
+ return ret;
+ }
+
+ mlog(0, "Insert clusters: (bit %u, blk %llu)\n",
+ bit, (unsigned long long)blkno);
+
+ item->free_blk = blkno;
+ item->free_bit = bit;
+ item->free_next = ctxt->c_global_allocator;
+
+ ctxt->c_global_allocator = item;
+ return ret;
+}
+
+static int ocfs2_free_cached_clusters(struct ocfs2_super *osb,
+ struct ocfs2_cached_block_free *head)
+{
+ struct ocfs2_cached_block_free *tmp;
+ struct inode *tl_inode = osb->osb_tl_inode;
+ handle_t *handle;
+ int ret = 0;
+
+ mutex_lock(&tl_inode->i_mutex);
+
+ while (head) {
+ if (ocfs2_truncate_log_needs_flush(osb)) {
+ ret = __ocfs2_flush_truncate_log(osb);
+ if (ret < 0) {
+ mlog_errno(ret);
+ break;
+ }
+ }
+
+ handle = ocfs2_start_trans(osb, OCFS2_TRUNCATE_LOG_UPDATE);
+ if (IS_ERR(handle)) {
+ ret = PTR_ERR(handle);
+ mlog_errno(ret);
+ break;
+ }
+
+ ret = ocfs2_truncate_log_append(osb, handle, head->free_blk,
+ head->free_bit);
+
+ ocfs2_commit_trans(osb, handle);
+ tmp = head;
+ head = head->free_next;
+ kfree(tmp);
+
+ if (ret < 0) {
+ mlog_errno(ret);
+ break;
+ }
+ }
+
+ mutex_unlock(&tl_inode->i_mutex);
+
+ while (head) {
+ /* Premature exit may have left some dangling items. */
+ tmp = head;
+ head = head->free_next;
+ kfree(tmp);
+ }
+
+ return ret;
+}
+
int ocfs2_run_deallocs(struct ocfs2_super *osb,
struct ocfs2_cached_dealloc_ctxt *ctxt)
{
@@ -5908,8 +5987,10 @@ int ocfs2_run_deallocs(struct ocfs2_super *osb,
if (fl->f_first) {
mlog(0, "Free items: (type %u, slot %d)\n",
fl->f_inode_type, fl->f_slot);
- ret2 = ocfs2_free_cached_items(osb, fl->f_inode_type,
- fl->f_slot, fl->f_first);
+ ret2 = ocfs2_free_cached_blocks(osb,
+ fl->f_inode_type,
+ fl->f_slot,
+ fl->f_first);
if (ret2)
mlog_errno(ret2);
if (!ret)
@@ -5920,6 +6001,17 @@ int ocfs2_run_deallocs(struct ocfs2_super *osb,
kfree(fl);
}

+ if (ctxt->c_global_allocator) {
+ ret2 = ocfs2_free_cached_clusters(osb,
+ ctxt->c_global_allocator);
+ if (ret2)
+ mlog_errno(ret2);
+ if (!ret)
+ ret = ret2;
+
+ ctxt->c_global_allocator = NULL;
+ }
+
return ret;
}

diff --git a/fs/ocfs2/alloc.h b/fs/ocfs2/alloc.h
index 70257c8..c301cf2 100644
--- a/fs/ocfs2/alloc.h
+++ b/fs/ocfs2/alloc.h
@@ -167,11 +167,15 @@ int __ocfs2_flush_truncate_log(struct ocfs2_super *osb);
*/
struct ocfs2_cached_dealloc_ctxt {
struct ocfs2_per_slot_free_list *c_first_suballocator;
+ struct ocfs2_cached_block_free *c_global_allocator;
};
static inline void ocfs2_init_dealloc_ctxt(struct ocfs2_cached_dealloc_ctxt *c)
{
c->c_first_suballocator = NULL;
+ c->c_global_allocator = NULL;
}
+int ocfs2_cache_cluster_dealloc(struct ocfs2_cached_dealloc_ctxt *ctxt,
+ u64 blkno, unsigned int bit);
int ocfs2_run_deallocs(struct ocfs2_super *osb,
struct ocfs2_cached_dealloc_ctxt *ctxt);

--
1.5.6

2008-12-19 22:00:58

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 19/45] ocfs2/xattr: Reserve meta/data at the beginning of ocfs2_xattr_set.

From: Tao Ma <[email protected]>

In ocfs2 xattr set, we reserve metadata and clusters in any place
they are needed. It is time-consuming and ineffective, so this
patch try to reserve metadata and clusters at the beginning of
ocfs2_xattr_set.

Signed-off-by: Tao Ma <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/alloc.h | 4 +
fs/ocfs2/xattr.c | 483 ++++++++++++++++++++++++++++++++++++++++--------------
2 files changed, 361 insertions(+), 126 deletions(-)

diff --git a/fs/ocfs2/alloc.h b/fs/ocfs2/alloc.h
index c301cf2..3eb735e 100644
--- a/fs/ocfs2/alloc.h
+++ b/fs/ocfs2/alloc.h
@@ -176,6 +176,10 @@ static inline void ocfs2_init_dealloc_ctxt(struct ocfs2_cached_dealloc_ctxt *c)
}
int ocfs2_cache_cluster_dealloc(struct ocfs2_cached_dealloc_ctxt *ctxt,
u64 blkno, unsigned int bit);
+static inline int ocfs2_dealloc_has_cluster(struct ocfs2_cached_dealloc_ctxt *c)
+{
+ return c->c_global_allocator != NULL;
+}
int ocfs2_run_deallocs(struct ocfs2_super *osb,
struct ocfs2_cached_dealloc_ctxt *ctxt);

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index f1da381..4fd201a 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -71,6 +71,12 @@ struct ocfs2_xattr_bucket {
int bu_blocks;
};

+struct ocfs2_xattr_set_ctxt {
+ struct ocfs2_alloc_context *meta_ac;
+ struct ocfs2_alloc_context *data_ac;
+ struct ocfs2_cached_dealloc_ctxt dealloc;
+};
+
#define OCFS2_XATTR_ROOT_SIZE (sizeof(struct ocfs2_xattr_def_value_root))
#define OCFS2_XATTR_INLINE_SIZE 80

@@ -133,11 +139,13 @@ static int ocfs2_xattr_tree_list_index_block(struct inode *inode,
size_t buffer_size);

static int ocfs2_xattr_create_index_block(struct inode *inode,
- struct ocfs2_xattr_search *xs);
+ struct ocfs2_xattr_search *xs,
+ struct ocfs2_xattr_set_ctxt *ctxt);

static int ocfs2_xattr_set_entry_index_block(struct inode *inode,
struct ocfs2_xattr_info *xi,
- struct ocfs2_xattr_search *xs);
+ struct ocfs2_xattr_search *xs,
+ struct ocfs2_xattr_set_ctxt *ctxt);

static int ocfs2_delete_xattr_index_block(struct inode *inode,
struct buffer_head *xb_bh);
@@ -334,14 +342,13 @@ static void ocfs2_xattr_hash_entry(struct inode *inode,
static int ocfs2_xattr_extend_allocation(struct inode *inode,
u32 clusters_to_add,
struct buffer_head *xattr_bh,
- struct ocfs2_xattr_value_root *xv)
+ struct ocfs2_xattr_value_root *xv,
+ struct ocfs2_xattr_set_ctxt *ctxt)
{
int status = 0;
int restart_func = 0;
int credits = 0;
handle_t *handle = NULL;
- struct ocfs2_alloc_context *data_ac = NULL;
- struct ocfs2_alloc_context *meta_ac = NULL;
enum ocfs2_alloc_restarted why;
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
u32 prev_clusters, logical_start = le32_to_cpu(xv->xr_clusters);
@@ -353,13 +360,6 @@ static int ocfs2_xattr_extend_allocation(struct inode *inode,

restart_all:

- status = ocfs2_lock_allocators(inode, &et, clusters_to_add, 0,
- &data_ac, &meta_ac);
- if (status) {
- mlog_errno(status);
- goto leave;
- }
-
credits = ocfs2_calc_extend_credits(osb->sb, et.et_root_el,
clusters_to_add);
handle = ocfs2_start_trans(osb, credits);
@@ -386,8 +386,8 @@ restarted_transaction:
0,
&et,
handle,
- data_ac,
- meta_ac,
+ ctxt->data_ac,
+ ctxt->meta_ac,
&why);
if ((status < 0) && (status != -EAGAIN)) {
if (status != -ENOSPC)
@@ -432,14 +432,6 @@ leave:
ocfs2_commit_trans(osb, handle);
handle = NULL;
}
- if (data_ac) {
- ocfs2_free_alloc_context(data_ac);
- data_ac = NULL;
- }
- if (meta_ac) {
- ocfs2_free_alloc_context(meta_ac);
- meta_ac = NULL;
- }
if ((!status) && restart_func) {
restart_func = 0;
goto restart_all;
@@ -452,23 +444,16 @@ static int __ocfs2_remove_xattr_range(struct inode *inode,
struct buffer_head *root_bh,
struct ocfs2_xattr_value_root *xv,
u32 cpos, u32 phys_cpos, u32 len,
- struct ocfs2_cached_dealloc_ctxt *dealloc)
+ struct ocfs2_xattr_set_ctxt *ctxt)
{
int ret;
u64 phys_blkno = ocfs2_clusters_to_blocks(inode->i_sb, phys_cpos);
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
handle_t *handle;
- struct ocfs2_alloc_context *meta_ac = NULL;
struct ocfs2_extent_tree et;

ocfs2_init_xattr_value_extent_tree(&et, inode, root_bh, xv);

- ret = ocfs2_lock_allocators(inode, &et, 0, 1, NULL, &meta_ac);
- if (ret) {
- mlog_errno(ret);
- return ret;
- }
-
handle = ocfs2_start_trans(osb, OCFS2_REMOVE_EXTENT_CREDITS);
if (IS_ERR(handle)) {
ret = PTR_ERR(handle);
@@ -483,8 +468,8 @@ static int __ocfs2_remove_xattr_range(struct inode *inode,
goto out_commit;
}

- ret = ocfs2_remove_extent(inode, &et, cpos, len, handle, meta_ac,
- dealloc);
+ ret = ocfs2_remove_extent(inode, &et, cpos, len, handle, ctxt->meta_ac,
+ &ctxt->dealloc);
if (ret) {
mlog_errno(ret);
goto out_commit;
@@ -498,17 +483,13 @@ static int __ocfs2_remove_xattr_range(struct inode *inode,
goto out_commit;
}

- ret = ocfs2_cache_cluster_dealloc(dealloc, phys_blkno, len);
+ ret = ocfs2_cache_cluster_dealloc(&ctxt->dealloc, phys_blkno, len);
if (ret)
mlog_errno(ret);

out_commit:
ocfs2_commit_trans(osb, handle);
out:
-
- if (meta_ac)
- ocfs2_free_alloc_context(meta_ac);
-
return ret;
}

@@ -516,15 +497,12 @@ static int ocfs2_xattr_shrink_size(struct inode *inode,
u32 old_clusters,
u32 new_clusters,
struct buffer_head *root_bh,
- struct ocfs2_xattr_value_root *xv)
+ struct ocfs2_xattr_value_root *xv,
+ struct ocfs2_xattr_set_ctxt *ctxt)
{
int ret = 0;
u32 trunc_len, cpos, phys_cpos, alloc_size;
u64 block;
- struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
- struct ocfs2_cached_dealloc_ctxt dealloc;
-
- ocfs2_init_dealloc_ctxt(&dealloc);

if (old_clusters <= new_clusters)
return 0;
@@ -544,7 +522,7 @@ static int ocfs2_xattr_shrink_size(struct inode *inode,

ret = __ocfs2_remove_xattr_range(inode, root_bh, xv, cpos,
phys_cpos, alloc_size,
- &dealloc);
+ ctxt);
if (ret) {
mlog_errno(ret);
goto out;
@@ -558,16 +536,14 @@ static int ocfs2_xattr_shrink_size(struct inode *inode,
}

out:
- ocfs2_schedule_truncate_log_flush(osb, 1);
- ocfs2_run_deallocs(osb, &dealloc);
-
return ret;
}

static int ocfs2_xattr_value_truncate(struct inode *inode,
struct buffer_head *root_bh,
struct ocfs2_xattr_value_root *xv,
- int len)
+ int len,
+ struct ocfs2_xattr_set_ctxt *ctxt)
{
int ret;
u32 new_clusters = ocfs2_clusters_for_bytes(inode->i_sb, len);
@@ -579,11 +555,11 @@ static int ocfs2_xattr_value_truncate(struct inode *inode,
if (new_clusters > old_clusters)
ret = ocfs2_xattr_extend_allocation(inode,
new_clusters - old_clusters,
- root_bh, xv);
+ root_bh, xv, ctxt);
else
ret = ocfs2_xattr_shrink_size(inode,
old_clusters, new_clusters,
- root_bh, xv);
+ root_bh, xv, ctxt);

return ret;
}
@@ -1167,6 +1143,7 @@ out:
static int ocfs2_xattr_set_value_outside(struct inode *inode,
struct ocfs2_xattr_info *xi,
struct ocfs2_xattr_search *xs,
+ struct ocfs2_xattr_set_ctxt *ctxt,
size_t offs)
{
size_t name_len = strlen(xi->name);
@@ -1186,7 +1163,7 @@ static int ocfs2_xattr_set_value_outside(struct inode *inode,
xv->xr_list.l_next_free_rec = 0;

ret = ocfs2_xattr_value_truncate(inode, xs->xattr_bh, xv,
- xi->value_len);
+ xi->value_len, ctxt);
if (ret < 0) {
mlog_errno(ret);
return ret;
@@ -1317,6 +1294,7 @@ static void ocfs2_xattr_set_entry_local(struct inode *inode,
static int ocfs2_xattr_set_entry(struct inode *inode,
struct ocfs2_xattr_info *xi,
struct ocfs2_xattr_search *xs,
+ struct ocfs2_xattr_set_ctxt *ctxt,
int flag)
{
struct ocfs2_xattr_entry *last;
@@ -1387,7 +1365,7 @@ static int ocfs2_xattr_set_entry(struct inode *inode,
if (ocfs2_xattr_is_local(xs->here) && size == size_l) {
/* Replace existing local xattr with tree root */
ret = ocfs2_xattr_set_value_outside(inode, xi, xs,
- offs);
+ ctxt, offs);
if (ret < 0)
mlog_errno(ret);
goto out;
@@ -1406,7 +1384,8 @@ static int ocfs2_xattr_set_entry(struct inode *inode,
ret = ocfs2_xattr_value_truncate(inode,
xs->xattr_bh,
xv,
- xi->value_len);
+ xi->value_len,
+ ctxt);
if (ret < 0) {
mlog_errno(ret);
goto out;
@@ -1436,7 +1415,8 @@ static int ocfs2_xattr_set_entry(struct inode *inode,
ret = ocfs2_xattr_value_truncate(inode,
xs->xattr_bh,
xv,
- 0);
+ 0,
+ ctxt);
if (ret < 0)
mlog_errno(ret);
}
@@ -1531,7 +1511,7 @@ out_commit:
* This is the second step for value size > INLINE_SIZE.
*/
size_t offs = le16_to_cpu(xs->here->xe_name_offset);
- ret = ocfs2_xattr_set_value_outside(inode, xi, xs, offs);
+ ret = ocfs2_xattr_set_value_outside(inode, xi, xs, ctxt, offs);
if (ret < 0) {
int ret2;

@@ -1555,6 +1535,10 @@ static int ocfs2_remove_value_outside(struct inode*inode,
struct ocfs2_xattr_header *header)
{
int ret = 0, i;
+ struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+ struct ocfs2_xattr_set_ctxt ctxt = { NULL, NULL, };
+
+ ocfs2_init_dealloc_ctxt(&ctxt.dealloc);

for (i = 0; i < le16_to_cpu(header->xh_count); i++) {
struct ocfs2_xattr_entry *entry = &header->xh_entries[i];
@@ -1567,14 +1551,17 @@ static int ocfs2_remove_value_outside(struct inode*inode,
le16_to_cpu(entry->xe_name_offset);
xv = (struct ocfs2_xattr_value_root *)
(val + OCFS2_XATTR_SIZE(entry->xe_name_len));
- ret = ocfs2_xattr_value_truncate(inode, bh, xv, 0);
+ ret = ocfs2_xattr_value_truncate(inode, bh, xv,
+ 0, &ctxt);
if (ret < 0) {
mlog_errno(ret);
- return ret;
+ break;
}
}
}

+ ocfs2_schedule_truncate_log_flush(osb, 1);
+ ocfs2_run_deallocs(osb, &ctxt.dealloc);
return ret;
}

@@ -1836,7 +1823,8 @@ static int ocfs2_xattr_ibody_find(struct inode *inode,
*/
static int ocfs2_xattr_ibody_set(struct inode *inode,
struct ocfs2_xattr_info *xi,
- struct ocfs2_xattr_search *xs)
+ struct ocfs2_xattr_search *xs,
+ struct ocfs2_xattr_set_ctxt *ctxt)
{
struct ocfs2_inode_info *oi = OCFS2_I(inode);
struct ocfs2_dinode *di = (struct ocfs2_dinode *)xs->inode_bh->b_data;
@@ -1853,7 +1841,7 @@ static int ocfs2_xattr_ibody_set(struct inode *inode,
}
}

- ret = ocfs2_xattr_set_entry(inode, xi, xs,
+ ret = ocfs2_xattr_set_entry(inode, xi, xs, ctxt,
(OCFS2_INLINE_XATTR_FL | OCFS2_HAS_XATTR_FL));
out:
up_write(&oi->ip_alloc_sem);
@@ -1926,12 +1914,12 @@ cleanup:
*/
static int ocfs2_xattr_block_set(struct inode *inode,
struct ocfs2_xattr_info *xi,
- struct ocfs2_xattr_search *xs)
+ struct ocfs2_xattr_search *xs,
+ struct ocfs2_xattr_set_ctxt *ctxt)
{
struct buffer_head *new_bh = NULL;
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
struct ocfs2_dinode *di = (struct ocfs2_dinode *)xs->inode_bh->b_data;
- struct ocfs2_alloc_context *meta_ac = NULL;
handle_t *handle = NULL;
struct ocfs2_xattr_block *xblk = NULL;
u16 suballoc_bit_start;
@@ -1940,15 +1928,6 @@ static int ocfs2_xattr_block_set(struct inode *inode,
int ret;

if (!xs->xattr_bh) {
- /*
- * Alloc one external block for extended attribute
- * outside of inode.
- */
- ret = ocfs2_reserve_new_metadata_blocks(osb, 1, &meta_ac);
- if (ret < 0) {
- mlog_errno(ret);
- goto out;
- }
handle = ocfs2_start_trans(osb,
OCFS2_XATTR_BLOCK_CREATE_CREDITS);
if (IS_ERR(handle)) {
@@ -1963,7 +1942,7 @@ static int ocfs2_xattr_block_set(struct inode *inode,
goto out_commit;
}

- ret = ocfs2_claim_metadata(osb, handle, meta_ac, 1,
+ ret = ocfs2_claim_metadata(osb, handle, ctxt->meta_ac, 1,
&suballoc_bit_start, &num_got,
&first_blkno);
if (ret < 0) {
@@ -1996,7 +1975,6 @@ static int ocfs2_xattr_block_set(struct inode *inode,
xs->end = (void *)xblk + inode->i_sb->s_blocksize;
xs->here = xs->header->xh_entries;

-
ret = ocfs2_journal_dirty(handle, new_bh);
if (ret < 0) {
mlog_errno(ret);
@@ -2009,8 +1987,6 @@ static int ocfs2_xattr_block_set(struct inode *inode,
out_commit:
ocfs2_commit_trans(osb, handle);
out:
- if (meta_ac)
- ocfs2_free_alloc_context(meta_ac);
if (ret < 0)
return ret;
} else
@@ -2018,22 +1994,266 @@ out:

if (!(le16_to_cpu(xblk->xb_flags) & OCFS2_XATTR_INDEXED)) {
/* Set extended attribute into external block */
- ret = ocfs2_xattr_set_entry(inode, xi, xs, OCFS2_HAS_XATTR_FL);
+ ret = ocfs2_xattr_set_entry(inode, xi, xs, ctxt,
+ OCFS2_HAS_XATTR_FL);
if (!ret || ret != -ENOSPC)
goto end;

- ret = ocfs2_xattr_create_index_block(inode, xs);
+ ret = ocfs2_xattr_create_index_block(inode, xs, ctxt);
if (ret)
goto end;
}

- ret = ocfs2_xattr_set_entry_index_block(inode, xi, xs);
+ ret = ocfs2_xattr_set_entry_index_block(inode, xi, xs, ctxt);

end:

return ret;
}

+/* Check whether the new xattr can be inserted into the inode. */
+static int ocfs2_xattr_can_be_in_inode(struct inode *inode,
+ struct ocfs2_xattr_info *xi,
+ struct ocfs2_xattr_search *xs)
+{
+ u64 value_size;
+ struct ocfs2_xattr_entry *last;
+ int free, i;
+ size_t min_offs = xs->end - xs->base;
+
+ if (!xs->header)
+ return 0;
+
+ last = xs->header->xh_entries;
+
+ for (i = 0; i < le16_to_cpu(xs->header->xh_count); i++) {
+ size_t offs = le16_to_cpu(last->xe_name_offset);
+ if (offs < min_offs)
+ min_offs = offs;
+ last += 1;
+ }
+
+ free = min_offs - ((void *)last - xs->base) - sizeof(__u32);
+ if (free < 0)
+ return 0;
+
+ BUG_ON(!xs->not_found);
+
+ if (xi->value_len > OCFS2_XATTR_INLINE_SIZE)
+ value_size = OCFS2_XATTR_ROOT_SIZE;
+ else
+ value_size = OCFS2_XATTR_SIZE(xi->value_len);
+
+ if (free >= sizeof(struct ocfs2_xattr_entry) +
+ OCFS2_XATTR_SIZE(strlen(xi->name)) + value_size)
+ return 1;
+
+ return 0;
+}
+
+static int ocfs2_calc_xattr_set_need(struct inode *inode,
+ struct ocfs2_dinode *di,
+ struct ocfs2_xattr_info *xi,
+ struct ocfs2_xattr_search *xis,
+ struct ocfs2_xattr_search *xbs,
+ int *clusters_need,
+ int *meta_need)
+{
+ int ret = 0, old_in_xb = 0;
+ int clusters_add = 0, meta_add = 0;
+ struct buffer_head *bh = NULL;
+ struct ocfs2_xattr_block *xb = NULL;
+ struct ocfs2_xattr_entry *xe = NULL;
+ struct ocfs2_xattr_value_root *xv = NULL;
+ char *base = NULL;
+ int name_offset, name_len = 0;
+ u32 new_clusters = ocfs2_clusters_for_bytes(inode->i_sb,
+ xi->value_len);
+ u64 value_size;
+
+ /*
+ * delete a xattr doesn't need metadata and cluster allocation.
+ * so return.
+ */
+ if (!xi->value)
+ goto out;
+
+ if (xis->not_found && xbs->not_found) {
+ if (xi->value_len > OCFS2_XATTR_INLINE_SIZE)
+ clusters_add += new_clusters;
+
+ goto meta_guess;
+ }
+
+ if (!xis->not_found) {
+ xe = xis->here;
+ name_offset = le16_to_cpu(xe->xe_name_offset);
+ name_len = OCFS2_XATTR_SIZE(xe->xe_name_len);
+ base = xis->base;
+ } else {
+ int i, block_off;
+ xb = (struct ocfs2_xattr_block *)xbs->xattr_bh->b_data;
+ xe = xbs->here;
+ name_offset = le16_to_cpu(xe->xe_name_offset);
+ name_len = OCFS2_XATTR_SIZE(xe->xe_name_len);
+ i = xbs->here - xbs->header->xh_entries;
+ old_in_xb = 1;
+
+ if (le16_to_cpu(xb->xb_flags) & OCFS2_XATTR_INDEXED) {
+ ret = ocfs2_xattr_bucket_get_name_value(inode,
+ bucket_xh(xbs->bucket),
+ i, &block_off,
+ &name_offset);
+ base = bucket_block(xbs->bucket, block_off);
+ } else
+ base = xbs->base;
+ }
+
+ /* do cluster allocation guess first. */
+ value_size = le64_to_cpu(xe->xe_value_size);
+
+ if (old_in_xb) {
+ /*
+ * In xattr set, we always try to set the xe in inode first,
+ * so if it can be inserted into inode successfully, the old
+ * one will be removed from the xattr block, and this xattr
+ * will be inserted into inode as a new xattr in inode.
+ */
+ if (ocfs2_xattr_can_be_in_inode(inode, xi, xis)) {
+ clusters_add += new_clusters;
+ goto out;
+ }
+ }
+
+ if (xi->value_len > OCFS2_XATTR_INLINE_SIZE) {
+ /* the new values will be stored outside. */
+ u32 old_clusters = 0;
+
+ if (!ocfs2_xattr_is_local(xe)) {
+ old_clusters = ocfs2_clusters_for_bytes(inode->i_sb,
+ value_size);
+ xv = (struct ocfs2_xattr_value_root *)
+ (base + name_offset + name_len);
+ } else
+ xv = &def_xv.xv;
+
+ if (old_clusters >= new_clusters)
+ goto out;
+ else {
+ meta_add += ocfs2_extend_meta_needed(&xv->xr_list);
+ clusters_add += new_clusters - old_clusters;
+ goto out;
+ }
+ } else {
+ /*
+ * Now the new value will be stored inside. So if the new
+ * value is smaller than the size of value root or the old
+ * value, we don't need any allocation, otherwise we have
+ * to guess metadata allocation.
+ */
+ if ((ocfs2_xattr_is_local(xe) && value_size >= xi->value_len) ||
+ (!ocfs2_xattr_is_local(xe) &&
+ OCFS2_XATTR_ROOT_SIZE >= xi->value_len))
+ goto out;
+ }
+
+meta_guess:
+ /* calculate metadata allocation. */
+ if (di->i_xattr_loc) {
+ if (!xbs->xattr_bh) {
+ ret = ocfs2_read_block(inode,
+ le64_to_cpu(di->i_xattr_loc),
+ &bh);
+ if (ret) {
+ mlog_errno(ret);
+ goto out;
+ }
+
+ xb = (struct ocfs2_xattr_block *)bh->b_data;
+ } else
+ xb = (struct ocfs2_xattr_block *)xbs->xattr_bh->b_data;
+
+ if (le16_to_cpu(xb->xb_flags) & OCFS2_XATTR_INDEXED) {
+ struct ocfs2_extent_list *el =
+ &xb->xb_attrs.xb_root.xt_list;
+ meta_add += ocfs2_extend_meta_needed(el);
+ }
+
+ /*
+ * This cluster will be used either for new bucket or for
+ * new xattr block.
+ * If the cluster size is the same as the bucket size, one
+ * more is needed since we may need to extend the bucket
+ * also.
+ */
+ clusters_add += 1;
+ if (OCFS2_XATTR_BUCKET_SIZE ==
+ OCFS2_SB(inode->i_sb)->s_clustersize)
+ clusters_add += 1;
+ } else
+ meta_add += 1;
+out:
+ if (clusters_need)
+ *clusters_need = clusters_add;
+ if (meta_need)
+ *meta_need = meta_add;
+ brelse(bh);
+ return ret;
+}
+
+static int ocfs2_init_xattr_set_ctxt(struct inode *inode,
+ struct ocfs2_dinode *di,
+ struct ocfs2_xattr_info *xi,
+ struct ocfs2_xattr_search *xis,
+ struct ocfs2_xattr_search *xbs,
+ struct ocfs2_xattr_set_ctxt *ctxt)
+{
+ int clusters_add, meta_add, ret;
+ struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+
+ memset(ctxt, 0, sizeof(struct ocfs2_xattr_set_ctxt));
+
+ ocfs2_init_dealloc_ctxt(&ctxt->dealloc);
+
+ ret = ocfs2_calc_xattr_set_need(inode, di, xi, xis, xbs,
+ &clusters_add, &meta_add);
+ if (ret) {
+ mlog_errno(ret);
+ return ret;
+ }
+
+ mlog(0, "Set xattr %s, reserve meta blocks = %d, clusters = %d\n",
+ xi->name, meta_add, clusters_add);
+
+ if (meta_add) {
+ ret = ocfs2_reserve_new_metadata_blocks(osb, meta_add,
+ &ctxt->meta_ac);
+ if (ret) {
+ mlog_errno(ret);
+ goto out;
+ }
+ }
+
+ if (clusters_add) {
+ ret = ocfs2_reserve_clusters(osb, clusters_add, &ctxt->data_ac);
+ if (ret)
+ mlog_errno(ret);
+ }
+out:
+ if (ret) {
+ if (ctxt->meta_ac) {
+ ocfs2_free_alloc_context(ctxt->meta_ac);
+ ctxt->meta_ac = NULL;
+ }
+
+ /*
+ * We cannot have an error and a non null ctxt->data_ac.
+ */
+ }
+
+ return ret;
+}
+
/*
* ocfs2_xattr_set()
*
@@ -2051,6 +2271,8 @@ int ocfs2_xattr_set(struct inode *inode,
struct buffer_head *di_bh = NULL;
struct ocfs2_dinode *di;
int ret;
+ struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+ struct ocfs2_xattr_set_ctxt ctxt = { NULL, NULL, };

struct ocfs2_xattr_info xi = {
.name_index = name_index,
@@ -2115,15 +2337,21 @@ int ocfs2_xattr_set(struct inode *inode,
goto cleanup;
}

+ ret = ocfs2_init_xattr_set_ctxt(inode, di, &xi, &xis, &xbs, &ctxt);
+ if (ret) {
+ mlog_errno(ret);
+ goto cleanup;
+ }
+
if (!value) {
/* Remove existing extended attribute */
if (!xis.not_found)
- ret = ocfs2_xattr_ibody_set(inode, &xi, &xis);
+ ret = ocfs2_xattr_ibody_set(inode, &xi, &xis, &ctxt);
else if (!xbs.not_found)
- ret = ocfs2_xattr_block_set(inode, &xi, &xbs);
+ ret = ocfs2_xattr_block_set(inode, &xi, &xbs, &ctxt);
} else {
/* We always try to set extended attribute into inode first*/
- ret = ocfs2_xattr_ibody_set(inode, &xi, &xis);
+ ret = ocfs2_xattr_ibody_set(inode, &xi, &xis, &ctxt);
if (!ret && !xbs.not_found) {
/*
* If succeed and that extended attribute existing in
@@ -2131,7 +2359,7 @@ int ocfs2_xattr_set(struct inode *inode,
*/
xi.value = NULL;
xi.value_len = 0;
- ret = ocfs2_xattr_block_set(inode, &xi, &xbs);
+ ret = ocfs2_xattr_block_set(inode, &xi, &xbs, &ctxt);
} else if (ret == -ENOSPC) {
if (di->i_xattr_loc && !xbs.xattr_bh) {
ret = ocfs2_xattr_block_find(inode, name_index,
@@ -2143,9 +2371,9 @@ int ocfs2_xattr_set(struct inode *inode,
* If no space in inode, we will set extended attribute
* into external block.
*/
- ret = ocfs2_xattr_block_set(inode, &xi, &xbs);
+ ret = ocfs2_xattr_block_set(inode, &xi, &xbs, &ctxt);
if (ret)
- goto cleanup;
+ goto free;
if (!xis.not_found) {
/*
* If succeed and that extended attribute
@@ -2153,10 +2381,19 @@ int ocfs2_xattr_set(struct inode *inode,
*/
xi.value = NULL;
xi.value_len = 0;
- ret = ocfs2_xattr_ibody_set(inode, &xi, &xis);
+ ret = ocfs2_xattr_ibody_set(inode, &xi,
+ &xis, &ctxt);
}
}
}
+free:
+ if (ctxt.data_ac)
+ ocfs2_free_alloc_context(ctxt.data_ac);
+ if (ctxt.meta_ac)
+ ocfs2_free_alloc_context(ctxt.meta_ac);
+ if (ocfs2_dealloc_has_cluster(&ctxt.dealloc))
+ ocfs2_schedule_truncate_log_flush(osb, 1);
+ ocfs2_run_deallocs(osb, &ctxt.dealloc);
cleanup:
up_write(&OCFS2_I(inode)->ip_xattr_sem);
ocfs2_inode_unlock(inode, 1);
@@ -2734,7 +2971,8 @@ static void ocfs2_xattr_update_xattr_search(struct inode *inode,
}

static int ocfs2_xattr_create_index_block(struct inode *inode,
- struct ocfs2_xattr_search *xs)
+ struct ocfs2_xattr_search *xs,
+ struct ocfs2_xattr_set_ctxt *ctxt)
{
int ret, credits = OCFS2_SUBALLOC_ALLOC;
u32 bit_off, len;
@@ -2742,7 +2980,6 @@ static int ocfs2_xattr_create_index_block(struct inode *inode,
handle_t *handle;
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
struct ocfs2_inode_info *oi = OCFS2_I(inode);
- struct ocfs2_alloc_context *data_ac;
struct buffer_head *xb_bh = xs->xattr_bh;
struct ocfs2_xattr_block *xb =
(struct ocfs2_xattr_block *)xb_bh->b_data;
@@ -2755,12 +2992,6 @@ static int ocfs2_xattr_create_index_block(struct inode *inode,
BUG_ON(xb_flags & OCFS2_XATTR_INDEXED);
BUG_ON(!xs->bucket);

- ret = ocfs2_reserve_clusters(osb, 1, &data_ac);
- if (ret) {
- mlog_errno(ret);
- goto out;
- }
-
/*
* XXX:
* We can use this lock for now, and maybe move to a dedicated mutex
@@ -2787,7 +3018,8 @@ static int ocfs2_xattr_create_index_block(struct inode *inode,
goto out_commit;
}

- ret = ocfs2_claim_clusters(osb, handle, data_ac, 1, &bit_off, &len);
+ ret = __ocfs2_claim_clusters(osb, handle, ctxt->data_ac,
+ 1, 1, &bit_off, &len);
if (ret) {
mlog_errno(ret);
goto out_commit;
@@ -2850,10 +3082,6 @@ out_commit:
out_sem:
up_write(&oi->ip_alloc_sem);

-out:
- if (data_ac)
- ocfs2_free_alloc_context(data_ac);
-
return ret;
}

@@ -3614,7 +3842,8 @@ static int ocfs2_add_new_xattr_cluster(struct inode *inode,
u32 *num_clusters,
u32 prev_cpos,
u64 prev_blkno,
- int *extend)
+ int *extend,
+ struct ocfs2_xattr_set_ctxt *ctxt)
{
int ret, credits;
u16 bpc = ocfs2_clusters_to_blocks(inode->i_sb, 1);
@@ -3622,8 +3851,6 @@ static int ocfs2_add_new_xattr_cluster(struct inode *inode,
u32 clusters_to_add = 1, bit_off, num_bits, v_start = 0;
u64 block;
handle_t *handle = NULL;
- struct ocfs2_alloc_context *data_ac = NULL;
- struct ocfs2_alloc_context *meta_ac = NULL;
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
struct ocfs2_extent_tree et;

@@ -3634,13 +3861,6 @@ static int ocfs2_add_new_xattr_cluster(struct inode *inode,

ocfs2_init_xattr_tree_extent_tree(&et, inode, root_bh);

- ret = ocfs2_lock_allocators(inode, &et, clusters_to_add, 0,
- &data_ac, &meta_ac);
- if (ret) {
- mlog_errno(ret);
- goto leave;
- }
-
credits = ocfs2_calc_extend_credits(osb->sb, et.et_root_el,
clusters_to_add);
handle = ocfs2_start_trans(osb, credits);
@@ -3658,7 +3878,7 @@ static int ocfs2_add_new_xattr_cluster(struct inode *inode,
goto leave;
}

- ret = __ocfs2_claim_clusters(osb, handle, data_ac, 1,
+ ret = __ocfs2_claim_clusters(osb, handle, ctxt->data_ac, 1,
clusters_to_add, &bit_off, &num_bits);
if (ret < 0) {
if (ret != -ENOSPC)
@@ -3719,7 +3939,7 @@ static int ocfs2_add_new_xattr_cluster(struct inode *inode,
mlog(0, "Insert %u clusters at block %llu for xattr at %u\n",
num_bits, (unsigned long long)block, v_start);
ret = ocfs2_insert_extent(osb, handle, inode, &et, v_start, block,
- num_bits, 0, meta_ac);
+ num_bits, 0, ctxt->meta_ac);
if (ret < 0) {
mlog_errno(ret);
goto leave;
@@ -3734,10 +3954,6 @@ static int ocfs2_add_new_xattr_cluster(struct inode *inode,
leave:
if (handle)
ocfs2_commit_trans(osb, handle);
- if (data_ac)
- ocfs2_free_alloc_context(data_ac);
- if (meta_ac)
- ocfs2_free_alloc_context(meta_ac);

return ret;
}
@@ -3821,7 +4037,8 @@ out:
*/
static int ocfs2_add_new_xattr_bucket(struct inode *inode,
struct buffer_head *xb_bh,
- struct buffer_head *header_bh)
+ struct buffer_head *header_bh,
+ struct ocfs2_xattr_set_ctxt *ctxt)
{
struct ocfs2_xattr_header *first_xh = NULL;
struct buffer_head *first_bh = NULL;
@@ -3872,7 +4089,8 @@ static int ocfs2_add_new_xattr_bucket(struct inode *inode,
&num_clusters,
e_cpos,
p_blkno,
- &extend);
+ &extend,
+ ctxt);
if (ret) {
mlog_errno(ret);
goto out;
@@ -4147,7 +4365,8 @@ out:
static int ocfs2_xattr_bucket_value_truncate(struct inode *inode,
struct buffer_head *header_bh,
int xe_off,
- int len)
+ int len,
+ struct ocfs2_xattr_set_ctxt *ctxt)
{
int ret, offset;
u64 value_blk;
@@ -4182,7 +4401,7 @@ static int ocfs2_xattr_bucket_value_truncate(struct inode *inode,

mlog(0, "truncate %u in xattr bucket %llu to %d bytes.\n",
xe_off, (unsigned long long)header_bh->b_blocknr, len);
- ret = ocfs2_xattr_value_truncate(inode, value_bh, xv, len);
+ ret = ocfs2_xattr_value_truncate(inode, value_bh, xv, len, ctxt);
if (ret) {
mlog_errno(ret);
goto out;
@@ -4200,8 +4419,9 @@ out:
}

static int ocfs2_xattr_bucket_value_truncate_xs(struct inode *inode,
- struct ocfs2_xattr_search *xs,
- int len)
+ struct ocfs2_xattr_search *xs,
+ int len,
+ struct ocfs2_xattr_set_ctxt *ctxt)
{
int ret, offset;
struct ocfs2_xattr_entry *xe = xs->here;
@@ -4211,7 +4431,7 @@ static int ocfs2_xattr_bucket_value_truncate_xs(struct inode *inode,

offset = xe - xh->xh_entries;
ret = ocfs2_xattr_bucket_value_truncate(inode, xs->bucket->bu_bhs[0],
- offset, len);
+ offset, len, ctxt);
if (ret)
mlog_errno(ret);

@@ -4375,7 +4595,8 @@ out_commit:
*/
static int ocfs2_xattr_set_in_bucket(struct inode *inode,
struct ocfs2_xattr_info *xi,
- struct ocfs2_xattr_search *xs)
+ struct ocfs2_xattr_search *xs,
+ struct ocfs2_xattr_set_ctxt *ctxt)
{
int ret, local = 1;
size_t value_len;
@@ -4403,7 +4624,8 @@ static int ocfs2_xattr_set_in_bucket(struct inode *inode,
value_len = 0;

ret = ocfs2_xattr_bucket_value_truncate_xs(inode, xs,
- value_len);
+ value_len,
+ ctxt);
if (ret)
goto out;

@@ -4434,7 +4656,7 @@ static int ocfs2_xattr_set_in_bucket(struct inode *inode,

/* allocate the space now for the outside block storage. */
ret = ocfs2_xattr_bucket_value_truncate_xs(inode, xs,
- value_len);
+ value_len, ctxt);
if (ret) {
mlog_errno(ret);

@@ -4485,7 +4707,8 @@ static int ocfs2_check_xattr_bucket_collision(struct inode *inode,

static int ocfs2_xattr_set_entry_index_block(struct inode *inode,
struct ocfs2_xattr_info *xi,
- struct ocfs2_xattr_search *xs)
+ struct ocfs2_xattr_search *xs,
+ struct ocfs2_xattr_set_ctxt *ctxt)
{
struct ocfs2_xattr_header *xh;
struct ocfs2_xattr_entry *xe;
@@ -4603,7 +4826,8 @@ try_again:

ret = ocfs2_add_new_xattr_bucket(inode,
xs->xattr_bh,
- xs->bucket->bu_bhs[0]);
+ xs->bucket->bu_bhs[0],
+ ctxt);
if (ret) {
mlog_errno(ret);
goto out;
@@ -4622,7 +4846,7 @@ try_again:
}

xattr_set:
- ret = ocfs2_xattr_set_in_bucket(inode, xi, xs);
+ ret = ocfs2_xattr_set_in_bucket(inode, xi, xs, ctxt);
out:
mlog_exit(ret);
return ret;
@@ -4636,6 +4860,10 @@ static int ocfs2_delete_xattr_in_bucket(struct inode *inode,
struct ocfs2_xattr_header *xh = bucket_xh(bucket);
u16 i;
struct ocfs2_xattr_entry *xe;
+ struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+ struct ocfs2_xattr_set_ctxt ctxt = {NULL, NULL,};
+
+ ocfs2_init_dealloc_ctxt(&ctxt.dealloc);

for (i = 0; i < le16_to_cpu(xh->xh_count); i++) {
xe = &xh->xh_entries[i];
@@ -4644,13 +4872,16 @@ static int ocfs2_delete_xattr_in_bucket(struct inode *inode,

ret = ocfs2_xattr_bucket_value_truncate(inode,
bucket->bu_bhs[0],
- i, 0);
+ i, 0, &ctxt);
if (ret) {
mlog_errno(ret);
break;
}
}

+ ocfs2_schedule_truncate_log_flush(osb, 1);
+ ocfs2_run_deallocs(osb, &ctxt.dealloc);
+
return ret;
}

--
1.5.6

2008-12-19 22:00:44

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 18/45] ocfs2/xattr: Move clusters free into dealloc.

From: Tao Ma <[email protected]>

Move clusters free process into dealloc context so that
they can be freed after the transaction.

Signed-off-by: Tao Ma <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 14 +-------------
1 files changed, 1 insertions(+), 13 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 4501c63..f1da381 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -457,7 +457,6 @@ static int __ocfs2_remove_xattr_range(struct inode *inode,
int ret;
u64 phys_blkno = ocfs2_clusters_to_blocks(inode->i_sb, phys_cpos);
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
- struct inode *tl_inode = osb->osb_tl_inode;
handle_t *handle;
struct ocfs2_alloc_context *meta_ac = NULL;
struct ocfs2_extent_tree et;
@@ -470,16 +469,6 @@ static int __ocfs2_remove_xattr_range(struct inode *inode,
return ret;
}

- mutex_lock(&tl_inode->i_mutex);
-
- if (ocfs2_truncate_log_needs_flush(osb)) {
- ret = __ocfs2_flush_truncate_log(osb);
- if (ret < 0) {
- mlog_errno(ret);
- goto out;
- }
- }
-
handle = ocfs2_start_trans(osb, OCFS2_REMOVE_EXTENT_CREDITS);
if (IS_ERR(handle)) {
ret = PTR_ERR(handle);
@@ -509,14 +498,13 @@ static int __ocfs2_remove_xattr_range(struct inode *inode,
goto out_commit;
}

- ret = ocfs2_truncate_log_append(osb, handle, phys_blkno, len);
+ ret = ocfs2_cache_cluster_dealloc(dealloc, phys_blkno, len);
if (ret)
mlog_errno(ret);

out_commit:
ocfs2_commit_trans(osb, handle);
out:
- mutex_unlock(&tl_inode->i_mutex);

if (meta_ac)
ocfs2_free_alloc_context(meta_ac);
--
1.5.6

2008-12-19 21:59:53

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 16/45] ocfs2/xattr: Only extend xattr bucket in need.

From: Tao Ma <[email protected]>

When the first block of a bucket is filled up with xattr
entries, we normally extend the bucket. But if we are
just replace one xattr with small length, we don't need
to extend it. This is important since we will calculate
what we need before the transaction and in this situation
no resources will be allocated.

Signed-off-by: Tao Ma <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index d8fc714..4501c63 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -4564,7 +4564,9 @@ try_again:
free, need, max_free, le16_to_cpu(xh->xh_free_start),
le16_to_cpu(xh->xh_name_value_len));

- if (free < need || count == ocfs2_xattr_max_xe_in_bucket(inode->i_sb)) {
+ if (free < need ||
+ (xs->not_found &&
+ count == ocfs2_xattr_max_xe_in_bucket(inode->i_sb))) {
if (need <= max_free &&
count < ocfs2_xattr_max_xe_in_bucket(inode->i_sb)) {
/*
--
1.5.6

2008-12-19 22:01:24

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 20/45] ocfs2/xattr: Merge xattr set transaction.

From: Tao Ma <[email protected]>

In current ocfs2/xattr, the whole xattr set is divided into
many steps are many transaction are used, this make the
xattr set process isn't like a real transaction, so this
patch try to merge all the transaction into one. Another
benefit is that acl can use it easily now.

I don't merge the transaction of deleting xattr when we
remove an inode. The reason is that if we have a large number
of xattrs and every xattrs has large values(large enough
for outside storage), the whole transaction will be very
huge and it looks like jbd can't handle it(I meet with a
jbd complain once). And the old inode removal is also divided
into many steps, so I'd like to leave as it is.

Note:
In xattr set, I try to avoid ocfs2_extend_trans since if
the credits aren't enough for the extension, it will commit
all the dirty blocks and create a new transaction which may
lead to inconsistency in metadata. All ocfs2_extend_trans
remained are safe now.

Signed-off-by: Tao Ma <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 673 ++++++++++++++++++++++++++----------------------------
1 files changed, 325 insertions(+), 348 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 4fd201a..7a90892 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -72,6 +72,7 @@ struct ocfs2_xattr_bucket {
};

struct ocfs2_xattr_set_ctxt {
+ handle_t *handle;
struct ocfs2_alloc_context *meta_ac;
struct ocfs2_alloc_context *data_ac;
struct ocfs2_cached_dealloc_ctxt dealloc;
@@ -346,9 +347,7 @@ static int ocfs2_xattr_extend_allocation(struct inode *inode,
struct ocfs2_xattr_set_ctxt *ctxt)
{
int status = 0;
- int restart_func = 0;
- int credits = 0;
- handle_t *handle = NULL;
+ handle_t *handle = ctxt->handle;
enum ocfs2_alloc_restarted why;
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
u32 prev_clusters, logical_start = le32_to_cpu(xv->xr_clusters);
@@ -358,19 +357,6 @@ static int ocfs2_xattr_extend_allocation(struct inode *inode,

ocfs2_init_xattr_value_extent_tree(&et, inode, xattr_bh, xv);

-restart_all:
-
- credits = ocfs2_calc_extend_credits(osb->sb, et.et_root_el,
- clusters_to_add);
- handle = ocfs2_start_trans(osb, credits);
- if (IS_ERR(handle)) {
- status = PTR_ERR(handle);
- handle = NULL;
- mlog_errno(status);
- goto leave;
- }
-
-restarted_transaction:
status = ocfs2_journal_access(handle, inode, xattr_bh,
OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
@@ -389,9 +375,8 @@ restarted_transaction:
ctxt->data_ac,
ctxt->meta_ac,
&why);
- if ((status < 0) && (status != -EAGAIN)) {
- if (status != -ENOSPC)
- mlog_errno(status);
+ if (status < 0) {
+ mlog_errno(status);
goto leave;
}

@@ -403,39 +388,13 @@ restarted_transaction:

clusters_to_add -= le32_to_cpu(xv->xr_clusters) - prev_clusters;

- if (why != RESTART_NONE && clusters_to_add) {
- if (why == RESTART_META) {
- mlog(0, "restarting function.\n");
- restart_func = 1;
- } else {
- BUG_ON(why != RESTART_TRANS);
-
- mlog(0, "restarting transaction.\n");
- /* TODO: This can be more intelligent. */
- credits = ocfs2_calc_extend_credits(osb->sb,
- et.et_root_el,
- clusters_to_add);
- status = ocfs2_extend_trans(handle, credits);
- if (status < 0) {
- /* handle still has to be committed at
- * this point. */
- status = -ENOMEM;
- mlog_errno(status);
- goto leave;
- }
- goto restarted_transaction;
- }
- }
+ /*
+ * We should have already allocated enough space before the transaction,
+ * so no need to restart.
+ */
+ BUG_ON(why != RESTART_NONE || clusters_to_add);

leave:
- if (handle) {
- ocfs2_commit_trans(osb, handle);
- handle = NULL;
- }
- if ((!status) && restart_func) {
- restart_func = 0;
- goto restart_all;
- }

return status;
}
@@ -448,31 +407,23 @@ static int __ocfs2_remove_xattr_range(struct inode *inode,
{
int ret;
u64 phys_blkno = ocfs2_clusters_to_blocks(inode->i_sb, phys_cpos);
- struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
- handle_t *handle;
+ handle_t *handle = ctxt->handle;
struct ocfs2_extent_tree et;

ocfs2_init_xattr_value_extent_tree(&et, inode, root_bh, xv);

- handle = ocfs2_start_trans(osb, OCFS2_REMOVE_EXTENT_CREDITS);
- if (IS_ERR(handle)) {
- ret = PTR_ERR(handle);
- mlog_errno(ret);
- goto out;
- }
-
ret = ocfs2_journal_access(handle, inode, root_bh,
OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
- goto out_commit;
+ goto out;
}

ret = ocfs2_remove_extent(inode, &et, cpos, len, handle, ctxt->meta_ac,
&ctxt->dealloc);
if (ret) {
mlog_errno(ret);
- goto out_commit;
+ goto out;
}

le32_add_cpu(&xv->xr_clusters, -len);
@@ -480,15 +431,13 @@ static int __ocfs2_remove_xattr_range(struct inode *inode,
ret = ocfs2_journal_dirty(handle, root_bh);
if (ret) {
mlog_errno(ret);
- goto out_commit;
+ goto out;
}

ret = ocfs2_cache_cluster_dealloc(&ctxt->dealloc, phys_blkno, len);
if (ret)
mlog_errno(ret);

-out_commit:
- ocfs2_commit_trans(osb, handle);
out:
return ret;
}
@@ -975,6 +924,7 @@ static int ocfs2_xattr_get(struct inode *inode,
}

static int __ocfs2_xattr_set_value_outside(struct inode *inode,
+ handle_t *handle,
struct ocfs2_xattr_value_root *xv,
const void *value,
int value_len)
@@ -986,14 +936,17 @@ static int __ocfs2_xattr_set_value_outside(struct inode *inode,
u32 clusters = ocfs2_clusters_for_bytes(inode->i_sb, value_len);
u64 blkno;
struct buffer_head *bh = NULL;
- handle_t *handle;

BUG_ON(clusters > le32_to_cpu(xv->xr_clusters));

+ /*
+ * In __ocfs2_xattr_set_value_outside has already been dirtied,
+ * so we don't need to worry about whether ocfs2_extend_trans
+ * will create a new transactio for us or not.
+ */
credits = clusters * bpc;
- handle = ocfs2_start_trans(OCFS2_SB(inode->i_sb), credits);
- if (IS_ERR(handle)) {
- ret = PTR_ERR(handle);
+ ret = ocfs2_extend_trans(handle, credits);
+ if (ret) {
mlog_errno(ret);
goto out;
}
@@ -1003,7 +956,7 @@ static int __ocfs2_xattr_set_value_outside(struct inode *inode,
&num_clusters, &xv->xr_list);
if (ret) {
mlog_errno(ret);
- goto out_commit;
+ goto out;
}

blkno = ocfs2_clusters_to_blocks(inode->i_sb, p_cluster);
@@ -1012,7 +965,7 @@ static int __ocfs2_xattr_set_value_outside(struct inode *inode,
ret = ocfs2_read_block(inode, blkno, &bh);
if (ret) {
mlog_errno(ret);
- goto out_commit;
+ goto out;
}

ret = ocfs2_journal_access(handle,
@@ -1021,7 +974,7 @@ static int __ocfs2_xattr_set_value_outside(struct inode *inode,
OCFS2_JOURNAL_ACCESS_WRITE);
if (ret < 0) {
mlog_errno(ret);
- goto out_commit;
+ goto out;
}

cp_len = value_len > blocksize ? blocksize : value_len;
@@ -1035,7 +988,7 @@ static int __ocfs2_xattr_set_value_outside(struct inode *inode,
ret = ocfs2_journal_dirty(handle, bh);
if (ret < 0) {
mlog_errno(ret);
- goto out_commit;
+ goto out;
}
brelse(bh);
bh = NULL;
@@ -1049,8 +1002,6 @@ static int __ocfs2_xattr_set_value_outside(struct inode *inode,
}
cpos += num_clusters;
}
-out_commit:
- ocfs2_commit_trans(OCFS2_SB(inode->i_sb), handle);
out:
brelse(bh);

@@ -1058,28 +1009,21 @@ out:
}

static int ocfs2_xattr_cleanup(struct inode *inode,
+ handle_t *handle,
struct ocfs2_xattr_info *xi,
struct ocfs2_xattr_search *xs,
size_t offs)
{
- handle_t *handle = NULL;
int ret = 0;
size_t name_len = strlen(xi->name);
void *val = xs->base + offs;
size_t size = OCFS2_XATTR_SIZE(name_len) + OCFS2_XATTR_ROOT_SIZE;

- handle = ocfs2_start_trans((OCFS2_SB(inode->i_sb)),
- OCFS2_XATTR_BLOCK_UPDATE_CREDITS);
- if (IS_ERR(handle)) {
- ret = PTR_ERR(handle);
- mlog_errno(ret);
- goto out;
- }
ret = ocfs2_journal_access(handle, inode, xs->xattr_bh,
OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
- goto out_commit;
+ goto out;
}
/* Decrease xattr count */
le16_add_cpu(&xs->header->xh_count, -1);
@@ -1090,32 +1034,23 @@ static int ocfs2_xattr_cleanup(struct inode *inode,
ret = ocfs2_journal_dirty(handle, xs->xattr_bh);
if (ret < 0)
mlog_errno(ret);
-out_commit:
- ocfs2_commit_trans(OCFS2_SB(inode->i_sb), handle);
out:
return ret;
}

static int ocfs2_xattr_update_entry(struct inode *inode,
+ handle_t *handle,
struct ocfs2_xattr_info *xi,
struct ocfs2_xattr_search *xs,
size_t offs)
{
- handle_t *handle = NULL;
- int ret = 0;
+ int ret;

- handle = ocfs2_start_trans((OCFS2_SB(inode->i_sb)),
- OCFS2_XATTR_BLOCK_UPDATE_CREDITS);
- if (IS_ERR(handle)) {
- ret = PTR_ERR(handle);
- mlog_errno(ret);
- goto out;
- }
ret = ocfs2_journal_access(handle, inode, xs->xattr_bh,
OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
- goto out_commit;
+ goto out;
}

xs->here->xe_name_offset = cpu_to_le16(offs);
@@ -1129,8 +1064,6 @@ static int ocfs2_xattr_update_entry(struct inode *inode,
ret = ocfs2_journal_dirty(handle, xs->xattr_bh);
if (ret < 0)
mlog_errno(ret);
-out_commit:
- ocfs2_commit_trans(OCFS2_SB(inode->i_sb), handle);
out:
return ret;
}
@@ -1168,13 +1101,13 @@ static int ocfs2_xattr_set_value_outside(struct inode *inode,
mlog_errno(ret);
return ret;
}
- ret = __ocfs2_xattr_set_value_outside(inode, xv, xi->value,
- xi->value_len);
+ ret = ocfs2_xattr_update_entry(inode, ctxt->handle, xi, xs, offs);
if (ret < 0) {
mlog_errno(ret);
return ret;
}
- ret = ocfs2_xattr_update_entry(inode, xi, xs, offs);
+ ret = __ocfs2_xattr_set_value_outside(inode, ctxt->handle, xv,
+ xi->value, xi->value_len);
if (ret < 0)
mlog_errno(ret);

@@ -1302,7 +1235,7 @@ static int ocfs2_xattr_set_entry(struct inode *inode,
struct ocfs2_dinode *di = (struct ocfs2_dinode *)xs->inode_bh->b_data;
size_t min_offs = xs->end - xs->base, name_len = strlen(xi->name);
size_t size_l = 0;
- handle_t *handle = NULL;
+ handle_t *handle = ctxt->handle;
int free, i, ret;
struct ocfs2_xattr_info xi_l = {
.name_index = xi->name_index,
@@ -1391,19 +1324,21 @@ static int ocfs2_xattr_set_entry(struct inode *inode,
goto out;
}

- ret = __ocfs2_xattr_set_value_outside(inode,
- xv,
- xi->value,
- xi->value_len);
+ ret = ocfs2_xattr_update_entry(inode,
+ handle,
+ xi,
+ xs,
+ offs);
if (ret < 0) {
mlog_errno(ret);
goto out;
}

- ret = ocfs2_xattr_update_entry(inode,
- xi,
- xs,
- offs);
+ ret = __ocfs2_xattr_set_value_outside(inode,
+ handle,
+ xv,
+ xi->value,
+ xi->value_len);
if (ret < 0)
mlog_errno(ret);
goto out;
@@ -1413,45 +1348,29 @@ static int ocfs2_xattr_set_entry(struct inode *inode,
* just trucate old value to zero.
*/
ret = ocfs2_xattr_value_truncate(inode,
- xs->xattr_bh,
- xv,
- 0,
- ctxt);
+ xs->xattr_bh,
+ xv,
+ 0,
+ ctxt);
if (ret < 0)
mlog_errno(ret);
}
}
}

- handle = ocfs2_start_trans((OCFS2_SB(inode->i_sb)),
- OCFS2_INODE_UPDATE_CREDITS);
- if (IS_ERR(handle)) {
- ret = PTR_ERR(handle);
- mlog_errno(ret);
- goto out;
- }
-
ret = ocfs2_journal_access(handle, inode, xs->inode_bh,
OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
- goto out_commit;
+ goto out;
}

if (!(flag & OCFS2_INLINE_XATTR_FL)) {
- /* set extended attribute in external block. */
- ret = ocfs2_extend_trans(handle,
- OCFS2_INODE_UPDATE_CREDITS +
- OCFS2_XATTR_BLOCK_UPDATE_CREDITS);
- if (ret) {
- mlog_errno(ret);
- goto out_commit;
- }
ret = ocfs2_journal_access(handle, inode, xs->xattr_bh,
OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
- goto out_commit;
+ goto out;
}
}

@@ -1465,7 +1384,7 @@ static int ocfs2_xattr_set_entry(struct inode *inode,
ret = ocfs2_journal_dirty(handle, xs->xattr_bh);
if (ret < 0) {
mlog_errno(ret);
- goto out_commit;
+ goto out;
}
}

@@ -1502,9 +1421,6 @@ static int ocfs2_xattr_set_entry(struct inode *inode,
if (ret < 0)
mlog_errno(ret);

-out_commit:
- ocfs2_commit_trans(OCFS2_SB(inode->i_sb), handle);
-
if (!ret && xi->value_len > OCFS2_XATTR_INLINE_SIZE) {
/*
* Set value outside in B tree.
@@ -1520,14 +1436,14 @@ out_commit:
* If set value outside failed, we have to clean
* the junk tree root we have already set in local.
*/
- ret2 = ocfs2_xattr_cleanup(inode, xi, xs, offs);
+ ret2 = ocfs2_xattr_cleanup(inode, ctxt->handle,
+ xi, xs, offs);
if (ret2 < 0)
mlog_errno(ret2);
}
}
out:
return ret;
-
}

static int ocfs2_remove_value_outside(struct inode*inode,
@@ -1540,6 +1456,13 @@ static int ocfs2_remove_value_outside(struct inode*inode,

ocfs2_init_dealloc_ctxt(&ctxt.dealloc);

+ ctxt.handle = ocfs2_start_trans(osb, OCFS2_REMOVE_EXTENT_CREDITS);
+ if (IS_ERR(ctxt.handle)) {
+ ret = PTR_ERR(ctxt.handle);
+ mlog_errno(ret);
+ goto out;
+ }
+
for (i = 0; i < le16_to_cpu(header->xh_count); i++) {
struct ocfs2_xattr_entry *entry = &header->xh_entries[i];

@@ -1560,8 +1483,10 @@ static int ocfs2_remove_value_outside(struct inode*inode,
}
}

+ ocfs2_commit_trans(osb, ctxt.handle);
ocfs2_schedule_truncate_log_flush(osb, 1);
ocfs2_run_deallocs(osb, &ctxt.dealloc);
+out:
return ret;
}

@@ -1920,7 +1845,7 @@ static int ocfs2_xattr_block_set(struct inode *inode,
struct buffer_head *new_bh = NULL;
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
struct ocfs2_dinode *di = (struct ocfs2_dinode *)xs->inode_bh->b_data;
- handle_t *handle = NULL;
+ handle_t *handle = ctxt->handle;
struct ocfs2_xattr_block *xblk = NULL;
u16 suballoc_bit_start;
u32 num_got;
@@ -1928,18 +1853,11 @@ static int ocfs2_xattr_block_set(struct inode *inode,
int ret;

if (!xs->xattr_bh) {
- handle = ocfs2_start_trans(osb,
- OCFS2_XATTR_BLOCK_CREATE_CREDITS);
- if (IS_ERR(handle)) {
- ret = PTR_ERR(handle);
- mlog_errno(ret);
- goto out;
- }
ret = ocfs2_journal_access(handle, inode, xs->inode_bh,
OCFS2_JOURNAL_ACCESS_CREATE);
if (ret < 0) {
mlog_errno(ret);
- goto out_commit;
+ goto end;
}

ret = ocfs2_claim_metadata(osb, handle, ctxt->meta_ac, 1,
@@ -1947,7 +1865,7 @@ static int ocfs2_xattr_block_set(struct inode *inode,
&first_blkno);
if (ret < 0) {
mlog_errno(ret);
- goto out_commit;
+ goto end;
}

new_bh = sb_getblk(inode->i_sb, first_blkno);
@@ -1957,7 +1875,7 @@ static int ocfs2_xattr_block_set(struct inode *inode,
OCFS2_JOURNAL_ACCESS_CREATE);
if (ret < 0) {
mlog_errno(ret);
- goto out_commit;
+ goto end;
}

/* Initialize ocfs2_xattr_block */
@@ -1978,17 +1896,10 @@ static int ocfs2_xattr_block_set(struct inode *inode,
ret = ocfs2_journal_dirty(handle, new_bh);
if (ret < 0) {
mlog_errno(ret);
- goto out_commit;
+ goto end;
}
di->i_xattr_loc = cpu_to_le64(first_blkno);
- ret = ocfs2_journal_dirty(handle, xs->inode_bh);
- if (ret < 0)
- mlog_errno(ret);
-out_commit:
- ocfs2_commit_trans(osb, handle);
-out:
- if (ret < 0)
- return ret;
+ ocfs2_journal_dirty(handle, xs->inode_bh);
} else
xblk = (struct ocfs2_xattr_block *)xs->xattr_bh->b_data;

@@ -2057,10 +1968,11 @@ static int ocfs2_calc_xattr_set_need(struct inode *inode,
struct ocfs2_xattr_search *xis,
struct ocfs2_xattr_search *xbs,
int *clusters_need,
- int *meta_need)
+ int *meta_need,
+ int *credits_need)
{
int ret = 0, old_in_xb = 0;
- int clusters_add = 0, meta_add = 0;
+ int clusters_add = 0, meta_add = 0, credits = 0;
struct buffer_head *bh = NULL;
struct ocfs2_xattr_block *xb = NULL;
struct ocfs2_xattr_entry *xe = NULL;
@@ -2071,16 +1983,15 @@ static int ocfs2_calc_xattr_set_need(struct inode *inode,
xi->value_len);
u64 value_size;

- /*
- * delete a xattr doesn't need metadata and cluster allocation.
- * so return.
- */
- if (!xi->value)
- goto out;
-
if (xis->not_found && xbs->not_found) {
- if (xi->value_len > OCFS2_XATTR_INLINE_SIZE)
+ credits += ocfs2_blocks_per_xattr_bucket(inode->i_sb);
+
+ if (xi->value_len > OCFS2_XATTR_INLINE_SIZE) {
clusters_add += new_clusters;
+ credits += ocfs2_calc_extend_credits(inode->i_sb,
+ &def_xv.xv.xr_list,
+ new_clusters);
+ }

goto meta_guess;
}
@@ -2090,6 +2001,7 @@ static int ocfs2_calc_xattr_set_need(struct inode *inode,
name_offset = le16_to_cpu(xe->xe_name_offset);
name_len = OCFS2_XATTR_SIZE(xe->xe_name_len);
base = xis->base;
+ credits += OCFS2_INODE_UPDATE_CREDITS;
} else {
int i, block_off;
xb = (struct ocfs2_xattr_block *)xbs->xattr_bh->b_data;
@@ -2105,8 +2017,25 @@ static int ocfs2_calc_xattr_set_need(struct inode *inode,
i, &block_off,
&name_offset);
base = bucket_block(xbs->bucket, block_off);
- } else
+ credits += ocfs2_blocks_per_xattr_bucket(inode->i_sb);
+ } else {
base = xbs->base;
+ credits += OCFS2_XATTR_BLOCK_UPDATE_CREDITS;
+ }
+ }
+
+ /*
+ * delete a xattr doesn't need metadata and cluster allocation.
+ * so just calculate the credits and return.
+ *
+ * The credits for removing the value tree will be extended
+ * by ocfs2_remove_extent itself.
+ */
+ if (!xi->value) {
+ if (!ocfs2_xattr_is_local(xe))
+ credits += OCFS2_REMOVE_EXTENT_CREDITS;
+
+ goto out;
}

/* do cluster allocation guess first. */
@@ -2121,6 +2050,13 @@ static int ocfs2_calc_xattr_set_need(struct inode *inode,
*/
if (ocfs2_xattr_can_be_in_inode(inode, xi, xis)) {
clusters_add += new_clusters;
+ credits += OCFS2_REMOVE_EXTENT_CREDITS +
+ OCFS2_INODE_UPDATE_CREDITS;
+ if (!ocfs2_xattr_is_local(xe))
+ credits += ocfs2_calc_extend_credits(
+ inode->i_sb,
+ &def_xv.xv.xr_list,
+ new_clusters);
goto out;
}
}
@@ -2137,11 +2073,16 @@ static int ocfs2_calc_xattr_set_need(struct inode *inode,
} else
xv = &def_xv.xv;

- if (old_clusters >= new_clusters)
+ if (old_clusters >= new_clusters) {
+ credits += OCFS2_REMOVE_EXTENT_CREDITS;
goto out;
- else {
+ } else {
meta_add += ocfs2_extend_meta_needed(&xv->xr_list);
clusters_add += new_clusters - old_clusters;
+ credits += ocfs2_calc_extend_credits(inode->i_sb,
+ &xv->xr_list,
+ new_clusters -
+ old_clusters);
goto out;
}
} else {
@@ -2177,6 +2118,8 @@ meta_guess:
struct ocfs2_extent_list *el =
&xb->xb_attrs.xb_root.xt_list;
meta_add += ocfs2_extend_meta_needed(el);
+ credits += ocfs2_calc_extend_credits(inode->i_sb,
+ el, 1);
}

/*
@@ -2187,16 +2130,23 @@ meta_guess:
* also.
*/
clusters_add += 1;
+ credits += ocfs2_blocks_per_xattr_bucket(inode->i_sb);
if (OCFS2_XATTR_BUCKET_SIZE ==
- OCFS2_SB(inode->i_sb)->s_clustersize)
+ OCFS2_SB(inode->i_sb)->s_clustersize) {
+ credits += ocfs2_blocks_per_xattr_bucket(inode->i_sb);
clusters_add += 1;
- } else
+ }
+ } else {
meta_add += 1;
+ credits += OCFS2_XATTR_BLOCK_CREATE_CREDITS;
+ }
out:
if (clusters_need)
*clusters_need = clusters_add;
if (meta_need)
*meta_need = meta_add;
+ if (credits_need)
+ *credits_need = credits;
brelse(bh);
return ret;
}
@@ -2206,7 +2156,8 @@ static int ocfs2_init_xattr_set_ctxt(struct inode *inode,
struct ocfs2_xattr_info *xi,
struct ocfs2_xattr_search *xis,
struct ocfs2_xattr_search *xbs,
- struct ocfs2_xattr_set_ctxt *ctxt)
+ struct ocfs2_xattr_set_ctxt *ctxt,
+ int *credits)
{
int clusters_add, meta_add, ret;
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
@@ -2216,14 +2167,14 @@ static int ocfs2_init_xattr_set_ctxt(struct inode *inode,
ocfs2_init_dealloc_ctxt(&ctxt->dealloc);

ret = ocfs2_calc_xattr_set_need(inode, di, xi, xis, xbs,
- &clusters_add, &meta_add);
+ &clusters_add, &meta_add, credits);
if (ret) {
mlog_errno(ret);
return ret;
}

- mlog(0, "Set xattr %s, reserve meta blocks = %d, clusters = %d\n",
- xi->name, meta_add, clusters_add);
+ mlog(0, "Set xattr %s, reserve meta blocks = %d, clusters = %d, "
+ "credits = %d\n", xi->name, meta_add, clusters_add, *credits);

if (meta_add) {
ret = ocfs2_reserve_new_metadata_blocks(osb, meta_add,
@@ -2254,6 +2205,126 @@ out:
return ret;
}

+static int __ocfs2_xattr_set_handle(struct inode *inode,
+ struct ocfs2_dinode *di,
+ struct ocfs2_xattr_info *xi,
+ struct ocfs2_xattr_search *xis,
+ struct ocfs2_xattr_search *xbs,
+ struct ocfs2_xattr_set_ctxt *ctxt)
+{
+ int ret = 0, credits;
+
+ if (!xi->value) {
+ /* Remove existing extended attribute */
+ if (!xis->not_found)
+ ret = ocfs2_xattr_ibody_set(inode, xi, xis, ctxt);
+ else if (!xbs->not_found)
+ ret = ocfs2_xattr_block_set(inode, xi, xbs, ctxt);
+ } else {
+ /* We always try to set extended attribute into inode first*/
+ ret = ocfs2_xattr_ibody_set(inode, xi, xis, ctxt);
+ if (!ret && !xbs->not_found) {
+ /*
+ * If succeed and that extended attribute existing in
+ * external block, then we will remove it.
+ */
+ xi->value = NULL;
+ xi->value_len = 0;
+
+ xis->not_found = -ENODATA;
+ ret = ocfs2_calc_xattr_set_need(inode,
+ di,
+ xi,
+ xis,
+ xbs,
+ NULL,
+ NULL,
+ &credits);
+ if (ret) {
+ mlog_errno(ret);
+ goto out;
+ }
+
+ ret = ocfs2_extend_trans(ctxt->handle, credits +
+ ctxt->handle->h_buffer_credits);
+ if (ret) {
+ mlog_errno(ret);
+ goto out;
+ }
+ ret = ocfs2_xattr_block_set(inode, xi, xbs, ctxt);
+ } else if (ret == -ENOSPC) {
+ if (di->i_xattr_loc && !xbs->xattr_bh) {
+ ret = ocfs2_xattr_block_find(inode,
+ xi->name_index,
+ xi->name, xbs);
+ if (ret)
+ goto out;
+
+ xis->not_found = -ENODATA;
+ ret = ocfs2_calc_xattr_set_need(inode,
+ di,
+ xi,
+ xis,
+ xbs,
+ NULL,
+ NULL,
+ &credits);
+ if (ret) {
+ mlog_errno(ret);
+ goto out;
+ }
+
+ ret = ocfs2_extend_trans(ctxt->handle, credits +
+ ctxt->handle->h_buffer_credits);
+ if (ret) {
+ mlog_errno(ret);
+ goto out;
+ }
+ }
+ /*
+ * If no space in inode, we will set extended attribute
+ * into external block.
+ */
+ ret = ocfs2_xattr_block_set(inode, xi, xbs, ctxt);
+ if (ret)
+ goto out;
+ if (!xis->not_found) {
+ /*
+ * If succeed and that extended attribute
+ * existing in inode, we will remove it.
+ */
+ xi->value = NULL;
+ xi->value_len = 0;
+ xbs->not_found = -ENODATA;
+ ret = ocfs2_calc_xattr_set_need(inode,
+ di,
+ xi,
+ xis,
+ xbs,
+ NULL,
+ NULL,
+ &credits);
+ if (ret) {
+ mlog_errno(ret);
+ goto out;
+ }
+
+ ret = ocfs2_extend_trans(ctxt->handle, credits +
+ ctxt->handle->h_buffer_credits);
+ if (ret) {
+ mlog_errno(ret);
+ goto out;
+ }
+ ret = ocfs2_xattr_ibody_set(inode, xi,
+ xis, ctxt);
+ }
+ }
+ }
+
+out:
+ return ret;
+}
+
/*
* ocfs2_xattr_set()
*
@@ -2270,8 +2341,9 @@ int ocfs2_xattr_set(struct inode *inode,
{
struct buffer_head *di_bh = NULL;
struct ocfs2_dinode *di;
- int ret;
+ int ret, credits;
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+ struct inode *tl_inode = osb->osb_tl_inode;
struct ocfs2_xattr_set_ctxt ctxt = { NULL, NULL, };

struct ocfs2_xattr_info xi = {
@@ -2337,56 +2409,37 @@ int ocfs2_xattr_set(struct inode *inode,
goto cleanup;
}

- ret = ocfs2_init_xattr_set_ctxt(inode, di, &xi, &xis, &xbs, &ctxt);
+
+ mutex_lock(&tl_inode->i_mutex);
+
+ if (ocfs2_truncate_log_needs_flush(osb)) {
+ ret = __ocfs2_flush_truncate_log(osb);
+ if (ret < 0) {
+ mutex_unlock(&tl_inode->i_mutex);
+ mlog_errno(ret);
+ goto cleanup;
+ }
+ }
+ mutex_unlock(&tl_inode->i_mutex);
+
+ ret = ocfs2_init_xattr_set_ctxt(inode, di, &xi, &xis,
+ &xbs, &ctxt, &credits);
if (ret) {
mlog_errno(ret);
goto cleanup;
}

- if (!value) {
- /* Remove existing extended attribute */
- if (!xis.not_found)
- ret = ocfs2_xattr_ibody_set(inode, &xi, &xis, &ctxt);
- else if (!xbs.not_found)
- ret = ocfs2_xattr_block_set(inode, &xi, &xbs, &ctxt);
- } else {
- /* We always try to set extended attribute into inode first*/
- ret = ocfs2_xattr_ibody_set(inode, &xi, &xis, &ctxt);
- if (!ret && !xbs.not_found) {
- /*
- * If succeed and that extended attribute existing in
- * external block, then we will remove it.
- */
- xi.value = NULL;
- xi.value_len = 0;
- ret = ocfs2_xattr_block_set(inode, &xi, &xbs, &ctxt);
- } else if (ret == -ENOSPC) {
- if (di->i_xattr_loc && !xbs.xattr_bh) {
- ret = ocfs2_xattr_block_find(inode, name_index,
- name, &xbs);
- if (ret)
- goto cleanup;
- }
- /*
- * If no space in inode, we will set extended attribute
- * into external block.
- */
- ret = ocfs2_xattr_block_set(inode, &xi, &xbs, &ctxt);
- if (ret)
- goto free;
- if (!xis.not_found) {
- /*
- * If succeed and that extended attribute
- * existing in inode, we will remove it.
- */
- xi.value = NULL;
- xi.value_len = 0;
- ret = ocfs2_xattr_ibody_set(inode, &xi,
- &xis, &ctxt);
- }
- }
+ ctxt.handle = ocfs2_start_trans(osb, credits);
+ if (IS_ERR(ctxt.handle)) {
+ ret = PTR_ERR(ctxt.handle);
+ mlog_errno(ret);
+ goto cleanup;
}
-free:
+
+ ret = __ocfs2_xattr_set_handle(inode, di, &xi, &xis, &xbs, &ctxt);
+
+ ocfs2_commit_trans(osb, ctxt.handle);
+
if (ctxt.data_ac)
ocfs2_free_alloc_context(ctxt.data_ac);
if (ctxt.meta_ac)
@@ -2974,10 +3027,10 @@ static int ocfs2_xattr_create_index_block(struct inode *inode,
struct ocfs2_xattr_search *xs,
struct ocfs2_xattr_set_ctxt *ctxt)
{
- int ret, credits = OCFS2_SUBALLOC_ALLOC;
+ int ret;
u32 bit_off, len;
u64 blkno;
- handle_t *handle;
+ handle_t *handle = ctxt->handle;
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
struct ocfs2_inode_info *oi = OCFS2_I(inode);
struct buffer_head *xb_bh = xs->xattr_bh;
@@ -2999,30 +3052,18 @@ static int ocfs2_xattr_create_index_block(struct inode *inode,
*/
down_write(&oi->ip_alloc_sem);

- /*
- * We need more credits. One for the xattr block update and one
- * for each block of the new xattr bucket.
- */
- credits += 1 + ocfs2_blocks_per_xattr_bucket(inode->i_sb);
- handle = ocfs2_start_trans(osb, credits);
- if (IS_ERR(handle)) {
- ret = PTR_ERR(handle);
- mlog_errno(ret);
- goto out_sem;
- }
-
ret = ocfs2_journal_access(handle, inode, xb_bh,
OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
- goto out_commit;
+ goto out;
}

ret = __ocfs2_claim_clusters(osb, handle, ctxt->data_ac,
1, 1, &bit_off, &len);
if (ret) {
mlog_errno(ret);
- goto out_commit;
+ goto out;
}

/*
@@ -3038,14 +3079,14 @@ static int ocfs2_xattr_create_index_block(struct inode *inode,
ret = ocfs2_init_xattr_bucket(xs->bucket, blkno);
if (ret) {
mlog_errno(ret);
- goto out_commit;
+ goto out;
}

ret = ocfs2_xattr_bucket_journal_access(handle, xs->bucket,
OCFS2_JOURNAL_ACCESS_CREATE);
if (ret) {
mlog_errno(ret);
- goto out_commit;
+ goto out;
}

ocfs2_cp_xattr_block_to_bucket(inode, xb_bh, xs->bucket);
@@ -3070,16 +3111,9 @@ static int ocfs2_xattr_create_index_block(struct inode *inode,

xb->xb_flags = cpu_to_le16(xb_flags | OCFS2_XATTR_INDEXED);

- ret = ocfs2_journal_dirty(handle, xb_bh);
- if (ret) {
- mlog_errno(ret);
- goto out_commit;
- }
+ ocfs2_journal_dirty(handle, xb_bh);

-out_commit:
- ocfs2_commit_trans(osb, handle);
-
-out_sem:
+out:
up_write(&oi->ip_alloc_sem);

return ret;
@@ -3105,6 +3139,7 @@ static int cmp_xe_offset(const void *a, const void *b)
* so that we can spare some space for insertion.
*/
static int ocfs2_defrag_xattr_bucket(struct inode *inode,
+ handle_t *handle,
struct ocfs2_xattr_bucket *bucket)
{
int ret, i;
@@ -3114,7 +3149,6 @@ static int ocfs2_defrag_xattr_bucket(struct inode *inode,
u64 blkno = bucket_blkno(bucket);
u16 xh_free_start;
size_t blocksize = inode->i_sb->s_blocksize;
- handle_t *handle;
struct ocfs2_xattr_entry *xe;

/*
@@ -3133,19 +3167,11 @@ static int ocfs2_defrag_xattr_bucket(struct inode *inode,
for (i = 0; i < bucket->bu_blocks; i++, buf += blocksize)
memcpy(buf, bucket_block(bucket, i), blocksize);

- handle = ocfs2_start_trans((OCFS2_SB(inode->i_sb)), bucket->bu_blocks);
- if (IS_ERR(handle)) {
- ret = PTR_ERR(handle);
- handle = NULL;
- mlog_errno(ret);
- goto out;
- }
-
ret = ocfs2_xattr_bucket_journal_access(handle, bucket,
OCFS2_JOURNAL_ACCESS_WRITE);
if (ret < 0) {
mlog_errno(ret);
- goto commit;
+ goto out;
}

xh = (struct ocfs2_xattr_header *)bucket_buf;
@@ -3203,7 +3229,7 @@ static int ocfs2_defrag_xattr_bucket(struct inode *inode,
"bucket %llu\n", (unsigned long long)blkno);

if (xh_free_start == end)
- goto commit;
+ goto out;

memset(bucket_buf + xh_free_start, 0, end - xh_free_start);
xh->xh_free_start = cpu_to_le16(end);
@@ -3218,8 +3244,6 @@ static int ocfs2_defrag_xattr_bucket(struct inode *inode,
memcpy(bucket_block(bucket, i), buf, blocksize);
ocfs2_xattr_bucket_journal_dirty(handle, bucket);

-commit:
- ocfs2_commit_trans(OCFS2_SB(inode->i_sb), handle);
out:
kfree(bucket_buf);
return ret;
@@ -3270,7 +3294,7 @@ static int ocfs2_mv_xattr_bucket_cross_cluster(struct inode *inode,
* 1 more for the update of the 1st bucket of the previous
* extent record.
*/
- credits = bpc / 2 + 1;
+ credits = bpc / 2 + 1 + handle->h_buffer_credits;
ret = ocfs2_extend_trans(handle, credits);
if (ret) {
mlog_errno(ret);
@@ -3662,7 +3686,7 @@ static int ocfs2_cp_xattr_cluster(struct inode *inode,
* We need to update the new cluster and 1 more for the update of
* the 1st bucket of the previous extent rec.
*/
- credits = bpc + 1;
+ credits = bpc + 1 + handle->h_buffer_credits;
ret = ocfs2_extend_trans(handle, credits);
if (ret) {
mlog_errno(ret);
@@ -3732,7 +3756,7 @@ static int ocfs2_divide_xattr_cluster(struct inode *inode,
u32 *first_hash)
{
u16 blk_per_bucket = ocfs2_blocks_per_xattr_bucket(inode->i_sb);
- int ret, credits = 2 * blk_per_bucket;
+ int ret, credits = 2 * blk_per_bucket + handle->h_buffer_credits;

BUG_ON(OCFS2_XATTR_BUCKET_SIZE < OCFS2_SB(inode->i_sb)->s_clustersize);

@@ -3845,12 +3869,12 @@ static int ocfs2_add_new_xattr_cluster(struct inode *inode,
int *extend,
struct ocfs2_xattr_set_ctxt *ctxt)
{
- int ret, credits;
+ int ret;
u16 bpc = ocfs2_clusters_to_blocks(inode->i_sb, 1);
u32 prev_clusters = *num_clusters;
u32 clusters_to_add = 1, bit_off, num_bits, v_start = 0;
u64 block;
- handle_t *handle = NULL;
+ handle_t *handle = ctxt->handle;
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
struct ocfs2_extent_tree et;

@@ -3861,16 +3885,6 @@ static int ocfs2_add_new_xattr_cluster(struct inode *inode,

ocfs2_init_xattr_tree_extent_tree(&et, inode, root_bh);

- credits = ocfs2_calc_extend_credits(osb->sb, et.et_root_el,
- clusters_to_add);
- handle = ocfs2_start_trans(osb, credits);
- if (IS_ERR(handle)) {
- ret = PTR_ERR(handle);
- handle = NULL;
- mlog_errno(ret);
- goto leave;
- }
-
ret = ocfs2_journal_access(handle, inode, root_bh,
OCFS2_JOURNAL_ACCESS_WRITE);
if (ret < 0) {
@@ -3924,18 +3938,6 @@ static int ocfs2_add_new_xattr_cluster(struct inode *inode,
}
}

- if (handle->h_buffer_credits < credits) {
- /*
- * The journal has been restarted before, and don't
- * have enough space for the insertion, so extend it
- * here.
- */
- ret = ocfs2_extend_trans(handle, credits);
- if (ret) {
- mlog_errno(ret);
- goto leave;
- }
- }
mlog(0, "Insert %u clusters at block %llu for xattr at %u\n",
num_bits, (unsigned long long)block, v_start);
ret = ocfs2_insert_extent(osb, handle, inode, &et, v_start, block,
@@ -3946,15 +3948,10 @@ static int ocfs2_add_new_xattr_cluster(struct inode *inode,
}

ret = ocfs2_journal_dirty(handle, root_bh);
- if (ret < 0) {
+ if (ret < 0)
mlog_errno(ret);
- goto leave;
- }

leave:
- if (handle)
- ocfs2_commit_trans(osb, handle);
-
return ret;
}

@@ -3963,6 +3960,7 @@ leave:
* We meet with start_bh. Only move half of the xattrs to the bucket after it.
*/
static int ocfs2_extend_xattr_bucket(struct inode *inode,
+ handle_t *handle,
struct buffer_head *first_bh,
struct buffer_head *start_bh,
u32 num_clusters)
@@ -3972,7 +3970,6 @@ static int ocfs2_extend_xattr_bucket(struct inode *inode,
u16 blk_per_bucket = ocfs2_blocks_per_xattr_bucket(inode->i_sb);
u64 start_blk = start_bh->b_blocknr, end_blk;
u32 num_buckets = num_clusters * ocfs2_xattr_buckets_per_cluster(osb);
- handle_t *handle;
struct ocfs2_xattr_header *first_xh =
(struct ocfs2_xattr_header *)first_bh->b_data;
u16 bucket = le16_to_cpu(first_xh->xh_num_buckets);
@@ -3989,11 +3986,10 @@ static int ocfs2_extend_xattr_bucket(struct inode *inode,
* We will touch all the buckets after the start_bh(include it).
* Then we add one more bucket.
*/
- credits = end_blk - start_blk + 3 * blk_per_bucket + 1;
- handle = ocfs2_start_trans(osb, credits);
- if (IS_ERR(handle)) {
- ret = PTR_ERR(handle);
- handle = NULL;
+ credits = end_blk - start_blk + 3 * blk_per_bucket + 1 +
+ handle->h_buffer_credits;
+ ret = ocfs2_extend_trans(handle, credits);
+ if (ret) {
mlog_errno(ret);
goto out;
}
@@ -4002,14 +3998,14 @@ static int ocfs2_extend_xattr_bucket(struct inode *inode,
OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
- goto commit;
+ goto out;
}

while (end_blk != start_blk) {
ret = ocfs2_cp_xattr_bucket(inode, handle, end_blk,
end_blk + blk_per_bucket, 0);
if (ret)
- goto commit;
+ goto out;
end_blk -= blk_per_bucket;
}

@@ -4020,8 +4016,6 @@ static int ocfs2_extend_xattr_bucket(struct inode *inode,
le16_add_cpu(&first_xh->xh_num_buckets, 1);
ocfs2_journal_dirty(handle, first_bh);

-commit:
- ocfs2_commit_trans(osb, handle);
out:
return ret;
}
@@ -4099,6 +4093,7 @@ static int ocfs2_add_new_xattr_bucket(struct inode *inode,

if (extend)
ret = ocfs2_extend_xattr_bucket(inode,
+ ctxt->handle,
first_bh,
header_bh,
num_clusters);
@@ -4272,14 +4267,13 @@ set_new_name_value:
* space for the xattr insertion.
*/
static int ocfs2_xattr_set_entry_in_bucket(struct inode *inode,
+ handle_t *handle,
struct ocfs2_xattr_info *xi,
struct ocfs2_xattr_search *xs,
u32 name_hash,
int local)
{
int ret;
- handle_t *handle = NULL;
- struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
u64 blkno;

mlog(0, "Set xattr entry len = %lu index = %d in bucket %llu\n",
@@ -4296,14 +4290,6 @@ static int ocfs2_xattr_set_entry_in_bucket(struct inode *inode,
}
}

- handle = ocfs2_start_trans(osb, xs->bucket->bu_blocks);
- if (IS_ERR(handle)) {
- ret = PTR_ERR(handle);
- handle = NULL;
- mlog_errno(ret);
- goto out;
- }
-
ret = ocfs2_xattr_bucket_journal_access(handle, xs->bucket,
OCFS2_JOURNAL_ACCESS_WRITE);
if (ret < 0) {
@@ -4315,32 +4301,22 @@ static int ocfs2_xattr_set_entry_in_bucket(struct inode *inode,
ocfs2_xattr_bucket_journal_dirty(handle, xs->bucket);

out:
- ocfs2_commit_trans(osb, handle);
-
return ret;
}

static int ocfs2_xattr_value_update_size(struct inode *inode,
+ handle_t *handle,
struct buffer_head *xe_bh,
struct ocfs2_xattr_entry *xe,
u64 new_size)
{
int ret;
- struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
- handle_t *handle = NULL;
-
- handle = ocfs2_start_trans(osb, 1);
- if (IS_ERR(handle)) {
- ret = -ENOMEM;
- mlog_errno(ret);
- goto out;
- }

ret = ocfs2_journal_access(handle, inode, xe_bh,
OCFS2_JOURNAL_ACCESS_WRITE);
if (ret < 0) {
mlog_errno(ret);
- goto out_commit;
+ goto out;
}

xe->xe_value_size = cpu_to_le64(new_size);
@@ -4349,8 +4325,6 @@ static int ocfs2_xattr_value_update_size(struct inode *inode,
if (ret < 0)
mlog_errno(ret);

-out_commit:
- ocfs2_commit_trans(osb, handle);
out:
return ret;
}
@@ -4407,7 +4381,8 @@ static int ocfs2_xattr_bucket_value_truncate(struct inode *inode,
goto out;
}

- ret = ocfs2_xattr_value_update_size(inode, header_bh, xe, len);
+ ret = ocfs2_xattr_value_update_size(inode, ctxt->handle,
+ header_bh, xe, len);
if (ret) {
mlog_errno(ret);
goto out;
@@ -4439,6 +4414,7 @@ static int ocfs2_xattr_bucket_value_truncate_xs(struct inode *inode,
}

static int ocfs2_xattr_bucket_set_value_outside(struct inode *inode,
+ handle_t *handle,
struct ocfs2_xattr_search *xs,
char *val,
int value_len)
@@ -4454,7 +4430,8 @@ static int ocfs2_xattr_bucket_set_value_outside(struct inode *inode,

xv = (struct ocfs2_xattr_value_root *)(xs->base + offset);

- return __ocfs2_xattr_set_value_outside(inode, xv, val, value_len);
+ return __ocfs2_xattr_set_value_outside(inode, handle,
+ xv, val, value_len);
}

static int ocfs2_rm_xattr_cluster(struct inode *inode,
@@ -4547,27 +4524,19 @@ out:
}

static void ocfs2_xattr_bucket_remove_xs(struct inode *inode,
+ handle_t *handle,
struct ocfs2_xattr_search *xs)
{
- handle_t *handle = NULL;
struct ocfs2_xattr_header *xh = bucket_xh(xs->bucket);
struct ocfs2_xattr_entry *last = &xh->xh_entries[
le16_to_cpu(xh->xh_count) - 1];
int ret = 0;

- handle = ocfs2_start_trans((OCFS2_SB(inode->i_sb)),
- ocfs2_blocks_per_xattr_bucket(inode->i_sb));
- if (IS_ERR(handle)) {
- ret = PTR_ERR(handle);
- mlog_errno(ret);
- return;
- }
-
ret = ocfs2_xattr_bucket_journal_access(handle, xs->bucket,
OCFS2_JOURNAL_ACCESS_WRITE);
if (ret) {
mlog_errno(ret);
- goto out_commit;
+ return;
}

/* Remove the old entry. */
@@ -4577,9 +4546,6 @@ static void ocfs2_xattr_bucket_remove_xs(struct inode *inode,
le16_add_cpu(&xh->xh_count, -1);

ocfs2_xattr_bucket_journal_dirty(handle, xs->bucket);
-
-out_commit:
- ocfs2_commit_trans(OCFS2_SB(inode->i_sb), handle);
}

/*
@@ -4645,7 +4611,8 @@ static int ocfs2_xattr_set_in_bucket(struct inode *inode,
xi->value_len = OCFS2_XATTR_ROOT_SIZE;
}

- ret = ocfs2_xattr_set_entry_in_bucket(inode, xi, xs, name_hash, local);
+ ret = ocfs2_xattr_set_entry_in_bucket(inode, ctxt->handle, xi, xs,
+ name_hash, local);
if (ret) {
mlog_errno(ret);
goto out;
@@ -4666,13 +4633,14 @@ static int ocfs2_xattr_set_in_bucket(struct inode *inode,
* storage and we have allocated xattr already,
* so need to remove it.
*/
- ocfs2_xattr_bucket_remove_xs(inode, xs);
+ ocfs2_xattr_bucket_remove_xs(inode, ctxt->handle, xs);
}
goto out;
}

set_value_outside:
- ret = ocfs2_xattr_bucket_set_value_outside(inode, xs, val, value_len);
+ ret = ocfs2_xattr_bucket_set_value_outside(inode, ctxt->handle,
+ xs, val, value_len);
out:
return ret;
}
@@ -4785,7 +4753,8 @@ try_again:
* name/value will be moved, the xe shouldn't be changed
* in xs.
*/
- ret = ocfs2_defrag_xattr_bucket(inode, xs->bucket);
+ ret = ocfs2_defrag_xattr_bucket(inode, ctxt->handle,
+ xs->bucket);
if (ret) {
mlog_errno(ret);
goto out;
@@ -4865,6 +4834,13 @@ static int ocfs2_delete_xattr_in_bucket(struct inode *inode,

ocfs2_init_dealloc_ctxt(&ctxt.dealloc);

+ ctxt.handle = ocfs2_start_trans(osb, OCFS2_REMOVE_EXTENT_CREDITS);
+ if (IS_ERR(ctxt.handle)) {
+ ret = PTR_ERR(ctxt.handle);
+ mlog_errno(ret);
+ goto out;
+ }
+
for (i = 0; i < le16_to_cpu(xh->xh_count); i++) {
xe = &xh->xh_entries[i];
if (ocfs2_xattr_is_local(xe))
@@ -4879,9 +4855,10 @@ static int ocfs2_delete_xattr_in_bucket(struct inode *inode,
}
}

+ ret = ocfs2_commit_trans(osb, ctxt.handle);
ocfs2_schedule_truncate_log_flush(osb, 1);
ocfs2_run_deallocs(osb, &ctxt.dealloc);
-
+out:
return ret;
}

--
1.5.6

2008-12-19 22:01:55

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 22/45] ocfs2: move new inode allocation out of the transaction

From: Tiger Yang <[email protected]>

Move out inode allocation from ocfs2_mknod_locked() because
vfs_dq_init() must be called outside of a transaction.

Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Tiger Yang <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/namei.c | 108 ++++++++++++++++++++++++++++++++----------------------
1 files changed, 64 insertions(+), 44 deletions(-)

diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
index f4967e6..e9118fc 100644
--- a/fs/ocfs2/namei.c
+++ b/fs/ocfs2/namei.c
@@ -66,12 +66,12 @@

static int ocfs2_mknod_locked(struct ocfs2_super *osb,
struct inode *dir,
- struct dentry *dentry, int mode,
+ struct inode *inode,
+ struct dentry *dentry,
dev_t dev,
struct buffer_head **new_fe_bh,
struct buffer_head *parent_fe_bh,
handle_t *handle,
- struct inode **ret_inode,
struct ocfs2_alloc_context *inode_ac);

static int ocfs2_prepare_orphan_dir(struct ocfs2_super *osb,
@@ -186,6 +186,34 @@ bail:
return ret;
}

+static struct inode *ocfs2_get_init_inode(struct inode *dir, int mode)
+{
+ struct inode *inode;
+
+ inode = new_inode(dir->i_sb);
+ if (!inode) {
+ mlog(ML_ERROR, "new_inode failed!\n");
+ return NULL;
+ }
+
+ /* populate as many fields early on as possible - many of
+ * these are used by the support functions here and in
+ * callers. */
+ if (S_ISDIR(mode))
+ inode->i_nlink = 2;
+ else
+ inode->i_nlink = 1;
+ inode->i_uid = current->fsuid;
+ if (dir->i_mode & S_ISGID) {
+ inode->i_gid = dir->i_gid;
+ if (S_ISDIR(mode))
+ mode |= S_ISGID;
+ } else
+ inode->i_gid = current->fsgid;
+ inode->i_mode = mode;
+ return inode;
+}
+
static int ocfs2_mknod(struct inode *dir,
struct dentry *dentry,
int mode,
@@ -250,6 +278,13 @@ static int ocfs2_mknod(struct inode *dir,
goto leave;
}

+ inode = ocfs2_get_init_inode(dir, mode);
+ if (!inode) {
+ status = -ENOMEM;
+ mlog_errno(status);
+ goto leave;
+ }
+
/* Reserve a cluster if creating an extent based directory. */
if (S_ISDIR(mode) && !ocfs2_supports_inline_data(osb)) {
status = ocfs2_reserve_clusters(osb, 1, &data_ac);
@@ -269,9 +304,9 @@ static int ocfs2_mknod(struct inode *dir,
}

/* do the real work now. */
- status = ocfs2_mknod_locked(osb, dir, dentry, mode, dev,
+ status = ocfs2_mknod_locked(osb, dir, inode, dentry, dev,
&new_fe_bh, parent_fe_bh, handle,
- &inode, inode_ac);
+ inode_ac);
if (status < 0) {
mlog_errno(status);
goto leave;
@@ -332,8 +367,10 @@ leave:
brelse(de_bh);
brelse(parent_fe_bh);

- if ((status < 0) && inode)
+ if ((status < 0) && inode) {
+ clear_nlink(inode);
iput(inode);
+ }

if (inode_ac)
ocfs2_free_alloc_context(inode_ac);
@@ -348,12 +385,12 @@ leave:

static int ocfs2_mknod_locked(struct ocfs2_super *osb,
struct inode *dir,
- struct dentry *dentry, int mode,
+ struct inode *inode,
+ struct dentry *dentry,
dev_t dev,
struct buffer_head **new_fe_bh,
struct buffer_head *parent_fe_bh,
handle_t *handle,
- struct inode **ret_inode,
struct ocfs2_alloc_context *inode_ac)
{
int status = 0;
@@ -361,14 +398,12 @@ static int ocfs2_mknod_locked(struct ocfs2_super *osb,
struct ocfs2_extent_list *fel;
u64 fe_blkno = 0;
u16 suballoc_bit;
- struct inode *inode = NULL;

- mlog_entry("(0x%p, 0x%p, %d, %lu, '%.*s')\n", dir, dentry, mode,
- (unsigned long)dev, dentry->d_name.len,
+ mlog_entry("(0x%p, 0x%p, %d, %lu, '%.*s')\n", dir, dentry,
+ inode->i_mode, (unsigned long)dev, dentry->d_name.len,
dentry->d_name.name);

*new_fe_bh = NULL;
- *ret_inode = NULL;

status = ocfs2_claim_new_inode(osb, handle, inode_ac, &suballoc_bit,
&fe_blkno);
@@ -377,23 +412,11 @@ static int ocfs2_mknod_locked(struct ocfs2_super *osb,
goto leave;
}

- inode = new_inode(dir->i_sb);
- if (!inode) {
- status = -ENOMEM;
- mlog(ML_ERROR, "new_inode failed!\n");
- goto leave;
- }
-
/* populate as many fields early on as possible - many of
* these are used by the support functions here and in
* callers. */
inode->i_ino = ino_from_blkno(osb->sb, fe_blkno);
OCFS2_I(inode)->ip_blkno = fe_blkno;
- if (S_ISDIR(mode))
- inode->i_nlink = 2;
- else
- inode->i_nlink = 1;
- inode->i_mode = mode;
spin_lock(&osb->osb_lock);
inode->i_generation = osb->s_next_generation++;
spin_unlock(&osb->osb_lock);
@@ -421,17 +444,11 @@ static int ocfs2_mknod_locked(struct ocfs2_super *osb,
fe->i_blkno = cpu_to_le64(fe_blkno);
fe->i_suballoc_bit = cpu_to_le16(suballoc_bit);
fe->i_suballoc_slot = cpu_to_le16(inode_ac->ac_alloc_slot);
- fe->i_uid = cpu_to_le32(current->fsuid);
- if (dir->i_mode & S_ISGID) {
- fe->i_gid = cpu_to_le32(dir->i_gid);
- if (S_ISDIR(mode))
- mode |= S_ISGID;
- } else
- fe->i_gid = cpu_to_le32(current->fsgid);
- fe->i_mode = cpu_to_le16(mode);
- if (S_ISCHR(mode) || S_ISBLK(mode))
+ fe->i_uid = cpu_to_le32(inode->i_uid);
+ fe->i_gid = cpu_to_le32(inode->i_gid);
+ fe->i_mode = cpu_to_le16(inode->i_mode);
+ if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode))
fe->id1.dev1.i_rdev = cpu_to_le64(huge_encode_dev(dev));
-
fe->i_links_count = cpu_to_le16(inode->i_nlink);

fe->i_last_eb_blk = 0;
@@ -446,7 +463,7 @@ static int ocfs2_mknod_locked(struct ocfs2_super *osb,
/*
* If supported, directories start with inline data.
*/
- if (S_ISDIR(mode) && ocfs2_supports_inline_data(osb)) {
+ if (S_ISDIR(inode->i_mode) && ocfs2_supports_inline_data(osb)) {
u16 feat = le16_to_cpu(fe->i_dyn_features);

fe->i_dyn_features = cpu_to_le16(feat | OCFS2_INLINE_DATA_FL);
@@ -484,17 +501,12 @@ static int ocfs2_mknod_locked(struct ocfs2_super *osb,
status = 0; /* error in ocfs2_create_new_inode_locks is not
* critical */

- *ret_inode = inode;
leave:
if (status < 0) {
if (*new_fe_bh) {
brelse(*new_fe_bh);
*new_fe_bh = NULL;
}
- if (inode) {
- clear_nlink(inode);
- iput(inode);
- }
}

mlog_exit(status);
@@ -1542,6 +1554,13 @@ static int ocfs2_symlink(struct inode *dir,
goto bail;
}

+ inode = ocfs2_get_init_inode(dir, S_IFLNK | S_IRWXUGO);
+ if (!inode) {
+ status = -ENOMEM;
+ mlog_errno(status);
+ goto bail;
+ }
+
/* don't reserve bitmap space for fast symlinks. */
if (l > ocfs2_fast_symlink_chars(sb)) {
status = ocfs2_reserve_clusters(osb, 1, &data_ac);
@@ -1560,10 +1579,9 @@ static int ocfs2_symlink(struct inode *dir,
goto bail;
}

- status = ocfs2_mknod_locked(osb, dir, dentry,
- S_IFLNK | S_IRWXUGO, 0,
- &new_fe_bh, parent_fe_bh, handle,
- &inode, inode_ac);
+ status = ocfs2_mknod_locked(osb, dir, inode, dentry,
+ 0, &new_fe_bh, parent_fe_bh, handle,
+ inode_ac);
if (status < 0) {
mlog_errno(status);
goto bail;
@@ -1644,8 +1662,10 @@ bail:
ocfs2_free_alloc_context(inode_ac);
if (data_ac)
ocfs2_free_alloc_context(data_ac);
- if ((status < 0) && inode)
+ if ((status < 0) && inode) {
+ clear_nlink(inode);
iput(inode);
+ }

mlog_exit(status);

--
1.5.6

2008-12-19 22:01:39

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 21/45] ocfs2: turn __ocfs2_remove_inode_range() into ocfs2_remove_btree_range()

This patch genericizes the high level handling of extent removal.
ocfs2_remove_btree_range() is nearly identical to
__ocfs2_remove_inode_range(), except that extent tree operations have been
used where necessary. We update ocfs2_remove_inode_range() to use the
generic helper. Now extent tree based structures have an easy way to
truncate ranges.

Signed-off-by: Mark Fasheh <[email protected]>
Acked-by: Joel Becker <[email protected]>
---
fs/ocfs2/alloc.c | 72 +++++++++++++++++++++++++++++++++++++++++++++
fs/ocfs2/alloc.h | 5 +++
fs/ocfs2/file.c | 85 +++--------------------------------------------------
3 files changed, 82 insertions(+), 80 deletions(-)

diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 4614614..5592a2f 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -5255,6 +5255,78 @@ out:
return ret;
}

+int ocfs2_remove_btree_range(struct inode *inode,
+ struct ocfs2_extent_tree *et,
+ u32 cpos, u32 phys_cpos, u32 len,
+ struct ocfs2_cached_dealloc_ctxt *dealloc)
+{
+ int ret;
+ u64 phys_blkno = ocfs2_clusters_to_blocks(inode->i_sb, phys_cpos);
+ struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+ struct inode *tl_inode = osb->osb_tl_inode;
+ handle_t *handle;
+ struct ocfs2_alloc_context *meta_ac = NULL;
+
+ ret = ocfs2_lock_allocators(inode, et, 0, 1, NULL, &meta_ac);
+ if (ret) {
+ mlog_errno(ret);
+ return ret;
+ }
+
+ mutex_lock(&tl_inode->i_mutex);
+
+ if (ocfs2_truncate_log_needs_flush(osb)) {
+ ret = __ocfs2_flush_truncate_log(osb);
+ if (ret < 0) {
+ mlog_errno(ret);
+ goto out;
+ }
+ }
+
+ handle = ocfs2_start_trans(osb, OCFS2_REMOVE_EXTENT_CREDITS);
+ if (IS_ERR(handle)) {
+ ret = PTR_ERR(handle);
+ mlog_errno(ret);
+ goto out;
+ }
+
+ ret = ocfs2_journal_access(handle, inode, et->et_root_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
+ if (ret) {
+ mlog_errno(ret);
+ goto out;
+ }
+
+ ret = ocfs2_remove_extent(inode, et, cpos, len, handle, meta_ac,
+ dealloc);
+ if (ret) {
+ mlog_errno(ret);
+ goto out_commit;
+ }
+
+ ocfs2_et_update_clusters(inode, et, -len);
+
+ ret = ocfs2_journal_dirty(handle, et->et_root_bh);
+ if (ret) {
+ mlog_errno(ret);
+ goto out_commit;
+ }
+
+ ret = ocfs2_truncate_log_append(osb, handle, phys_blkno, len);
+ if (ret)
+ mlog_errno(ret);
+
+out_commit:
+ ocfs2_commit_trans(osb, handle);
+out:
+ mutex_unlock(&tl_inode->i_mutex);
+
+ if (meta_ac)
+ ocfs2_free_alloc_context(meta_ac);
+
+ return ret;
+}
+
int ocfs2_truncate_log_needs_flush(struct ocfs2_super *osb)
{
struct buffer_head *tl_bh = osb->osb_tl_bh;
diff --git a/fs/ocfs2/alloc.h b/fs/ocfs2/alloc.h
index 3eb735e..0fbf8fc 100644
--- a/fs/ocfs2/alloc.h
+++ b/fs/ocfs2/alloc.h
@@ -110,6 +110,11 @@ int ocfs2_remove_extent(struct inode *inode,
u32 cpos, u32 len, handle_t *handle,
struct ocfs2_alloc_context *meta_ac,
struct ocfs2_cached_dealloc_ctxt *dealloc);
+int ocfs2_remove_btree_range(struct inode *inode,
+ struct ocfs2_extent_tree *et,
+ u32 cpos, u32 phys_cpos, u32 len,
+ struct ocfs2_cached_dealloc_ctxt *dealloc);
+
int ocfs2_num_free_extents(struct ocfs2_super *osb,
struct inode *inode,
struct ocfs2_extent_tree *et);
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index e2570a3..3605491 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1226,83 +1226,6 @@ out:
return ret;
}

-static int __ocfs2_remove_inode_range(struct inode *inode,
- struct buffer_head *di_bh,
- u32 cpos, u32 phys_cpos, u32 len,
- struct ocfs2_cached_dealloc_ctxt *dealloc)
-{
- int ret;
- u64 phys_blkno = ocfs2_clusters_to_blocks(inode->i_sb, phys_cpos);
- struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
- struct inode *tl_inode = osb->osb_tl_inode;
- handle_t *handle;
- struct ocfs2_alloc_context *meta_ac = NULL;
- struct ocfs2_dinode *di = (struct ocfs2_dinode *)di_bh->b_data;
- struct ocfs2_extent_tree et;
-
- ocfs2_init_dinode_extent_tree(&et, inode, di_bh);
-
- ret = ocfs2_lock_allocators(inode, &et, 0, 1, NULL, &meta_ac);
- if (ret) {
- mlog_errno(ret);
- return ret;
- }
-
- mutex_lock(&tl_inode->i_mutex);
-
- if (ocfs2_truncate_log_needs_flush(osb)) {
- ret = __ocfs2_flush_truncate_log(osb);
- if (ret < 0) {
- mlog_errno(ret);
- goto out;
- }
- }
-
- handle = ocfs2_start_trans(osb, OCFS2_REMOVE_EXTENT_CREDITS);
- if (IS_ERR(handle)) {
- ret = PTR_ERR(handle);
- mlog_errno(ret);
- goto out;
- }
-
- ret = ocfs2_journal_access(handle, inode, di_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
- if (ret) {
- mlog_errno(ret);
- goto out;
- }
-
- ret = ocfs2_remove_extent(inode, &et, cpos, len, handle, meta_ac,
- dealloc);
- if (ret) {
- mlog_errno(ret);
- goto out_commit;
- }
-
- OCFS2_I(inode)->ip_clusters -= len;
- di->i_clusters = cpu_to_le32(OCFS2_I(inode)->ip_clusters);
-
- ret = ocfs2_journal_dirty(handle, di_bh);
- if (ret) {
- mlog_errno(ret);
- goto out_commit;
- }
-
- ret = ocfs2_truncate_log_append(osb, handle, phys_blkno, len);
- if (ret)
- mlog_errno(ret);
-
-out_commit:
- ocfs2_commit_trans(osb, handle);
-out:
- mutex_unlock(&tl_inode->i_mutex);
-
- if (meta_ac)
- ocfs2_free_alloc_context(meta_ac);
-
- return ret;
-}
-
/*
* Truncate a byte range, avoiding pages within partial clusters. This
* preserves those pages for the zeroing code to write to.
@@ -1402,7 +1325,9 @@ static int ocfs2_remove_inode_range(struct inode *inode,
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
struct ocfs2_cached_dealloc_ctxt dealloc;
struct address_space *mapping = inode->i_mapping;
+ struct ocfs2_extent_tree et;

+ ocfs2_init_dinode_extent_tree(&et, inode, di_bh);
ocfs2_init_dealloc_ctxt(&dealloc);

if (byte_len == 0)
@@ -1458,9 +1383,9 @@ static int ocfs2_remove_inode_range(struct inode *inode,

/* Only do work for non-holes */
if (phys_cpos != 0) {
- ret = __ocfs2_remove_inode_range(inode, di_bh, cpos,
- phys_cpos, alloc_size,
- &dealloc);
+ ret = ocfs2_remove_btree_range(inode, &et, cpos,
+ phys_cpos, alloc_size,
+ &dealloc);
if (ret) {
mlog_errno(ret);
goto out;
--
1.5.6

2008-12-19 22:02:19

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 23/45] ocfs2: add ocfs2_xattr_set_handle

From: Tiger Yang <[email protected]>

This function is used to set xattr's in a started transaction. It is only
called during inode creation inode for initial security/acl xattrs of the
new inode. These xattrs could be put into ibody or extent block, so xattr
bucket would not be use in this case.

Signed-off-by: Tiger Yang <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/ocfs2/xattr.h | 4 +++
2 files changed, 72 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 7a90892..6480254 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -2326,6 +2326,74 @@ out:
}

/*
+ * This function only called duing creating inode
+ * for init security/acl xattrs of the new inode.
+ * The xattrs could be put into ibody or extent block,
+ * xattr bucket would not be use in this case.
+ * transanction credits also be reserved in here.
+ */
+int ocfs2_xattr_set_handle(handle_t *handle,
+ struct inode *inode,
+ struct buffer_head *di_bh,
+ int name_index,
+ const char *name,
+ const void *value,
+ size_t value_len,
+ int flags,
+ struct ocfs2_alloc_context *meta_ac,
+ struct ocfs2_alloc_context *data_ac)
+{
+ struct ocfs2_dinode *di;
+ int ret;
+
+ struct ocfs2_xattr_info xi = {
+ .name_index = name_index,
+ .name = name,
+ .value = value,
+ .value_len = value_len,
+ };
+
+ struct ocfs2_xattr_search xis = {
+ .not_found = -ENODATA,
+ };
+
+ struct ocfs2_xattr_search xbs = {
+ .not_found = -ENODATA,
+ };
+
+ struct ocfs2_xattr_set_ctxt ctxt = {
+ .handle = handle,
+ .meta_ac = meta_ac,
+ .data_ac = data_ac,
+ };
+
+ if (!ocfs2_supports_xattr(OCFS2_SB(inode->i_sb)))
+ return -EOPNOTSUPP;
+
+ xis.inode_bh = xbs.inode_bh = di_bh;
+ di = (struct ocfs2_dinode *)di_bh->b_data;
+
+ down_write(&OCFS2_I(inode)->ip_xattr_sem);
+
+ ret = ocfs2_xattr_ibody_find(inode, name_index, name, &xis);
+ if (ret)
+ goto cleanup;
+ if (xis.not_found) {
+ ret = ocfs2_xattr_block_find(inode, name_index, name, &xbs);
+ if (ret)
+ goto cleanup;
+ }
+
+ ret = __ocfs2_xattr_set_handle(inode, di, &xi, &xis, &xbs, &ctxt);
+
+cleanup:
+ up_write(&OCFS2_I(inode)->ip_xattr_sem);
+ brelse(xbs.xattr_bh);
+
+ return ret;
+}
+
+/*
* ocfs2_xattr_set()
*
* Set, replace or remove an extended attribute for this inode.
diff --git a/fs/ocfs2/xattr.h b/fs/ocfs2/xattr.h
index 1d8314c..8fbdc16 100644
--- a/fs/ocfs2/xattr.h
+++ b/fs/ocfs2/xattr.h
@@ -37,6 +37,10 @@ extern struct xattr_handler *ocfs2_xattr_handlers[];
ssize_t ocfs2_listxattr(struct dentry *, char *, size_t);
int ocfs2_xattr_set(struct inode *, int, const char *, const void *,
size_t, int);
+int ocfs2_xattr_set_handle(handle_t *, struct inode *, struct buffer_head *,
+ int, const char *, const void *, size_t, int,
+ struct ocfs2_alloc_context *,
+ struct ocfs2_alloc_context *);
int ocfs2_xattr_remove(struct inode *, struct buffer_head *);

#endif /* OCFS2_XATTR_H */
--
1.5.6

2008-12-19 22:02:35

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 24/45] ocfs2: add security xattr API

From: Tiger Yang <[email protected]>

This patch add security xattr set/get/list APIs to
support security attributes in Ocfs2.

Signed-off-by: Tiger Yang <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
fs/ocfs2/xattr.h | 1 +
2 files changed, 48 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 6480254..db03162 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -35,6 +35,7 @@
#include <linux/init.h>
#include <linux/module.h>
#include <linux/string.h>
+#include <linux/security.h>

#define MLOG_MASK_PREFIX ML_XATTR
#include <cluster/masklog.h>
@@ -88,12 +89,14 @@ static struct ocfs2_xattr_def_value_root def_xv = {
struct xattr_handler *ocfs2_xattr_handlers[] = {
&ocfs2_xattr_user_handler,
&ocfs2_xattr_trusted_handler,
+ &ocfs2_xattr_security_handler,
NULL
};

static struct xattr_handler *ocfs2_xattr_handler_map[OCFS2_XATTR_MAX] = {
[OCFS2_XATTR_INDEX_USER] = &ocfs2_xattr_user_handler,
[OCFS2_XATTR_INDEX_TRUSTED] = &ocfs2_xattr_trusted_handler,
+ [OCFS2_XATTR_INDEX_SECURITY] = &ocfs2_xattr_security_handler,
};

struct ocfs2_xattr_info {
@@ -4977,6 +4980,50 @@ out:
}

/*
+ * 'security' attributes support
+ */
+static size_t ocfs2_xattr_security_list(struct inode *inode, char *list,
+ size_t list_size, const char *name,
+ size_t name_len)
+{
+ const size_t prefix_len = XATTR_SECURITY_PREFIX_LEN;
+ const size_t total_len = prefix_len + name_len + 1;
+
+ if (list && total_len <= list_size) {
+ memcpy(list, XATTR_SECURITY_PREFIX, prefix_len);
+ memcpy(list + prefix_len, name, name_len);
+ list[prefix_len + name_len] = '\0';
+ }
+ return total_len;
+}
+
+static int ocfs2_xattr_security_get(struct inode *inode, const char *name,
+ void *buffer, size_t size)
+{
+ if (strcmp(name, "") == 0)
+ return -EINVAL;
+ return ocfs2_xattr_get(inode, OCFS2_XATTR_INDEX_SECURITY, name,
+ buffer, size);
+}
+
+static int ocfs2_xattr_security_set(struct inode *inode, const char *name,
+ const void *value, size_t size, int flags)
+{
+ if (strcmp(name, "") == 0)
+ return -EINVAL;
+
+ return ocfs2_xattr_set(inode, OCFS2_XATTR_INDEX_SECURITY, name, value,
+ size, flags);
+}
+
+struct xattr_handler ocfs2_xattr_security_handler = {
+ .prefix = XATTR_SECURITY_PREFIX,
+ .list = ocfs2_xattr_security_list,
+ .get = ocfs2_xattr_security_get,
+ .set = ocfs2_xattr_security_set,
+};
+
+/*
* 'trusted' attributes support
*/
static size_t ocfs2_xattr_trusted_list(struct inode *inode, char *list,
diff --git a/fs/ocfs2/xattr.h b/fs/ocfs2/xattr.h
index 8fbdc16..55c5256 100644
--- a/fs/ocfs2/xattr.h
+++ b/fs/ocfs2/xattr.h
@@ -32,6 +32,7 @@ enum ocfs2_xattr_type {

extern struct xattr_handler ocfs2_xattr_user_handler;
extern struct xattr_handler ocfs2_xattr_trusted_handler;
+extern struct xattr_handler ocfs2_xattr_security_handler;
extern struct xattr_handler *ocfs2_xattr_handlers[];

ssize_t ocfs2_listxattr(struct dentry *, char *, size_t);
--
1.5.6

2008-12-19 22:02:49

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 25/45] ocfs2: add ocfs2_init_security in during file create

From: Tiger Yang <[email protected]>

Security attributes must be set when creating a new inode.

We do this in three steps.

- First, get security xattr's name and value by security_operation

- Calculate and reserve the meta data and clusters needed by this security
xattr before starting transaction

- Finally, we set it before add_entry

Signed-off-by: Tiger Yang <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/namei.c | 107 ++++++++++++++++++++++++++++++++++++++++++++++++------
fs/ocfs2/xattr.c | 70 +++++++++++++++++++++++++++++++++++
fs/ocfs2/xattr.h | 17 +++++++++
3 files changed, 182 insertions(+), 12 deletions(-)

diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
index e9118fc..b361aec 100644
--- a/fs/ocfs2/namei.c
+++ b/fs/ocfs2/namei.c
@@ -229,6 +229,12 @@ static int ocfs2_mknod(struct inode *dir,
struct inode *inode = NULL;
struct ocfs2_alloc_context *inode_ac = NULL;
struct ocfs2_alloc_context *data_ac = NULL;
+ struct ocfs2_alloc_context *xattr_ac = NULL;
+ int want_clusters = 0;
+ int xattr_credits = 0;
+ struct ocfs2_security_xattr_info si = {
+ .enable = 1,
+ };

mlog_entry("(0x%p, 0x%p, %d, %lu, '%.*s')\n", dir, dentry, mode,
(unsigned long)dev, dentry->d_name.len,
@@ -285,17 +291,39 @@ static int ocfs2_mknod(struct inode *dir,
goto leave;
}

- /* Reserve a cluster if creating an extent based directory. */
- if (S_ISDIR(mode) && !ocfs2_supports_inline_data(osb)) {
- status = ocfs2_reserve_clusters(osb, 1, &data_ac);
+ /* get security xattr */
+ status = ocfs2_init_security_get(inode, dir, &si);
+ if (status) {
+ if (status == -EOPNOTSUPP)
+ si.enable = 0;
+ else {
+ mlog_errno(status);
+ goto leave;
+ }
+ }
+
+ /* calculate meta data/clusters for setting security xattr */
+ if (si.enable) {
+ status = ocfs2_calc_security_init(dir, &si, &want_clusters,
+ &xattr_credits, &xattr_ac);
if (status < 0) {
- if (status != -ENOSPC)
- mlog_errno(status);
+ mlog_errno(status);
goto leave;
}
}

- handle = ocfs2_start_trans(osb, OCFS2_MKNOD_CREDITS);
+ /* Reserve a cluster if creating an extent based directory. */
+ if (S_ISDIR(mode) && !ocfs2_supports_inline_data(osb))
+ want_clusters += 1;
+
+ status = ocfs2_reserve_clusters(osb, want_clusters, &data_ac);
+ if (status < 0) {
+ if (status != -ENOSPC)
+ mlog_errno(status);
+ goto leave;
+ }
+
+ handle = ocfs2_start_trans(osb, OCFS2_MKNOD_CREDITS + xattr_credits);
if (IS_ERR(handle)) {
status = PTR_ERR(handle);
handle = NULL;
@@ -335,6 +363,15 @@ static int ocfs2_mknod(struct inode *dir,
inc_nlink(dir);
}

+ if (si.enable) {
+ status = ocfs2_init_security_set(handle, inode, new_fe_bh, &si,
+ xattr_ac, data_ac);
+ if (status < 0) {
+ mlog_errno(status);
+ goto leave;
+ }
+ }
+
status = ocfs2_add_entry(handle, dentry, inode,
OCFS2_I(inode)->ip_blkno, parent_fe_bh,
de_bh);
@@ -366,6 +403,8 @@ leave:
brelse(new_fe_bh);
brelse(de_bh);
brelse(parent_fe_bh);
+ kfree(si.name);
+ kfree(si.value);

if ((status < 0) && inode) {
clear_nlink(inode);
@@ -378,6 +417,9 @@ leave:
if (data_ac)
ocfs2_free_alloc_context(data_ac);

+ if (xattr_ac)
+ ocfs2_free_alloc_context(xattr_ac);
+
mlog_exit(status);

return status;
@@ -1508,6 +1550,12 @@ static int ocfs2_symlink(struct inode *dir,
handle_t *handle = NULL;
struct ocfs2_alloc_context *inode_ac = NULL;
struct ocfs2_alloc_context *data_ac = NULL;
+ struct ocfs2_alloc_context *xattr_ac = NULL;
+ int want_clusters = 0;
+ int xattr_credits = 0;
+ struct ocfs2_security_xattr_info si = {
+ .enable = 1,
+ };

mlog_entry("(0x%p, 0x%p, symname='%s' actual='%.*s')\n", dir,
dentry, symname, dentry->d_name.len, dentry->d_name.name);
@@ -1561,17 +1609,39 @@ static int ocfs2_symlink(struct inode *dir,
goto bail;
}

- /* don't reserve bitmap space for fast symlinks. */
- if (l > ocfs2_fast_symlink_chars(sb)) {
- status = ocfs2_reserve_clusters(osb, 1, &data_ac);
+ /* get security xattr */
+ status = ocfs2_init_security_get(inode, dir, &si);
+ if (status) {
+ if (status == -EOPNOTSUPP)
+ si.enable = 0;
+ else {
+ mlog_errno(status);
+ goto bail;
+ }
+ }
+
+ /* calculate meta data/clusters for setting security xattr */
+ if (si.enable) {
+ status = ocfs2_calc_security_init(dir, &si, &want_clusters,
+ &xattr_credits, &xattr_ac);
if (status < 0) {
- if (status != -ENOSPC)
- mlog_errno(status);
+ mlog_errno(status);
goto bail;
}
}

- handle = ocfs2_start_trans(osb, credits);
+ /* don't reserve bitmap space for fast symlinks. */
+ if (l > ocfs2_fast_symlink_chars(sb))
+ want_clusters += 1;
+
+ status = ocfs2_reserve_clusters(osb, want_clusters, &data_ac);
+ if (status < 0) {
+ if (status != -ENOSPC)
+ mlog_errno(status);
+ goto bail;
+ }
+
+ handle = ocfs2_start_trans(osb, credits + xattr_credits);
if (IS_ERR(handle)) {
status = PTR_ERR(handle);
handle = NULL;
@@ -1632,6 +1702,15 @@ static int ocfs2_symlink(struct inode *dir,
}
}

+ if (si.enable) {
+ status = ocfs2_init_security_set(handle, inode, new_fe_bh, &si,
+ xattr_ac, data_ac);
+ if (status < 0) {
+ mlog_errno(status);
+ goto bail;
+ }
+ }
+
status = ocfs2_add_entry(handle, dentry, inode,
le64_to_cpu(fe->i_blkno), parent_fe_bh,
de_bh);
@@ -1658,10 +1737,14 @@ bail:
brelse(new_fe_bh);
brelse(parent_fe_bh);
brelse(de_bh);
+ kfree(si.name);
+ kfree(si.value);
if (inode_ac)
ocfs2_free_alloc_context(inode_ac);
if (data_ac)
ocfs2_free_alloc_context(data_ac);
+ if (xattr_ac)
+ ocfs2_free_alloc_context(xattr_ac);
if ((status < 0) && inode) {
clear_nlink(inode);
iput(inode);
diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index db03162..2cab0d6 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -81,6 +81,9 @@ struct ocfs2_xattr_set_ctxt {

#define OCFS2_XATTR_ROOT_SIZE (sizeof(struct ocfs2_xattr_def_value_root))
#define OCFS2_XATTR_INLINE_SIZE 80
+#define OCFS2_XATTR_FREE_IN_IBODY (OCFS2_MIN_XATTR_INLINE_SIZE \
+ - sizeof(struct ocfs2_xattr_header) \
+ - sizeof(__u32))

static struct ocfs2_xattr_def_value_root def_xv = {
.xv.xr_list.l_count = cpu_to_le16(1),
@@ -343,6 +346,52 @@ static void ocfs2_xattr_hash_entry(struct inode *inode,
return;
}

+static int ocfs2_xattr_entry_real_size(int name_len, size_t value_len)
+{
+ int size = 0;
+
+ if (value_len <= OCFS2_XATTR_INLINE_SIZE)
+ size = OCFS2_XATTR_SIZE(name_len) + OCFS2_XATTR_SIZE(value_len);
+ else
+ size = OCFS2_XATTR_SIZE(name_len) + OCFS2_XATTR_ROOT_SIZE;
+ size += sizeof(struct ocfs2_xattr_entry);
+
+ return size;
+}
+
+int ocfs2_calc_security_init(struct inode *dir,
+ struct ocfs2_security_xattr_info *si,
+ int *want_clusters,
+ int *xattr_credits,
+ struct ocfs2_alloc_context **xattr_ac)
+{
+ int ret = 0;
+ struct ocfs2_super *osb = OCFS2_SB(dir->i_sb);
+ int s_size = ocfs2_xattr_entry_real_size(strlen(si->name),
+ si->value_len);
+
+ /*
+ * The max space of security xattr taken inline is
+ * 256(name) + 80(value) + 16(entry) = 352 bytes,
+ * So reserve one metadata block for it is ok.
+ */
+ if (dir->i_sb->s_blocksize == OCFS2_MIN_BLOCKSIZE ||
+ s_size > OCFS2_XATTR_FREE_IN_IBODY) {
+ ret = ocfs2_reserve_new_metadata_blocks(osb, 1, xattr_ac);
+ if (ret) {
+ mlog_errno(ret);
+ return ret;
+ }
+ *xattr_credits += OCFS2_XATTR_BLOCK_CREATE_CREDITS;
+ }
+
+ /* reserve clusters for xattr value which will be set in B tree*/
+ if (si->value_len > OCFS2_XATTR_INLINE_SIZE)
+ *want_clusters += ocfs2_clusters_for_bytes(dir->i_sb,
+ si->value_len);
+ return ret;
+}
+
static int ocfs2_xattr_extend_allocation(struct inode *inode,
u32 clusters_to_add,
struct buffer_head *xattr_bh,
@@ -5016,6 +5065,27 @@ static int ocfs2_xattr_security_set(struct inode *inode, const char *name,
size, flags);
}

+int ocfs2_init_security_get(struct inode *inode,
+ struct inode *dir,
+ struct ocfs2_security_xattr_info *si)
+{
+ return security_inode_init_security(inode, dir, &si->name, &si->value,
+ &si->value_len);
+}
+
+int ocfs2_init_security_set(handle_t *handle,
+ struct inode *inode,
+ struct buffer_head *di_bh,
+ struct ocfs2_security_xattr_info *si,
+ struct ocfs2_alloc_context *xattr_ac,
+ struct ocfs2_alloc_context *data_ac)
+{
+ return ocfs2_xattr_set_handle(handle, inode, di_bh,
+ OCFS2_XATTR_INDEX_SECURITY,
+ si->name, si->value, si->value_len, 0,
+ xattr_ac, data_ac);
+}
+
struct xattr_handler ocfs2_xattr_security_handler = {
.prefix = XATTR_SECURITY_PREFIX,
.list = ocfs2_xattr_security_list,
diff --git a/fs/ocfs2/xattr.h b/fs/ocfs2/xattr.h
index 55c5256..188ef6b 100644
--- a/fs/ocfs2/xattr.h
+++ b/fs/ocfs2/xattr.h
@@ -30,6 +30,13 @@ enum ocfs2_xattr_type {
OCFS2_XATTR_MAX
};

+struct ocfs2_security_xattr_info {
+ int enable;
+ char *name;
+ void *value;
+ size_t value_len;
+};
+
extern struct xattr_handler ocfs2_xattr_user_handler;
extern struct xattr_handler ocfs2_xattr_trusted_handler;
extern struct xattr_handler ocfs2_xattr_security_handler;
@@ -43,5 +50,15 @@ int ocfs2_xattr_set_handle(handle_t *, struct inode *, struct buffer_head *,
struct ocfs2_alloc_context *,
struct ocfs2_alloc_context *);
int ocfs2_xattr_remove(struct inode *, struct buffer_head *);
+int ocfs2_init_security_get(struct inode *, struct inode *,
+ struct ocfs2_security_xattr_info *);
+int ocfs2_init_security_set(handle_t *, struct inode *,
+ struct buffer_head *,
+ struct ocfs2_security_xattr_info *,
+ struct ocfs2_alloc_context *,
+ struct ocfs2_alloc_context *);
+int ocfs2_calc_security_init(struct inode *,
+ struct ocfs2_security_xattr_info *,
+ int *, int *, struct ocfs2_alloc_context **);

#endif /* OCFS2_XATTR_H */
--
1.5.6

2008-12-19 22:03:37

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 27/45] ocfs2: add POSIX ACL API

From: Tiger Yang <[email protected]>

This patch adds POSIX ACL(access control lists) APIs in ocfs2. We convert
struct posix_acl to many ocfs2_acl_entry and regard them as an extended
attribute entry.

Signed-off-by: Tiger Yang <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/Makefile | 4 +
fs/ocfs2/acl.c | 378 +++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/ocfs2/acl.h | 29 ++++
fs/ocfs2/ocfs2.h | 1 +
fs/ocfs2/xattr.c | 10 ++
fs/ocfs2/xattr.h | 4 +
6 files changed, 426 insertions(+), 0 deletions(-)
create mode 100644 fs/ocfs2/acl.c
create mode 100644 fs/ocfs2/acl.h

diff --git a/fs/ocfs2/Makefile b/fs/ocfs2/Makefile
index 589dcdf..e9ef5d1 100644
--- a/fs/ocfs2/Makefile
+++ b/fs/ocfs2/Makefile
@@ -37,6 +37,10 @@ ocfs2-objs := \
ver.o \
xattr.o

+ifeq ($(CONFIG_OCFS2_FS_POSIX_ACL),y)
+ocfs2-objs += acl.o
+endif
+
ocfs2_stackglue-objs := stackglue.o
ocfs2_stack_o2cb-objs := stack_o2cb.o
ocfs2_stack_user-objs := stack_user.o
diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
new file mode 100644
index 0000000..62d0faa
--- /dev/null
+++ b/fs/ocfs2/acl.c
@@ -0,0 +1,378 @@
+/* -*- mode: c; c-basic-offset: 8; -*-
+ * vim: noexpandtab sw=8 ts=8 sts=0:
+ *
+ * acl.c
+ *
+ * Copyright (C) 2004, 2008 Oracle. All rights reserved.
+ *
+ * CREDITS:
+ * Lots of code in this file is copy from linux/fs/ext3/acl.c.
+ * Copyright (C) 2001-2003 Andreas Gruenbacher, <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/string.h>
+
+#define MLOG_MASK_PREFIX ML_INODE
+#include <cluster/masklog.h>
+
+#include "ocfs2.h"
+#include "alloc.h"
+#include "dlmglue.h"
+#include "file.h"
+#include "ocfs2_fs.h"
+
+#include "xattr.h"
+#include "acl.h"
+
+/*
+ * Convert from xattr value to acl struct.
+ */
+static struct posix_acl *ocfs2_acl_from_xattr(const void *value, size_t size)
+{
+ int n, count;
+ struct posix_acl *acl;
+
+ if (!value)
+ return NULL;
+ if (size < sizeof(struct posix_acl_entry))
+ return ERR_PTR(-EINVAL);
+
+ count = size / sizeof(struct posix_acl_entry);
+ if (count < 0)
+ return ERR_PTR(-EINVAL);
+ if (count == 0)
+ return NULL;
+
+ acl = posix_acl_alloc(count, GFP_NOFS);
+ if (!acl)
+ return ERR_PTR(-ENOMEM);
+ for (n = 0; n < count; n++) {
+ struct ocfs2_acl_entry *entry =
+ (struct ocfs2_acl_entry *)value;
+
+ acl->a_entries[n].e_tag = le16_to_cpu(entry->e_tag);
+ acl->a_entries[n].e_perm = le16_to_cpu(entry->e_perm);
+ acl->a_entries[n].e_id = le32_to_cpu(entry->e_id);
+ value += sizeof(struct posix_acl_entry);
+
+ }
+ return acl;
+}
+
+/*
+ * Convert acl struct to xattr value.
+ */
+static void *ocfs2_acl_to_xattr(const struct posix_acl *acl, size_t *size)
+{
+ struct ocfs2_acl_entry *entry = NULL;
+ char *ocfs2_acl;
+ size_t n;
+
+ *size = acl->a_count * sizeof(struct posix_acl_entry);
+
+ ocfs2_acl = kmalloc(*size, GFP_NOFS);
+ if (!ocfs2_acl)
+ return ERR_PTR(-ENOMEM);
+
+ entry = (struct ocfs2_acl_entry *)ocfs2_acl;
+ for (n = 0; n < acl->a_count; n++, entry++) {
+ entry->e_tag = cpu_to_le16(acl->a_entries[n].e_tag);
+ entry->e_perm = cpu_to_le16(acl->a_entries[n].e_perm);
+ entry->e_id = cpu_to_le32(acl->a_entries[n].e_id);
+ }
+ return ocfs2_acl;
+}
+
+static struct posix_acl *ocfs2_get_acl_nolock(struct inode *inode,
+ int type,
+ struct buffer_head *di_bh)
+{
+ struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+ int name_index;
+ char *value = NULL;
+ struct posix_acl *acl;
+ int retval;
+
+ if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
+ return NULL;
+
+ switch (type) {
+ case ACL_TYPE_ACCESS:
+ name_index = OCFS2_XATTR_INDEX_POSIX_ACL_ACCESS;
+ break;
+ case ACL_TYPE_DEFAULT:
+ name_index = OCFS2_XATTR_INDEX_POSIX_ACL_DEFAULT;
+ break;
+ default:
+ return ERR_PTR(-EINVAL);
+ }
+
+ retval = ocfs2_xattr_get_nolock(inode, di_bh, name_index, "", NULL, 0);
+ if (retval > 0) {
+ value = kmalloc(retval, GFP_NOFS);
+ if (!value)
+ return ERR_PTR(-ENOMEM);
+ retval = ocfs2_xattr_get_nolock(inode, di_bh, name_index,
+ "", value, retval);
+ }
+
+ if (retval > 0)
+ acl = ocfs2_acl_from_xattr(value, retval);
+ else if (retval == -ENODATA || retval == 0)
+ acl = NULL;
+ else
+ acl = ERR_PTR(retval);
+
+ kfree(value);
+
+ return acl;
+}
+
+
+/*
+ * Get posix acl.
+ */
+static struct posix_acl *ocfs2_get_acl(struct inode *inode, int type)
+{
+ struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+ struct buffer_head *di_bh = NULL;
+ struct posix_acl *acl;
+ int ret;
+
+ if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
+ return NULL;
+
+ ret = ocfs2_inode_lock(inode, &di_bh, 0);
+ if (ret < 0) {
+ mlog_errno(ret);
+ acl = ERR_PTR(ret);
+ return acl;
+ }
+
+ acl = ocfs2_get_acl_nolock(inode, type, di_bh);
+
+ ocfs2_inode_unlock(inode, 0);
+
+ brelse(di_bh);
+
+ return acl;
+}
+
+/*
+ * Set the access or default ACL of an inode.
+ */
+static int ocfs2_set_acl(handle_t *handle,
+ struct inode *inode,
+ struct buffer_head *di_bh,
+ int type,
+ struct posix_acl *acl,
+ struct ocfs2_alloc_context *meta_ac,
+ struct ocfs2_alloc_context *data_ac)
+{
+ int name_index;
+ void *value = NULL;
+ size_t size = 0;
+ int ret;
+
+ if (S_ISLNK(inode->i_mode))
+ return -EOPNOTSUPP;
+
+ switch (type) {
+ case ACL_TYPE_ACCESS:
+ name_index = OCFS2_XATTR_INDEX_POSIX_ACL_ACCESS;
+ if (acl) {
+ mode_t mode = inode->i_mode;
+ ret = posix_acl_equiv_mode(acl, &mode);
+ if (ret < 0)
+ return ret;
+ else {
+ inode->i_mode = mode;
+ if (ret == 0)
+ acl = NULL;
+ }
+ }
+ break;
+ case ACL_TYPE_DEFAULT:
+ name_index = OCFS2_XATTR_INDEX_POSIX_ACL_DEFAULT;
+ if (!S_ISDIR(inode->i_mode))
+ return acl ? -EACCES : 0;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ if (acl) {
+ value = ocfs2_acl_to_xattr(acl, &size);
+ if (IS_ERR(value))
+ return (int)PTR_ERR(value);
+ }
+
+ if (handle)
+ ret = ocfs2_xattr_set_handle(handle, inode, di_bh, name_index,
+ "", value, size, 0,
+ meta_ac, data_ac);
+ else
+ ret = ocfs2_xattr_set(inode, name_index, "", value, size, 0);
+
+ kfree(value);
+
+ return ret;
+}
+
+static size_t ocfs2_xattr_list_acl_access(struct inode *inode,
+ char *list,
+ size_t list_len,
+ const char *name,
+ size_t name_len)
+{
+ struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+ const size_t size = sizeof(POSIX_ACL_XATTR_ACCESS);
+
+ if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
+ return 0;
+
+ if (list && size <= list_len)
+ memcpy(list, POSIX_ACL_XATTR_ACCESS, size);
+ return size;
+}
+
+static size_t ocfs2_xattr_list_acl_default(struct inode *inode,
+ char *list,
+ size_t list_len,
+ const char *name,
+ size_t name_len)
+{
+ struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+ const size_t size = sizeof(POSIX_ACL_XATTR_DEFAULT);
+
+ if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
+ return 0;
+
+ if (list && size <= list_len)
+ memcpy(list, POSIX_ACL_XATTR_DEFAULT, size);
+ return size;
+}
+
+static int ocfs2_xattr_get_acl(struct inode *inode,
+ int type,
+ void *buffer,
+ size_t size)
+{
+ struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+ struct posix_acl *acl;
+ int ret;
+
+ if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
+ return -EOPNOTSUPP;
+
+ acl = ocfs2_get_acl(inode, type);
+ if (IS_ERR(acl))
+ return PTR_ERR(acl);
+ if (acl == NULL)
+ return -ENODATA;
+ ret = posix_acl_to_xattr(acl, buffer, size);
+ posix_acl_release(acl);
+
+ return ret;
+}
+
+static int ocfs2_xattr_get_acl_access(struct inode *inode,
+ const char *name,
+ void *buffer,
+ size_t size)
+{
+ if (strcmp(name, "") != 0)
+ return -EINVAL;
+ return ocfs2_xattr_get_acl(inode, ACL_TYPE_ACCESS, buffer, size);
+}
+
+static int ocfs2_xattr_get_acl_default(struct inode *inode,
+ const char *name,
+ void *buffer,
+ size_t size)
+{
+ if (strcmp(name, "") != 0)
+ return -EINVAL;
+ return ocfs2_xattr_get_acl(inode, ACL_TYPE_DEFAULT, buffer, size);
+}
+
+static int ocfs2_xattr_set_acl(struct inode *inode,
+ int type,
+ const void *value,
+ size_t size)
+{
+ struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+ struct posix_acl *acl;
+ int ret = 0;
+
+ if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
+ return -EOPNOTSUPP;
+
+ if (!is_owner_or_cap(inode))
+ return -EPERM;
+
+ if (value) {
+ acl = posix_acl_from_xattr(value, size);
+ if (IS_ERR(acl))
+ return PTR_ERR(acl);
+ else if (acl) {
+ ret = posix_acl_valid(acl);
+ if (ret)
+ goto cleanup;
+ }
+ } else
+ acl = NULL;
+
+ ret = ocfs2_set_acl(NULL, inode, NULL, type, acl, NULL, NULL);
+
+cleanup:
+ posix_acl_release(acl);
+ return ret;
+}
+
+static int ocfs2_xattr_set_acl_access(struct inode *inode,
+ const char *name,
+ const void *value,
+ size_t size,
+ int flags)
+{
+ if (strcmp(name, "") != 0)
+ return -EINVAL;
+ return ocfs2_xattr_set_acl(inode, ACL_TYPE_ACCESS, value, size);
+}
+
+static int ocfs2_xattr_set_acl_default(struct inode *inode,
+ const char *name,
+ const void *value,
+ size_t size,
+ int flags)
+{
+ if (strcmp(name, "") != 0)
+ return -EINVAL;
+ return ocfs2_xattr_set_acl(inode, ACL_TYPE_DEFAULT, value, size);
+}
+
+struct xattr_handler ocfs2_xattr_acl_access_handler = {
+ .prefix = POSIX_ACL_XATTR_ACCESS,
+ .list = ocfs2_xattr_list_acl_access,
+ .get = ocfs2_xattr_get_acl_access,
+ .set = ocfs2_xattr_set_acl_access,
+};
+
+struct xattr_handler ocfs2_xattr_acl_default_handler = {
+ .prefix = POSIX_ACL_XATTR_DEFAULT,
+ .list = ocfs2_xattr_list_acl_default,
+ .get = ocfs2_xattr_get_acl_default,
+ .set = ocfs2_xattr_set_acl_default,
+};
diff --git a/fs/ocfs2/acl.h b/fs/ocfs2/acl.h
new file mode 100644
index 0000000..1b39f3e
--- /dev/null
+++ b/fs/ocfs2/acl.h
@@ -0,0 +1,29 @@
+/* -*- mode: c; c-basic-offset: 8; -*-
+ * vim: noexpandtab sw=8 ts=8 sts=0:
+ *
+ * acl.h
+ *
+ * Copyright (C) 2004, 2008 Oracle. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
+#ifndef OCFS2_ACL_H
+#define OCFS2_ACL_H
+
+#include <linux/posix_acl_xattr.h>
+
+struct ocfs2_acl_entry {
+ __le16 e_tag;
+ __le16 e_perm;
+ __le32 e_id;
+};
+
+#endif /* OCFS2_ACL_H */
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index 3fed9e3..25d07ff 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -195,6 +195,7 @@ enum ocfs2_mount_options
OCFS2_MOUNT_LOCALFLOCKS = 1 << 5, /* No cluster aware user file locks */
OCFS2_MOUNT_NOUSERXATTR = 1 << 6, /* No user xattr */
OCFS2_MOUNT_INODE64 = 1 << 7, /* Allow inode numbers > 2^32 */
+ OCFS2_MOUNT_POSIX_ACL = 1 << 8, /* POSIX access control lists */
};

#define OCFS2_OSB_SOFT_RO 0x0001
diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index ba9b870..2e273c2 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -91,6 +91,10 @@ static struct ocfs2_xattr_def_value_root def_xv = {

struct xattr_handler *ocfs2_xattr_handlers[] = {
&ocfs2_xattr_user_handler,
+#ifdef CONFIG_OCFS2_FS_POSIX_ACL
+ &ocfs2_xattr_acl_access_handler,
+ &ocfs2_xattr_acl_default_handler,
+#endif
&ocfs2_xattr_trusted_handler,
&ocfs2_xattr_security_handler,
NULL
@@ -98,6 +102,12 @@ struct xattr_handler *ocfs2_xattr_handlers[] = {

static struct xattr_handler *ocfs2_xattr_handler_map[OCFS2_XATTR_MAX] = {
[OCFS2_XATTR_INDEX_USER] = &ocfs2_xattr_user_handler,
+#ifdef CONFIG_OCFS2_FS_POSIX_ACL
+ [OCFS2_XATTR_INDEX_POSIX_ACL_ACCESS]
+ = &ocfs2_xattr_acl_access_handler,
+ [OCFS2_XATTR_INDEX_POSIX_ACL_DEFAULT]
+ = &ocfs2_xattr_acl_default_handler,
+#endif
[OCFS2_XATTR_INDEX_TRUSTED] = &ocfs2_xattr_trusted_handler,
[OCFS2_XATTR_INDEX_SECURITY] = &ocfs2_xattr_security_handler,
};
diff --git a/fs/ocfs2/xattr.h b/fs/ocfs2/xattr.h
index 86aa10f..6163df3 100644
--- a/fs/ocfs2/xattr.h
+++ b/fs/ocfs2/xattr.h
@@ -40,6 +40,10 @@ struct ocfs2_security_xattr_info {
extern struct xattr_handler ocfs2_xattr_user_handler;
extern struct xattr_handler ocfs2_xattr_trusted_handler;
extern struct xattr_handler ocfs2_xattr_security_handler;
+#ifdef CONFIG_OCFS2_FS_POSIX_ACL
+extern struct xattr_handler ocfs2_xattr_acl_access_handler;
+extern struct xattr_handler ocfs2_xattr_acl_default_handler;
+#endif
extern struct xattr_handler *ocfs2_xattr_handlers[];

ssize_t ocfs2_listxattr(struct dentry *, char *, size_t);
--
1.5.6

2008-12-19 22:03:11

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 26/45] ocfs2: add ocfs2_xattr_get_nolock

From: Tiger Yang <[email protected]>

This function does the work of ocfs2_xattr_get under an open lock.

Signed-off-by: Tiger Yang <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 40 ++++++++++++++++++++++++++++------------
fs/ocfs2/xattr.h | 2 ++
2 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 2cab0d6..ba9b870 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -925,12 +925,8 @@ cleanup:
return ret;
}

-/* ocfs2_xattr_get()
- *
- * Copy an extended attribute into the buffer provided.
- * Buffer is NULL to compute the size of buffer required.
- */
-static int ocfs2_xattr_get(struct inode *inode,
+int ocfs2_xattr_get_nolock(struct inode *inode,
+ struct buffer_head *di_bh,
int name_index,
const char *name,
void *buffer,
@@ -938,7 +934,6 @@ static int ocfs2_xattr_get(struct inode *inode,
{
int ret;
struct ocfs2_dinode *di = NULL;
- struct buffer_head *di_bh = NULL;
struct ocfs2_inode_info *oi = OCFS2_I(inode);
struct ocfs2_xattr_search xis = {
.not_found = -ENODATA,
@@ -953,11 +948,6 @@ static int ocfs2_xattr_get(struct inode *inode,
if (!(oi->ip_dyn_features & OCFS2_HAS_XATTR_FL))
ret = -ENODATA;

- ret = ocfs2_inode_lock(inode, &di_bh, 0);
- if (ret < 0) {
- mlog_errno(ret);
- return ret;
- }
xis.inode_bh = xbs.inode_bh = di_bh;
di = (struct ocfs2_dinode *)di_bh->b_data;

@@ -968,6 +958,32 @@ static int ocfs2_xattr_get(struct inode *inode,
ret = ocfs2_xattr_block_get(inode, name_index, name, buffer,
buffer_size, &xbs);
up_read(&oi->ip_xattr_sem);
+
+ return ret;
+}
+
+/* ocfs2_xattr_get()
+ *
+ * Copy an extended attribute into the buffer provided.
+ * Buffer is NULL to compute the size of buffer required.
+ */
+static int ocfs2_xattr_get(struct inode *inode,
+ int name_index,
+ const char *name,
+ void *buffer,
+ size_t buffer_size)
+{
+ int ret;
+ struct buffer_head *di_bh = NULL;
+
+ ret = ocfs2_inode_lock(inode, &di_bh, 0);
+ if (ret < 0) {
+ mlog_errno(ret);
+ return ret;
+ }
+ ret = ocfs2_xattr_get_nolock(inode, di_bh, name_index,
+ name, buffer, buffer_size);
+
ocfs2_inode_unlock(inode, 0);

brelse(di_bh);
diff --git a/fs/ocfs2/xattr.h b/fs/ocfs2/xattr.h
index 188ef6b..86aa10f 100644
--- a/fs/ocfs2/xattr.h
+++ b/fs/ocfs2/xattr.h
@@ -43,6 +43,8 @@ extern struct xattr_handler ocfs2_xattr_security_handler;
extern struct xattr_handler *ocfs2_xattr_handlers[];

ssize_t ocfs2_listxattr(struct dentry *, char *, size_t);
+int ocfs2_xattr_get_nolock(struct inode *, struct buffer_head *, int,
+ const char *, void *, size_t);
int ocfs2_xattr_set(struct inode *, int, const char *, const void *,
size_t, int);
int ocfs2_xattr_set_handle(handle_t *, struct inode *, struct buffer_head *,
--
1.5.6

2008-12-19 22:03:55

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 28/45] ocfs2: add ocfs2_check_acl

From: Tiger Yang <[email protected]>

This function is used to enhance permission checking with POSIX ACLs.

Signed-off-by: Tiger Yang <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/acl.c | 15 +++++++++++++++
fs/ocfs2/acl.h | 10 ++++++++++
fs/ocfs2/file.c | 3 ++-
3 files changed, 27 insertions(+), 1 deletions(-)

diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
index 62d0faa..a6a2bf6 100644
--- a/fs/ocfs2/acl.c
+++ b/fs/ocfs2/acl.c
@@ -230,6 +230,21 @@ static int ocfs2_set_acl(handle_t *handle,
return ret;
}

+int ocfs2_check_acl(struct inode *inode, int mask)
+{
+ struct posix_acl *acl = ocfs2_get_acl(inode, ACL_TYPE_ACCESS);
+
+ if (IS_ERR(acl))
+ return PTR_ERR(acl);
+ if (acl) {
+ int ret = posix_acl_permission(inode, acl, mask);
+ posix_acl_release(acl);
+ return ret;
+ }
+
+ return -EAGAIN;
+}
+
static size_t ocfs2_xattr_list_acl_access(struct inode *inode,
char *list,
size_t list_len,
diff --git a/fs/ocfs2/acl.h b/fs/ocfs2/acl.h
index 1b39f3e..fef10f1 100644
--- a/fs/ocfs2/acl.h
+++ b/fs/ocfs2/acl.h
@@ -26,4 +26,14 @@ struct ocfs2_acl_entry {
__le32 e_id;
};

+#ifdef CONFIG_OCFS2_FS_POSIX_ACL
+
+extern int ocfs2_check_acl(struct inode *, int);
+
+#else /* CONFIG_OCFS2_FS_POSIX_ACL*/
+
+#define ocfs2_check_acl NULL
+
+#endif /* CONFIG_OCFS2_FS_POSIX_ACL*/
+
#endif /* OCFS2_ACL_H */
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 3605491..7bad7d9 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -56,6 +56,7 @@
#include "suballoc.h"
#include "super.h"
#include "xattr.h"
+#include "acl.h"

#include "buffer_head_io.h"

@@ -1035,7 +1036,7 @@ int ocfs2_permission(struct inode *inode, int mask)
goto out;
}

- ret = generic_permission(inode, mask, NULL);
+ ret = generic_permission(inode, mask, ocfs2_check_acl);

ocfs2_inode_unlock(inode, 0);
out:
--
1.5.6

2008-12-19 22:04:20

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 29/45] ocfs2: add ocfs2_acl_chmod

From: Tiger Yang <[email protected]>

This function is used to update acl xattrs during file mode changes.

Signed-off-by: Tiger Yang <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/acl.c | 27 +++++++++++++++++++++++++++
fs/ocfs2/acl.h | 5 +++++
fs/ocfs2/file.c | 6 ++++++
3 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
index a6a2bf6..df72256 100644
--- a/fs/ocfs2/acl.c
+++ b/fs/ocfs2/acl.c
@@ -245,6 +245,33 @@ int ocfs2_check_acl(struct inode *inode, int mask)
return -EAGAIN;
}

+int ocfs2_acl_chmod(struct inode *inode)
+{
+ struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+ struct posix_acl *acl, *clone;
+ int ret;
+
+ if (S_ISLNK(inode->i_mode))
+ return -EOPNOTSUPP;
+
+ if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
+ return 0;
+
+ acl = ocfs2_get_acl(inode, ACL_TYPE_ACCESS);
+ if (IS_ERR(acl) || !acl)
+ return PTR_ERR(acl);
+ clone = posix_acl_clone(acl, GFP_KERNEL);
+ posix_acl_release(acl);
+ if (!clone)
+ return -ENOMEM;
+ ret = posix_acl_chmod_masq(clone, inode->i_mode);
+ if (!ret)
+ ret = ocfs2_set_acl(NULL, inode, NULL, ACL_TYPE_ACCESS,
+ clone, NULL, NULL);
+ posix_acl_release(clone);
+ return ret;
+}
+
static size_t ocfs2_xattr_list_acl_access(struct inode *inode,
char *list,
size_t list_len,
diff --git a/fs/ocfs2/acl.h b/fs/ocfs2/acl.h
index fef10f1..68ffd64 100644
--- a/fs/ocfs2/acl.h
+++ b/fs/ocfs2/acl.h
@@ -29,10 +29,15 @@ struct ocfs2_acl_entry {
#ifdef CONFIG_OCFS2_FS_POSIX_ACL

extern int ocfs2_check_acl(struct inode *, int);
+extern int ocfs2_acl_chmod(struct inode *);

#else /* CONFIG_OCFS2_FS_POSIX_ACL*/

#define ocfs2_check_acl NULL
+static inline int ocfs2_acl_chmod(struct inode *inode)
+{
+ return 0;
+}

#endif /* CONFIG_OCFS2_FS_POSIX_ACL*/

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 7bad7d9..4636aa6 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -990,6 +990,12 @@ bail_unlock_rw:
bail:
brelse(bh);

+ if (!status && attr->ia_valid & ATTR_MODE) {
+ status = ocfs2_acl_chmod(inode);
+ if (status < 0)
+ mlog_errno(status);
+ }
+
mlog_exit(status);
return status;
}
--
1.5.6

2008-12-19 22:04:49

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 31/45] ocfs2: add mount option and Kconfig option for acl

From: Tiger Yang <[email protected]>

This patch adds the Kconfig option "CONFIG_OCFS2_FS_POSIX_ACL"
and mount options "acl" to enable acls in Ocfs2.

Signed-off-by: Tiger Yang <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
Documentation/filesystems/ocfs2.txt | 3 ++-
fs/Kconfig | 9 +++++++++
fs/ocfs2/super.c | 33 +++++++++++++++++++++++++++++++++
3 files changed, 44 insertions(+), 1 deletions(-)

diff --git a/Documentation/filesystems/ocfs2.txt b/Documentation/filesystems/ocfs2.txt
index 67310fb..c2a0871 100644
--- a/Documentation/filesystems/ocfs2.txt
+++ b/Documentation/filesystems/ocfs2.txt
@@ -31,7 +31,6 @@ Features which OCFS2 does not support yet:
- quotas
- Directory change notification (F_NOTIFY)
- Distributed Caching (F_SETLEASE/F_GETLEASE/break_lease)
- - POSIX ACLs

Mount options
=============
@@ -79,3 +78,5 @@ inode64 Indicates that Ocfs2 is allowed to create inodes at
bits of significance.
user_xattr (*) Enables Extended User Attributes.
nouser_xattr Disables Extended User Attributes.
+acl Enables POSIX Access Control Lists support.
+noacl (*) Disables POSIX Access Control Lists support.
diff --git a/fs/Kconfig b/fs/Kconfig
index 522469a..3af6024 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -268,6 +268,15 @@ config OCFS2_COMPAT_JBD
is backwards compatible with JBD. It is safe to say N here.
However, if you really want to use the original JBD, say Y here.

+config OCFS2_FS_POSIX_ACL
+ bool "OCFS2 POSIX Access Control Lists"
+ depends on OCFS2_FS
+ select FS_POSIX_ACL
+ default n
+ help
+ Posix Access Control Lists (ACLs) support permissions for users and
+ groups beyond the owner/group/world scheme.
+
endif # BLOCK

config DNOTIFY
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index 304b63a..9e7accc 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -158,6 +158,8 @@ enum {
Opt_user_xattr,
Opt_nouser_xattr,
Opt_inode64,
+ Opt_acl,
+ Opt_noacl,
Opt_err,
};

@@ -180,6 +182,8 @@ static const match_table_t tokens = {
{Opt_user_xattr, "user_xattr"},
{Opt_nouser_xattr, "nouser_xattr"},
{Opt_inode64, "inode64"},
+ {Opt_acl, "acl"},
+ {Opt_noacl, "noacl"},
{Opt_err, NULL}
};

@@ -466,6 +470,8 @@ unlock_osb:
if (!ret) {
/* Only save off the new mount options in case of a successful
* remount. */
+ if (!(osb->s_feature_incompat & OCFS2_FEATURE_INCOMPAT_XATTR))
+ parsed_options.mount_opt &= ~OCFS2_MOUNT_POSIX_ACL;
osb->s_mount_opt = parsed_options.mount_opt;
osb->s_atime_quantum = parsed_options.atime_quantum;
osb->preferred_slot = parsed_options.slot;
@@ -651,6 +657,10 @@ static int ocfs2_fill_super(struct super_block *sb, void *data, int silent)
}
brelse(bh);
bh = NULL;
+
+ if (!(osb->s_feature_incompat & OCFS2_FEATURE_INCOMPAT_XATTR))
+ parsed_options.mount_opt &= ~OCFS2_MOUNT_POSIX_ACL;
+
osb->s_mount_opt = parsed_options.mount_opt;
osb->s_atime_quantum = parsed_options.atime_quantum;
osb->preferred_slot = parsed_options.slot;
@@ -664,6 +674,9 @@ static int ocfs2_fill_super(struct super_block *sb, void *data, int silent)

sb->s_magic = OCFS2_SUPER_MAGIC;

+ sb->s_flags = (sb->s_flags & ~MS_POSIXACL) |
+ ((osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL) ? MS_POSIXACL : 0);
+
/* Hard readonly mode only if: bdev_read_only, MS_RDONLY,
* heartbeat=none */
if (bdev_read_only(sb->s_bdev)) {
@@ -945,6 +958,19 @@ static int ocfs2_parse_options(struct super_block *sb,
case Opt_inode64:
mopt->mount_opt |= OCFS2_MOUNT_INODE64;
break;
+#ifdef CONFIG_OCFS2_FS_POSIX_ACL
+ case Opt_acl:
+ mopt->mount_opt |= OCFS2_MOUNT_POSIX_ACL;
+ break;
+ case Opt_noacl:
+ mopt->mount_opt &= ~OCFS2_MOUNT_POSIX_ACL;
+ break;
+#else
+ case Opt_acl:
+ case Opt_noacl:
+ printk(KERN_INFO "ocfs2 (no)acl options not supported\n");
+ break;
+#endif
default:
mlog(ML_ERROR,
"Unrecognized mount option \"%s\" "
@@ -1017,6 +1043,13 @@ static int ocfs2_show_options(struct seq_file *s, struct vfsmount *mnt)
if (opts & OCFS2_MOUNT_INODE64)
seq_printf(s, ",inode64");

+#ifdef CONFIG_OCFS2_FS_POSIX_ACL
+ if (opts & OCFS2_MOUNT_POSIX_ACL)
+ seq_printf(s, ",acl");
+ else
+ seq_printf(s, ",noacl");
+#endif
+
return 0;
}

--
1.5.6

2008-12-19 22:04:36

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 30/45] ocfs2: add ocfs2_init_acl in mknod

From: Tiger Yang <[email protected]>

We need to get the parent directories acls and let the new child inherit it.
To this, we add additional calculations for data/metadata allocation.

Signed-off-by: Tiger Yang <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/acl.c | 59 ++++++++++++++++++++++++++++++++++++++++
fs/ocfs2/acl.h | 14 +++++++++
fs/ocfs2/namei.c | 23 ++++++++++-----
fs/ocfs2/xattr.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/ocfs2/xattr.h | 3 ++
5 files changed, 170 insertions(+), 8 deletions(-)

diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
index df72256..12dfb44 100644
--- a/fs/ocfs2/acl.c
+++ b/fs/ocfs2/acl.c
@@ -272,6 +272,65 @@ int ocfs2_acl_chmod(struct inode *inode)
return ret;
}

+/*
+ * Initialize the ACLs of a new inode. If parent directory has default ACL,
+ * then clone to new inode. Called from ocfs2_mknod.
+ */
+int ocfs2_init_acl(handle_t *handle,
+ struct inode *inode,
+ struct inode *dir,
+ struct buffer_head *di_bh,
+ struct buffer_head *dir_bh,
+ struct ocfs2_alloc_context *meta_ac,
+ struct ocfs2_alloc_context *data_ac)
+{
+ struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+ struct posix_acl *acl = NULL;
+ int ret = 0;
+
+ if (!S_ISLNK(inode->i_mode)) {
+ if (osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL) {
+ acl = ocfs2_get_acl_nolock(dir, ACL_TYPE_DEFAULT,
+ dir_bh);
+ if (IS_ERR(acl))
+ return PTR_ERR(acl);
+ }
+ if (!acl)
+ inode->i_mode &= ~current->fs->umask;
+ }
+ if ((osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL) && acl) {
+ struct posix_acl *clone;
+ mode_t mode;
+
+ if (S_ISDIR(inode->i_mode)) {
+ ret = ocfs2_set_acl(handle, inode, di_bh,
+ ACL_TYPE_DEFAULT, acl,
+ meta_ac, data_ac);
+ if (ret)
+ goto cleanup;
+ }
+ clone = posix_acl_clone(acl, GFP_NOFS);
+ ret = -ENOMEM;
+ if (!clone)
+ goto cleanup;
+
+ mode = inode->i_mode;
+ ret = posix_acl_create_masq(clone, &mode);
+ if (ret >= 0) {
+ inode->i_mode = mode;
+ if (ret > 0) {
+ ret = ocfs2_set_acl(handle, inode,
+ di_bh, ACL_TYPE_ACCESS,
+ clone, meta_ac, data_ac);
+ }
+ }
+ posix_acl_release(clone);
+ }
+cleanup:
+ posix_acl_release(acl);
+ return ret;
+}
+
static size_t ocfs2_xattr_list_acl_access(struct inode *inode,
char *list,
size_t list_len,
diff --git a/fs/ocfs2/acl.h b/fs/ocfs2/acl.h
index 68ffd64..8f6389e 100644
--- a/fs/ocfs2/acl.h
+++ b/fs/ocfs2/acl.h
@@ -30,6 +30,10 @@ struct ocfs2_acl_entry {

extern int ocfs2_check_acl(struct inode *, int);
extern int ocfs2_acl_chmod(struct inode *);
+extern int ocfs2_init_acl(handle_t *, struct inode *, struct inode *,
+ struct buffer_head *, struct buffer_head *,
+ struct ocfs2_alloc_context *,
+ struct ocfs2_alloc_context *);

#else /* CONFIG_OCFS2_FS_POSIX_ACL*/

@@ -38,6 +42,16 @@ static inline int ocfs2_acl_chmod(struct inode *inode)
{
return 0;
}
+static inline int ocfs2_init_acl(handle_t *handle,
+ struct inode *inode,
+ struct inode *dir,
+ struct buffer_head *di_bh,
+ struct buffer_head *dir_bh,
+ struct ocfs2_alloc_context *meta_ac,
+ struct ocfs2_alloc_context *data_ac)
+{
+ return 0;
+}

#endif /* CONFIG_OCFS2_FS_POSIX_ACL*/

diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
index b361aec..ed7ed74 100644
--- a/fs/ocfs2/namei.c
+++ b/fs/ocfs2/namei.c
@@ -61,6 +61,7 @@
#include "sysfile.h"
#include "uptodate.h"
#include "xattr.h"
+#include "acl.h"

#include "buffer_head_io.h"

@@ -302,14 +303,13 @@ static int ocfs2_mknod(struct inode *dir,
}
}

- /* calculate meta data/clusters for setting security xattr */
- if (si.enable) {
- status = ocfs2_calc_security_init(dir, &si, &want_clusters,
- &xattr_credits, &xattr_ac);
- if (status < 0) {
- mlog_errno(status);
- goto leave;
- }
+ /* calculate meta data/clusters for setting security and acl xattr */
+ status = ocfs2_calc_xattr_init(dir, parent_fe_bh, mode,
+ &si, &want_clusters,
+ &xattr_credits, &xattr_ac);
+ if (status < 0) {
+ mlog_errno(status);
+ goto leave;
}

/* Reserve a cluster if creating an extent based directory. */
@@ -363,6 +363,13 @@ static int ocfs2_mknod(struct inode *dir,
inc_nlink(dir);
}

+ status = ocfs2_init_acl(handle, inode, dir, new_fe_bh, parent_fe_bh,
+ xattr_ac, data_ac);
+ if (status < 0) {
+ mlog_errno(status);
+ goto leave;
+ }
+
if (si.enable) {
status = ocfs2_init_security_set(handle, inode, new_fe_bh, &si,
xattr_ac, data_ac);
diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 2e273c2..3cc8385 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -84,6 +84,10 @@ struct ocfs2_xattr_set_ctxt {
#define OCFS2_XATTR_FREE_IN_IBODY (OCFS2_MIN_XATTR_INLINE_SIZE \
- sizeof(struct ocfs2_xattr_header) \
- sizeof(__u32))
+#define OCFS2_XATTR_FREE_IN_BLOCK(ptr) ((ptr)->i_sb->s_blocksize \
+ - sizeof(struct ocfs2_xattr_block) \
+ - sizeof(struct ocfs2_xattr_header) \
+ - sizeof(__u32))

static struct ocfs2_xattr_def_value_root def_xv = {
.xv.xr_list.l_count = cpu_to_le16(1),
@@ -402,6 +406,81 @@ int ocfs2_calc_security_init(struct inode *dir,
return ret;
}

+int ocfs2_calc_xattr_init(struct inode *dir,
+ struct buffer_head *dir_bh,
+ int mode,
+ struct ocfs2_security_xattr_info *si,
+ int *want_clusters,
+ int *xattr_credits,
+ struct ocfs2_alloc_context **xattr_ac)
+{
+ int ret = 0;
+ struct ocfs2_super *osb = OCFS2_SB(dir->i_sb);
+ int s_size = 0;
+ int a_size = 0;
+ int acl_len = 0;
+
+ if (si->enable)
+ s_size = ocfs2_xattr_entry_real_size(strlen(si->name),
+ si->value_len);
+
+ if (osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL) {
+ acl_len = ocfs2_xattr_get_nolock(dir, dir_bh,
+ OCFS2_XATTR_INDEX_POSIX_ACL_DEFAULT,
+ "", NULL, 0);
+ if (acl_len > 0) {
+ a_size = ocfs2_xattr_entry_real_size(0, acl_len);
+ if (S_ISDIR(mode))
+ a_size <<= 1;
+ } else if (acl_len != 0 && acl_len != -ENODATA) {
+ mlog_errno(ret);
+ return ret;
+ }
+ }
+
+ if (!(s_size + a_size))
+ return ret;
+
+ /*
+ * The max space of security xattr taken inline is
+ * 256(name) + 80(value) + 16(entry) = 352 bytes,
+ * The max space of acl xattr taken inline is
+ * 80(value) + 16(entry) * 2(if directory) = 192 bytes,
+ * when blocksize = 512, may reserve one more cluser for
+ * xattr bucket, otherwise reserve one metadata block
+ * for them is ok.
+ */
+ if (dir->i_sb->s_blocksize == OCFS2_MIN_BLOCKSIZE ||
+ (s_size + a_size) > OCFS2_XATTR_FREE_IN_IBODY) {
+ ret = ocfs2_reserve_new_metadata_blocks(osb, 1, xattr_ac);
+ if (ret) {
+ mlog_errno(ret);
+ return ret;
+ }
+ *xattr_credits += OCFS2_XATTR_BLOCK_CREATE_CREDITS;
+ }
+
+ if (dir->i_sb->s_blocksize == OCFS2_MIN_BLOCKSIZE &&
+ (s_size + a_size) > OCFS2_XATTR_FREE_IN_BLOCK(dir)) {
+ *want_clusters += 1;
+ *xattr_credits += ocfs2_blocks_per_xattr_bucket(dir->i_sb);
+ }
+
+ /* reserve clusters for xattr value which will be set in B tree*/
+ if (si->enable && si->value_len > OCFS2_XATTR_INLINE_SIZE)
+ *want_clusters += ocfs2_clusters_for_bytes(dir->i_sb,
+ si->value_len);
+ if (osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL &&
+ acl_len > OCFS2_XATTR_INLINE_SIZE) {
+ *want_clusters += ocfs2_clusters_for_bytes(dir->i_sb, acl_len);
+ if (S_ISDIR(mode))
+ *want_clusters += ocfs2_clusters_for_bytes(dir->i_sb,
+ acl_len);
+ }
+
+ return ret;
+}
+
static int ocfs2_xattr_extend_allocation(struct inode *inode,
u32 clusters_to_add,
struct buffer_head *xattr_bh,
diff --git a/fs/ocfs2/xattr.h b/fs/ocfs2/xattr.h
index 6163df3..9a67e7d 100644
--- a/fs/ocfs2/xattr.h
+++ b/fs/ocfs2/xattr.h
@@ -66,5 +66,8 @@ int ocfs2_init_security_set(handle_t *, struct inode *,
int ocfs2_calc_security_init(struct inode *,
struct ocfs2_security_xattr_info *,
int *, int *, struct ocfs2_alloc_context **);
+int ocfs2_calc_xattr_init(struct inode *, struct buffer_head *,
+ int, struct ocfs2_security_xattr_info *,
+ int *, int *, struct ocfs2_alloc_context **);

#endif /* OCFS2_XATTR_H */
--
1.5.6

2008-12-19 22:05:25

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 32/45] ocfs2: Wrap inode block reads in a dedicated function.

From: Joel Becker <[email protected]>

The ocfs2 code currently reads inodes off disk with a simple
ocfs2_read_block() call. Each place that does this has a different set
of sanity checks it performs. Some check only the signature. A couple
validate the block number (the block read vs di->i_blkno). A couple
others check for VALID_FL. Only one place validates i_fs_generation. A
couple check nothing. Even when an error is found, they don't all do
the same thing.

We wrap inode reading into ocfs2_read_inode_block(). This will validate
all the above fields, going readonly if they are invalid (they never
should be). ocfs2_read_inode_block_full() is provided for the places
that want to pass read_block flags. Every caller is passing a struct
inode with a valid ip_blkno, so we don't need a separate blkno argument
either.

We will remove the validation checks from the rest of the code in a
later commit, as they are no longer necessary.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/alloc.c | 2 +-
fs/ocfs2/aops.c | 11 +---
fs/ocfs2/dir.c | 6 +-
fs/ocfs2/dlmglue.c | 12 ++---
fs/ocfs2/extent_map.c | 2 +-
fs/ocfs2/file.c | 21 ++------
fs/ocfs2/inode.c | 136 +++++++++++++++++++++++++++++++++++--------------
fs/ocfs2/inode.h | 16 +++++-
fs/ocfs2/journal.c | 3 +-
fs/ocfs2/localalloc.c | 8 ++--
fs/ocfs2/namei.c | 14 +----
fs/ocfs2/symlink.c | 2 +-
12 files changed, 136 insertions(+), 97 deletions(-)

diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 5592a2f..9c598ad 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -5658,7 +5658,7 @@ static int ocfs2_get_truncate_log_info(struct ocfs2_super *osb,
goto bail;
}

- status = ocfs2_read_block(inode, OCFS2_I(inode)->ip_blkno, &bh);
+ status = ocfs2_read_inode_block(inode, &bh);
if (status < 0) {
iput(inode);
mlog_errno(status);
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index c22543b..e219f8b 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -68,20 +68,13 @@ static int ocfs2_symlink_get_block(struct inode *inode, sector_t iblock,
goto bail;
}

- status = ocfs2_read_block(inode, OCFS2_I(inode)->ip_blkno, &bh);
+ status = ocfs2_read_inode_block(inode, &bh);
if (status < 0) {
mlog_errno(status);
goto bail;
}
fe = (struct ocfs2_dinode *) bh->b_data;

- if (!OCFS2_IS_VALID_DINODE(fe)) {
- mlog(ML_ERROR, "Invalid dinode #%llu: signature = %.*s\n",
- (unsigned long long)le64_to_cpu(fe->i_blkno), 7,
- fe->i_signature);
- goto bail;
- }
-
if ((u64)iblock >= ocfs2_clusters_to_blocks(inode->i_sb,
le32_to_cpu(fe->i_clusters))) {
mlog(ML_ERROR, "block offset is outside the allocated size: "
@@ -262,7 +255,7 @@ static int ocfs2_readpage_inline(struct inode *inode, struct page *page)
BUG_ON(!PageLocked(page));
BUG_ON(!(OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL));

- ret = ocfs2_read_block(inode, OCFS2_I(inode)->ip_blkno, &di_bh);
+ ret = ocfs2_read_inode_block(inode, &di_bh);
if (ret) {
mlog_errno(ret);
goto out;
diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c
index 026e6eb..5777045 100644
--- a/fs/ocfs2/dir.c
+++ b/fs/ocfs2/dir.c
@@ -231,7 +231,7 @@ static struct buffer_head *ocfs2_find_entry_id(const char *name,
struct ocfs2_dinode *di;
struct ocfs2_inline_data *data;

- ret = ocfs2_read_block(dir, OCFS2_I(dir)->ip_blkno, &di_bh);
+ ret = ocfs2_read_inode_block(dir, &di_bh);
if (ret) {
mlog_errno(ret);
goto out;
@@ -458,7 +458,7 @@ static inline int ocfs2_delete_entry_id(handle_t *handle,
struct ocfs2_dinode *di;
struct ocfs2_inline_data *data;

- ret = ocfs2_read_block(dir, OCFS2_I(dir)->ip_blkno, &di_bh);
+ ret = ocfs2_read_inode_block(dir, &di_bh);
if (ret) {
mlog_errno(ret);
goto out;
@@ -636,7 +636,7 @@ static int ocfs2_dir_foreach_blk_id(struct inode *inode,
struct ocfs2_inline_data *data;
struct ocfs2_dir_entry *de;

- ret = ocfs2_read_block(inode, OCFS2_I(inode)->ip_blkno, &di_bh);
+ ret = ocfs2_read_inode_block(inode, &di_bh);
if (ret) {
mlog(ML_ERROR, "Unable to read inode block for dir %llu\n",
(unsigned long long)OCFS2_I(inode)->ip_blkno);
diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 6e6cc0a..9f2a7f7 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -2024,7 +2024,7 @@ static int ocfs2_inode_lock_update(struct inode *inode,
} else {
/* Boo, we have to go to disk. */
/* read bh, cast, ocfs2_refresh_inode */
- status = ocfs2_read_block(inode, oi->ip_blkno, bh);
+ status = ocfs2_read_inode_block(inode, bh);
if (status < 0) {
mlog_errno(status);
goto bail_refresh;
@@ -2032,18 +2032,14 @@ static int ocfs2_inode_lock_update(struct inode *inode,
fe = (struct ocfs2_dinode *) (*bh)->b_data;

/* This is a good chance to make sure we're not
- * locking an invalid object.
+ * locking an invalid object. ocfs2_read_inode_block()
+ * already checked that the inode block is sane.
*
* We bug on a stale inode here because we checked
* above whether it was wiped from disk. The wiping
* node provides a guarantee that we receive that
* message and can mark the inode before dropping any
* locks associated with it. */
- if (!OCFS2_IS_VALID_DINODE(fe)) {
- OCFS2_RO_ON_INVALID_DINODE(inode->i_sb, fe);
- status = -EIO;
- goto bail_refresh;
- }
mlog_bug_on_msg(inode->i_generation !=
le32_to_cpu(fe->i_generation),
"Invalid dinode %llu disk generation: %u "
@@ -2085,7 +2081,7 @@ static int ocfs2_assign_bh(struct inode *inode,
return 0;
}

- status = ocfs2_read_block(inode, OCFS2_I(inode)->ip_blkno, ret_bh);
+ status = ocfs2_read_inode_block(inode, ret_bh);
if (status < 0)
mlog_errno(status);

diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
index 2baedac..b686b31 100644
--- a/fs/ocfs2/extent_map.c
+++ b/fs/ocfs2/extent_map.c
@@ -630,7 +630,7 @@ int ocfs2_get_clusters(struct inode *inode, u32 v_cluster,
if (ret == 0)
goto out;

- ret = ocfs2_read_block(inode, OCFS2_I(inode)->ip_blkno, &di_bh);
+ ret = ocfs2_read_inode_block(inode, &di_bh);
if (ret) {
mlog_errno(ret);
goto out;
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 4636aa6..41001d5 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -402,12 +402,9 @@ static int ocfs2_truncate_file(struct inode *inode,
(unsigned long long)OCFS2_I(inode)->ip_blkno,
(unsigned long long)new_i_size);

+ /* We trust di_bh because it comes from ocfs2_inode_lock(), which
+ * already validated it */
fe = (struct ocfs2_dinode *) di_bh->b_data;
- if (!OCFS2_IS_VALID_DINODE(fe)) {
- OCFS2_RO_ON_INVALID_DINODE(inode->i_sb, fe);
- status = -EIO;
- goto bail;
- }

mlog_bug_on_msg(le64_to_cpu(fe->i_size) != i_size_read(inode),
"Inode %llu, inode i_size = %lld != di "
@@ -546,18 +543,12 @@ static int __ocfs2_extend_allocation(struct inode *inode, u32 logical_start,
*/
BUG_ON(mark_unwritten && !ocfs2_sparse_alloc(osb));

- status = ocfs2_read_block(inode, OCFS2_I(inode)->ip_blkno, &bh);
+ status = ocfs2_read_inode_block(inode, &bh);
if (status < 0) {
mlog_errno(status);
goto leave;
}
-
fe = (struct ocfs2_dinode *) bh->b_data;
- if (!OCFS2_IS_VALID_DINODE(fe)) {
- OCFS2_RO_ON_INVALID_DINODE(inode->i_sb, fe);
- status = -EIO;
- goto leave;
- }

restart_all:
BUG_ON(le32_to_cpu(fe->i_clusters) != OCFS2_I(inode)->ip_clusters);
@@ -1135,9 +1126,8 @@ static int ocfs2_write_remove_suid(struct inode *inode)
{
int ret;
struct buffer_head *bh = NULL;
- struct ocfs2_inode_info *oi = OCFS2_I(inode);

- ret = ocfs2_read_block(inode, oi->ip_blkno, &bh);
+ ret = ocfs2_read_inode_block(inode, &bh);
if (ret < 0) {
mlog_errno(ret);
goto out;
@@ -1163,8 +1153,7 @@ static int ocfs2_allocate_unwritten_extents(struct inode *inode,
struct buffer_head *di_bh = NULL;

if (OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) {
- ret = ocfs2_read_block(inode, OCFS2_I(inode)->ip_blkno,
- &di_bh);
+ ret = ocfs2_read_inode_block(inode, &di_bh);
if (ret) {
mlog_errno(ret);
goto out;
diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
index 7aa00d5..9eb701b 100644
--- a/fs/ocfs2/inode.c
+++ b/fs/ocfs2/inode.c
@@ -214,12 +214,11 @@ static int ocfs2_init_locked_inode(struct inode *inode, void *opaque)
return 0;
}

-int ocfs2_populate_inode(struct inode *inode, struct ocfs2_dinode *fe,
- int create_ino)
+void ocfs2_populate_inode(struct inode *inode, struct ocfs2_dinode *fe,
+ int create_ino)
{
struct super_block *sb;
struct ocfs2_super *osb;
- int status = -EINVAL;
int use_plocks = 1;

mlog_entry("(0x%p, size:%llu)\n", inode,
@@ -232,25 +231,17 @@ int ocfs2_populate_inode(struct inode *inode, struct ocfs2_dinode *fe,
ocfs2_mount_local(osb) || !ocfs2_stack_supports_plocks())
use_plocks = 0;

- /* this means that read_inode cannot create a superblock inode
- * today. change if needed. */
- if (!OCFS2_IS_VALID_DINODE(fe) ||
- !(fe->i_flags & cpu_to_le32(OCFS2_VALID_FL))) {
- mlog(0, "Invalid dinode: i_ino=%lu, i_blkno=%llu, "
- "signature = %.*s, flags = 0x%x\n",
- inode->i_ino,
- (unsigned long long)le64_to_cpu(fe->i_blkno), 7,
- fe->i_signature, le32_to_cpu(fe->i_flags));
- goto bail;
- }
+ /*
+ * These have all been checked by ocfs2_read_inode_block() or set
+ * by ocfs2_mknod_locked(), so a failure is a code bug.
+ */
+ BUG_ON(!OCFS2_IS_VALID_DINODE(fe)); /* This means that read_inode
+ cannot create a superblock
+ inode today. change if
+ that is needed. */
+ BUG_ON(!(fe->i_flags & cpu_to_le32(OCFS2_VALID_FL)));
+ BUG_ON(le32_to_cpu(fe->i_fs_generation) != osb->fs_generation);

- if (le32_to_cpu(fe->i_fs_generation) != osb->fs_generation) {
- mlog(ML_ERROR, "file entry generation does not match "
- "superblock! osb->fs_generation=%x, "
- "fe->i_fs_generation=%x\n",
- osb->fs_generation, le32_to_cpu(fe->i_fs_generation));
- goto bail;
- }

OCFS2_I(inode)->ip_clusters = le32_to_cpu(fe->i_clusters);
OCFS2_I(inode)->ip_attr = le32_to_cpu(fe->i_attr);
@@ -354,10 +345,7 @@ int ocfs2_populate_inode(struct inode *inode, struct ocfs2_dinode *fe,

ocfs2_set_inode_flags(inode);

- status = 0;
-bail:
- mlog_exit(status);
- return status;
+ mlog_exit_void();
}

static int ocfs2_read_locked_inode(struct inode *inode,
@@ -460,11 +448,14 @@ static int ocfs2_read_locked_inode(struct inode *inode,
}
}

- if (can_lock)
- status = ocfs2_read_blocks(inode, args->fi_blkno, 1, &bh,
- OCFS2_BH_IGNORE_CACHE);
- else
+ if (can_lock) {
+ status = ocfs2_read_inode_block_full(inode, &bh,
+ OCFS2_BH_IGNORE_CACHE);
+ } else {
status = ocfs2_read_blocks_sync(osb, args->fi_blkno, 1, &bh);
+ if (!status)
+ status = ocfs2_validate_inode_block(osb->sb, bh);
+ }
if (status < 0) {
mlog_errno(status);
goto bail;
@@ -472,12 +463,6 @@ static int ocfs2_read_locked_inode(struct inode *inode,

status = -EINVAL;
fe = (struct ocfs2_dinode *) bh->b_data;
- if (!OCFS2_IS_VALID_DINODE(fe)) {
- mlog(0, "Invalid dinode #%llu: signature = %.*s\n",
- (unsigned long long)args->fi_blkno, 7,
- fe->i_signature);
- goto bail;
- }

/*
* This is a code bug. Right now the caller needs to
@@ -491,10 +476,9 @@ static int ocfs2_read_locked_inode(struct inode *inode,

if (S_ISCHR(le16_to_cpu(fe->i_mode)) ||
S_ISBLK(le16_to_cpu(fe->i_mode)))
- inode->i_rdev = huge_decode_dev(le64_to_cpu(fe->id1.dev1.i_rdev));
+ inode->i_rdev = huge_decode_dev(le64_to_cpu(fe->id1.dev1.i_rdev));

- if (ocfs2_populate_inode(inode, fe, 0) < 0)
- goto bail;
+ ocfs2_populate_inode(inode, fe, 0);

BUG_ON(args->fi_blkno != le64_to_cpu(fe->i_blkno));

@@ -1264,3 +1248,79 @@ void ocfs2_refresh_inode(struct inode *inode,

spin_unlock(&OCFS2_I(inode)->ip_lock);
}
+
+int ocfs2_validate_inode_block(struct super_block *sb,
+ struct buffer_head *bh)
+{
+ int rc = -EINVAL;
+ struct ocfs2_dinode *di = (struct ocfs2_dinode *)bh->b_data;
+
+ BUG_ON(!buffer_uptodate(bh));
+
+ if (!OCFS2_IS_VALID_DINODE(di)) {
+ ocfs2_error(sb, "Invalid dinode #%llu: signature = %.*s\n",
+ (unsigned long long)bh->b_blocknr, 7,
+ di->i_signature);
+ goto bail;
+ }
+
+ if (le64_to_cpu(di->i_blkno) != bh->b_blocknr) {
+ ocfs2_error(sb, "Invalid dinode #%llu: i_blkno is %llu\n",
+ (unsigned long long)bh->b_blocknr,
+ (unsigned long long)le64_to_cpu(di->i_blkno));
+ goto bail;
+ }
+
+ if (!(di->i_flags & cpu_to_le32(OCFS2_VALID_FL))) {
+ ocfs2_error(sb,
+ "Invalid dinode #%llu: OCFS2_VALID_FL not set\n",
+ (unsigned long long)bh->b_blocknr);
+ goto bail;
+ }
+
+ if (le32_to_cpu(di->i_fs_generation) !=
+ OCFS2_SB(sb)->fs_generation) {
+ ocfs2_error(sb,
+ "Invalid dinode #%llu: fs_generation is %u\n",
+ (unsigned long long)bh->b_blocknr,
+ le32_to_cpu(di->i_fs_generation));
+ goto bail;
+ }
+
+ rc = 0;
+
+bail:
+ return rc;
+}
+
+int ocfs2_read_inode_block_full(struct inode *inode, struct buffer_head **bh,
+ int flags)
+{
+ int rc;
+ struct buffer_head *tmp = *bh;
+
+ rc = ocfs2_read_blocks(inode, OCFS2_I(inode)->ip_blkno, 1, &tmp,
+ flags);
+ if (rc)
+ goto out;
+
+ if (!(flags & OCFS2_BH_READAHEAD)) {
+ rc = ocfs2_validate_inode_block(inode->i_sb, tmp);
+ if (rc) {
+ brelse(tmp);
+ goto out;
+ }
+ }
+
+ /* If ocfs2_read_blocks() got us a new bh, pass it up. */
+ if (!*bh)
+ *bh = tmp;
+
+out:
+ return rc;
+}
+
+int ocfs2_read_inode_block(struct inode *inode, struct buffer_head **bh)
+{
+ return ocfs2_read_inode_block_full(inode, bh, 0);
+}
diff --git a/fs/ocfs2/inode.h b/fs/ocfs2/inode.h
index 2f37af9..b79c371 100644
--- a/fs/ocfs2/inode.h
+++ b/fs/ocfs2/inode.h
@@ -128,8 +128,8 @@ struct inode *ocfs2_iget(struct ocfs2_super *osb, u64 feoff, unsigned flags,
int sysfile_type);
int ocfs2_inode_init_private(struct inode *inode);
int ocfs2_inode_revalidate(struct dentry *dentry);
-int ocfs2_populate_inode(struct inode *inode, struct ocfs2_dinode *fe,
- int create_ino);
+void ocfs2_populate_inode(struct inode *inode, struct ocfs2_dinode *fe,
+ int create_ino);
void ocfs2_read_inode(struct inode *inode);
void ocfs2_read_inode2(struct inode *inode, void *opaque);
ssize_t ocfs2_rw_direct(int rw, struct file *filp, char *buf,
@@ -153,4 +153,16 @@ static inline blkcnt_t ocfs2_inode_sector_count(struct inode *inode)
return (blkcnt_t)(OCFS2_I(inode)->ip_clusters << c_to_s_bits);
}

+/* Validate that a bh contains a valid inode */
+int ocfs2_validate_inode_block(struct super_block *sb,
+ struct buffer_head *bh);
+/*
+ * Read an inode block into *bh. If *bh is NULL, a bh will be allocated.
+ * This is a cached read. The inode will be validated with
+ * ocfs2_validate_inode_block().
+ */
+int ocfs2_read_inode_block(struct inode *inode, struct buffer_head **bh);
+/* The same, but can be passed OCFS2_BH_* flags */
+int ocfs2_read_inode_block_full(struct inode *inode, struct buffer_head **bh,
+ int flags);
#endif /* OCFS2_INODE_H */
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index 99fe9d5..877aaa0 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -1135,8 +1135,7 @@ static int ocfs2_read_journal_inode(struct ocfs2_super *osb,
}
SET_INODE_JOURNAL(inode);

- status = ocfs2_read_blocks(inode, OCFS2_I(inode)->ip_blkno, 1, bh,
- OCFS2_BH_IGNORE_CACHE);
+ status = ocfs2_read_inode_block_full(inode, bh, OCFS2_BH_IGNORE_CACHE);
if (status < 0) {
mlog_errno(status);
goto bail;
diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
index 687b287..19cfb1b 100644
--- a/fs/ocfs2/localalloc.c
+++ b/fs/ocfs2/localalloc.c
@@ -248,8 +248,8 @@ int ocfs2_load_local_alloc(struct ocfs2_super *osb)
goto bail;
}

- status = ocfs2_read_blocks(inode, OCFS2_I(inode)->ip_blkno, 1,
- &alloc_bh, OCFS2_BH_IGNORE_CACHE);
+ status = ocfs2_read_inode_block_full(inode, &alloc_bh,
+ OCFS2_BH_IGNORE_CACHE);
if (status < 0) {
mlog_errno(status);
goto bail;
@@ -459,8 +459,8 @@ int ocfs2_begin_local_alloc_recovery(struct ocfs2_super *osb,

mutex_lock(&inode->i_mutex);

- status = ocfs2_read_blocks(inode, OCFS2_I(inode)->ip_blkno, 1,
- &alloc_bh, OCFS2_BH_IGNORE_CACHE);
+ status = ocfs2_read_inode_block_full(inode, &alloc_bh,
+ OCFS2_BH_IGNORE_CACHE);
if (status < 0) {
mlog_errno(status);
goto bail;
diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
index ed7ed74..98fd325 100644
--- a/fs/ocfs2/namei.c
+++ b/fs/ocfs2/namei.c
@@ -531,15 +531,7 @@ static int ocfs2_mknod_locked(struct ocfs2_super *osb,
goto leave;
}

- if (ocfs2_populate_inode(inode, fe, 1) < 0) {
- mlog(ML_ERROR, "populate inode failed! bh->b_blocknr=%llu, "
- "i_blkno=%llu, i_ino=%lu\n",
- (unsigned long long)(*new_fe_bh)->b_blocknr,
- (unsigned long long)le64_to_cpu(fe->i_blkno),
- inode->i_ino);
- BUG();
- }
-
+ ocfs2_populate_inode(inode, fe, 1);
ocfs2_inode_set_new(osb, inode);
if (!ocfs2_mount_local(osb)) {
status = ocfs2_create_new_inode_locks(inode);
@@ -1864,9 +1856,7 @@ static int ocfs2_orphan_add(struct ocfs2_super *osb,

mlog_entry("(inode->i_ino = %lu)\n", inode->i_ino);

- status = ocfs2_read_block(orphan_dir_inode,
- OCFS2_I(orphan_dir_inode)->ip_blkno,
- &orphan_dir_bh);
+ status = ocfs2_read_inode_block(orphan_dir_inode, &orphan_dir_bh);
if (status < 0) {
mlog_errno(status);
goto leave;
diff --git a/fs/ocfs2/symlink.c b/fs/ocfs2/symlink.c
index cbd03df..ed0a0cf 100644
--- a/fs/ocfs2/symlink.c
+++ b/fs/ocfs2/symlink.c
@@ -84,7 +84,7 @@ static char *ocfs2_fast_symlink_getlink(struct inode *inode,

mlog_entry_void();

- status = ocfs2_read_block(inode, OCFS2_I(inode)->ip_blkno, bh);
+ status = ocfs2_read_inode_block(inode, bh);
if (status < 0) {
mlog_errno(status);
link = ERR_PTR(status);
--
1.5.6

2008-12-19 22:05:45

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 33/45] ocfs2: Morph the haphazard OCFS2_IS_VALID_DINODE() checks.

From: Joel Becker <[email protected]>

Random places in the code would check a dinode bh to see if it was
valid. Not only did they do different levels of validation, they
handled errors in different ways.

The previous commit unified inode block reads, validating all block
reads in the same place. Thus, these haphazard checks are no longer
necessary. Rather than eliminate them, however, we change them to
BUG_ON() checks. This ensures the assumptions remain true. All of the
code paths to these checks have been audited to ensure they come from a
validated inode read.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/alloc.c | 50 +++++++++++++++++++++-----------------------------
fs/ocfs2/journal.c | 17 +++++------------
fs/ocfs2/ocfs2.h | 8 --------
fs/ocfs2/resize.c | 10 ++++------
fs/ocfs2/suballoc.c | 36 ++++++++++++++++--------------------
5 files changed, 46 insertions(+), 75 deletions(-)

diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 9c598ad..320545b 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -187,20 +187,12 @@ static int ocfs2_dinode_insert_check(struct inode *inode,
static int ocfs2_dinode_sanity_check(struct inode *inode,
struct ocfs2_extent_tree *et)
{
- int ret = 0;
- struct ocfs2_dinode *di;
+ struct ocfs2_dinode *di = et->et_object;

BUG_ON(et->et_ops != &ocfs2_dinode_et_ops);
+ BUG_ON(!OCFS2_IS_VALID_DINODE(di));

- di = et->et_object;
- if (!OCFS2_IS_VALID_DINODE(di)) {
- ret = -EIO;
- ocfs2_error(inode->i_sb,
- "Inode %llu has invalid path root",
- (unsigned long long)OCFS2_I(inode)->ip_blkno);
- }
-
- return ret;
+ return 0;
}

static void ocfs2_dinode_fill_root_el(struct ocfs2_extent_tree *et)
@@ -5380,13 +5372,13 @@ int ocfs2_truncate_log_append(struct ocfs2_super *osb,
start_cluster = ocfs2_blocks_to_clusters(osb->sb, start_blk);

di = (struct ocfs2_dinode *) tl_bh->b_data;
- tl = &di->id2.i_dealloc;
- if (!OCFS2_IS_VALID_DINODE(di)) {
- OCFS2_RO_ON_INVALID_DINODE(osb->sb, di);
- status = -EIO;
- goto bail;
- }

+ /* tl_bh is loaded from ocfs2_truncate_log_init(). It's validated
+ * by the underlying call to ocfs2_read_inode_block(), so any
+ * corruption is a code bug */
+ BUG_ON(!OCFS2_IS_VALID_DINODE(di));
+
+ tl = &di->id2.i_dealloc;
tl_count = le16_to_cpu(tl->tl_count);
mlog_bug_on_msg(tl_count > ocfs2_truncate_recs_per_inode(osb->sb) ||
tl_count == 0,
@@ -5536,13 +5528,13 @@ int __ocfs2_flush_truncate_log(struct ocfs2_super *osb)
BUG_ON(mutex_trylock(&tl_inode->i_mutex));

di = (struct ocfs2_dinode *) tl_bh->b_data;
- tl = &di->id2.i_dealloc;
- if (!OCFS2_IS_VALID_DINODE(di)) {
- OCFS2_RO_ON_INVALID_DINODE(osb->sb, di);
- status = -EIO;
- goto out;
- }

+ /* tl_bh is loaded from ocfs2_truncate_log_init(). It's validated
+ * by the underlying call to ocfs2_read_inode_block(), so any
+ * corruption is a code bug */
+ BUG_ON(!OCFS2_IS_VALID_DINODE(di));
+
+ tl = &di->id2.i_dealloc;
num_to_flush = le16_to_cpu(tl->tl_used);
mlog(0, "Flush %u records from truncate log #%llu\n",
num_to_flush, (unsigned long long)OCFS2_I(tl_inode)->ip_blkno);
@@ -5697,13 +5689,13 @@ int ocfs2_begin_truncate_log_recovery(struct ocfs2_super *osb,
}

di = (struct ocfs2_dinode *) tl_bh->b_data;
- tl = &di->id2.i_dealloc;
- if (!OCFS2_IS_VALID_DINODE(di)) {
- OCFS2_RO_ON_INVALID_DINODE(tl_inode->i_sb, di);
- status = -EIO;
- goto bail;
- }

+ /* tl_bh is loaded from ocfs2_get_truncate_log_info(). It's
+ * validated by the underlying call to ocfs2_read_inode_block(),
+ * so any corruption is a code bug */
+ BUG_ON(!OCFS2_IS_VALID_DINODE(di));
+
+ tl = &di->id2.i_dealloc;
if (le16_to_cpu(tl->tl_used)) {
mlog(0, "We'll have %u logs to recover\n",
le16_to_cpu(tl->tl_used));
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index 877aaa0..9223bfc 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -587,17 +587,11 @@ static int ocfs2_journal_toggle_dirty(struct ocfs2_super *osb,
mlog_entry_void();

fe = (struct ocfs2_dinode *)bh->b_data;
- if (!OCFS2_IS_VALID_DINODE(fe)) {
- /* This is called from startup/shutdown which will
- * handle the errors in a specific manner, so no need
- * to call ocfs2_error() here. */
- mlog(ML_ERROR, "Journal dinode %llu has invalid "
- "signature: %.*s",
- (unsigned long long)le64_to_cpu(fe->i_blkno), 7,
- fe->i_signature);
- status = -EIO;
- goto out;
- }
+
+ /* The journal bh on the osb always comes from ocfs2_journal_init()
+ * and was validated there inside ocfs2_inode_lock_full(). It's a
+ * code bug if we mess it up. */
+ BUG_ON(!OCFS2_IS_VALID_DINODE(fe));

flags = le32_to_cpu(fe->id1.journal1.ij_flags);
if (dirty)
@@ -613,7 +607,6 @@ static int ocfs2_journal_toggle_dirty(struct ocfs2_super *osb,
if (status < 0)
mlog_errno(status);

-out:
mlog_exit(status);
return status;
}
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index 25d07ff..467bdb6 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -444,14 +444,6 @@ static inline int ocfs2_uses_extended_slot_map(struct ocfs2_super *osb)
#define OCFS2_IS_VALID_DINODE(ptr) \
(!strcmp((ptr)->i_signature, OCFS2_INODE_SIGNATURE))

-#define OCFS2_RO_ON_INVALID_DINODE(__sb, __di) do { \
- typeof(__di) ____di = (__di); \
- ocfs2_error((__sb), \
- "Dinode # %llu has bad signature %.*s", \
- (unsigned long long)le64_to_cpu((____di)->i_blkno), 7, \
- (____di)->i_signature); \
-} while (0)
-
#define OCFS2_IS_VALID_EXTENT_BLOCK(ptr) \
(!strcmp((ptr)->h_signature, OCFS2_EXTENT_BLOCK_SIGNATURE))

diff --git a/fs/ocfs2/resize.c b/fs/ocfs2/resize.c
index ffd48db..739d452 100644
--- a/fs/ocfs2/resize.c
+++ b/fs/ocfs2/resize.c
@@ -314,6 +314,10 @@ int ocfs2_group_extend(struct inode * inode, int new_clusters)

fe = (struct ocfs2_dinode *)main_bm_bh->b_data;

+ /* main_bm_bh is validated by inode read inside ocfs2_inode_lock(),
+ * so any corruption is a code bug. */
+ BUG_ON(!OCFS2_IS_VALID_DINODE(fe));
+
if (le16_to_cpu(fe->id2.i_chain.cl_cpg) !=
ocfs2_group_bitmap_size(osb->sb) * 8) {
mlog(ML_ERROR, "The disk is too old and small. "
@@ -322,12 +326,6 @@ int ocfs2_group_extend(struct inode * inode, int new_clusters)
goto out_unlock;
}

- if (!OCFS2_IS_VALID_DINODE(fe)) {
- OCFS2_RO_ON_INVALID_DINODE(main_bm_inode->i_sb, fe);
- ret = -EIO;
- goto out_unlock;
- }
-
first_new_cluster = le32_to_cpu(fe->i_clusters);
lgd_blkno = ocfs2_which_cluster_group(main_bm_inode,
first_new_cluster - 1);
diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index c5ff18b..95d432b 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -441,11 +441,11 @@ static int ocfs2_reserve_suballoc_bits(struct ocfs2_super *osb,
ac->ac_alloc_slot = slot;

fe = (struct ocfs2_dinode *) bh->b_data;
- if (!OCFS2_IS_VALID_DINODE(fe)) {
- OCFS2_RO_ON_INVALID_DINODE(alloc_inode->i_sb, fe);
- status = -EIO;
- goto bail;
- }
+
+ /* The bh was validated by the inode read inside
+ * ocfs2_inode_lock(). Any corruption is a code bug. */
+ BUG_ON(!OCFS2_IS_VALID_DINODE(fe));
+
if (!(fe->i_flags & cpu_to_le32(OCFS2_CHAIN_FL))) {
ocfs2_error(alloc_inode->i_sb, "Invalid chain allocator %llu",
(unsigned long long)le64_to_cpu(fe->i_blkno));
@@ -931,11 +931,6 @@ static int ocfs2_relink_block_group(handle_t *handle,
struct ocfs2_group_desc *bg = (struct ocfs2_group_desc *) bg_bh->b_data;
struct ocfs2_group_desc *prev_bg = (struct ocfs2_group_desc *) prev_bg_bh->b_data;

- if (!OCFS2_IS_VALID_DINODE(fe)) {
- OCFS2_RO_ON_INVALID_DINODE(alloc_inode->i_sb, fe);
- status = -EIO;
- goto out;
- }
if (!OCFS2_IS_VALID_GROUP_DESC(bg)) {
OCFS2_RO_ON_INVALID_GROUP_DESC(alloc_inode->i_sb, bg);
status = -EIO;
@@ -1392,11 +1387,11 @@ static int ocfs2_claim_suballoc_bits(struct ocfs2_super *osb,
BUG_ON(!ac->ac_bh);

fe = (struct ocfs2_dinode *) ac->ac_bh->b_data;
- if (!OCFS2_IS_VALID_DINODE(fe)) {
- OCFS2_RO_ON_INVALID_DINODE(osb->sb, fe);
- status = -EIO;
- goto bail;
- }
+
+ /* The bh was validated by the inode read during
+ * ocfs2_reserve_suballoc_bits(). Any corruption is a code bug. */
+ BUG_ON(!OCFS2_IS_VALID_DINODE(fe));
+
if (le32_to_cpu(fe->id1.bitmap1.i_used) >=
le32_to_cpu(fe->id1.bitmap1.i_total)) {
ocfs2_error(osb->sb, "Chain allocator dinode %llu has %u used "
@@ -1782,11 +1777,12 @@ int ocfs2_free_suballoc_bits(handle_t *handle,

mlog_entry_void();

- if (!OCFS2_IS_VALID_DINODE(fe)) {
- OCFS2_RO_ON_INVALID_DINODE(alloc_inode->i_sb, fe);
- status = -EIO;
- goto bail;
- }
+ /* The alloc_bh comes from ocfs2_free_dinode() or
+ * ocfs2_free_clusters(). The callers have all locked the
+ * allocator and gotten alloc_bh from the lock call. This
+ * validates the dinode buffer. Any corruption that has happended
+ * is a code bug. */
+ BUG_ON(!OCFS2_IS_VALID_DINODE(fe));
BUG_ON((count + start_bit) > ocfs2_bits_per_group(cl));

mlog(0, "%llu: freeing %u bits from group %llu, starting at %u\n",
--
1.5.6

2008-12-19 22:06:04

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 34/45] ocfs2: Consolidate validation of group descriptors.

From: Joel Becker <[email protected]>

Currently the validation of group descriptors is directly duplicated so
that one version can error the filesystem and the other (resize) can
just report the problem. Consolidate to one function that takes a
boolean. Wrap that function with the old call for the old users.

This is in preparation for lifting the read+validate step into a
single function.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/resize.c | 40 +++++----------------------
fs/ocfs2/suballoc.c | 74 +++++++++++++++++++++++++++++---------------------
fs/ocfs2/suballoc.h | 20 +++++++++++--
3 files changed, 68 insertions(+), 66 deletions(-)

diff --git a/fs/ocfs2/resize.c b/fs/ocfs2/resize.c
index 739d452..a2de32a 100644
--- a/fs/ocfs2/resize.c
+++ b/fs/ocfs2/resize.c
@@ -396,41 +396,16 @@ static int ocfs2_check_new_group(struct inode *inode,
struct buffer_head *group_bh)
{
int ret;
- struct ocfs2_group_desc *gd;
+ struct ocfs2_group_desc *gd =
+ (struct ocfs2_group_desc *)group_bh->b_data;
u16 cl_bpc = le16_to_cpu(di->id2.i_chain.cl_bpc);
- unsigned int max_bits = le16_to_cpu(di->id2.i_chain.cl_cpg) *
- le16_to_cpu(di->id2.i_chain.cl_bpc);

+ ret = ocfs2_validate_group_descriptor(inode->i_sb, di, gd, 1);
+ if (ret)
+ goto out;

- gd = (struct ocfs2_group_desc *)group_bh->b_data;
-
- ret = -EIO;
- if (!OCFS2_IS_VALID_GROUP_DESC(gd))
- mlog(ML_ERROR, "Group descriptor # %llu isn't valid.\n",
- (unsigned long long)le64_to_cpu(gd->bg_blkno));
- else if (di->i_blkno != gd->bg_parent_dinode)
- mlog(ML_ERROR, "Group descriptor # %llu has bad parent "
- "pointer (%llu, expected %llu)\n",
- (unsigned long long)le64_to_cpu(gd->bg_blkno),
- (unsigned long long)le64_to_cpu(gd->bg_parent_dinode),
- (unsigned long long)le64_to_cpu(di->i_blkno));
- else if (le16_to_cpu(gd->bg_bits) > max_bits)
- mlog(ML_ERROR, "Group descriptor # %llu has bit count of %u\n",
- (unsigned long long)le64_to_cpu(gd->bg_blkno),
- le16_to_cpu(gd->bg_bits));
- else if (le16_to_cpu(gd->bg_free_bits_count) > le16_to_cpu(gd->bg_bits))
- mlog(ML_ERROR, "Group descriptor # %llu has bit count %u but "
- "claims that %u are free\n",
- (unsigned long long)le64_to_cpu(gd->bg_blkno),
- le16_to_cpu(gd->bg_bits),
- le16_to_cpu(gd->bg_free_bits_count));
- else if (le16_to_cpu(gd->bg_bits) > (8 * le16_to_cpu(gd->bg_size)))
- mlog(ML_ERROR, "Group descriptor # %llu has bit count %u but "
- "max bitmap bits of %u\n",
- (unsigned long long)le64_to_cpu(gd->bg_blkno),
- le16_to_cpu(gd->bg_bits),
- 8 * le16_to_cpu(gd->bg_size));
- else if (le16_to_cpu(gd->bg_chain) != input->chain)
+ ret = -EINVAL;
+ if (le16_to_cpu(gd->bg_chain) != input->chain)
mlog(ML_ERROR, "Group descriptor # %llu has bad chain %u "
"while input has %u set.\n",
(unsigned long long)le64_to_cpu(gd->bg_blkno),
@@ -449,6 +424,7 @@ static int ocfs2_check_new_group(struct inode *inode,
else
ret = 0;

+out:
return ret;
}

diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index 95d432b..ddba97d 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -146,59 +146,71 @@ static u32 ocfs2_bits_per_group(struct ocfs2_chain_list *cl)
}

/* somewhat more expensive than our other checks, so use sparingly. */
-int ocfs2_check_group_descriptor(struct super_block *sb,
- struct ocfs2_dinode *di,
- struct ocfs2_group_desc *gd)
+int ocfs2_validate_group_descriptor(struct super_block *sb,
+ struct ocfs2_dinode *di,
+ struct ocfs2_group_desc *gd,
+ int clean_error)
{
unsigned int max_bits;

+#define do_error(fmt, ...) \
+ do{ \
+ if (clean_error) \
+ mlog(ML_ERROR, fmt "\n", ##__VA_ARGS__); \
+ else \
+ ocfs2_error(sb, fmt, ##__VA_ARGS__); \
+ } while (0)
+
if (!OCFS2_IS_VALID_GROUP_DESC(gd)) {
- OCFS2_RO_ON_INVALID_GROUP_DESC(sb, gd);
- return -EIO;
+ do_error("Group Descriptor #%llu has bad signature %.*s",
+ (unsigned long long)le64_to_cpu(gd->bg_blkno), 7,
+ gd->bg_signature);
+ return -EINVAL;
}

if (di->i_blkno != gd->bg_parent_dinode) {
- ocfs2_error(sb, "Group descriptor # %llu has bad parent "
- "pointer (%llu, expected %llu)",
- (unsigned long long)le64_to_cpu(gd->bg_blkno),
- (unsigned long long)le64_to_cpu(gd->bg_parent_dinode),
- (unsigned long long)le64_to_cpu(di->i_blkno));
- return -EIO;
+ do_error("Group descriptor # %llu has bad parent "
+ "pointer (%llu, expected %llu)",
+ (unsigned long long)le64_to_cpu(gd->bg_blkno),
+ (unsigned long long)le64_to_cpu(gd->bg_parent_dinode),
+ (unsigned long long)le64_to_cpu(di->i_blkno));
+ return -EINVAL;
}

max_bits = le16_to_cpu(di->id2.i_chain.cl_cpg) * le16_to_cpu(di->id2.i_chain.cl_bpc);
if (le16_to_cpu(gd->bg_bits) > max_bits) {
- ocfs2_error(sb, "Group descriptor # %llu has bit count of %u",
- (unsigned long long)le64_to_cpu(gd->bg_blkno),
- le16_to_cpu(gd->bg_bits));
- return -EIO;
+ do_error("Group descriptor # %llu has bit count of %u",
+ (unsigned long long)le64_to_cpu(gd->bg_blkno),
+ le16_to_cpu(gd->bg_bits));
+ return -EINVAL;
}

if (le16_to_cpu(gd->bg_chain) >=
le16_to_cpu(di->id2.i_chain.cl_next_free_rec)) {
- ocfs2_error(sb, "Group descriptor # %llu has bad chain %u",
- (unsigned long long)le64_to_cpu(gd->bg_blkno),
- le16_to_cpu(gd->bg_chain));
- return -EIO;
+ do_error("Group descriptor # %llu has bad chain %u",
+ (unsigned long long)le64_to_cpu(gd->bg_blkno),
+ le16_to_cpu(gd->bg_chain));
+ return -EINVAL;
}

if (le16_to_cpu(gd->bg_free_bits_count) > le16_to_cpu(gd->bg_bits)) {
- ocfs2_error(sb, "Group descriptor # %llu has bit count %u but "
- "claims that %u are free",
- (unsigned long long)le64_to_cpu(gd->bg_blkno),
- le16_to_cpu(gd->bg_bits),
- le16_to_cpu(gd->bg_free_bits_count));
- return -EIO;
+ do_error("Group descriptor # %llu has bit count %u but "
+ "claims that %u are free",
+ (unsigned long long)le64_to_cpu(gd->bg_blkno),
+ le16_to_cpu(gd->bg_bits),
+ le16_to_cpu(gd->bg_free_bits_count));
+ return -EINVAL;
}

if (le16_to_cpu(gd->bg_bits) > (8 * le16_to_cpu(gd->bg_size))) {
- ocfs2_error(sb, "Group descriptor # %llu has bit count %u but "
- "max bitmap bits of %u",
- (unsigned long long)le64_to_cpu(gd->bg_blkno),
- le16_to_cpu(gd->bg_bits),
- 8 * le16_to_cpu(gd->bg_size));
- return -EIO;
+ do_error("Group descriptor # %llu has bit count %u but "
+ "max bitmap bits of %u",
+ (unsigned long long)le64_to_cpu(gd->bg_blkno),
+ le16_to_cpu(gd->bg_bits),
+ 8 * le16_to_cpu(gd->bg_size));
+ return -EINVAL;
}
+#undef do_error

return 0;
}
diff --git a/fs/ocfs2/suballoc.h b/fs/ocfs2/suballoc.h
index 4df159d..7adfcc4 100644
--- a/fs/ocfs2/suballoc.h
+++ b/fs/ocfs2/suballoc.h
@@ -165,9 +165,23 @@ void ocfs2_free_ac_resource(struct ocfs2_alloc_context *ac);
u64 ocfs2_which_cluster_group(struct inode *inode, u32 cluster);

/* somewhat more expensive than our other checks, so use sparingly. */
-int ocfs2_check_group_descriptor(struct super_block *sb,
- struct ocfs2_dinode *di,
- struct ocfs2_group_desc *gd);
+/*
+ * By default, ocfs2_validate_group_descriptor() calls ocfs2_error() when it
+ * finds a problem. A caller that wants to check a group descriptor
+ * without going readonly passes a nonzero clean_error. This is only
+ * resize, really.
+ */
+int ocfs2_validate_group_descriptor(struct super_block *sb,
+ struct ocfs2_dinode *di,
+ struct ocfs2_group_desc *gd,
+ int clean_error);
+static inline int ocfs2_check_group_descriptor(struct super_block *sb,
+ struct ocfs2_dinode *di,
+ struct ocfs2_group_desc *gd)
+{
+ return ocfs2_validate_group_descriptor(sb, di, gd, 0);
+}
+
int ocfs2_lock_allocators(struct inode *inode, struct ocfs2_extent_tree *et,
u32 clusters_to_add, u32 extents_to_split,
struct ocfs2_alloc_context **data_ac,
--
1.5.6

2008-12-19 22:06:33

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 35/45] ocfs2: Wrap group descriptor reads in a dedicated function.

From: Joel Becker <[email protected]>

We have a clean call for validating group descriptors, but every place
that wants the always does a read_block()+validate() call pair. Create
a toplevel ocfs2_read_group_descriptor() that does the right
thing. This allows us to leverage the single call point later for
fancier handling. We also add validation of gd->bg_generation against
the superblock and gd->bg_blkno against the block we thought we read.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/resize.c | 12 +----
fs/ocfs2/suballoc.c | 108 ++++++++++++++++++++++++++++++--------------------
fs/ocfs2/suballoc.h | 19 +++++----
3 files changed, 78 insertions(+), 61 deletions(-)

diff --git a/fs/ocfs2/resize.c b/fs/ocfs2/resize.c
index a2de32a..252baff 100644
--- a/fs/ocfs2/resize.c
+++ b/fs/ocfs2/resize.c
@@ -330,20 +330,14 @@ int ocfs2_group_extend(struct inode * inode, int new_clusters)
lgd_blkno = ocfs2_which_cluster_group(main_bm_inode,
first_new_cluster - 1);

- ret = ocfs2_read_block(main_bm_inode, lgd_blkno, &group_bh);
+ ret = ocfs2_read_group_descriptor(main_bm_inode, fe, lgd_blkno,
+ &group_bh);
if (ret < 0) {
mlog_errno(ret);
goto out_unlock;
}
-
group = (struct ocfs2_group_desc *)group_bh->b_data;

- ret = ocfs2_check_group_descriptor(inode->i_sb, fe, group);
- if (ret) {
- mlog_errno(ret);
- goto out_unlock;
- }
-
cl_bpc = le16_to_cpu(fe->id2.i_chain.cl_bpc);
if (le16_to_cpu(group->bg_bits) / cl_bpc + new_clusters >
le16_to_cpu(fe->id2.i_chain.cl_cpg)) {
@@ -400,7 +394,7 @@ static int ocfs2_check_new_group(struct inode *inode,
(struct ocfs2_group_desc *)group_bh->b_data;
u16 cl_bpc = le16_to_cpu(di->id2.i_chain.cl_bpc);

- ret = ocfs2_validate_group_descriptor(inode->i_sb, di, gd, 1);
+ ret = ocfs2_validate_group_descriptor(inode->i_sb, di, group_bh, 1);
if (ret)
goto out;

diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index ddba97d..797f509 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -145,13 +145,13 @@ static u32 ocfs2_bits_per_group(struct ocfs2_chain_list *cl)
return (u32)le16_to_cpu(cl->cl_cpg) * (u32)le16_to_cpu(cl->cl_bpc);
}

-/* somewhat more expensive than our other checks, so use sparingly. */
int ocfs2_validate_group_descriptor(struct super_block *sb,
struct ocfs2_dinode *di,
- struct ocfs2_group_desc *gd,
+ struct buffer_head *bh,
int clean_error)
{
unsigned int max_bits;
+ struct ocfs2_group_desc *gd = (struct ocfs2_group_desc *)bh->b_data;

#define do_error(fmt, ...) \
do{ \
@@ -162,16 +162,32 @@ int ocfs2_validate_group_descriptor(struct super_block *sb,
} while (0)

if (!OCFS2_IS_VALID_GROUP_DESC(gd)) {
- do_error("Group Descriptor #%llu has bad signature %.*s",
- (unsigned long long)le64_to_cpu(gd->bg_blkno), 7,
+ do_error("Group descriptor #%llu has bad signature %.*s",
+ (unsigned long long)bh->b_blocknr, 7,
gd->bg_signature);
return -EINVAL;
}

+ if (le64_to_cpu(gd->bg_blkno) != bh->b_blocknr) {
+ do_error("Group descriptor #%llu has an invalid bg_blkno "
+ "of %llu",
+ (unsigned long long)bh->b_blocknr,
+ (unsigned long long)le64_to_cpu(gd->bg_blkno));
+ return -EINVAL;
+ }
+
+ if (le32_to_cpu(gd->bg_generation) != OCFS2_SB(sb)->fs_generation) {
+ do_error("Group descriptor #%llu has an invalid "
+ "fs_generation of #%u",
+ (unsigned long long)bh->b_blocknr,
+ le32_to_cpu(gd->bg_generation));
+ return -EINVAL;
+ }
+
if (di->i_blkno != gd->bg_parent_dinode) {
- do_error("Group descriptor # %llu has bad parent "
+ do_error("Group descriptor #%llu has bad parent "
"pointer (%llu, expected %llu)",
- (unsigned long long)le64_to_cpu(gd->bg_blkno),
+ (unsigned long long)bh->b_blocknr,
(unsigned long long)le64_to_cpu(gd->bg_parent_dinode),
(unsigned long long)le64_to_cpu(di->i_blkno));
return -EINVAL;
@@ -179,33 +195,33 @@ int ocfs2_validate_group_descriptor(struct super_block *sb,

max_bits = le16_to_cpu(di->id2.i_chain.cl_cpg) * le16_to_cpu(di->id2.i_chain.cl_bpc);
if (le16_to_cpu(gd->bg_bits) > max_bits) {
- do_error("Group descriptor # %llu has bit count of %u",
- (unsigned long long)le64_to_cpu(gd->bg_blkno),
+ do_error("Group descriptor #%llu has bit count of %u",
+ (unsigned long long)bh->b_blocknr,
le16_to_cpu(gd->bg_bits));
return -EINVAL;
}

if (le16_to_cpu(gd->bg_chain) >=
le16_to_cpu(di->id2.i_chain.cl_next_free_rec)) {
- do_error("Group descriptor # %llu has bad chain %u",
- (unsigned long long)le64_to_cpu(gd->bg_blkno),
+ do_error("Group descriptor #%llu has bad chain %u",
+ (unsigned long long)bh->b_blocknr,
le16_to_cpu(gd->bg_chain));
return -EINVAL;
}

if (le16_to_cpu(gd->bg_free_bits_count) > le16_to_cpu(gd->bg_bits)) {
- do_error("Group descriptor # %llu has bit count %u but "
+ do_error("Group descriptor #%llu has bit count %u but "
"claims that %u are free",
- (unsigned long long)le64_to_cpu(gd->bg_blkno),
+ (unsigned long long)bh->b_blocknr,
le16_to_cpu(gd->bg_bits),
le16_to_cpu(gd->bg_free_bits_count));
return -EINVAL;
}

if (le16_to_cpu(gd->bg_bits) > (8 * le16_to_cpu(gd->bg_size))) {
- do_error("Group descriptor # %llu has bit count %u but "
+ do_error("Group descriptor #%llu has bit count %u but "
"max bitmap bits of %u",
- (unsigned long long)le64_to_cpu(gd->bg_blkno),
+ (unsigned long long)bh->b_blocknr,
le16_to_cpu(gd->bg_bits),
8 * le16_to_cpu(gd->bg_size));
return -EINVAL;
@@ -215,6 +231,30 @@ int ocfs2_validate_group_descriptor(struct super_block *sb,
return 0;
}

+int ocfs2_read_group_descriptor(struct inode *inode, struct ocfs2_dinode *di,
+ u64 gd_blkno, struct buffer_head **bh)
+{
+ int rc;
+ struct buffer_head *tmp = *bh;
+
+ rc = ocfs2_read_block(inode, gd_blkno, &tmp);
+ if (rc)
+ goto out;
+
+ rc = ocfs2_validate_group_descriptor(inode->i_sb, di, tmp, 0);
+ if (rc) {
+ brelse(tmp);
+ goto out;
+ }
+
+ /* If ocfs2_read_block() got us a new bh, pass it up. */
+ if (!*bh)
+ *bh = tmp;
+
+out:
+ return rc;
+}
+
static int ocfs2_block_group_fill(handle_t *handle,
struct inode *alloc_inode,
struct buffer_head *bg_bh,
@@ -1177,21 +1217,17 @@ static int ocfs2_search_one_group(struct ocfs2_alloc_context *ac,
u16 found;
struct buffer_head *group_bh = NULL;
struct ocfs2_group_desc *gd;
+ struct ocfs2_dinode *di = (struct ocfs2_dinode *)ac->ac_bh->b_data;
struct inode *alloc_inode = ac->ac_inode;

- ret = ocfs2_read_block(alloc_inode, gd_blkno, &group_bh);
+ ret = ocfs2_read_group_descriptor(alloc_inode, di, gd_blkno,
+ &group_bh);
if (ret < 0) {
mlog_errno(ret);
return ret;
}

gd = (struct ocfs2_group_desc *) group_bh->b_data;
- if (!OCFS2_IS_VALID_GROUP_DESC(gd)) {
- OCFS2_RO_ON_INVALID_GROUP_DESC(alloc_inode->i_sb, gd);
- ret = -EIO;
- goto out;
- }
-
ret = ac->ac_group_search(alloc_inode, group_bh, bits_wanted, min_bits,
ac->ac_max_block, bit_off, &found);
if (ret < 0) {
@@ -1248,19 +1284,14 @@ static int ocfs2_search_chain(struct ocfs2_alloc_context *ac,
bits_wanted, chain,
(unsigned long long)OCFS2_I(alloc_inode)->ip_blkno);

- status = ocfs2_read_block(alloc_inode,
- le64_to_cpu(cl->cl_recs[chain].c_blkno),
- &group_bh);
+ status = ocfs2_read_group_descriptor(alloc_inode, fe,
+ le64_to_cpu(cl->cl_recs[chain].c_blkno),
+ &group_bh);
if (status < 0) {
mlog_errno(status);
goto bail;
}
bg = (struct ocfs2_group_desc *) group_bh->b_data;
- status = ocfs2_check_group_descriptor(alloc_inode->i_sb, fe, bg);
- if (status) {
- mlog_errno(status);
- goto bail;
- }

status = -ENOSPC;
/* for now, the chain search is a bit simplistic. We just use
@@ -1278,18 +1309,13 @@ static int ocfs2_search_chain(struct ocfs2_alloc_context *ac,
next_group = le64_to_cpu(bg->bg_next_group);
prev_group_bh = group_bh;
group_bh = NULL;
- status = ocfs2_read_block(alloc_inode,
- next_group, &group_bh);
+ status = ocfs2_read_group_descriptor(alloc_inode, fe,
+ next_group, &group_bh);
if (status < 0) {
mlog_errno(status);
goto bail;
}
bg = (struct ocfs2_group_desc *) group_bh->b_data;
- status = ocfs2_check_group_descriptor(alloc_inode->i_sb, fe, bg);
- if (status) {
- mlog_errno(status);
- goto bail;
- }
}
if (status < 0) {
if (status != -ENOSPC)
@@ -1801,18 +1827,14 @@ int ocfs2_free_suballoc_bits(handle_t *handle,
(unsigned long long)OCFS2_I(alloc_inode)->ip_blkno, count,
(unsigned long long)bg_blkno, start_bit);

- status = ocfs2_read_block(alloc_inode, bg_blkno, &group_bh);
+ status = ocfs2_read_group_descriptor(alloc_inode, fe, bg_blkno,
+ &group_bh);
if (status < 0) {
mlog_errno(status);
goto bail;
}
-
group = (struct ocfs2_group_desc *) group_bh->b_data;
- status = ocfs2_check_group_descriptor(alloc_inode->i_sb, fe, group);
- if (status) {
- mlog_errno(status);
- goto bail;
- }
+
BUG_ON((count + start_bit) > le16_to_cpu(group->bg_bits));

status = ocfs2_block_group_clear_bits(handle, alloc_inode,
diff --git a/fs/ocfs2/suballoc.h b/fs/ocfs2/suballoc.h
index 7adfcc4..43de4fd 100644
--- a/fs/ocfs2/suballoc.h
+++ b/fs/ocfs2/suballoc.h
@@ -164,23 +164,24 @@ void ocfs2_free_ac_resource(struct ocfs2_alloc_context *ac);
* and return that block offset. */
u64 ocfs2_which_cluster_group(struct inode *inode, u32 cluster);

-/* somewhat more expensive than our other checks, so use sparingly. */
/*
* By default, ocfs2_validate_group_descriptor() calls ocfs2_error() when it
* finds a problem. A caller that wants to check a group descriptor
* without going readonly passes a nonzero clean_error. This is only
- * resize, really.
+ * resize, really. Everyone else should be using
+ * ocfs2_read_group_descriptor().
*/
int ocfs2_validate_group_descriptor(struct super_block *sb,
struct ocfs2_dinode *di,
- struct ocfs2_group_desc *gd,
+ struct buffer_head *bh,
int clean_error);
-static inline int ocfs2_check_group_descriptor(struct super_block *sb,
- struct ocfs2_dinode *di,
- struct ocfs2_group_desc *gd)
-{
- return ocfs2_validate_group_descriptor(sb, di, gd, 0);
-}
+/*
+ * Read a group descriptor block into *bh. If *bh is NULL, a bh will be
+ * allocated. This is a cached read. The descriptor will be validated with
+ * ocfs2_validate_group_descriptor().
+ */
+int ocfs2_read_group_descriptor(struct inode *inode, struct ocfs2_dinode *di,
+ u64 gd_blkno, struct buffer_head **bh);

int ocfs2_lock_allocators(struct inode *inode, struct ocfs2_extent_tree *et,
u32 clusters_to_add, u32 extents_to_split,
--
1.5.6

2008-12-19 22:07:12

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 37/45] ocfs2: Wrap extent block reads in a dedicated function.

From: Joel Becker <[email protected]>

We weren't consistently checking extent blocks after we read them.
Most places checked the signature, but none checked h_blkno or
h_fs_signature. Create a toplevel ocfs2_read_extent_block() that does
the read and the validation.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/alloc.c | 151 ++++++++++++++++++++++++++++++++-----------------
fs/ocfs2/alloc.h | 8 +++
fs/ocfs2/extent_map.c | 23 ++------
fs/ocfs2/ocfs2.h | 8 ---
4 files changed, 111 insertions(+), 79 deletions(-)

diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 320545b..f430cc6 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -678,6 +678,66 @@ struct ocfs2_merge_ctxt {
int c_split_covers_rec;
};

+static int ocfs2_validate_extent_block(struct super_block *sb,
+ struct buffer_head *bh)
+{
+ struct ocfs2_extent_block *eb =
+ (struct ocfs2_extent_block *)bh->b_data;
+
+ if (!OCFS2_IS_VALID_EXTENT_BLOCK(eb)) {
+ ocfs2_error(sb,
+ "Extent block #%llu has bad signature %.*s",
+ (unsigned long long)bh->b_blocknr, 7,
+ eb->h_signature);
+ return -EINVAL;
+ }
+
+ if (le64_to_cpu(eb->h_blkno) != bh->b_blocknr) {
+ ocfs2_error(sb,
+ "Extent block #%llu has an invalid h_blkno "
+ "of %llu",
+ (unsigned long long)bh->b_blocknr,
+ (unsigned long long)le64_to_cpu(eb->h_blkno));
+ return -EINVAL;
+ }
+
+ if (le32_to_cpu(eb->h_fs_generation) != OCFS2_SB(sb)->fs_generation) {
+ ocfs2_error(sb,
+ "Extent block #%llu has an invalid "
+ "h_fs_generation of #%u",
+ (unsigned long long)bh->b_blocknr,
+ le32_to_cpu(eb->h_fs_generation));
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+int ocfs2_read_extent_block(struct inode *inode, u64 eb_blkno,
+ struct buffer_head **bh)
+{
+ int rc;
+ struct buffer_head *tmp = *bh;
+
+ rc = ocfs2_read_block(inode, eb_blkno, &tmp);
+ if (rc)
+ goto out;
+
+ rc = ocfs2_validate_extent_block(inode->i_sb, tmp);
+ if (rc) {
+ brelse(tmp);
+ goto out;
+ }
+
+ /* If ocfs2_read_block() got us a new bh, pass it up. */
+ if (!*bh)
+ *bh = tmp;
+
+out:
+ return rc;
+}
+
+
/*
* How many free extents have we got before we need more meta data?
*/
@@ -697,8 +757,7 @@ int ocfs2_num_free_extents(struct ocfs2_super *osb,
last_eb_blk = ocfs2_et_get_last_eb_blk(et);

if (last_eb_blk) {
- retval = ocfs2_read_block(inode, last_eb_blk,
- &eb_bh);
+ retval = ocfs2_read_extent_block(inode, last_eb_blk, &eb_bh);
if (retval < 0) {
mlog_errno(retval);
goto bail;
@@ -900,11 +959,8 @@ static int ocfs2_add_branch(struct ocfs2_super *osb,
for(i = 0; i < new_blocks; i++) {
bh = new_eb_bhs[i];
eb = (struct ocfs2_extent_block *) bh->b_data;
- if (!OCFS2_IS_VALID_EXTENT_BLOCK(eb)) {
- OCFS2_RO_ON_INVALID_EXTENT_BLOCK(inode->i_sb, eb);
- status = -EIO;
- goto bail;
- }
+ /* ocfs2_create_new_meta_bhs() should create it right! */
+ BUG_ON(!OCFS2_IS_VALID_EXTENT_BLOCK(eb));
eb_el = &eb->h_list;

status = ocfs2_journal_access(handle, inode, bh,
@@ -1044,11 +1100,8 @@ static int ocfs2_shift_tree_depth(struct ocfs2_super *osb,
}

eb = (struct ocfs2_extent_block *) new_eb_bh->b_data;
- if (!OCFS2_IS_VALID_EXTENT_BLOCK(eb)) {
- OCFS2_RO_ON_INVALID_EXTENT_BLOCK(inode->i_sb, eb);
- status = -EIO;
- goto bail;
- }
+ /* ocfs2_create_new_meta_bhs() should create it right! */
+ BUG_ON(!OCFS2_IS_VALID_EXTENT_BLOCK(eb));

eb_el = &eb->h_list;
root_el = et->et_root_el;
@@ -1168,18 +1221,13 @@ static int ocfs2_find_branch_target(struct ocfs2_super *osb,
brelse(bh);
bh = NULL;

- status = ocfs2_read_block(inode, blkno, &bh);
+ status = ocfs2_read_extent_block(inode, blkno, &bh);
if (status < 0) {
mlog_errno(status);
goto bail;
}

eb = (struct ocfs2_extent_block *) bh->b_data;
- if (!OCFS2_IS_VALID_EXTENT_BLOCK(eb)) {
- OCFS2_RO_ON_INVALID_EXTENT_BLOCK(inode->i_sb, eb);
- status = -EIO;
- goto bail;
- }
el = &eb->h_list;

if (le16_to_cpu(el->l_next_free_rec) <
@@ -1532,7 +1580,7 @@ static int __ocfs2_find_path(struct inode *inode,

brelse(bh);
bh = NULL;
- ret = ocfs2_read_block(inode, blkno, &bh);
+ ret = ocfs2_read_extent_block(inode, blkno, &bh);
if (ret) {
mlog_errno(ret);
goto out;
@@ -1540,11 +1588,6 @@ static int __ocfs2_find_path(struct inode *inode,

eb = (struct ocfs2_extent_block *) bh->b_data;
el = &eb->h_list;
- if (!OCFS2_IS_VALID_EXTENT_BLOCK(eb)) {
- OCFS2_RO_ON_INVALID_EXTENT_BLOCK(inode->i_sb, eb);
- ret = -EIO;
- goto out;
- }

if (le16_to_cpu(el->l_next_free_rec) >
le16_to_cpu(el->l_count)) {
@@ -4089,8 +4132,15 @@ ocfs2_figure_merge_contig_type(struct inode *inode, struct ocfs2_path *path,
le16_to_cpu(new_el->l_count)) {
bh = path_leaf_bh(left_path);
eb = (struct ocfs2_extent_block *)bh->b_data;
- OCFS2_RO_ON_INVALID_EXTENT_BLOCK(inode->i_sb,
- eb);
+ ocfs2_error(inode->i_sb,
+ "Extent block #%llu has an "
+ "invalid l_next_free_rec of "
+ "%d. It should have "
+ "matched the l_count of %d",
+ (unsigned long long)le64_to_cpu(eb->h_blkno),
+ le16_to_cpu(new_el->l_next_free_rec),
+ le16_to_cpu(new_el->l_count));
+ status = -EINVAL;
goto out;
}
rec = &new_el->l_recs[
@@ -4139,8 +4189,12 @@ ocfs2_figure_merge_contig_type(struct inode *inode, struct ocfs2_path *path,
if (le16_to_cpu(new_el->l_next_free_rec) <= 1) {
bh = path_leaf_bh(right_path);
eb = (struct ocfs2_extent_block *)bh->b_data;
- OCFS2_RO_ON_INVALID_EXTENT_BLOCK(inode->i_sb,
- eb);
+ ocfs2_error(inode->i_sb,
+ "Extent block #%llu has an "
+ "invalid l_next_free_rec of %d",
+ (unsigned long long)le64_to_cpu(eb->h_blkno),
+ le16_to_cpu(new_el->l_next_free_rec));
+ status = -EINVAL;
goto out;
}
rec = &new_el->l_recs[1];
@@ -4286,7 +4340,9 @@ static int ocfs2_figure_insert_type(struct inode *inode,
* ocfs2_figure_insert_type() and ocfs2_add_branch()
* may want it later.
*/
- ret = ocfs2_read_block(inode, ocfs2_et_get_last_eb_blk(et), &bh);
+ ret = ocfs2_read_extent_block(inode,
+ ocfs2_et_get_last_eb_blk(et),
+ &bh);
if (ret) {
mlog_exit(ret);
goto out;
@@ -4752,20 +4808,15 @@ static int __ocfs2_mark_extent_written(struct inode *inode,
if (path->p_tree_depth) {
struct ocfs2_extent_block *eb;

- ret = ocfs2_read_block(inode, ocfs2_et_get_last_eb_blk(et),
- &last_eb_bh);
+ ret = ocfs2_read_extent_block(inode,
+ ocfs2_et_get_last_eb_blk(et),
+ &last_eb_bh);
if (ret) {
mlog_exit(ret);
goto out;
}

eb = (struct ocfs2_extent_block *) last_eb_bh->b_data;
- if (!OCFS2_IS_VALID_EXTENT_BLOCK(eb)) {
- OCFS2_RO_ON_INVALID_EXTENT_BLOCK(inode->i_sb, eb);
- ret = -EROFS;
- goto out;
- }
-
rightmost_el = &eb->h_list;
} else
rightmost_el = path_root_el(path);
@@ -4910,8 +4961,9 @@ static int ocfs2_split_tree(struct inode *inode, struct ocfs2_extent_tree *et,

depth = path->p_tree_depth;
if (depth > 0) {
- ret = ocfs2_read_block(inode, ocfs2_et_get_last_eb_blk(et),
- &last_eb_bh);
+ ret = ocfs2_read_extent_block(inode,
+ ocfs2_et_get_last_eb_blk(et),
+ &last_eb_bh);
if (ret < 0) {
mlog_errno(ret);
goto out;
@@ -6231,11 +6283,10 @@ static int ocfs2_find_new_last_ext_blk(struct inode *inode,

eb = (struct ocfs2_extent_block *) bh->b_data;
el = &eb->h_list;
- if (!OCFS2_IS_VALID_EXTENT_BLOCK(eb)) {
- OCFS2_RO_ON_INVALID_EXTENT_BLOCK(inode->i_sb, eb);
- ret = -EROFS;
- goto out;
- }
+
+ /* ocfs2_find_leaf() gets the eb from ocfs2_read_extent_block().
+ * Any corruption is a code bug. */
+ BUG_ON(!OCFS2_IS_VALID_EXTENT_BLOCK(eb));

*new_last_eb = bh;
get_bh(*new_last_eb);
@@ -7140,20 +7191,14 @@ int ocfs2_prepare_truncate(struct ocfs2_super *osb,
ocfs2_init_dealloc_ctxt(&(*tc)->tc_dealloc);

if (fe->id2.i_list.l_tree_depth) {
- status = ocfs2_read_block(inode, le64_to_cpu(fe->i_last_eb_blk),
- &last_eb_bh);
+ status = ocfs2_read_extent_block(inode,
+ le64_to_cpu(fe->i_last_eb_blk),
+ &last_eb_bh);
if (status < 0) {
mlog_errno(status);
goto bail;
}
eb = (struct ocfs2_extent_block *) last_eb_bh->b_data;
- if (!OCFS2_IS_VALID_EXTENT_BLOCK(eb)) {
- OCFS2_RO_ON_INVALID_EXTENT_BLOCK(inode->i_sb, eb);
-
- brelse(last_eb_bh);
- status = -EIO;
- goto bail;
- }
}

(*tc)->tc_last_eb_bh = last_eb_bh;
diff --git a/fs/ocfs2/alloc.h b/fs/ocfs2/alloc.h
index 0fbf8fc..59d37d1 100644
--- a/fs/ocfs2/alloc.h
+++ b/fs/ocfs2/alloc.h
@@ -73,6 +73,14 @@ void ocfs2_init_xattr_value_extent_tree(struct ocfs2_extent_tree *et,
struct buffer_head *bh,
struct ocfs2_xattr_value_root *xv);

+/*
+ * Read an extent block into *bh. If *bh is NULL, a bh will be
+ * allocated. This is a cached read. The extent block will be validated
+ * with ocfs2_validate_extent_block().
+ */
+int ocfs2_read_extent_block(struct inode *inode, u64 eb_blkno,
+ struct buffer_head **bh);
+
struct ocfs2_alloc_context;
int ocfs2_insert_extent(struct ocfs2_super *osb,
handle_t *handle,
diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
index b686b31..0bd9d96 100644
--- a/fs/ocfs2/extent_map.c
+++ b/fs/ocfs2/extent_map.c
@@ -293,7 +293,7 @@ static int ocfs2_last_eb_is_empty(struct inode *inode,
struct ocfs2_extent_block *eb;
struct ocfs2_extent_list *el;

- ret = ocfs2_read_block(inode, last_eb_blk, &eb_bh);
+ ret = ocfs2_read_extent_block(inode, last_eb_blk, &eb_bh);
if (ret) {
mlog_errno(ret);
goto out;
@@ -302,12 +302,6 @@ static int ocfs2_last_eb_is_empty(struct inode *inode,
eb = (struct ocfs2_extent_block *) eb_bh->b_data;
el = &eb->h_list;

- if (!OCFS2_IS_VALID_EXTENT_BLOCK(eb)) {
- ret = -EROFS;
- OCFS2_RO_ON_INVALID_EXTENT_BLOCK(inode->i_sb, eb);
- goto out;
- }
-
if (el->l_tree_depth) {
ocfs2_error(inode->i_sb,
"Inode %lu has non zero tree depth in "
@@ -381,23 +375,16 @@ static int ocfs2_figure_hole_clusters(struct inode *inode,
if (le64_to_cpu(eb->h_next_leaf_blk) == 0ULL)
goto no_more_extents;

- ret = ocfs2_read_block(inode,
- le64_to_cpu(eb->h_next_leaf_blk),
- &next_eb_bh);
+ ret = ocfs2_read_extent_block(inode,
+ le64_to_cpu(eb->h_next_leaf_blk),
+ &next_eb_bh);
if (ret) {
mlog_errno(ret);
goto out;
}
- next_eb = (struct ocfs2_extent_block *)next_eb_bh->b_data;
-
- if (!OCFS2_IS_VALID_EXTENT_BLOCK(next_eb)) {
- ret = -EROFS;
- OCFS2_RO_ON_INVALID_EXTENT_BLOCK(inode->i_sb, next_eb);
- goto out;
- }

+ next_eb = (struct ocfs2_extent_block *)next_eb_bh->b_data;
el = &next_eb->h_list;
-
i = ocfs2_search_for_hole_index(el, v_cluster);
}

diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index 82ba887..f04b229 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -447,14 +447,6 @@ static inline int ocfs2_uses_extended_slot_map(struct ocfs2_super *osb)
#define OCFS2_IS_VALID_EXTENT_BLOCK(ptr) \
(!strcmp((ptr)->h_signature, OCFS2_EXTENT_BLOCK_SIGNATURE))

-#define OCFS2_RO_ON_INVALID_EXTENT_BLOCK(__sb, __eb) do { \
- typeof(__eb) ____eb = (__eb); \
- ocfs2_error((__sb), \
- "Extent Block # %llu has bad signature %.*s", \
- (unsigned long long)le64_to_cpu((____eb)->h_blkno), 7, \
- (____eb)->h_signature); \
-} while (0)
-
#define OCFS2_IS_VALID_GROUP_DESC(ptr) \
(!strcmp((ptr)->bg_signature, OCFS2_GROUP_DESC_SIGNATURE))

--
1.5.6

2008-12-19 22:06:48

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 36/45] ocfs2: Morph the haphazard OCFS2_IS_VALID_GROUP_DESC() checks.

From: Joel Becker <[email protected]>

Random places in the code would check a group descriptor bh to see if it
was valid. The previous commit unified descriptor block reads,
validating all block reads in the same place. Thus, these checks are no
longer necessary. Rather than eliminate them, however, we change them
to BUG_ON() checks. This ensures the assumptions remain true. All of
the code paths to these checks have been audited to ensure they come
from a validated descriptor read.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/ocfs2.h | 7 -------
fs/ocfs2/suballoc.c | 39 ++++++++++++++-------------------------
2 files changed, 14 insertions(+), 32 deletions(-)

diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index 467bdb6..82ba887 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -458,13 +458,6 @@ static inline int ocfs2_uses_extended_slot_map(struct ocfs2_super *osb)
#define OCFS2_IS_VALID_GROUP_DESC(ptr) \
(!strcmp((ptr)->bg_signature, OCFS2_GROUP_DESC_SIGNATURE))

-#define OCFS2_RO_ON_INVALID_GROUP_DESC(__sb, __gd) do { \
- typeof(__gd) ____gd = (__gd); \
- ocfs2_error((__sb), \
- "Group Descriptor # %llu has bad signature %.*s", \
- (unsigned long long)le64_to_cpu((____gd)->bg_blkno), 7, \
- (____gd)->bg_signature); \
-} while (0)

#define OCFS2_IS_VALID_XATTR_BLOCK(ptr) \
(!strcmp((ptr)->xb_signature, OCFS2_XATTR_BLOCK_SIGNATURE))
diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index 797f509..766a00b 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -842,10 +842,9 @@ static int ocfs2_block_group_find_clear_bits(struct ocfs2_super *osb,
int offset, start, found, status = 0;
struct ocfs2_group_desc *bg = (struct ocfs2_group_desc *) bg_bh->b_data;

- if (!OCFS2_IS_VALID_GROUP_DESC(bg)) {
- OCFS2_RO_ON_INVALID_GROUP_DESC(osb->sb, bg);
- return -EIO;
- }
+ /* Callers got this descriptor from
+ * ocfs2_read_group_descriptor(). Any corruption is a code bug. */
+ BUG_ON(!OCFS2_IS_VALID_GROUP_DESC(bg));

found = start = best_offset = best_size = 0;
bitmap = bg->bg_bitmap;
@@ -910,11 +909,9 @@ static inline int ocfs2_block_group_set_bits(handle_t *handle,

mlog_entry_void();

- if (!OCFS2_IS_VALID_GROUP_DESC(bg)) {
- OCFS2_RO_ON_INVALID_GROUP_DESC(alloc_inode->i_sb, bg);
- status = -EIO;
- goto bail;
- }
+ /* All callers get the descriptor via
+ * ocfs2_read_group_descriptor(). Any corruption is a code bug. */
+ BUG_ON(!OCFS2_IS_VALID_GROUP_DESC(bg));
BUG_ON(le16_to_cpu(bg->bg_free_bits_count) < num_bits);

mlog(0, "block_group_set_bits: off = %u, num = %u\n", bit_off,
@@ -983,16 +980,10 @@ static int ocfs2_relink_block_group(handle_t *handle,
struct ocfs2_group_desc *bg = (struct ocfs2_group_desc *) bg_bh->b_data;
struct ocfs2_group_desc *prev_bg = (struct ocfs2_group_desc *) prev_bg_bh->b_data;

- if (!OCFS2_IS_VALID_GROUP_DESC(bg)) {
- OCFS2_RO_ON_INVALID_GROUP_DESC(alloc_inode->i_sb, bg);
- status = -EIO;
- goto out;
- }
- if (!OCFS2_IS_VALID_GROUP_DESC(prev_bg)) {
- OCFS2_RO_ON_INVALID_GROUP_DESC(alloc_inode->i_sb, prev_bg);
- status = -EIO;
- goto out;
- }
+ /* The caller got these descriptors from
+ * ocfs2_read_group_descriptor(). Any corruption is a code bug. */
+ BUG_ON(!OCFS2_IS_VALID_GROUP_DESC(bg));
+ BUG_ON(!OCFS2_IS_VALID_GROUP_DESC(prev_bg));

mlog(0, "Suballoc %llu, chain %u, move group %llu to top, prev = %llu\n",
(unsigned long long)le64_to_cpu(fe->i_blkno), chain,
@@ -1055,7 +1046,7 @@ out_rollback:
bg->bg_next_group = cpu_to_le64(bg_ptr);
prev_bg->bg_next_group = cpu_to_le64(prev_bg_ptr);
}
-out:
+
mlog_exit(status);
return status;
}
@@ -1758,11 +1749,9 @@ static inline int ocfs2_block_group_clear_bits(handle_t *handle,

mlog_entry_void();

- if (!OCFS2_IS_VALID_GROUP_DESC(bg)) {
- OCFS2_RO_ON_INVALID_GROUP_DESC(alloc_inode->i_sb, bg);
- status = -EIO;
- goto bail;
- }
+ /* The caller got this descriptor from
+ * ocfs2_read_group_descriptor(). Any corruption is a code bug. */
+ BUG_ON(!OCFS2_IS_VALID_GROUP_DESC(bg));

mlog(0, "off = %u, num = %u\n", bit_off, num_bits);

--
1.5.6

2008-12-19 22:07:43

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 38/45] ocfs2: Wrap dirblock reads in a dedicated function.

From: Joel Becker <[email protected]>

We have ocfs2_bread() as a vestige of the original ext-based dir code.
It's only used by directories, though. Turn it into
ocfs2_read_dir_block(), with a prototype matching the other metadata
read functions. It's set up to validate dirblocks when the time comes.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/dir.c | 150 +++++++++++++++++++++++++++++++++-----------------------
1 files changed, 88 insertions(+), 62 deletions(-)

diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c
index 5777045..c2f3fd9 100644
--- a/fs/ocfs2/dir.c
+++ b/fs/ocfs2/dir.c
@@ -82,49 +82,6 @@ static int ocfs2_do_extend_dir(struct super_block *sb,
struct ocfs2_alloc_context *meta_ac,
struct buffer_head **new_bh);

-static struct buffer_head *ocfs2_bread(struct inode *inode,
- int block, int *err, int reada)
-{
- struct buffer_head *bh = NULL;
- int tmperr;
- u64 p_blkno;
- int readflags = 0;
-
- if (reada)
- readflags |= OCFS2_BH_READAHEAD;
-
- if (((u64)block << inode->i_sb->s_blocksize_bits) >=
- i_size_read(inode)) {
- BUG_ON(!reada);
- return NULL;
- }
-
- down_read(&OCFS2_I(inode)->ip_alloc_sem);
- tmperr = ocfs2_extent_map_get_blocks(inode, block, &p_blkno, NULL,
- NULL);
- up_read(&OCFS2_I(inode)->ip_alloc_sem);
- if (tmperr < 0) {
- mlog_errno(tmperr);
- goto fail;
- }
-
- tmperr = ocfs2_read_blocks(inode, p_blkno, 1, &bh, readflags);
- if (tmperr < 0)
- goto fail;
-
- tmperr = 0;
-
- *err = 0;
- return bh;
-
-fail:
- brelse(bh);
- bh = NULL;
-
- *err = -EIO;
- return NULL;
-}
-
/*
* bh passed here can be an inode block or a dir data block, depending
* on the inode inline data flag.
@@ -250,6 +207,76 @@ out:
return NULL;
}

+static int ocfs2_validate_dir_block(struct super_block *sb,
+ struct buffer_head *bh)
+{
+ /*
+ * Nothing yet. We don't validate dirents here, that's handled
+ * in-place when the code walks them.
+ */
+
+ return 0;
+}
+
+/*
+ * This function forces all errors to -EIO for consistency with its
+ * predecessor, ocfs2_bread(). We haven't audited what returning the
+ * real error codes would do to callers. We log the real codes with
+ * mlog_errno() before we squash them.
+ */
+static int ocfs2_read_dir_block(struct inode *inode, u64 v_block,
+ struct buffer_head **bh, int flags)
+{
+ int rc = 0;
+ struct buffer_head *tmp = *bh;
+ u64 p_blkno;
+
+ if (((u64)v_block << inode->i_sb->s_blocksize_bits) >=
+ i_size_read(inode)) {
+ BUG_ON(!(flags & OCFS2_BH_READAHEAD));
+ goto out;
+ }
+
+ down_read(&OCFS2_I(inode)->ip_alloc_sem);
+ rc = ocfs2_extent_map_get_blocks(inode, v_block, &p_blkno, NULL,
+ NULL);
+ up_read(&OCFS2_I(inode)->ip_alloc_sem);
+ if (rc) {
+ mlog_errno(rc);
+ goto out;
+ }
+
+ if (!p_blkno) {
+ rc = -EIO;
+ mlog(ML_ERROR,
+ "Directory #%llu contains a hole at offset %llu\n",
+ (unsigned long long)OCFS2_I(inode)->ip_blkno,
+ (unsigned long long)v_block << inode->i_sb->s_blocksize_bits);
+ goto out;
+ }
+
+ rc = ocfs2_read_blocks(inode, p_blkno, 1, &tmp, flags);
+ if (rc) {
+ mlog_errno(rc);
+ goto out;
+ }
+
+ if (!(flags & OCFS2_BH_READAHEAD)) {
+ rc = ocfs2_validate_dir_block(inode->i_sb, tmp);
+ if (rc) {
+ brelse(tmp);
+ goto out;
+ }
+ }
+
+ /* If ocfs2_read_blocks() got us a new bh, pass it up. */
+ if (!*bh)
+ *bh = tmp;
+
+out:
+ return rc ? -EIO : 0;
+}
+
static struct buffer_head *ocfs2_find_entry_el(const char *name, int namelen,
struct inode *dir,
struct ocfs2_dir_entry **res_dir)
@@ -296,15 +323,17 @@ restart:
}
num++;

- bh = ocfs2_bread(dir, b++, &err, 1);
+ bh = NULL;
+ err = ocfs2_read_dir_block(dir, b++, &bh,
+ OCFS2_BH_READAHEAD);
bh_use[ra_max] = bh;
}
}
if ((bh = bh_use[ra_ptr++]) == NULL)
goto next;
- if (ocfs2_read_block(dir, block, &bh)) {
+ if (ocfs2_read_dir_block(dir, block, &bh, 0)) {
/* read error, skip block & hope for the best.
- * ocfs2_read_block() has released the bh. */
+ * ocfs2_read_dir_block() has released the bh. */
ocfs2_error(dir->i_sb, "reading directory %llu, "
"offset %lu\n",
(unsigned long long)OCFS2_I(dir)->ip_blkno,
@@ -724,7 +753,6 @@ static int ocfs2_dir_foreach_blk_el(struct inode *inode,
int i, stored;
struct buffer_head * bh, * tmp;
struct ocfs2_dir_entry * de;
- int err;
struct super_block * sb = inode->i_sb;
unsigned int ra_sectors = 16;

@@ -735,12 +763,8 @@ static int ocfs2_dir_foreach_blk_el(struct inode *inode,

while (!error && !stored && *f_pos < i_size_read(inode)) {
blk = (*f_pos) >> sb->s_blocksize_bits;
- bh = ocfs2_bread(inode, blk, &err, 0);
- if (!bh) {
- mlog(ML_ERROR,
- "directory #%llu contains a hole at offset %lld\n",
- (unsigned long long)OCFS2_I(inode)->ip_blkno,
- *f_pos);
+ if (ocfs2_read_dir_block(inode, blk, &bh, 0)) {
+ /* Skip the corrupt dirblock and keep trying */
*f_pos += sb->s_blocksize - offset;
continue;
}
@@ -754,8 +778,10 @@ static int ocfs2_dir_foreach_blk_el(struct inode *inode,
|| (((last_ra_blk - blk) << 9) <= (ra_sectors / 2))) {
for (i = ra_sectors >> (sb->s_blocksize_bits - 9);
i > 0; i--) {
- tmp = ocfs2_bread(inode, ++blk, &err, 1);
- brelse(tmp);
+ tmp = NULL;
+ if (!ocfs2_read_dir_block(inode, ++blk, &tmp,
+ OCFS2_BH_READAHEAD))
+ brelse(tmp);
}
last_ra_blk = blk;
ra_sectors = 8;
@@ -828,6 +854,7 @@ revalidate:
}
offset = 0;
brelse(bh);
+ bh = NULL;
}

stored = 0;
@@ -1680,8 +1707,8 @@ static int ocfs2_find_dir_space_el(struct inode *dir, const char *name,
struct super_block *sb = dir->i_sb;
int status;

- bh = ocfs2_bread(dir, 0, &status, 0);
- if (!bh) {
+ status = ocfs2_read_dir_block(dir, 0, &bh, 0);
+ if (status) {
mlog_errno(status);
goto bail;
}
@@ -1702,11 +1729,10 @@ static int ocfs2_find_dir_space_el(struct inode *dir, const char *name,
status = -ENOSPC;
goto bail;
}
- bh = ocfs2_bread(dir,
- offset >> sb->s_blocksize_bits,
- &status,
- 0);
- if (!bh) {
+ status = ocfs2_read_dir_block(dir,
+ offset >> sb->s_blocksize_bits,
+ &bh, 0);
+ if (status) {
mlog_errno(status);
goto bail;
}
--
1.5.6

2008-12-19 22:08:04

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 39/45] ocfs2: Wrap xattr block reads in a dedicated function

From: Joel Becker <[email protected]>

We weren't consistently checking xattr blocks after we read them.
Most places checked the signature, but none checked xb_blkno or
xb_fs_signature. Create a toplevel ocfs2_read_xattr_block() that does
the read and the validation.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 94 ++++++++++++++++++++++++++++++++++++++++--------------
1 files changed, 70 insertions(+), 24 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 3cc8385..ef4aa54 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -314,6 +314,65 @@ static void ocfs2_xattr_bucket_copy_data(struct ocfs2_xattr_bucket *dest,
}
}

+static int ocfs2_validate_xattr_block(struct super_block *sb,
+ struct buffer_head *bh)
+{
+ struct ocfs2_xattr_block *xb =
+ (struct ocfs2_xattr_block *)bh->b_data;
+
+ mlog(0, "Validating xattr block %llu\n",
+ (unsigned long long)bh->b_blocknr);
+
+ if (!OCFS2_IS_VALID_XATTR_BLOCK(xb)) {
+ ocfs2_error(sb,
+ "Extended attribute block #%llu has bad "
+ "signature %.*s",
+ (unsigned long long)bh->b_blocknr, 7,
+ xb->xb_signature);
+ return -EINVAL;
+ }
+
+ if (le64_to_cpu(xb->xb_blkno) != bh->b_blocknr) {
+ ocfs2_error(sb,
+ "Extended attribute block #%llu has an "
+ "invalid xb_blkno of %llu",
+ (unsigned long long)bh->b_blocknr,
+ (unsigned long long)le64_to_cpu(xb->xb_blkno));
+ return -EINVAL;
+ }
+
+ if (le32_to_cpu(xb->xb_fs_generation) != OCFS2_SB(sb)->fs_generation) {
+ ocfs2_error(sb,
+ "Extended attribute block #%llu has an invalid "
+ "xb_fs_generation of #%u",
+ (unsigned long long)bh->b_blocknr,
+ le32_to_cpu(xb->xb_fs_generation));
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int ocfs2_read_xattr_block(struct inode *inode, u64 xb_blkno,
+ struct buffer_head **bh)
+{
+ int rc;
+ struct buffer_head *tmp = *bh;
+
+ rc = ocfs2_read_block(inode, xb_blkno, &tmp);
+ if (!rc) {
+ rc = ocfs2_validate_xattr_block(inode->i_sb, tmp);
+ if (rc)
+ brelse(tmp);
+ }
+
+ /* If ocfs2_read_block() got us a new bh, pass it up. */
+ if (!rc && !*bh)
+ *bh = tmp;
+
+ return rc;
+}
+
static inline const char *ocfs2_xattr_prefix(int name_index)
{
struct xattr_handler *handler = NULL;
@@ -739,18 +798,14 @@ static int ocfs2_xattr_block_list(struct inode *inode,
if (!di->i_xattr_loc)
return ret;

- ret = ocfs2_read_block(inode, le64_to_cpu(di->i_xattr_loc), &blk_bh);
+ ret = ocfs2_read_xattr_block(inode, le64_to_cpu(di->i_xattr_loc),
+ &blk_bh);
if (ret < 0) {
mlog_errno(ret);
return ret;
}

xb = (struct ocfs2_xattr_block *)blk_bh->b_data;
- if (!OCFS2_IS_VALID_XATTR_BLOCK(xb)) {
- ret = -EIO;
- goto cleanup;
- }
-
if (!(le16_to_cpu(xb->xb_flags) & OCFS2_XATTR_INDEXED)) {
struct ocfs2_xattr_header *header = &xb->xb_attrs.xb_header;
ret = ocfs2_xattr_list_entries(inode, header,
@@ -760,7 +815,7 @@ static int ocfs2_xattr_block_list(struct inode *inode,
ret = ocfs2_xattr_tree_list_index_block(inode, xt,
buffer, buffer_size);
}
-cleanup:
+
brelse(blk_bh);

return ret;
@@ -1693,24 +1748,19 @@ static int ocfs2_xattr_free_block(struct inode *inode,
u64 blk, bg_blkno;
u16 bit;

- ret = ocfs2_read_block(inode, block, &blk_bh);
+ ret = ocfs2_read_xattr_block(inode, block, &blk_bh);
if (ret < 0) {
mlog_errno(ret);
goto out;
}

- xb = (struct ocfs2_xattr_block *)blk_bh->b_data;
- if (!OCFS2_IS_VALID_XATTR_BLOCK(xb)) {
- ret = -EIO;
- goto out;
- }
-
ret = ocfs2_xattr_block_remove(inode, blk_bh);
if (ret < 0) {
mlog_errno(ret);
goto out;
}

+ xb = (struct ocfs2_xattr_block *)blk_bh->b_data;
blk = le64_to_cpu(xb->xb_blkno);
bit = le16_to_cpu(xb->xb_suballoc_bit);
bg_blkno = ocfs2_which_suballoc_group(blk, bit);
@@ -1950,19 +2000,15 @@ static int ocfs2_xattr_block_find(struct inode *inode,
if (!di->i_xattr_loc)
return ret;

- ret = ocfs2_read_block(inode, le64_to_cpu(di->i_xattr_loc), &blk_bh);
+ ret = ocfs2_read_xattr_block(inode, le64_to_cpu(di->i_xattr_loc),
+ &blk_bh);
if (ret < 0) {
mlog_errno(ret);
return ret;
}

- xb = (struct ocfs2_xattr_block *)blk_bh->b_data;
- if (!OCFS2_IS_VALID_XATTR_BLOCK(xb)) {
- ret = -EIO;
- goto cleanup;
- }
-
xs->xattr_bh = blk_bh;
+ xb = (struct ocfs2_xattr_block *)blk_bh->b_data;

if (!(le16_to_cpu(xb->xb_flags) & OCFS2_XATTR_INDEXED)) {
xs->header = &xb->xb_attrs.xb_header;
@@ -2259,9 +2305,9 @@ meta_guess:
/* calculate metadata allocation. */
if (di->i_xattr_loc) {
if (!xbs->xattr_bh) {
- ret = ocfs2_read_block(inode,
- le64_to_cpu(di->i_xattr_loc),
- &bh);
+ ret = ocfs2_read_xattr_block(inode,
+ le64_to_cpu(di->i_xattr_loc),
+ &bh);
if (ret) {
mlog_errno(ret);
goto out;
--
1.5.6

2008-12-19 22:08:29

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 40/45] ocfs2: Validate metadata only when it's read from disk.

From: Joel Becker <[email protected]>

Add an optional validation hook to ocfs2_read_blocks(). Now the
validation function is only called when a block was actually read off of
disk. It is not called when the buffer was in cache.

We add a buffer state bit BH_NeedsValidate to flag these buffers. It
must always be one higher than the last JBD2 buffer state bit.

The dinode, dirblock, extent_block, and xattr_block validators are
lifted to this scheme directly. The group_descriptor validator needs to
be split into two pieces. The first part only needs the gd buffer and
is passed to ocfs2_read_block(). The second part requires the dinode as
well, and is called every time. It's only 3 compares, so it's tiny.
This also allows us to clean up the non-fatal gd check used by resize.c.
It now has no magic argument.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/alloc.c | 17 +++-----
fs/ocfs2/buffer_head_io.c | 33 ++++++++++++++++-
fs/ocfs2/buffer_head_io.h | 27 ++++++++-----
fs/ocfs2/dir.c | 13 ++----
fs/ocfs2/inode.c | 18 ++------
fs/ocfs2/resize.c | 2 +-
fs/ocfs2/slot_map.c | 4 +-
fs/ocfs2/suballoc.c | 91 +++++++++++++++++++++++++++++++-------------
fs/ocfs2/suballoc.h | 15 +++----
fs/ocfs2/xattr.c | 26 ++++++-------
10 files changed, 149 insertions(+), 97 deletions(-)

diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index f430cc6..e823a27 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -684,6 +684,9 @@ static int ocfs2_validate_extent_block(struct super_block *sb,
struct ocfs2_extent_block *eb =
(struct ocfs2_extent_block *)bh->b_data;

+ mlog(0, "Validating extent block %llu\n",
+ (unsigned long long)bh->b_blocknr);
+
if (!OCFS2_IS_VALID_EXTENT_BLOCK(eb)) {
ocfs2_error(sb,
"Extent block #%llu has bad signature %.*s",
@@ -719,21 +722,13 @@ int ocfs2_read_extent_block(struct inode *inode, u64 eb_blkno,
int rc;
struct buffer_head *tmp = *bh;

- rc = ocfs2_read_block(inode, eb_blkno, &tmp);
- if (rc)
- goto out;
-
- rc = ocfs2_validate_extent_block(inode->i_sb, tmp);
- if (rc) {
- brelse(tmp);
- goto out;
- }
+ rc = ocfs2_read_block(inode, eb_blkno, &tmp,
+ ocfs2_validate_extent_block);

/* If ocfs2_read_block() got us a new bh, pass it up. */
- if (!*bh)
+ if (!rc && !*bh)
*bh = tmp;

-out:
return rc;
}

diff --git a/fs/ocfs2/buffer_head_io.c b/fs/ocfs2/buffer_head_io.c
index 3a178ec..0e9eed0 100644
--- a/fs/ocfs2/buffer_head_io.c
+++ b/fs/ocfs2/buffer_head_io.c
@@ -39,6 +39,19 @@

#include "buffer_head_io.h"

+/*
+ * Bits on bh->b_state used by ocfs2.
+ *
+ * These MUST be after the JBD2 bits. Currently BH_Unshadow is the last
+ * JBD2 bit.
+ */
+enum ocfs2_state_bits {
+ BH_NeedsValidate = BH_Unshadow + 1,
+};
+
+/* Expand the magic b_state functions */
+BUFFER_FNS(NeedsValidate, needs_validate);
+
int ocfs2_write_block(struct ocfs2_super *osb, struct buffer_head *bh,
struct inode *inode)
{
@@ -166,7 +179,9 @@ bail:
}

int ocfs2_read_blocks(struct inode *inode, u64 block, int nr,
- struct buffer_head *bhs[], int flags)
+ struct buffer_head *bhs[], int flags,
+ int (*validate)(struct super_block *sb,
+ struct buffer_head *bh))
{
int status = 0;
int i, ignore_cache = 0;
@@ -298,6 +313,8 @@ int ocfs2_read_blocks(struct inode *inode, u64 block, int nr,

clear_buffer_uptodate(bh);
get_bh(bh); /* for end_buffer_read_sync() */
+ if (validate)
+ set_buffer_needs_validate(bh);
bh->b_end_io = end_buffer_read_sync;
submit_bh(READ, bh);
continue;
@@ -328,6 +345,20 @@ int ocfs2_read_blocks(struct inode *inode, u64 block, int nr,
bhs[i] = NULL;
continue;
}
+
+ if (buffer_needs_validate(bh)) {
+ /* We never set NeedsValidate if the
+ * buffer was held by the journal, so
+ * that better not have changed */
+ BUG_ON(buffer_jbd(bh));
+ clear_buffer_needs_validate(bh);
+ status = validate(inode->i_sb, bh);
+ if (status) {
+ put_bh(bh);
+ bhs[i] = NULL;
+ continue;
+ }
+ }
}

/* Always set the buffer in the cache, even if it was
diff --git a/fs/ocfs2/buffer_head_io.h b/fs/ocfs2/buffer_head_io.h
index 75e1dcb..c75d682 100644
--- a/fs/ocfs2/buffer_head_io.h
+++ b/fs/ocfs2/buffer_head_io.h
@@ -31,21 +31,24 @@
void ocfs2_end_buffer_io_sync(struct buffer_head *bh,
int uptodate);

-static inline int ocfs2_read_block(struct inode *inode,
- u64 off,
- struct buffer_head **bh);
-
int ocfs2_write_block(struct ocfs2_super *osb,
struct buffer_head *bh,
struct inode *inode);
-int ocfs2_read_blocks(struct inode *inode,
- u64 block,
- int nr,
- struct buffer_head *bhs[],
- int flags);
int ocfs2_read_blocks_sync(struct ocfs2_super *osb, u64 block,
unsigned int nr, struct buffer_head *bhs[]);

+/*
+ * If not NULL, validate() will be called on a buffer that is freshly
+ * read from disk. It will not be called if the buffer was in cache.
+ * Note that if validate() is being used for this buffer, it needs to
+ * be set even for a READAHEAD call, as it marks the buffer for later
+ * validation.
+ */
+int ocfs2_read_blocks(struct inode *inode, u64 block, int nr,
+ struct buffer_head *bhs[], int flags,
+ int (*validate)(struct super_block *sb,
+ struct buffer_head *bh));
+
int ocfs2_write_super_or_backup(struct ocfs2_super *osb,
struct buffer_head *bh);

@@ -53,7 +56,9 @@ int ocfs2_write_super_or_backup(struct ocfs2_super *osb,
#define OCFS2_BH_READAHEAD 8

static inline int ocfs2_read_block(struct inode *inode, u64 off,
- struct buffer_head **bh)
+ struct buffer_head **bh,
+ int (*validate)(struct super_block *sb,
+ struct buffer_head *bh))
{
int status = 0;

@@ -63,7 +68,7 @@ static inline int ocfs2_read_block(struct inode *inode, u64 off,
goto bail;
}

- status = ocfs2_read_blocks(inode, off, 1, bh, 0);
+ status = ocfs2_read_blocks(inode, off, 1, bh, 0, validate);

bail:
return status;
diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c
index c2f3fd9..7e863d4 100644
--- a/fs/ocfs2/dir.c
+++ b/fs/ocfs2/dir.c
@@ -214,6 +214,8 @@ static int ocfs2_validate_dir_block(struct super_block *sb,
* Nothing yet. We don't validate dirents here, that's handled
* in-place when the code walks them.
*/
+ mlog(0, "Validating dirblock %llu\n",
+ (unsigned long long)bh->b_blocknr);

return 0;
}
@@ -255,20 +257,13 @@ static int ocfs2_read_dir_block(struct inode *inode, u64 v_block,
goto out;
}

- rc = ocfs2_read_blocks(inode, p_blkno, 1, &tmp, flags);
+ rc = ocfs2_read_blocks(inode, p_blkno, 1, &tmp, flags,
+ ocfs2_validate_dir_block);
if (rc) {
mlog_errno(rc);
goto out;
}

- if (!(flags & OCFS2_BH_READAHEAD)) {
- rc = ocfs2_validate_dir_block(inode->i_sb, tmp);
- if (rc) {
- brelse(tmp);
- goto out;
- }
- }
-
/* If ocfs2_read_blocks() got us a new bh, pass it up. */
if (!*bh)
*bh = tmp;
diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
index 9eb701b..ec3497b 100644
--- a/fs/ocfs2/inode.c
+++ b/fs/ocfs2/inode.c
@@ -1255,6 +1255,9 @@ int ocfs2_validate_inode_block(struct super_block *sb,
int rc = -EINVAL;
struct ocfs2_dinode *di = (struct ocfs2_dinode *)bh->b_data;

+ mlog(0, "Validating dinode %llu\n",
+ (unsigned long long)bh->b_blocknr);
+
BUG_ON(!buffer_uptodate(bh));

if (!OCFS2_IS_VALID_DINODE(di)) {
@@ -1300,23 +1303,12 @@ int ocfs2_read_inode_block_full(struct inode *inode, struct buffer_head **bh,
struct buffer_head *tmp = *bh;

rc = ocfs2_read_blocks(inode, OCFS2_I(inode)->ip_blkno, 1, &tmp,
- flags);
- if (rc)
- goto out;
-
- if (!(flags & OCFS2_BH_READAHEAD)) {
- rc = ocfs2_validate_inode_block(inode->i_sb, tmp);
- if (rc) {
- brelse(tmp);
- goto out;
- }
- }
+ flags, ocfs2_validate_inode_block);

/* If ocfs2_read_blocks() got us a new bh, pass it up. */
- if (!*bh)
+ if (!rc && !*bh)
*bh = tmp;

-out:
return rc;
}

diff --git a/fs/ocfs2/resize.c b/fs/ocfs2/resize.c
index 252baff..867de3e 100644
--- a/fs/ocfs2/resize.c
+++ b/fs/ocfs2/resize.c
@@ -394,7 +394,7 @@ static int ocfs2_check_new_group(struct inode *inode,
(struct ocfs2_group_desc *)group_bh->b_data;
u16 cl_bpc = le16_to_cpu(di->id2.i_chain.cl_bpc);

- ret = ocfs2_validate_group_descriptor(inode->i_sb, di, group_bh, 1);
+ ret = ocfs2_check_group_descriptor(inode->i_sb, di, group_bh);
if (ret)
goto out;

diff --git a/fs/ocfs2/slot_map.c b/fs/ocfs2/slot_map.c
index bdda2d8..40661e7 100644
--- a/fs/ocfs2/slot_map.c
+++ b/fs/ocfs2/slot_map.c
@@ -151,7 +151,7 @@ int ocfs2_refresh_slot_info(struct ocfs2_super *osb)
* this is not true, the read of -1 (UINT64_MAX) will fail.
*/
ret = ocfs2_read_blocks(si->si_inode, -1, si->si_blocks, si->si_bh,
- OCFS2_BH_IGNORE_CACHE);
+ OCFS2_BH_IGNORE_CACHE, NULL);
if (ret == 0) {
spin_lock(&osb->osb_lock);
ocfs2_update_slot_info(si);
@@ -405,7 +405,7 @@ static int ocfs2_map_slot_buffers(struct ocfs2_super *osb,

bh = NULL; /* Acquire a fresh bh */
status = ocfs2_read_blocks(si->si_inode, blkno, 1, &bh,
- OCFS2_BH_IGNORE_CACHE);
+ OCFS2_BH_IGNORE_CACHE, NULL);
if (status < 0) {
mlog_errno(status);
goto bail;
diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index 766a00b..226fe21 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -145,14 +145,6 @@ static u32 ocfs2_bits_per_group(struct ocfs2_chain_list *cl)
return (u32)le16_to_cpu(cl->cl_cpg) * (u32)le16_to_cpu(cl->cl_bpc);
}

-int ocfs2_validate_group_descriptor(struct super_block *sb,
- struct ocfs2_dinode *di,
- struct buffer_head *bh,
- int clean_error)
-{
- unsigned int max_bits;
- struct ocfs2_group_desc *gd = (struct ocfs2_group_desc *)bh->b_data;
-
#define do_error(fmt, ...) \
do{ \
if (clean_error) \
@@ -161,6 +153,12 @@ int ocfs2_validate_group_descriptor(struct super_block *sb,
ocfs2_error(sb, fmt, ##__VA_ARGS__); \
} while (0)

+static int ocfs2_validate_gd_self(struct super_block *sb,
+ struct buffer_head *bh,
+ int clean_error)
+{
+ struct ocfs2_group_desc *gd = (struct ocfs2_group_desc *)bh->b_data;
+
if (!OCFS2_IS_VALID_GROUP_DESC(gd)) {
do_error("Group descriptor #%llu has bad signature %.*s",
(unsigned long long)bh->b_blocknr, 7,
@@ -184,6 +182,35 @@ int ocfs2_validate_group_descriptor(struct super_block *sb,
return -EINVAL;
}

+ if (le16_to_cpu(gd->bg_free_bits_count) > le16_to_cpu(gd->bg_bits)) {
+ do_error("Group descriptor #%llu has bit count %u but "
+ "claims that %u are free",
+ (unsigned long long)bh->b_blocknr,
+ le16_to_cpu(gd->bg_bits),
+ le16_to_cpu(gd->bg_free_bits_count));
+ return -EINVAL;
+ }
+
+ if (le16_to_cpu(gd->bg_bits) > (8 * le16_to_cpu(gd->bg_size))) {
+ do_error("Group descriptor #%llu has bit count %u but "
+ "max bitmap bits of %u",
+ (unsigned long long)bh->b_blocknr,
+ le16_to_cpu(gd->bg_bits),
+ 8 * le16_to_cpu(gd->bg_size));
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int ocfs2_validate_gd_parent(struct super_block *sb,
+ struct ocfs2_dinode *di,
+ struct buffer_head *bh,
+ int clean_error)
+{
+ unsigned int max_bits;
+ struct ocfs2_group_desc *gd = (struct ocfs2_group_desc *)bh->b_data;
+
if (di->i_blkno != gd->bg_parent_dinode) {
do_error("Group descriptor #%llu has bad parent "
"pointer (%llu, expected %llu)",
@@ -209,26 +236,35 @@ int ocfs2_validate_group_descriptor(struct super_block *sb,
return -EINVAL;
}

- if (le16_to_cpu(gd->bg_free_bits_count) > le16_to_cpu(gd->bg_bits)) {
- do_error("Group descriptor #%llu has bit count %u but "
- "claims that %u are free",
- (unsigned long long)bh->b_blocknr,
- le16_to_cpu(gd->bg_bits),
- le16_to_cpu(gd->bg_free_bits_count));
- return -EINVAL;
- }
+ return 0;
+}

- if (le16_to_cpu(gd->bg_bits) > (8 * le16_to_cpu(gd->bg_size))) {
- do_error("Group descriptor #%llu has bit count %u but "
- "max bitmap bits of %u",
- (unsigned long long)bh->b_blocknr,
- le16_to_cpu(gd->bg_bits),
- 8 * le16_to_cpu(gd->bg_size));
- return -EINVAL;
- }
#undef do_error

- return 0;
+/*
+ * This version only prints errors. It does not fail the filesystem, and
+ * exists only for resize.
+ */
+int ocfs2_check_group_descriptor(struct super_block *sb,
+ struct ocfs2_dinode *di,
+ struct buffer_head *bh)
+{
+ int rc;
+
+ rc = ocfs2_validate_gd_self(sb, bh, 1);
+ if (!rc)
+ rc = ocfs2_validate_gd_parent(sb, di, bh, 1);
+
+ return rc;
+}
+
+static int ocfs2_validate_group_descriptor(struct super_block *sb,
+ struct buffer_head *bh)
+{
+ mlog(0, "Validating group descriptor %llu\n",
+ (unsigned long long)bh->b_blocknr);
+
+ return ocfs2_validate_gd_self(sb, bh, 0);
}

int ocfs2_read_group_descriptor(struct inode *inode, struct ocfs2_dinode *di,
@@ -237,11 +273,12 @@ int ocfs2_read_group_descriptor(struct inode *inode, struct ocfs2_dinode *di,
int rc;
struct buffer_head *tmp = *bh;

- rc = ocfs2_read_block(inode, gd_blkno, &tmp);
+ rc = ocfs2_read_block(inode, gd_blkno, &tmp,
+ ocfs2_validate_group_descriptor);
if (rc)
goto out;

- rc = ocfs2_validate_group_descriptor(inode->i_sb, di, tmp, 0);
+ rc = ocfs2_validate_gd_parent(inode->i_sb, di, tmp, 0);
if (rc) {
brelse(tmp);
goto out;
diff --git a/fs/ocfs2/suballoc.h b/fs/ocfs2/suballoc.h
index 43de4fd..e3c13c7 100644
--- a/fs/ocfs2/suballoc.h
+++ b/fs/ocfs2/suballoc.h
@@ -165,16 +165,15 @@ void ocfs2_free_ac_resource(struct ocfs2_alloc_context *ac);
u64 ocfs2_which_cluster_group(struct inode *inode, u32 cluster);

/*
- * By default, ocfs2_validate_group_descriptor() calls ocfs2_error() when it
+ * By default, ocfs2_read_group_descriptor() calls ocfs2_error() when it
* finds a problem. A caller that wants to check a group descriptor
- * without going readonly passes a nonzero clean_error. This is only
- * resize, really. Everyone else should be using
- * ocfs2_read_group_descriptor().
+ * without going readonly should read the block with ocfs2_read_block[s]()
+ * and then checking it with this function. This is only resize, really.
+ * Everyone else should be using ocfs2_read_group_descriptor().
*/
-int ocfs2_validate_group_descriptor(struct super_block *sb,
- struct ocfs2_dinode *di,
- struct buffer_head *bh,
- int clean_error);
+int ocfs2_check_group_descriptor(struct super_block *sb,
+ struct ocfs2_dinode *di,
+ struct buffer_head *bh);
/*
* Read a group descriptor block into *bh. If *bh is NULL, a bh will be
* allocated. This is a cached read. The descriptor will be validated with
diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index ef4aa54..8af29b3 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -266,7 +266,8 @@ static int ocfs2_read_xattr_bucket(struct ocfs2_xattr_bucket *bucket,
int rc;

rc = ocfs2_read_blocks(bucket->bu_inode, xb_blkno,
- bucket->bu_blocks, bucket->bu_bhs, 0);
+ bucket->bu_blocks, bucket->bu_bhs, 0,
+ NULL);
if (rc)
ocfs2_xattr_bucket_relse(bucket);
return rc;
@@ -359,12 +360,8 @@ static int ocfs2_read_xattr_block(struct inode *inode, u64 xb_blkno,
int rc;
struct buffer_head *tmp = *bh;

- rc = ocfs2_read_block(inode, xb_blkno, &tmp);
- if (!rc) {
- rc = ocfs2_validate_xattr_block(inode->i_sb, tmp);
- if (rc)
- brelse(tmp);
- }
+ rc = ocfs2_read_block(inode, xb_blkno, &tmp,
+ ocfs2_validate_xattr_block);

/* If ocfs2_read_block() got us a new bh, pass it up. */
if (!rc && !*bh)
@@ -925,7 +922,7 @@ static int ocfs2_xattr_get_value_outside(struct inode *inode,
blkno = ocfs2_clusters_to_blocks(inode->i_sb, p_cluster);
/* Copy ocfs2_xattr_value */
for (i = 0; i < num_clusters * bpc; i++, blkno++) {
- ret = ocfs2_read_block(inode, blkno, &bh);
+ ret = ocfs2_read_block(inode, blkno, &bh, NULL);
if (ret) {
mlog_errno(ret);
goto out;
@@ -1174,7 +1171,7 @@ static int __ocfs2_xattr_set_value_outside(struct inode *inode,
blkno = ocfs2_clusters_to_blocks(inode->i_sb, p_cluster);

for (i = 0; i < num_clusters * bpc; i++, blkno++) {
- ret = ocfs2_read_block(inode, blkno, &bh);
+ ret = ocfs2_read_block(inode, blkno, &bh, NULL);
if (ret) {
mlog_errno(ret);
goto out;
@@ -2206,7 +2203,7 @@ static int ocfs2_calc_xattr_set_need(struct inode *inode,
base = xis->base;
credits += OCFS2_INODE_UPDATE_CREDITS;
} else {
- int i, block_off;
+ int i, block_off = 0;
xb = (struct ocfs2_xattr_block *)xbs->xattr_bh->b_data;
xe = xbs->here;
name_offset = le16_to_cpu(xe->xe_name_offset);
@@ -2840,6 +2837,7 @@ static int ocfs2_find_xe_in_bucket(struct inode *inode,
break;
}

+
xe_name = bucket_block(bucket, block_off) + new_offset;
if (!memcmp(name, xe_name, name_len)) {
*xe_index = i;
@@ -3598,7 +3596,7 @@ static int ocfs2_mv_xattr_bucket_cross_cluster(struct inode *inode,
goto out;
}

- ret = ocfs2_read_block(inode, prev_blkno, &old_bh);
+ ret = ocfs2_read_block(inode, prev_blkno, &old_bh, NULL);
if (ret < 0) {
mlog_errno(ret);
brelse(new_bh);
@@ -3990,7 +3988,7 @@ static int ocfs2_cp_xattr_cluster(struct inode *inode,
ocfs2_journal_dirty(handle, first_bh);

/* update the new bucket header. */
- ret = ocfs2_read_block(inode, to_blk_start, &bh);
+ ret = ocfs2_read_block(inode, to_blk_start, &bh, NULL);
if (ret < 0) {
mlog_errno(ret);
goto out;
@@ -4337,7 +4335,7 @@ static int ocfs2_add_new_xattr_bucket(struct inode *inode,
goto out;
}

- ret = ocfs2_read_block(inode, p_blkno, &first_bh);
+ ret = ocfs2_read_block(inode, p_blkno, &first_bh, NULL);
if (ret) {
mlog_errno(ret);
goto out;
@@ -4635,7 +4633,7 @@ static int ocfs2_xattr_bucket_value_truncate(struct inode *inode,
BUG_ON(value_blk != (offset + OCFS2_XATTR_ROOT_SIZE - 1) / blocksize);
value_blk += header_bh->b_blocknr;

- ret = ocfs2_read_block(inode, value_blk, &value_bh);
+ ret = ocfs2_read_block(inode, value_blk, &value_bh, NULL);
if (ret) {
mlog_errno(ret);
goto out;
--
1.5.6

2008-12-19 22:09:00

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 42/45] ocfs2: Convert ocfs2_read_dir_block() to ocfs2_read_virt_blocks()

From: Joel Becker <[email protected]>

Now that we've centralized the ocfs2_read_virt_blocks() code, let's use
it in ocfs2_read_dir_block().

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/dir.c | 38 +++++---------------------------------
1 files changed, 5 insertions(+), 33 deletions(-)

diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c
index 7e863d4..d83cff9 100644
--- a/fs/ocfs2/dir.c
+++ b/fs/ocfs2/dir.c
@@ -231,44 +231,16 @@ static int ocfs2_read_dir_block(struct inode *inode, u64 v_block,
{
int rc = 0;
struct buffer_head *tmp = *bh;
- u64 p_blkno;

- if (((u64)v_block << inode->i_sb->s_blocksize_bits) >=
- i_size_read(inode)) {
- BUG_ON(!(flags & OCFS2_BH_READAHEAD));
- goto out;
- }
-
- down_read(&OCFS2_I(inode)->ip_alloc_sem);
- rc = ocfs2_extent_map_get_blocks(inode, v_block, &p_blkno, NULL,
- NULL);
- up_read(&OCFS2_I(inode)->ip_alloc_sem);
- if (rc) {
+ rc = ocfs2_read_virt_blocks(inode, v_block, 1, &tmp, flags,
+ ocfs2_validate_dir_block);
+ if (rc)
mlog_errno(rc);
- goto out;
- }

- if (!p_blkno) {
- rc = -EIO;
- mlog(ML_ERROR,
- "Directory #%llu contains a hole at offset %llu\n",
- (unsigned long long)OCFS2_I(inode)->ip_blkno,
- (unsigned long long)v_block << inode->i_sb->s_blocksize_bits);
- goto out;
- }
-
- rc = ocfs2_read_blocks(inode, p_blkno, 1, &tmp, flags,
- ocfs2_validate_dir_block);
- if (rc) {
- mlog_errno(rc);
- goto out;
- }
-
- /* If ocfs2_read_blocks() got us a new bh, pass it up. */
- if (!*bh)
+ /* If ocfs2_read_virt_blocks() got us a new bh, pass it up. */
+ if (!rc && !*bh)
*bh = tmp;

-out:
return rc ? -EIO : 0;
}

--
1.5.6

2008-12-19 22:08:43

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 41/45] ocfs2: Wrap virtual block reads in ocfs2_read_virt_blocks()

From: Joel Becker <[email protected]>

The ocfs2_read_dir_block() function really maps an inode's virtual
blocks to physical ones before calling ocfs2_read_blocks(). Let's
extract that to common code, because other places might want to do that.

Other than the block number being virtual, ocfs2_read_virt_blocks()
takes the same arguments as ocfs2_read_blocks(). It converts those
virtual block numbers to physical before calling ocfs2_read_blocks()
directly. If the blocks asked for are discontiguous, this can mean
multiple calls to ocfs2_read_blocks(), but this is mostly hidden from
the caller.

Like ocfs2_read_blocks(), the caller can pass in an existing
buffer_head. This is usually done to pick up some readahead I/O.
ocfs2_read_virt_blocks() checks the buffer_head's block number
against the extent map - it must match.

Signed-off-by: Joel Becker <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/extent_map.c | 71 +++++++++++++++++++++++++++++++++++++++++++++++++
fs/ocfs2/extent_map.h | 24 ++++++++++++++++
2 files changed, 95 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
index 0bd9d96..f2bb1a0 100644
--- a/fs/ocfs2/extent_map.c
+++ b/fs/ocfs2/extent_map.c
@@ -806,3 +806,74 @@ out:

return ret;
}
+
+int ocfs2_read_virt_blocks(struct inode *inode, u64 v_block, int nr,
+ struct buffer_head *bhs[], int flags,
+ int (*validate)(struct super_block *sb,
+ struct buffer_head *bh))
+{
+ int rc = 0;
+ u64 p_block, p_count;
+ int i, count, done = 0;
+
+ mlog_entry("(inode = %p, v_block = %llu, nr = %d, bhs = %p, "
+ "flags = %x, validate = %p)\n",
+ inode, (unsigned long long)v_block, nr, bhs, flags,
+ validate);
+
+ if (((v_block + nr - 1) << inode->i_sb->s_blocksize_bits) >=
+ i_size_read(inode)) {
+ BUG_ON(!(flags & OCFS2_BH_READAHEAD));
+ goto out;
+ }
+
+ while (done < nr) {
+ down_read(&OCFS2_I(inode)->ip_alloc_sem);
+ rc = ocfs2_extent_map_get_blocks(inode, v_block + done,
+ &p_block, &p_count, NULL);
+ up_read(&OCFS2_I(inode)->ip_alloc_sem);
+ if (rc) {
+ mlog_errno(rc);
+ break;
+ }
+
+ if (!p_block) {
+ rc = -EIO;
+ mlog(ML_ERROR,
+ "Inode #%llu contains a hole at offset %llu\n",
+ (unsigned long long)OCFS2_I(inode)->ip_blkno,
+ (unsigned long long)(v_block + done) <<
+ inode->i_sb->s_blocksize_bits);
+ break;
+ }
+
+ count = nr - done;
+ if (p_count < count)
+ count = p_count;
+
+ /*
+ * If the caller passed us bhs, they should have come
+ * from a previous readahead call to this function. Thus,
+ * they should have the right b_blocknr.
+ */
+ for (i = 0; i < count; i++) {
+ if (!bhs[done + i])
+ continue;
+ BUG_ON(bhs[done + i]->b_blocknr != (p_block + i));
+ }
+
+ rc = ocfs2_read_blocks(inode, p_block, count, bhs + done,
+ flags, validate);
+ if (rc) {
+ mlog_errno(rc);
+ break;
+ }
+ done += count;
+ }
+
+out:
+ mlog_exit(rc);
+ return rc;
+}
+
+
diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
index 1c4aa8b..b7dd973 100644
--- a/fs/ocfs2/extent_map.h
+++ b/fs/ocfs2/extent_map.h
@@ -57,4 +57,28 @@ int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
u32 *p_cluster, u32 *num_clusters,
struct ocfs2_extent_list *el);

+int ocfs2_read_virt_blocks(struct inode *inode, u64 v_block, int nr,
+ struct buffer_head *bhs[], int flags,
+ int (*validate)(struct super_block *sb,
+ struct buffer_head *bh));
+static inline int ocfs2_read_virt_block(struct inode *inode, u64 v_block,
+ struct buffer_head **bh,
+ int (*validate)(struct super_block *sb,
+ struct buffer_head *bh))
+{
+ int status = 0;
+
+ if (bh == NULL) {
+ printk("ocfs2: bh == NULL\n");
+ status = -EINVAL;
+ goto bail;
+ }
+
+ status = ocfs2_read_virt_blocks(inode, v_block, 1, bh, 0, validate);
+
+bail:
+ return status;
+}
+
+
#endif /* _EXTENT_MAP_H */
--
1.5.6

2008-12-19 22:09:25

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 43/45] ocfs2: Remove JBD compatibility layer

JBD2 is fully backwards compatible with JBD and it's been tested enough with
Ocfs2 that we can clean this code up now.

Signed-off-by: Mark Fasheh <[email protected]>
---
fs/Kconfig | 10 -----
fs/ocfs2/alloc.c | 5 ---
fs/ocfs2/aops.c | 24 +-----------
fs/ocfs2/journal.c | 14 -------
fs/ocfs2/journal.h | 11 +-----
fs/ocfs2/ocfs2_jbd_compat.h | 82 -------------------------------------------
6 files changed, 3 insertions(+), 143 deletions(-)
delete mode 100644 fs/ocfs2/ocfs2_jbd_compat.h

diff --git a/fs/Kconfig b/fs/Kconfig
index 3af6024..55dc974 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -258,16 +258,6 @@ config OCFS2_DEBUG_FS
this option for debugging only as it is likely to decrease
performance of the filesystem.

-config OCFS2_COMPAT_JBD
- bool "Use JBD for compatibility"
- depends on OCFS2_FS
- default n
- select JBD
- help
- The ocfs2 filesystem now uses JBD2 for its journalling. JBD2
- is backwards compatible with JBD. It is safe to say N here.
- However, if you really want to use the original JBD, say Y here.
-
config OCFS2_FS_POSIX_ACL
bool "OCFS2 POSIX Access Control Lists"
depends on OCFS2_FS
diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index e823a27..69d67ab 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -6638,11 +6638,6 @@ static void ocfs2_map_and_dirty_page(struct inode *inode, handle_t *handle,
mlog_errno(ret);
else if (ocfs2_should_order_data(inode)) {
ret = ocfs2_jbd2_file_inode(handle, inode);
-#ifdef CONFIG_OCFS2_COMPAT_JBD
- ret = walk_page_buffers(handle, page_buffers(page),
- from, to, &partial,
- ocfs2_journal_dirty_data);
-#endif
if (ret < 0)
mlog_errno(ret);
}
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index e219f8b..6af79ad 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -474,12 +474,6 @@ handle_t *ocfs2_start_walk_page_trans(struct inode *inode,

if (ocfs2_should_order_data(inode)) {
ret = ocfs2_jbd2_file_inode(handle, inode);
-#ifdef CONFIG_OCFS2_COMPAT_JBD
- ret = walk_page_buffers(handle,
- page_buffers(page),
- from, to, NULL,
- ocfs2_journal_dirty_data);
-#endif
if (ret < 0)
mlog_errno(ret);
}
@@ -1065,15 +1059,8 @@ static void ocfs2_write_failure(struct inode *inode,
tmppage = wc->w_pages[i];

if (page_has_buffers(tmppage)) {
- if (ocfs2_should_order_data(inode)) {
+ if (ocfs2_should_order_data(inode))
ocfs2_jbd2_file_inode(wc->w_handle, inode);
-#ifdef CONFIG_OCFS2_COMPAT_JBD
- walk_page_buffers(wc->w_handle,
- page_buffers(tmppage),
- from, to, NULL,
- ocfs2_journal_dirty_data);
-#endif
- }

block_commit_write(tmppage, from, to);
}
@@ -1912,15 +1899,8 @@ int ocfs2_write_end_nolock(struct address_space *mapping,
}

if (page_has_buffers(tmppage)) {
- if (ocfs2_should_order_data(inode)) {
+ if (ocfs2_should_order_data(inode))
ocfs2_jbd2_file_inode(wc->w_handle, inode);
-#ifdef CONFIG_OCFS2_COMPAT_JBD
- walk_page_buffers(wc->w_handle,
- page_buffers(tmppage),
- from, to, NULL,
- ocfs2_journal_dirty_data);
-#endif
- }
block_commit_write(tmppage, from, to);
}
}
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index 9223bfc..12b62a3 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -434,20 +434,6 @@ int ocfs2_journal_dirty(handle_t *handle,
return status;
}

-#ifdef CONFIG_OCFS2_COMPAT_JBD
-int ocfs2_journal_dirty_data(handle_t *handle,
- struct buffer_head *bh)
-{
- int err = journal_dirty_data(handle, bh);
- if (err)
- mlog_errno(err);
- /* TODO: When we can handle it, abort the handle and go RO on
- * error here. */
-
- return err;
-}
-#endif
-
#define OCFS2_DEFAULT_COMMIT_INTERVAL (HZ * JBD2_DEFAULT_MAX_COMMIT_AGE)

void ocfs2_set_journal_params(struct ocfs2_super *osb)
diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
index d4d14e9..8203980 100644
--- a/fs/ocfs2/journal.h
+++ b/fs/ocfs2/journal.h
@@ -27,12 +27,7 @@
#define OCFS2_JOURNAL_H

#include <linux/fs.h>
-#ifndef CONFIG_OCFS2_COMPAT_JBD
-# include <linux/jbd2.h>
-#else
-# include <linux/jbd.h>
-# include "ocfs2_jbd_compat.h"
-#endif
+#include <linux/jbd2.h>

enum ocfs2_journal_state {
OCFS2_JOURNAL_FREE = 0,
@@ -273,10 +268,6 @@ int ocfs2_journal_access(handle_t *handle,
*/
int ocfs2_journal_dirty(handle_t *handle,
struct buffer_head *bh);
-#ifdef CONFIG_OCFS2_COMPAT_JBD
-int ocfs2_journal_dirty_data(handle_t *handle,
- struct buffer_head *bh);
-#endif

/*
* Credit Macros:
diff --git a/fs/ocfs2/ocfs2_jbd_compat.h b/fs/ocfs2/ocfs2_jbd_compat.h
deleted file mode 100644
index b91c78f..0000000
--- a/fs/ocfs2/ocfs2_jbd_compat.h
+++ /dev/null
@@ -1,82 +0,0 @@
-/* -*- mode: c; c-basic-offset: 8; -*-
- * vim: noexpandtab sw=8 ts=8 sts=0:
- *
- * ocfs2_jbd_compat.h
- *
- * Compatibility defines for JBD.
- *
- * Copyright (C) 2008 Oracle. All rights reserved.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public
- * License version 2 as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- */
-
-#ifndef OCFS2_JBD_COMPAT_H
-#define OCFS2_JBD_COMPAT_H
-
-#ifndef CONFIG_OCFS2_COMPAT_JBD
-# error Should not have been included
-#endif
-
-struct jbd2_inode {
- unsigned int dummy;
-};
-
-#define JBD2_BARRIER JFS_BARRIER
-#define JBD2_DEFAULT_MAX_COMMIT_AGE JBD_DEFAULT_MAX_COMMIT_AGE
-
-#define jbd2_journal_ack_err journal_ack_err
-#define jbd2_journal_clear_err journal_clear_err
-#define jbd2_journal_destroy journal_destroy
-#define jbd2_journal_dirty_metadata journal_dirty_metadata
-#define jbd2_journal_errno journal_errno
-#define jbd2_journal_extend journal_extend
-#define jbd2_journal_flush journal_flush
-#define jbd2_journal_force_commit journal_force_commit
-#define jbd2_journal_get_write_access journal_get_write_access
-#define jbd2_journal_get_undo_access journal_get_undo_access
-#define jbd2_journal_init_inode journal_init_inode
-#define jbd2_journal_invalidatepage journal_invalidatepage
-#define jbd2_journal_load journal_load
-#define jbd2_journal_lock_updates journal_lock_updates
-#define jbd2_journal_restart journal_restart
-#define jbd2_journal_start journal_start
-#define jbd2_journal_start_commit journal_start_commit
-#define jbd2_journal_stop journal_stop
-#define jbd2_journal_try_to_free_buffers journal_try_to_free_buffers
-#define jbd2_journal_unlock_updates journal_unlock_updates
-#define jbd2_journal_wipe journal_wipe
-#define jbd2_log_wait_commit log_wait_commit
-
-static inline int jbd2_journal_file_inode(handle_t *handle,
- struct jbd2_inode *inode)
-{
- return 0;
-}
-
-static inline int jbd2_journal_begin_ordered_truncate(struct jbd2_inode *inode,
- loff_t new_size)
-{
- return 0;
-}
-
-static inline void jbd2_journal_init_jbd_inode(struct jbd2_inode *jinode,
- struct inode *inode)
-{
- return;
-}
-
-static inline void jbd2_journal_release_jbd_inode(journal_t *journal,
- struct jbd2_inode *jinode)
-{
- return;
-}
-
-
-#endif /* OCFS2_JBD_COMPAT_H */
--
1.5.6

2008-12-19 22:09:57

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 45/45] ocfs2/xattr: Restore not_found in xis

From: Tao Ma <[email protected]>

During an xattr set, when we move a xattr which was stored in inode to the
outside bucket, we have to delete it and it will use the old value of
xis->not_found. xis->not_found is removed by ocfs2_calc_xattr_set_need
though, so we must restore it.

Signed-off-by: Tao Ma <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index d0b94ed..9cb71e1 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -2414,7 +2414,7 @@ static int __ocfs2_xattr_set_handle(struct inode *inode,
struct ocfs2_xattr_search *xbs,
struct ocfs2_xattr_set_ctxt *ctxt)
{
- int ret = 0, credits;
+ int ret = 0, credits, old_found;

if (!xi->value) {
/* Remove existing extended attribute */
@@ -2433,6 +2433,7 @@ static int __ocfs2_xattr_set_handle(struct inode *inode,
xi->value = NULL;
xi->value_len = 0;

+ old_found = xis->not_found;
xis->not_found = -ENODATA;
ret = ocfs2_calc_xattr_set_need(inode,
di,
@@ -2442,6 +2443,7 @@ static int __ocfs2_xattr_set_handle(struct inode *inode,
NULL,
NULL,
&credits);
+ xis->not_found = old_found;
if (ret) {
mlog_errno(ret);
goto out;
@@ -2462,6 +2464,7 @@ static int __ocfs2_xattr_set_handle(struct inode *inode,
if (ret)
goto out;

+ old_found = xis->not_found;
xis->not_found = -ENODATA;
ret = ocfs2_calc_xattr_set_need(inode,
di,
@@ -2471,6 +2474,7 @@ static int __ocfs2_xattr_set_handle(struct inode *inode,
NULL,
NULL,
&credits);
+ xis->not_found = old_found;
if (ret) {
mlog_errno(ret);
goto out;
--
1.5.6

2008-12-19 22:09:41

by Mark Fasheh

[permalink] [raw]
Subject: [PATCH 44/45] ocfs2/xattr: Fix a bug in xattr allocation estimation

From: Tao Ma <[email protected]>

When we extend one xattr's value to a large size, the old value size might
be smaller than the size of a value root. In those cases, we still need to
guess the metadata allocation.

Reported-by: Tiger Yang <[email protected]>
Signed-off-by: Tao Ma <[email protected]>
Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/xattr.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 8af29b3..d0b94ed 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -2270,6 +2270,7 @@ static int ocfs2_calc_xattr_set_need(struct inode *inode,
value_size);
xv = (struct ocfs2_xattr_value_root *)
(base + name_offset + name_len);
+ value_size = OCFS2_XATTR_ROOT_SIZE;
} else
xv = &def_xv.xv;

@@ -2283,7 +2284,8 @@ static int ocfs2_calc_xattr_set_need(struct inode *inode,
&xv->xr_list,
new_clusters -
old_clusters);
- goto out;
+ if (value_size >= OCFS2_XATTR_ROOT_SIZE)
+ goto out;
}
} else {
/*
--
1.5.6