2011-01-05 01:01:14

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 0/6] Shrinking the size of ext4_inode_info

The following set of patches shrink the size of ext4_inode_info, which
is present in memory for every single ext4 inode in the inode cache.
For example, on one of my machines I currently have 172261 ext4 inodes
in my cache, which represents 161 megabytes of memory. On an x86_64
machine, these patches allow me to shrink the ext4 in-core inode size
from 952 bytes to 872 bytes. This represents a 8.4% decrease. Using
the SLUB allocator, this means we can now fit 18 inodes into an order 4
(16384 bytes) slab, instead of the previous 17 inodes. This results in
an effective decrease of 5.6% of memory consumption by the ext4 inode
cache.

It would be possible to further slim down the ext4_inode_cache by
another 100 bytes or so, by breaking the ext4_inode_info into the
portion of the inode required a file is opened for writing, and
everything else. That would be a fairly disruptive change, though, so
I'll save that for another time.

- Ted

Theodore Ts'o (6):
ext4: replace i_delalloc_reserved_flag with
EXT4_STATE_DELALLOC_RESERVED
ext4: Use ext4_lblk_t instead of sector_t for logical blocks
ext4: Drop ec_type from the ext4_ext_cache structure
ext4: reorder ext4_inode_info structure elements to remove unneeded
padding
ext4: Drop i_state_flags on architectures with 64-bit longs
ext4: Dynamically allocate the jbd2_inode in ext4_inode_info as
necessary

fs/ext4/balloc.c | 3 ++-
fs/ext4/ext4.h | 36 +++++++++++++++++++++++-------------
fs/ext4/ext4_extents.h | 8 ++------
fs/ext4/ext4_jbd2.h | 2 +-
fs/ext4/extents.c | 39 ++++++++++++++++-----------------------
fs/ext4/file.c | 19 +++++++++++++++++++
fs/ext4/ialloc.c | 2 +-
fs/ext4/inode.c | 29 ++++++++++++++++++-----------
fs/ext4/mballoc.c | 5 +++--
fs/ext4/super.c | 17 +++++++----------
fs/jbd2/journal.c | 20 +++++++++++++-------
include/linux/jbd2.h | 20 ++++++++++++++++++--
12 files changed, 123 insertions(+), 77 deletions(-)

--
1.7.3.1



2011-01-05 01:01:14

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 1/6] ext4: replace i_delalloc_reserved_flag with EXT4_STATE_DELALLOC_RESERVED

Remove the short element i_delalloc_reserved_flag from the
ext4_inode_info structure and replace it a new bit in i_state_flags.
Since we have an ext4_inode_info for every ext4 inode cached in the
inode cache, any savings we can produce here is a very good thing from
a memory utilization perspective.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/balloc.c | 3 ++-
fs/ext4/ext4.h | 2 +-
fs/ext4/inode.c | 6 +++---
fs/ext4/mballoc.c | 5 +++--
fs/ext4/super.c | 1 -
5 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 14c3af2..adf96b8 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -592,7 +592,8 @@ ext4_fsblk_t ext4_new_meta_blocks(handle_t *handle, struct inode *inode,
* Account for the allocated meta blocks. We will never
* fail EDQUOT for metdata, but we do account for it.
*/
- if (!(*errp) && EXT4_I(inode)->i_delalloc_reserved_flag) {
+ if (!(*errp) &&
+ ext4_test_inode_state(inode, EXT4_STATE_DELALLOC_RESERVED)) {
spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
EXT4_I(inode)->i_allocated_meta_blocks += ar.len;
spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 2a73925..b7ee66f 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -828,7 +828,6 @@ struct ext4_inode_info {
unsigned int i_reserved_data_blocks;
unsigned int i_reserved_meta_blocks;
unsigned int i_allocated_meta_blocks;
- unsigned short i_delalloc_reserved_flag;
sector_t i_da_metadata_calc_last_lblock;
int i_da_metadata_calc_len;

@@ -1235,6 +1234,7 @@ enum {
EXT4_STATE_EXT_MIGRATE, /* Inode is migrating */
EXT4_STATE_DIO_UNWRITTEN, /* need convert on dio done*/
EXT4_STATE_NEWENTRY, /* File just added to dir */
+ EXT4_STATE_DELALLOC_RESERVED, /* blks already reserved for delalloc */
};

#define EXT4_INODE_BIT_FNS(name, field) \
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c0fe426..ac08460 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1330,7 +1330,7 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
* avoid double accounting
*/
if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
- EXT4_I(inode)->i_delalloc_reserved_flag = 1;
+ ext4_set_inode_state(inode, EXT4_STATE_DELALLOC_RESERVED);
/*
* We need to check for EXT4 here because migrate
* could have changed the inode type in between
@@ -1360,7 +1360,7 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
ext4_da_update_reserve_space(inode, retval, 1);
}
if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
- EXT4_I(inode)->i_delalloc_reserved_flag = 0;
+ ext4_clear_inode_state(inode, EXT4_STATE_DELALLOC_RESERVED);

up_write((&EXT4_I(inode)->i_data_sem));
if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) {
@@ -2249,7 +2249,7 @@ static void mpage_da_map_and_submit(struct mpage_da_data *mpd)
* affects functions in many different parts of the allocation
* call path. This flag exists primarily because we don't
* want to change *many* call functions, so ext4_map_blocks()
- * will set the magic i_delalloc_reserved_flag once the
+ * will set the EXT4_STATE_DELALLOC_RESERVED flag once the
* inode's allocation semaphore is taken.
*
* If the blocks in questions were delalloc blocks, set
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 1b10681..32d1d40 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4283,7 +4283,7 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle,
* EDQUOT check, as blocks and quotas have been already
* reserved when data being copied into pagecache.
*/
- if (EXT4_I(ar->inode)->i_delalloc_reserved_flag)
+ if (ext4_test_inode_state(ar->inode, EXT4_STATE_DELALLOC_RESERVED))
ar->flags |= EXT4_MB_DELALLOC_RESERVED;
else {
/* Without delayed allocation we need to verify
@@ -4380,7 +4380,8 @@ out:
if (inquota && ar->len < inquota)
dquot_free_block(ar->inode, inquota - ar->len);
if (!ar->len) {
- if (!EXT4_I(ar->inode)->i_delalloc_reserved_flag)
+ if (!ext4_test_inode_state(ar->inode,
+ EXT4_STATE_DELALLOC_RESERVED))
/* release all the reserved blocks if non delalloc */
percpu_counter_sub(&sbi->s_dirtyblocks_counter,
reserv_blks);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 7728a4c..f5960d6 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -828,7 +828,6 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
ei->i_reserved_meta_blocks = 0;
ei->i_allocated_meta_blocks = 0;
ei->i_da_metadata_calc_len = 0;
- ei->i_delalloc_reserved_flag = 0;
spin_lock_init(&(ei->i_block_reservation_lock));
#ifdef CONFIG_QUOTA
ei->i_reserved_quota = 0;
--
1.7.3.1


2011-01-05 01:01:14

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 2/6] ext4: Use ext4_lblk_t instead of sector_t for logical blocks

This fixes a number of places where we used sector_t instead of
ext4_lblk_t for logical blocks, which for ext4 are still 32-bit data
types. No point wasting space in the ext4_inode_info structure, and
requiring 64-bit arithmetic on 32-bit systems, when it isn't
necessary.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/ext4.h | 2 +-
fs/ext4/ext4_extents.h | 2 +-
fs/ext4/extents.c | 2 +-
fs/ext4/inode.c | 4 ++--
4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index b7ee66f..746a598 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -828,7 +828,7 @@ struct ext4_inode_info {
unsigned int i_reserved_data_blocks;
unsigned int i_reserved_meta_blocks;
unsigned int i_allocated_meta_blocks;
- sector_t i_da_metadata_calc_last_lblock;
+ ext4_lblk_t i_da_metadata_calc_last_lblock;
int i_da_metadata_calc_len;

/* on-disk additional length */
diff --git a/fs/ext4/ext4_extents.h b/fs/ext4/ext4_extents.h
index 28ce70f..dfdda17 100644
--- a/fs/ext4/ext4_extents.h
+++ b/fs/ext4/ext4_extents.h
@@ -278,7 +278,7 @@ static inline void ext4_idx_store_pblock(struct ext4_extent_idx *ix,
}

extern int ext4_ext_calc_metadata_amount(struct inode *inode,
- sector_t lblocks);
+ ext4_lblk_t lblocks);
extern int ext4_extent_tree_init(handle_t *, struct inode *);
extern int ext4_ext_calc_credits_for_single_extent(struct inode *inode,
int num,
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index ed622e2..a16fe91 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -266,7 +266,7 @@ static inline int ext4_ext_space_root_idx(struct inode *inode, int check)
* to allocate @blocks
* Worse case is one block per extent
*/
-int ext4_ext_calc_metadata_amount(struct inode *inode, sector_t lblock)
+int ext4_ext_calc_metadata_amount(struct inode *inode, ext4_lblk_t lblock)
{
struct ext4_inode_info *ei = EXT4_I(inode);
int idxs, num = 0;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index ac08460..3ae8313 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1091,7 +1091,7 @@ static int ext4_indirect_calc_metadata_amount(struct inode *inode,
* Calculate the number of metadata blocks need to reserve
* to allocate a block located at @lblock
*/
-static int ext4_calc_metadata_amount(struct inode *inode, sector_t lblock)
+static int ext4_calc_metadata_amount(struct inode *inode, ext4_lblk_t lblock)
{
if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
return ext4_ext_calc_metadata_amount(inode, lblock);
@@ -1888,7 +1888,7 @@ static int ext4_journalled_write_end(struct file *file,
/*
* Reserve a single block located at lblock
*/
-static int ext4_da_reserve_space(struct inode *inode, sector_t lblock)
+static int ext4_da_reserve_space(struct inode *inode, ext4_lblk_t lblock)
{
int retries = 0;
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
--
1.7.3.1


2011-01-05 01:01:16

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 4/6] ext4: reorder ext4_inode_info structure elements to remove unneeded padding

By reordering the elements in the ext4_inode_info structure, we can
reduce the padding needed on an x86_64 system by 16 bytes.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/ext4.h | 7 ++++---
1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index de937fc..50e3d24 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -763,10 +763,10 @@ struct ext4_inode_info {
* near to their parent directory's inode.
*/
ext4_group_t i_block_group;
+ ext4_lblk_t i_dir_start_lookup;
unsigned long i_state_flags; /* Dynamic state flags */
unsigned long i_flags;

- ext4_lblk_t i_dir_start_lookup;
#ifdef CONFIG_EXT4_FS_XATTR
/*
* Extended attributes can be read independently of the main file
@@ -835,7 +835,6 @@ struct ext4_inode_info {
/* on-disk additional length */
__u16 i_extra_isize;

- spinlock_t i_block_reservation_lock;
#ifdef CONFIG_QUOTA
/* quota space reservation, managed internally by quota code */
qsize_t i_reserved_quota;
@@ -844,9 +843,11 @@ struct ext4_inode_info {
/* completed IOs that might need unwritten extents handling */
struct list_head i_completed_io_list;
spinlock_t i_completed_io_lock;
+ atomic_t i_ioend_count; /* Number of outstanding io_end structs */
/* current io_end structure for async DIO write*/
ext4_io_end_t *cur_aio_dio;
- atomic_t i_ioend_count; /* Number of outstanding io_end structs */
+
+ spinlock_t i_block_reservation_lock;

/*
* Transactions that contain inode's metadata needed to complete
--
1.7.3.1


2011-01-05 01:01:17

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 5/6] ext4: Drop i_state_flags on architectures with 64-bit longs

We can store the dynamic inode state flags in the high bits of
EXT4_I(inode)->i_flags, and eliminate i_state_flags. This saves 8
bytes from the size of ext4_inode_info structure, which when
multiplied by the number of the number of in the inode cache, can save
a lot of memory.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/ext4.h | 20 ++++++++++++++------
fs/ext4/ialloc.c | 2 +-
fs/ext4/inode.c | 4 ++--
3 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 50e3d24..8a2603d 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -764,7 +764,9 @@ struct ext4_inode_info {
*/
ext4_group_t i_block_group;
ext4_lblk_t i_dir_start_lookup;
+#if (BITS_PER_LONG < 64)
unsigned long i_state_flags; /* Dynamic state flags */
+#endif
unsigned long i_flags;

#ifdef CONFIG_EXT4_FS_XATTR
@@ -1239,22 +1241,28 @@ enum {
EXT4_STATE_DELALLOC_RESERVED, /* blks already reserved for delalloc */
};

-#define EXT4_INODE_BIT_FNS(name, field) \
+#define EXT4_INODE_BIT_FNS(name, field, offset) \
static inline int ext4_test_inode_##name(struct inode *inode, int bit) \
{ \
- return test_bit(bit, &EXT4_I(inode)->i_##field); \
+ return test_bit(bit + (offset), &EXT4_I(inode)->i_##field); \
} \
static inline void ext4_set_inode_##name(struct inode *inode, int bit) \
{ \
- set_bit(bit, &EXT4_I(inode)->i_##field); \
+ set_bit(bit + (offset), &EXT4_I(inode)->i_##field); \
} \
static inline void ext4_clear_inode_##name(struct inode *inode, int bit) \
{ \
- clear_bit(bit, &EXT4_I(inode)->i_##field); \
+ clear_bit(bit + (offset), &EXT4_I(inode)->i_##field); \
}

-EXT4_INODE_BIT_FNS(flag, flags)
-EXT4_INODE_BIT_FNS(state, state_flags)
+EXT4_INODE_BIT_FNS(flag, flags, 0)
+#if (BITS_PER_LONG < 64)
+EXT4_INODE_BIT_FNS(state, state_flags, 0)
+#define EXT4_CLEAR_STATE_FLAGS(ei) (ei)->i_state_flags = 0
+#else
+EXT4_INODE_BIT_FNS(state, flags, 32)
+#define EXT4_CLEAR_STATE_FLAGS(ei) (ei)->i_flags = 0
+#endif
#else
/* Assume that user mode programs are passing in an ext4fs superblock, not
* a kernel struct super_block. This will allow us to call the feature-test
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 1ce240a..67b6779 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -1009,6 +1009,7 @@ got:
* extent flag on newly created directory and file only if -o extent
* mount option is specified
*/
+ EXT4_CLEAR_STATE_FLAGS(ei);
ei->i_flags =
ext4_mask_flags(mode, EXT4_I(dir)->i_flags & EXT4_FL_INHERITED);
ei->i_file_acl = 0;
@@ -1027,7 +1028,6 @@ got:
inode->i_generation = sbi->s_next_generation++;
spin_unlock(&sbi->s_next_gen_lock);

- ei->i_state_flags = 0;
ext4_set_inode_state(inode, EXT4_STATE_NEW);

ei->i_extra_isize = EXT4_SB(sb)->s_want_extra_isize;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3ae8313..2d60028 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4868,7 +4868,7 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
}
inode->i_nlink = le16_to_cpu(raw_inode->i_links_count);

- ei->i_state_flags = 0;
+ EXT4_CLEAR_STATE_FLAGS(ei);
ei->i_dir_start_lookup = 0;
ei->i_dtime = le32_to_cpu(raw_inode->i_dtime);
/* We now have enough fields to check if the inode was active or not.
@@ -5127,7 +5127,7 @@ static int ext4_do_update_inode(handle_t *handle,
if (ext4_inode_blocks_set(handle, raw_inode, ei))
goto out_brelse;
raw_inode->i_dtime = cpu_to_le32(ei->i_dtime);
- raw_inode->i_flags = cpu_to_le32(ei->i_flags);
+ raw_inode->i_flags = cpu_to_le32(ei->i_flags & 0xFFFFFFFF);
if (EXT4_SB(inode->i_sb)->s_es->s_creator_os !=
cpu_to_le32(EXT4_OS_HURD))
raw_inode->i_file_acl_high =
--
1.7.3.1


2011-01-05 01:01:15

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 3/6] ext4: Drop ec_type from the ext4_ext_cache structure

We can encode the ec_type information by using ee_len == 0 to denote
EXT4_EXT_CACHE_NO, ee_start == 0 to denote EXT4_EXT_CACHE_GAP, and if
neither is true, then the cache type must be EXT4_EXT_CACHE_EXTENT.
This allows us to reduce the size of ext4_ext_inode by another 8
bytes. (ec_type is 4 bytes, plus another 4 bytes of padding)

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/ext4.h | 3 ++-
fs/ext4/ext4_extents.h | 6 +-----
fs/ext4/extents.c | 37 +++++++++++++++----------------------
3 files changed, 18 insertions(+), 28 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 746a598..de937fc 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -738,12 +738,13 @@ do { \

/*
* storage for cached extent
+ * If ec_len == 0, then the cache is invalid.
+ * If ec_start == 0, then the cache represents a gap (null mapping)
*/
struct ext4_ext_cache {
ext4_fsblk_t ec_start;
ext4_lblk_t ec_block;
__u32 ec_len; /* must be 32bit to return holes */
- __u32 ec_type;
};

/*
diff --git a/fs/ext4/ext4_extents.h b/fs/ext4/ext4_extents.h
index dfdda17..2e29abb 100644
--- a/fs/ext4/ext4_extents.h
+++ b/fs/ext4/ext4_extents.h
@@ -119,10 +119,6 @@ struct ext4_ext_path {
* structure for external API
*/

-#define EXT4_EXT_CACHE_NO 0
-#define EXT4_EXT_CACHE_GAP 1
-#define EXT4_EXT_CACHE_EXTENT 2
-
/*
* to be called by ext4_ext_walk_space()
* negative retcode - error
@@ -197,7 +193,7 @@ static inline unsigned short ext_depth(struct inode *inode)
static inline void
ext4_ext_invalidate_cache(struct inode *inode)
{
- EXT4_I(inode)->i_cached_extent.ec_type = EXT4_EXT_CACHE_NO;
+ EXT4_I(inode)->i_cached_extent.ec_len = 0;
}

static inline void ext4_ext_mark_uninitialized(struct ext4_extent *ext)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index a16fe91..a643b50 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -1894,12 +1894,10 @@ static int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
cbex.ec_block = start;
cbex.ec_len = end - start;
cbex.ec_start = 0;
- cbex.ec_type = EXT4_EXT_CACHE_GAP;
} else {
cbex.ec_block = le32_to_cpu(ex->ee_block);
cbex.ec_len = ext4_ext_get_actual_len(ex);
cbex.ec_start = ext4_ext_pblock(ex);
- cbex.ec_type = EXT4_EXT_CACHE_EXTENT;
}

if (unlikely(cbex.ec_len == 0)) {
@@ -1939,13 +1937,12 @@ static int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,

static void
ext4_ext_put_in_cache(struct inode *inode, ext4_lblk_t block,
- __u32 len, ext4_fsblk_t start, int type)
+ __u32 len, ext4_fsblk_t start)
{
struct ext4_ext_cache *cex;
BUG_ON(len == 0);
spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
cex = &EXT4_I(inode)->i_cached_extent;
- cex->ec_type = type;
cex->ec_block = block;
cex->ec_len = len;
cex->ec_start = start;
@@ -1998,15 +1995,18 @@ ext4_ext_put_gap_in_cache(struct inode *inode, struct ext4_ext_path *path,
}

ext_debug(" -> %u:%lu\n", lblock, len);
- ext4_ext_put_in_cache(inode, lblock, len, 0, EXT4_EXT_CACHE_GAP);
+ ext4_ext_put_in_cache(inode, lblock, len, 0);
}

+/*
+ * Return 0 if cache is invalid; 1 if the cache is valid
+ */
static int
ext4_ext_in_cache(struct inode *inode, ext4_lblk_t block,
struct ext4_extent *ex)
{
struct ext4_ext_cache *cex;
- int ret = EXT4_EXT_CACHE_NO;
+ int ret = 0;

/*
* We borrow i_block_reservation_lock to protect i_cached_extent
@@ -2015,11 +2015,9 @@ ext4_ext_in_cache(struct inode *inode, ext4_lblk_t block,
cex = &EXT4_I(inode)->i_cached_extent;

/* has cache valid data? */
- if (cex->ec_type == EXT4_EXT_CACHE_NO)
+ if (cex->ec_len == 0)
goto errout;

- BUG_ON(cex->ec_type != EXT4_EXT_CACHE_GAP &&
- cex->ec_type != EXT4_EXT_CACHE_EXTENT);
if (in_range(block, cex->ec_block, cex->ec_len)) {
ex->ee_block = cpu_to_le32(cex->ec_block);
ext4_ext_store_pblock(ex, cex->ec_start);
@@ -2027,7 +2025,7 @@ ext4_ext_in_cache(struct inode *inode, ext4_lblk_t block,
ext_debug("%u cached by %u:%u:%llu\n",
block,
cex->ec_block, cex->ec_len, cex->ec_start);
- ret = cex->ec_type;
+ ret = 1;
}
errout:
spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
@@ -3298,7 +3296,7 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
struct ext4_extent_header *eh;
struct ext4_extent newex, *ex;
ext4_fsblk_t newblock;
- int err = 0, depth, ret, cache_type;
+ int err = 0, depth, ret;
unsigned int allocated = 0;
struct ext4_allocation_request ar;
ext4_io_end_t *io = EXT4_I(inode)->cur_aio_dio;
@@ -3307,9 +3305,8 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
map->m_lblk, map->m_len, inode->i_ino);

/* check in cache */
- cache_type = ext4_ext_in_cache(inode, map->m_lblk, &newex);
- if (cache_type) {
- if (cache_type == EXT4_EXT_CACHE_GAP) {
+ if (ext4_ext_in_cache(inode, map->m_lblk, &newex)) {
+ if (!newex.ee_start_lo && !newex.ee_start_hi) {
if ((flags & EXT4_GET_BLOCKS_CREATE) == 0) {
/*
* block isn't allocated yet and
@@ -3318,7 +3315,7 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
goto out2;
}
/* we should allocate requested block */
- } else if (cache_type == EXT4_EXT_CACHE_EXTENT) {
+ } else {
/* block is already allocated */
newblock = map->m_lblk
- le32_to_cpu(newex.ee_block)
@@ -3327,8 +3324,6 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
allocated = ext4_ext_get_actual_len(&newex) -
(map->m_lblk - le32_to_cpu(newex.ee_block));
goto out;
- } else {
- BUG();
}
}

@@ -3379,8 +3374,7 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
/* Do not put uninitialized extent in the cache */
if (!ext4_ext_is_uninitialized(ex)) {
ext4_ext_put_in_cache(inode, ee_block,
- ee_len, ee_start,
- EXT4_EXT_CACHE_EXTENT);
+ ee_len, ee_start);
goto out;
}
ret = ext4_ext_handle_uninitialized_extents(handle,
@@ -3512,8 +3506,7 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
* when it is _not_ an uninitialized extent.
*/
if ((flags & EXT4_GET_BLOCKS_UNINIT_EXT) == 0) {
- ext4_ext_put_in_cache(inode, map->m_lblk, allocated, newblock,
- EXT4_EXT_CACHE_EXTENT);
+ ext4_ext_put_in_cache(inode, map->m_lblk, allocated, newblock);
ext4_update_inode_fsync_trans(handle, inode, 1);
} else
ext4_update_inode_fsync_trans(handle, inode, 0);
@@ -3789,7 +3782,7 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,

logical = (__u64)newex->ec_block << blksize_bits;

- if (newex->ec_type == EXT4_EXT_CACHE_GAP) {
+ if (newex->ec_start == 0) {
pgoff_t offset;
struct page *page;
struct buffer_head *bh = NULL;
--
1.7.3.1


2011-01-05 01:01:18

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 6/6] ext4: Dynamically allocate the jbd2_inode in ext4_inode_info as necessary

Replace the jbd2_inode structure (which is 48 bytes) with a pointer
and only allocate the jbd2_inode when it is needed --- that is, when
the file system has a journal present and the inode has been opened
for writing. This allows us to further slim down the ext4_inode_info
structure.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/ext4.h | 2 +-
fs/ext4/ext4_jbd2.h | 2 +-
fs/ext4/file.c | 19 +++++++++++++++++++
fs/ext4/inode.c | 15 +++++++++++----
fs/ext4/super.c | 16 +++++++---------
fs/jbd2/journal.c | 20 +++++++++++++-------
include/linux/jbd2.h | 20 ++++++++++++++++++--
7 files changed, 70 insertions(+), 24 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 8a2603d..50f35e8 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -811,7 +811,7 @@ struct ext4_inode_info {
*/
struct rw_semaphore i_data_sem;
struct inode vfs_inode;
- struct jbd2_inode jinode;
+ struct jbd2_inode *jinode;

struct ext4_ext_cache i_cached_extent;
/*
diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index b0bd792..d8b992e 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -253,7 +253,7 @@ static inline int ext4_journal_force_commit(journal_t *journal)
static inline int ext4_jbd2_file_inode(handle_t *handle, struct inode *inode)
{
if (ext4_handle_valid(handle))
- return jbd2_journal_file_inode(handle, &EXT4_I(inode)->jinode);
+ return jbd2_journal_file_inode(handle, EXT4_I(inode)->jinode);
return 0;
}

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 5a5c55d..45a656f 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -104,6 +104,7 @@ static int ext4_file_open(struct inode * inode, struct file * filp)
{
struct super_block *sb = inode->i_sb;
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+ struct ext4_inode_info *ei = EXT4_I(inode);
struct vfsmount *mnt = filp->f_path.mnt;
struct path path;
char buf[64], *cp;
@@ -127,6 +128,24 @@ static int ext4_file_open(struct inode * inode, struct file * filp)
ext4_mark_super_dirty(sb);
}
}
+ /*
+ * Set up the jbd2_inode if we are opening the inode for
+ * writing and the journal is present
+ */
+ if (sbi->s_journal && !ei->jinode && (filp->f_mode & FMODE_WRITE)) {
+ struct jbd2_inode *jinode = jbd2_alloc_inode(GFP_KERNEL);
+
+ spin_lock(&inode->i_lock);
+ if (!ei->jinode) {
+ if (!jinode) {
+ spin_unlock(&inode->i_lock);
+ return -ENOMEM;
+ }
+ ei->jinode = jinode;
+ jbd2_journal_init_jbd_inode(ei->jinode, inode);
+ }
+ spin_unlock(&inode->i_lock);
+ }
return dquot_file_open(inode, filp);
}

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 2d60028..de3f2e3 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -55,10 +55,17 @@ static inline int ext4_begin_ordered_truncate(struct inode *inode,
loff_t new_size)
{
trace_ext4_begin_ordered_truncate(inode, new_size);
- return jbd2_journal_begin_ordered_truncate(
- EXT4_SB(inode->i_sb)->s_journal,
- &EXT4_I(inode)->jinode,
- new_size);
+ /*
+ * If jinode is zero, then we never opened the file for
+ * writing, so there's no need to call
+ * jbd2_journal_begin_ordered_truncate() since there's no
+ * outstanding writes we need to flush.
+ */
+ if (!EXT4_I(inode)->jinode)
+ return 0;
+ return jbd2_journal_begin_ordered_truncate(EXT4_JOURNAL(inode),
+ EXT4_I(inode)->jinode,
+ new_size);
}

static void ext4_invalidatepage(struct page *page, unsigned long offset);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index f5960d6..7f99ac7 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -818,12 +818,6 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
memset(&ei->i_cached_extent, 0, sizeof(struct ext4_ext_cache));
INIT_LIST_HEAD(&ei->i_prealloc_list);
spin_lock_init(&ei->i_prealloc_lock);
- /*
- * Note: We can be called before EXT4_SB(sb)->s_journal is set,
- * therefore it can be null here. Don't check it, just initialize
- * jinode.
- */
- jbd2_journal_init_jbd_inode(&ei->jinode, &ei->vfs_inode);
ei->i_reserved_data_blocks = 0;
ei->i_reserved_meta_blocks = 0;
ei->i_allocated_meta_blocks = 0;
@@ -832,6 +826,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
#ifdef CONFIG_QUOTA
ei->i_reserved_quota = 0;
#endif
+ ei->jinode = 0;
INIT_LIST_HEAD(&ei->i_completed_io_list);
spin_lock_init(&ei->i_completed_io_lock);
ei->cur_aio_dio = NULL;
@@ -900,9 +895,12 @@ void ext4_clear_inode(struct inode *inode)
end_writeback(inode);
dquot_drop(inode);
ext4_discard_preallocations(inode);
- if (EXT4_JOURNAL(inode))
- jbd2_journal_release_jbd_inode(EXT4_SB(inode->i_sb)->s_journal,
- &EXT4_I(inode)->jinode);
+ if (EXT4_JOURNAL(inode) && EXT4_I(inode)->jinode) {
+ jbd2_journal_release_jbd_inode(EXT4_JOURNAL(inode),
+ EXT4_I(inode)->jinode);
+ jbd2_free_inode(EXT4_I(inode)->jinode);
+ EXT4_I(inode)->jinode = 0;
+ }
}

static inline void ext4_show_quota_options(struct seq_file *seq,
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 2447bd8..9e46869 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -94,6 +94,7 @@ EXPORT_SYMBOL(jbd2_journal_file_inode);
EXPORT_SYMBOL(jbd2_journal_init_jbd_inode);
EXPORT_SYMBOL(jbd2_journal_release_jbd_inode);
EXPORT_SYMBOL(jbd2_journal_begin_ordered_truncate);
+EXPORT_SYMBOL(jbd2_inode_cache);

static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
static void __journal_abort_soft (journal_t *journal, int errno);
@@ -2286,17 +2287,19 @@ static void __exit jbd2_remove_jbd_stats_proc_entry(void)

#endif

-struct kmem_cache *jbd2_handle_cache;
+struct kmem_cache *jbd2_handle_cache, *jbd2_inode_cache;

static int __init journal_init_handle_cache(void)
{
- jbd2_handle_cache = kmem_cache_create("jbd2_journal_handle",
- sizeof(handle_t),
- 0, /* offset */
- SLAB_TEMPORARY, /* flags */
- NULL); /* ctor */
+ jbd2_handle_cache = KMEM_CACHE(jbd2_journal_handle, SLAB_TEMPORARY);
if (jbd2_handle_cache == NULL) {
- printk(KERN_EMERG "JBD: failed to create handle cache\n");
+ printk(KERN_EMERG "JBD2: failed to create handle cache\n");
+ return -ENOMEM;
+ }
+ jbd2_inode_cache = KMEM_CACHE(jbd2_inode, 0);
+ if (jbd2_inode_cache == NULL) {
+ printk(KERN_EMERG "JBD2: failed to create inode cache\n");
+ kmem_cache_destroy(jbd2_handle_cache);
return -ENOMEM;
}
return 0;
@@ -2306,6 +2309,9 @@ static void jbd2_journal_destroy_handle_cache(void)
{
if (jbd2_handle_cache)
kmem_cache_destroy(jbd2_handle_cache);
+ if (jbd2_inode_cache)
+ kmem_cache_destroy(jbd2_inode_cache);
+
}

/*
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 2ae86aa..27e79c2 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -94,7 +94,7 @@ extern void jbd2_free(void *ptr, size_t size);
*
* This is an opaque datatype.
**/
-typedef struct handle_s handle_t; /* Atomic operation type */
+typedef struct jbd2_journal_handle handle_t; /* Atomic operation type */


/**
@@ -416,7 +416,7 @@ struct jbd2_revoke_table_s;
* in so it can be fixed later.
*/

-struct handle_s
+struct jbd2_journal_handle
{
/* Which compound transaction is this update a part of? */
transaction_t *h_transaction;
@@ -1158,6 +1158,22 @@ static inline void jbd2_free_handle(handle_t *handle)
kmem_cache_free(jbd2_handle_cache, handle);
}

+/*
+ * jbd2_inode management (optional, for those file systems that want to use
+ * dynamically allocated jbd2_inode structures)
+ */
+extern struct kmem_cache *jbd2_inode_cache;
+
+static inline struct jbd2_inode *jbd2_alloc_inode(gfp_t gfp_flags)
+{
+ return kmem_cache_alloc(jbd2_inode_cache, gfp_flags);
+}
+
+static inline void jbd2_free_inode(struct jbd2_inode *jinode)
+{
+ kmem_cache_free(jbd2_inode_cache, jinode);
+}
+
/* Primary revoke support */
#define JOURNAL_REVOKE_DEFAULT_HASH 256
extern int jbd2_journal_init_revoke(journal_t *, int);
--
1.7.3.1


2011-01-05 09:26:07

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH 6/6] ext4: Dynamically allocate the jbd2_inode in ext4_inode_info as necessary

On Wed, Jan 5, 2011 at 3:01 AM, Theodore Ts'o <[email protected]> wrote:
> Replace the jbd2_inode structure (which is 48 bytes) with a pointer
> and only allocate the jbd2_inode when it is needed --- that is, when
> the file system has a journal present and the inode has been opened
> for writing. ?This allows us to further slim down the ext4_inode_info
> structure.
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> ?fs/ext4/ext4.h ? ? ? | ? ?2 +-
> ?fs/ext4/ext4_jbd2.h ?| ? ?2 +-
> ?fs/ext4/file.c ? ? ? | ? 19 +++++++++++++++++++
> ?fs/ext4/inode.c ? ? ?| ? 15 +++++++++++----
> ?fs/ext4/super.c ? ? ?| ? 16 +++++++---------
> ?fs/jbd2/journal.c ? ?| ? 20 +++++++++++++-------
> ?include/linux/jbd2.h | ? 20 ++++++++++++++++++--
> ?7 files changed, 70 insertions(+), 24 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 8a2603d..50f35e8 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -811,7 +811,7 @@ struct ext4_inode_info {
> ? ? ? ? */
> ? ? ? ?struct rw_semaphore i_data_sem;
> ? ? ? ?struct inode vfs_inode;
> - ? ? ? struct jbd2_inode jinode;
> + ? ? ? struct jbd2_inode *jinode;
>
> ? ? ? ?struct ext4_ext_cache i_cached_extent;
> ? ? ? ?/*
> diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
> index b0bd792..d8b992e 100644
> --- a/fs/ext4/ext4_jbd2.h
> +++ b/fs/ext4/ext4_jbd2.h
> @@ -253,7 +253,7 @@ static inline int ext4_journal_force_commit(journal_t *journal)
> ?static inline int ext4_jbd2_file_inode(handle_t *handle, struct inode *inode)
> ?{
> ? ? ? ?if (ext4_handle_valid(handle))
> - ? ? ? ? ? ? ? return jbd2_journal_file_inode(handle, &EXT4_I(inode)->jinode);
> + ? ? ? ? ? ? ? return jbd2_journal_file_inode(handle, EXT4_I(inode)->jinode);
> ? ? ? ?return 0;
> ?}
>
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index 5a5c55d..45a656f 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -104,6 +104,7 @@ static int ext4_file_open(struct inode * inode, struct file * filp)
> ?{
> ? ? ? ?struct super_block *sb = inode->i_sb;
> ? ? ? ?struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> + ? ? ? struct ext4_inode_info *ei = EXT4_I(inode);
> ? ? ? ?struct vfsmount *mnt = filp->f_path.mnt;
> ? ? ? ?struct path path;
> ? ? ? ?char buf[64], *cp;
> @@ -127,6 +128,24 @@ static int ext4_file_open(struct inode * inode, struct file * filp)
> ? ? ? ? ? ? ? ? ? ? ? ?ext4_mark_super_dirty(sb);
> ? ? ? ? ? ? ? ?}
> ? ? ? ?}
> + ? ? ? /*
> + ? ? ? ?* Set up the jbd2_inode if we are opening the inode for
> + ? ? ? ?* writing and the journal is present
> + ? ? ? ?*/
> + ? ? ? if (sbi->s_journal && !ei->jinode && (filp->f_mode & FMODE_WRITE)) {
> + ? ? ? ? ? ? ? struct jbd2_inode *jinode = jbd2_alloc_inode(GFP_KERNEL);
> +
> + ? ? ? ? ? ? ? spin_lock(&inode->i_lock);
> + ? ? ? ? ? ? ? if (!ei->jinode) {
> + ? ? ? ? ? ? ? ? ? ? ? if (!jinode) {
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? spin_unlock(&inode->i_lock);
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? return -ENOMEM;
> + ? ? ? ? ? ? ? ? ? ? ? }
> + ? ? ? ? ? ? ? ? ? ? ? ei->jinode = jinode;
> + ? ? ? ? ? ? ? ? ? ? ? jbd2_journal_init_jbd_inode(ei->jinode, inode);

jinode = NULL;

> + ? ? ? ? ? ? ? }
> + ? ? ? ? ? ? ? spin_unlock(&inode->i_lock);

if (jinode)
// ei->jinode was allocated after first test - can it happen?
// if not, then 2nd test under spinlock is not needed
jbd2_free_inode(jinode);

> + ? ? ? }
> ? ? ? ?return dquot_file_open(inode, filp);
> ?}
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 2d60028..de3f2e3 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -55,10 +55,17 @@ static inline int ext4_begin_ordered_truncate(struct inode *inode,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?loff_t new_size)
> ?{
> ? ? ? ?trace_ext4_begin_ordered_truncate(inode, new_size);
> - ? ? ? return jbd2_journal_begin_ordered_truncate(
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? EXT4_SB(inode->i_sb)->s_journal,
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? &EXT4_I(inode)->jinode,
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? new_size);
> + ? ? ? /*
> + ? ? ? ?* If jinode is zero, then we never opened the file for
> + ? ? ? ?* writing, so there's no need to call
> + ? ? ? ?* jbd2_journal_begin_ordered_truncate() since there's no
> + ? ? ? ?* outstanding writes we need to flush.
> + ? ? ? ?*/
> + ? ? ? if (!EXT4_I(inode)->jinode)
> + ? ? ? ? ? ? ? return 0;
> + ? ? ? return jbd2_journal_begin_ordered_truncate(EXT4_JOURNAL(inode),
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?EXT4_I(inode)->jinode,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?new_size);
> ?}
>
> ?static void ext4_invalidatepage(struct page *page, unsigned long offset);
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index f5960d6..7f99ac7 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -818,12 +818,6 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
> ? ? ? ?memset(&ei->i_cached_extent, 0, sizeof(struct ext4_ext_cache));
> ? ? ? ?INIT_LIST_HEAD(&ei->i_prealloc_list);
> ? ? ? ?spin_lock_init(&ei->i_prealloc_lock);
> - ? ? ? /*
> - ? ? ? ?* Note: ?We can be called before EXT4_SB(sb)->s_journal is set,
> - ? ? ? ?* therefore it can be null here. ?Don't check it, just initialize
> - ? ? ? ?* jinode.
> - ? ? ? ?*/
> - ? ? ? jbd2_journal_init_jbd_inode(&ei->jinode, &ei->vfs_inode);
> ? ? ? ?ei->i_reserved_data_blocks = 0;
> ? ? ? ?ei->i_reserved_meta_blocks = 0;
> ? ? ? ?ei->i_allocated_meta_blocks = 0;
> @@ -832,6 +826,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
> ?#ifdef CONFIG_QUOTA
> ? ? ? ?ei->i_reserved_quota = 0;
> ?#endif
> + ? ? ? ei->jinode = 0;
> ? ? ? ?INIT_LIST_HEAD(&ei->i_completed_io_list);
> ? ? ? ?spin_lock_init(&ei->i_completed_io_lock);
> ? ? ? ?ei->cur_aio_dio = NULL;
> @@ -900,9 +895,12 @@ void ext4_clear_inode(struct inode *inode)
> ? ? ? ?end_writeback(inode);
> ? ? ? ?dquot_drop(inode);
> ? ? ? ?ext4_discard_preallocations(inode);
> - ? ? ? if (EXT4_JOURNAL(inode))
> - ? ? ? ? ? ? ? jbd2_journal_release_jbd_inode(EXT4_SB(inode->i_sb)->s_journal,
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?&EXT4_I(inode)->jinode);
> + ? ? ? if (EXT4_JOURNAL(inode) && EXT4_I(inode)->jinode) {
> + ? ? ? ? ? ? ? jbd2_journal_release_jbd_inode(EXT4_JOURNAL(inode),
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?EXT4_I(inode)->jinode);
> + ? ? ? ? ? ? ? jbd2_free_inode(EXT4_I(inode)->jinode);
> + ? ? ? ? ? ? ? EXT4_I(inode)->jinode = 0;
> + ? ? ? }
> ?}
>
> ?static inline void ext4_show_quota_options(struct seq_file *seq,
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index 2447bd8..9e46869 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -94,6 +94,7 @@ EXPORT_SYMBOL(jbd2_journal_file_inode);
> ?EXPORT_SYMBOL(jbd2_journal_init_jbd_inode);
> ?EXPORT_SYMBOL(jbd2_journal_release_jbd_inode);
> ?EXPORT_SYMBOL(jbd2_journal_begin_ordered_truncate);
> +EXPORT_SYMBOL(jbd2_inode_cache);
>
> ?static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
> ?static void __journal_abort_soft (journal_t *journal, int errno);
> @@ -2286,17 +2287,19 @@ static void __exit jbd2_remove_jbd_stats_proc_entry(void)
>
> ?#endif
>
> -struct kmem_cache *jbd2_handle_cache;
> +struct kmem_cache *jbd2_handle_cache, *jbd2_inode_cache;
>
> ?static int __init journal_init_handle_cache(void)
> ?{
> - ? ? ? jbd2_handle_cache = kmem_cache_create("jbd2_journal_handle",
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? sizeof(handle_t),
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0, ? ? ? ? ? ? ?/* offset */
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? SLAB_TEMPORARY, /* flags */
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? NULL); ? ? ? ? ?/* ctor */
> + ? ? ? jbd2_handle_cache = KMEM_CACHE(jbd2_journal_handle, SLAB_TEMPORARY);
> ? ? ? ?if (jbd2_handle_cache == NULL) {
> - ? ? ? ? ? ? ? printk(KERN_EMERG "JBD: failed to create handle cache\n");
> + ? ? ? ? ? ? ? printk(KERN_EMERG "JBD2: failed to create handle cache\n");
> + ? ? ? ? ? ? ? return -ENOMEM;
> + ? ? ? }
> + ? ? ? jbd2_inode_cache = KMEM_CACHE(jbd2_inode, 0);
> + ? ? ? if (jbd2_inode_cache == NULL) {
> + ? ? ? ? ? ? ? printk(KERN_EMERG "JBD2: failed to create inode cache\n");
> + ? ? ? ? ? ? ? kmem_cache_destroy(jbd2_handle_cache);
> ? ? ? ? ? ? ? ?return -ENOMEM;
> ? ? ? ?}
> ? ? ? ?return 0;
> @@ -2306,6 +2309,9 @@ static void jbd2_journal_destroy_handle_cache(void)
> ?{
> ? ? ? ?if (jbd2_handle_cache)
> ? ? ? ? ? ? ? ?kmem_cache_destroy(jbd2_handle_cache);
> + ? ? ? if (jbd2_inode_cache)
> + ? ? ? ? ? ? ? kmem_cache_destroy(jbd2_inode_cache);
> +
> ?}
>
> ?/*
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index 2ae86aa..27e79c2 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -94,7 +94,7 @@ extern void jbd2_free(void *ptr, size_t size);
> ?*
> ?* This is an opaque datatype.
> ?**/
> -typedef struct handle_s ? ? ? ? ? ? ? ?handle_t; ? ? ? /* Atomic operation type */
> +typedef struct jbd2_journal_handle handle_t; ? /* Atomic operation type */
>
>
> ?/**
> @@ -416,7 +416,7 @@ struct jbd2_revoke_table_s;
> ?* in so it can be fixed later.
> ?*/
>
> -struct handle_s
> +struct jbd2_journal_handle
> ?{
> ? ? ? ?/* Which compound transaction is this update a part of? */
> ? ? ? ?transaction_t ? ? ? ? ? *h_transaction;
> @@ -1158,6 +1158,22 @@ static inline void jbd2_free_handle(handle_t *handle)
> ? ? ? ?kmem_cache_free(jbd2_handle_cache, handle);
> ?}
>
> +/*
> + * jbd2_inode management (optional, for those file systems that want to use
> + * dynamically allocated jbd2_inode structures)
> + */
> +extern struct kmem_cache *jbd2_inode_cache;
> +
> +static inline struct jbd2_inode *jbd2_alloc_inode(gfp_t gfp_flags)
> +{
> + ? ? ? return kmem_cache_alloc(jbd2_inode_cache, gfp_flags);
> +}
> +
> +static inline void jbd2_free_inode(struct jbd2_inode *jinode)
> +{
> + ? ? ? kmem_cache_free(jbd2_inode_cache, jinode);
> +}
> +
> ?/* Primary revoke support */
> ?#define JOURNAL_REVOKE_DEFAULT_HASH 256
> ?extern int ? ? ? ?jbd2_journal_init_revoke(journal_t *, int);
> --
> 1.7.3.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

2011-01-05 18:43:10

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH 5/6] ext4: Drop i_state_flags on architectures with 64-bit longs

On 2011-01-04, at 18:01, Theodore Ts'o wrote:
> +#if (BITS_PER_LONG < 64)
> +EXT4_INODE_BIT_FNS(state, state_flags, 0)
> +#define EXT4_CLEAR_STATE_FLAGS(ei) (ei)->i_state_flags = 0
> +#else
> +EXT4_INODE_BIT_FNS(state, flags, 32)
> +#define EXT4_CLEAR_STATE_FLAGS(ei) (ei)->i_flags = 0

This looks like it will clear all of the i_flags values, instead of just the
state flags. It should probably be something like:

#define EXT4_CLEAR_STATE_FLAGS(ei) (ei)->i_flags &= 0xffffffffULL;

> @@ -1009,6 +1009,7 @@ got:
> * extent flag on newly created directory and file only if -o extent
> * mount option is specified
> */
> + EXT4_CLEAR_STATE_FLAGS(ei);
> ei->i_flags =
> ext4_mask_flags(mode, EXT4_I(dir)->i_flags & EXT4_FL_INHERITED);
> ei->i_file_acl = 0;
> @@ -1027,7 +1028,6 @@ got:
> inode->i_generation = sbi->s_next_generation++;
> spin_unlock(&sbi->s_next_gen_lock);
>
> - ei->i_state_flags = 0;

It looks like you have compensated for the above by changing the
code here, but I think it is risky/confusing if clearing the state
flags has a side-effect on 64-bit arches, that doesn't exist on
32-bit arches. It looks like a bug waiting to happen...

Cheers, Andreas






2011-01-05 19:26:34

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH 6/6] ext4: Dynamically allocate the jbd2_inode in ext4_inode_info as necessary

On 2011-01-04, at 18:01, Theodore Ts'o wrote:
> Replace the jbd2_inode structure (which is 48 bytes) with a pointer
> and only allocate the jbd2_inode when it is needed --- that is, when
> the file system has a journal present and the inode has been opened
> for writing. This allows us to further slim down the ext4_inode_info
> structure.

How does this change impact the majority of users that are running with a journal? It is clearly a win for a small percentage of users with no-journal mode, but it may be a net increase in memory usage for the majority of the users (with journal). There will now be two allocations for every inode, and the extra packing these allocations into slabs will increase memory usage for an inode, and would definitely result in more allocation/freeing overhead.

The main question is how many files are ever opened for write? It isn't just the number of currently-open files for write, because the jinfo isn't released until the inode is cleared from memory. While I suspect that most inodes in cache are never opened for write, it would be worthwhile to compare the ext4_inode_cache object count against the jbd2_inode object count, and see how the total memory compares to a before-patch system running different workloads (with journal).


> @@ -55,10 +55,17 @@ static inline int ext4_begin_ordered_truncate(struct inode *inode,
> loff_t new_size)
> {
> trace_ext4_begin_ordered_truncate(inode, new_size);
> + /*
> + * If jinode is zero, then we never opened the file for
> + * writing, so there's no need to call
> + * jbd2_journal_begin_ordered_truncate() since there's no
> + * outstanding writes we need to flush.
> + */
> + if (!EXT4_I(inode)->jinode)
> + return 0;
> + return jbd2_journal_begin_ordered_truncate(EXT4_JOURNAL(inode),
> + EXT4_I(inode)->jinode,
> + new_size);
> }

In fact, the only callsites of this function are protected with:

if (ext4_should_order_data(inode))
ext4_begin_ordered_truncate(inode, size)

which will skip the call to ext4_begin_ordered_truncate() if the filesystem is running in no-journal mode (EXT4_JOURNAL(inode) == NULL)). That means the only reason this function could be called with jinode == NULL is due to memory corruption, and it makes sense to replace this with:

BUG_ON(EXT4_I(inode)->jinode == NULL);

> @@ -832,6 +826,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
> #ifdef CONFIG_QUOTA
> ei->i_reserved_quota = 0;
> #endif
> + ei->jinode = 0;

This should be "ei->jinode = NULL".

Cheers, Andreas






Cheers, Andreas






2011-01-05 20:21:07

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 6/6] ext4: Dynamically allocate the jbd2_inode in ext4_inode_info as necessary

On Wed, Jan 05, 2011 at 12:26:33PM -0700, Andreas Dilger wrote:
>
> How does this change impact the majority of users that are running
> with a journal? It is clearly a win for a small percentage of users
> with no-journal mode, but it may be a net increase in memory usage
> for the majority of the users (with journal). There will now be two
> allocations for every inode, and the extra packing these allocations
> into slabs will increase memory usage for an inode, and would
> definitely result in more allocation/freeing overhead.
>
> The main question is how many files are ever opened for write?

Even if we do two allocations for every inode (not just inodes opened
for write), it's a win simply because moving the jinode out the
ext4_inode_info structure shrinks it sufficiently that we can now pack
18 inodes in a 16k slab on x86_64. It turns out that the slab
allocator is pretty inefficient at large data structures, and smaller
data structures (such as the jbd2_inode structure) it handles much
more efficiently, in terms of wasted memory.

> It
> isn't just the number of currently-open files for write, because the
> jinfo isn't released until the inode is cleared from memory. While
> I suspect that most inodes in cache are never opened for write, it
> would be worthwhile to compare the ext4_inode_cache object count
> against the jbd2_inode object count, and see how the total memory
> compares to a before-patch system running different workloads (with
> journal).

Sure. It should be possible to release jinfo when the file is
completely closed, in ext4_release_file. That would reduce the memory
footprint significantly. I hadn't bothered with it too badly because
the jbd2_inode structure is only 48 bytes, and you can fit 85 of them
on a 4k page with only 16 bytes getting wasted. But it's fair that we
release jinode once the inode is no longer used by any file descriptors.


I'll make the the other changes you suggested; thanks!!

- Ted

2011-01-05 20:21:51

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 6/6] ext4: Dynamically allocate the jbd2_inode in ext4_inode_info as necessary

On Wed, Jan 05, 2011 at 11:26:05AM +0200, Amir Goldstein wrote:
>
> if (jinode)
> // ei->jinode was allocated after first test - can it happen?
> // if not, then 2nd test under spinlock is not needed
> jbd2_free_inode(jinode);

Good point; currently if there are two CPU's racing to open the file,
we could end up with a memory leak. Thanks for pointing that out!

- Ted

2011-01-05 20:29:55

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 5/6] ext4: Drop i_state_flags on architectures with 64-bit longs

On Wed, Jan 05, 2011 at 11:43:08AM -0700, Andreas Dilger wrote:
>
> It looks like you have compensated for the above by changing the
> code here, but I think it is risky/confusing if clearing the state
> flags has a side-effect on 64-bit arches, that doesn't exist on
> 32-bit arches. It looks like a bug waiting to happen...

Yeah, I did think of this, but it seemed like extra/needless work that
I was trying to optimize away. It's still not safe to do:

#define EXT4_CLEAR_STATE_FLAGS(ei) (ei)->i_flags &= 0xffffffffULL;

... since we're not atomically updating i_flags. So if anyone tries
using EXT4_CLEAR_STATE_FLAGS() aside from the two allocation contexts,
they need to be careful anyway.

I did think about putting the #ifdef BITS_PER_LONG < 64 inline in the
code, but that's ugly.

Maybe the best thing to do is to clearly document this pitfall, and
then leave things as-is? There aren't a lot of great solutions.

- Ted

2011-01-06 07:23:55

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH 5/6] ext4: Drop i_state_flags on architectures with 64-bit longs

On 2011-01-05, at 13:29, Ted Ts'o wrote:
> On Wed, Jan 05, 2011 at 11:43:08AM -0700, Andreas Dilger wrote:
>>
>> It looks like you have compensated for the above by changing the
>> code here, but I think it is risky/confusing if clearing the state
>> flags has a side-effect on 64-bit arches, that doesn't exist on
>> 32-bit arches. It looks like a bug waiting to happen...
>
> Yeah, I did think of this, but it seemed like extra/needless work that
> I was trying to optimize away. It's still not safe to do:
>
> #define EXT4_CLEAR_STATE_FLAGS(ei) (ei)->i_flags &= 0xffffffffULL;
>
> ... since we're not atomically updating i_flags.

This code would only be used on a 64-bit arch, so it should be updating the whole word at once (unlike a 32-bit arch).

> So if anyone tries using EXT4_CLEAR_STATE_FLAGS() aside from the two
> allocation contexts, they need to be careful anyway.
>
> I did think about putting the #ifdef BITS_PER_LONG < 64 inline in the
> code, but that's ugly.

I'm missing the point of that - isn't the EXT4_CLEAR_STATE_FLAGS() macro
masking already conditional on 64-bit architectures?

The one call in ext4_do_update_inode() that is masking i_flags is redundant,
since the cpu_to_le32() macro is itself either masking the value before
swabbing, and/or it is truncated by the assignment to i_flags.

> Maybe the best thing to do is to clearly document this pitfall, and
> then leave things as-is? There aren't a lot of great solutions.

It doesn't matter so much to me in the end. At least documenting this
anomaly is useful, and I don't think that doing the masking is harmful.

Cheers, Andreas






2011-01-06 17:55:30

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 5/6] ext4: Drop i_state_flags on architectures with 64-bit longs

On Thu, Jan 06, 2011 at 12:23:53AM -0700, Andreas Dilger wrote:
> > Yeah, I did think of this, but it seemed like extra/needless work that
> > I was trying to optimize away. It's still not safe to do:
> >
> > #define EXT4_CLEAR_STATE_FLAGS(ei) (ei)->i_flags &= 0xffffffffULL;
> >
> > ... since we're not atomically updating i_flags.
>
> This code would only be used on a 64-bit arch, so it should be
> updating the whole word at once (unlike a 32-bit arch).

Masking off the low 32-bits is a read / mask / write set of operations
on RISCy architectures. Even on an Intel architecture, unless you
specifically use the LOCK prefix on the AND operation, you can still
risk racing with another CPU while you do the mask operation. So no,
it's not atomic.

This is why we did a wholesale replacement of explicit C bit
operations on ei->i_flags and replaced them with
ext4_{set/get/clear}_inode_flags. And why in the jbd/jbd2 layer, we
take an explicit spinlock before modifying j_flags. (An obvious thing
to try doing to reduce spinlock contention on big 48-core machines is
to replace do a similar replacement in the jbd2 layer; on my todo
list.)

> > I did think about putting the #ifdef BITS_PER_LONG < 64 inline in the
> > code, but that's ugly.
>
> I'm missing the point of that - isn't the EXT4_CLEAR_STATE_FLAGS() macro
> masking already conditional on 64-bit architectures?

Sorry, I must not have been clear. I was referring to putting in an
inline #if statement into the C code as being ugly.

> The one call in ext4_do_update_inode() that is masking i_flags is redundant,
> since the cpu_to_le32() macro is itself either masking the value before
> swabbing, and/or it is truncated by the assignment to i_flags.

(Actually, I think the C compiler will likely notice that the earlier
i_flags is redudant, and optimize it out.)

I could have replaced things with this instead:

#if BITS_PER_LONG < 64
ei->i_state_flags = 0
#endif

That's more "correct", but it's ugly. On the other hand, it does make
explicit what is going on, instead of hiding it in a header file.
Whether we think it's better depends on how likely that people will
get confused in the future. If we think it's likely this section of
code will be regularly modified, then we can keep it explicit to
minimize the chances that a future change in the code will move things
around and I don't notice when I'm reviewing the patch.

If we think the section of code is less likely to be changed, then
hiding this detail could make it easier for the casual reader to
understand the code.

To be honoest, it's a minor point either way. I don't have strong
feelings about how this gets done one way or another; I just had to
pick one, and I went with the latter approach. I'm certainly willing
to be pursuaded otherwise that the chances that someone could fall
into the pitfall would be huge.

- Ted

2011-01-06 21:15:49

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH] ext4: Drop i_state_flags on architectures with 64-bit longs

We can store the dynamic inode state flags in the high bits of
EXT4_I(inode)->i_flags, and eliminate i_state_flags. This saves 8
bytes from the size of ext4_inode_info structure, which when
multiplied by the number of the number of in the inode cache, can save
a lot of memory.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/ext4.h | 20 ++++++++++++++------
fs/ext4/ialloc.c | 2 +-
fs/ext4/inode.c | 4 ++--
3 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 50e3d24..2569c23 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -764,7 +764,9 @@ struct ext4_inode_info {
*/
ext4_group_t i_block_group;
ext4_lblk_t i_dir_start_lookup;
+#if (BITS_PER_LONG < 64)
unsigned long i_state_flags; /* Dynamic state flags */
+#endif
unsigned long i_flags;

#ifdef CONFIG_EXT4_FS_XATTR
@@ -1239,22 +1241,28 @@ enum {
EXT4_STATE_DELALLOC_RESERVED, /* blks already reserved for delalloc */
};

-#define EXT4_INODE_BIT_FNS(name, field) \
+#define EXT4_INODE_BIT_FNS(name, field, offset) \
static inline int ext4_test_inode_##name(struct inode *inode, int bit) \
{ \
- return test_bit(bit, &EXT4_I(inode)->i_##field); \
+ return test_bit(bit + (offset), &EXT4_I(inode)->i_##field); \
} \
static inline void ext4_set_inode_##name(struct inode *inode, int bit) \
{ \
- set_bit(bit, &EXT4_I(inode)->i_##field); \
+ set_bit(bit + (offset), &EXT4_I(inode)->i_##field); \
} \
static inline void ext4_clear_inode_##name(struct inode *inode, int bit) \
{ \
- clear_bit(bit, &EXT4_I(inode)->i_##field); \
+ clear_bit(bit + (offset), &EXT4_I(inode)->i_##field); \
}

-EXT4_INODE_BIT_FNS(flag, flags)
-EXT4_INODE_BIT_FNS(state, state_flags)
+EXT4_INODE_BIT_FNS(flag, flags, 0)
+#if (BITS_PER_LONG < 64)
+EXT4_INODE_BIT_FNS(state, state_flags, 0)
+#define EXT4_CLEAR_STATE_FLAGS(ei) (ei)->i_state_flags = 0
+#else
+EXT4_INODE_BIT_FNS(state, flags, 32)
+#define EXT4_CLEAR_STATE_FLAGS(ei) do { } while (0)
+#endif
#else
/* Assume that user mode programs are passing in an ext4fs superblock, not
* a kernel struct super_block. This will allow us to call the feature-test
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 1ce240a..9b0f230 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -1027,7 +1027,7 @@ got:
inode->i_generation = sbi->s_next_generation++;
spin_unlock(&sbi->s_next_gen_lock);

- ei->i_state_flags = 0;
+ EXT4_CLEAR_STATE_FLAGS(ei); /* Only relevant on 32-bit archs */
ext4_set_inode_state(inode, EXT4_STATE_NEW);

ei->i_extra_isize = EXT4_SB(sb)->s_want_extra_isize;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3ae8313..d34ddc6 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4868,7 +4868,7 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
}
inode->i_nlink = le16_to_cpu(raw_inode->i_links_count);

- ei->i_state_flags = 0;
+ EXT4_CLEAR_STATE_FLAGS(ei); /* Only relevant on 32-bit archs */
ei->i_dir_start_lookup = 0;
ei->i_dtime = le32_to_cpu(raw_inode->i_dtime);
/* We now have enough fields to check if the inode was active or not.
@@ -5127,7 +5127,7 @@ static int ext4_do_update_inode(handle_t *handle,
if (ext4_inode_blocks_set(handle, raw_inode, ei))
goto out_brelse;
raw_inode->i_dtime = cpu_to_le32(ei->i_dtime);
- raw_inode->i_flags = cpu_to_le32(ei->i_flags);
+ raw_inode->i_flags = cpu_to_le32(ei->i_flags & 0xFFFFFFFF);
if (EXT4_SB(inode->i_sb)->s_es->s_creator_os !=
cpu_to_le32(EXT4_OS_HURD))
raw_inode->i_file_acl_high =
--
1.7.3.1


2011-01-06 22:14:26

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 6/6] ext4: Dynamically allocate the jbd2_inode in ext4_inode_info as necessary

On Wed, Jan 05, 2011 at 12:26:33PM -0700, Andreas Dilger wrote:
> In fact, the only callsites of this function are protected with:
>
> if (ext4_should_order_data(inode))
> ext4_begin_ordered_truncate(inode, size)
>
> which will skip the call to ext4_begin_ordered_truncate() if the
> filesystem is running in no-journal mode (EXT4_JOURNAL(inode) ==
> NULL)). That means the only reason this function could be called
> with jinode == NULL is due to memory corruption, and it makes sense
> to replace this with:

While it's true all of the callers of this function are protected with
ext4_should_order_data(), this function can be called by unlink(), and
if the file hasn't been opened for write, jinode will be NULL. So
returning if jinode is NULL is in fact the right thing to do.

- Ted

2011-01-07 02:36:08

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH -v2] ext4: Dynamically allocate the jbd2_inode in ext4_inode_info as necessary

Replace the jbd2_inode structure (which is 48 bytes) with a pointer
and only allocate the jbd2_inode when it is needed --- that is, when
the file system has a journal present and the inode has been opened
for writing. This allows us to further slim down the ext4_inode_info
structure.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---

I looked into release the jinode structure in ext4_release_file(), but
that doesn't work because we need to wait until the delayed allocation
is completed. At some point I'll look into adding something that checks
to see if we have finished doing the background writeout, but I'm going
to save that for another patch.

fs/ext4/ext4.h | 2 +-
fs/ext4/ext4_jbd2.h | 2 +-
fs/ext4/file.c | 22 ++++++++++++++++++++++
fs/ext4/inode.c | 17 ++++++++++++-----
fs/ext4/super.c | 16 +++++++---------
fs/jbd2/journal.c | 20 +++++++++++++-------
include/linux/jbd2.h | 20 ++++++++++++++++++--
7 files changed, 74 insertions(+), 25 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 2569c23..2d20424 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -811,7 +811,7 @@ struct ext4_inode_info {
*/
struct rw_semaphore i_data_sem;
struct inode vfs_inode;
- struct jbd2_inode jinode;
+ struct jbd2_inode *jinode;

struct ext4_ext_cache i_cached_extent;
/*
diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index b0bd792..d8b992e 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -253,7 +253,7 @@ static inline int ext4_journal_force_commit(journal_t *journal)
static inline int ext4_jbd2_file_inode(handle_t *handle, struct inode *inode)
{
if (ext4_handle_valid(handle))
- return jbd2_journal_file_inode(handle, &EXT4_I(inode)->jinode);
+ return jbd2_journal_file_inode(handle, EXT4_I(inode)->jinode);
return 0;
}

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 5a5c55d..b1b4884 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -104,6 +104,7 @@ static int ext4_file_open(struct inode * inode, struct file * filp)
{
struct super_block *sb = inode->i_sb;
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+ struct ext4_inode_info *ei = EXT4_I(inode);
struct vfsmount *mnt = filp->f_path.mnt;
struct path path;
char buf[64], *cp;
@@ -127,6 +128,27 @@ static int ext4_file_open(struct inode * inode, struct file * filp)
ext4_mark_super_dirty(sb);
}
}
+ /*
+ * Set up the jbd2_inode if we are opening the inode for
+ * writing and the journal is present
+ */
+ if (sbi->s_journal && !ei->jinode && (filp->f_mode & FMODE_WRITE)) {
+ struct jbd2_inode *jinode = jbd2_alloc_inode(GFP_KERNEL);
+
+ spin_lock(&inode->i_lock);
+ if (!ei->jinode) {
+ if (!jinode) {
+ spin_unlock(&inode->i_lock);
+ return -ENOMEM;
+ }
+ ei->jinode = jinode;
+ jbd2_journal_init_jbd_inode(ei->jinode, inode);
+ jinode = NULL;
+ }
+ if (unlikely(jinode != NULL))
+ jbd2_free_inode(jinode);
+ spin_unlock(&inode->i_lock);
+ }
return dquot_file_open(inode, filp);
}

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d34ddc6..fb5fe73 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -55,10 +55,17 @@ static inline int ext4_begin_ordered_truncate(struct inode *inode,
loff_t new_size)
{
trace_ext4_begin_ordered_truncate(inode, new_size);
- return jbd2_journal_begin_ordered_truncate(
- EXT4_SB(inode->i_sb)->s_journal,
- &EXT4_I(inode)->jinode,
- new_size);
+ /*
+ * If jinode is zero, then we never opened the file for
+ * writing, so there's no need to call
+ * jbd2_journal_begin_ordered_truncate() since there's no
+ * outstanding writes we need to flush.
+ */
+ if (!EXT4_I(inode)->jinode)
+ return 0;
+ return jbd2_journal_begin_ordered_truncate(EXT4_JOURNAL(inode),
+ EXT4_I(inode)->jinode,
+ new_size);
}

static void ext4_invalidatepage(struct page *page, unsigned long offset);
@@ -4054,7 +4061,7 @@ int ext4_block_truncate_page(handle_t *handle,
if (ext4_should_journal_data(inode)) {
err = ext4_handle_dirty_metadata(handle, inode, bh);
} else {
- if (ext4_should_order_data(inode))
+ if (ext4_should_order_data(inode) && EXT4_I(inode)->jinode)
err = ext4_jbd2_file_inode(handle, inode);
mark_buffer_dirty(bh);
}
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index f5960d6..1cd4326 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -818,12 +818,6 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
memset(&ei->i_cached_extent, 0, sizeof(struct ext4_ext_cache));
INIT_LIST_HEAD(&ei->i_prealloc_list);
spin_lock_init(&ei->i_prealloc_lock);
- /*
- * Note: We can be called before EXT4_SB(sb)->s_journal is set,
- * therefore it can be null here. Don't check it, just initialize
- * jinode.
- */
- jbd2_journal_init_jbd_inode(&ei->jinode, &ei->vfs_inode);
ei->i_reserved_data_blocks = 0;
ei->i_reserved_meta_blocks = 0;
ei->i_allocated_meta_blocks = 0;
@@ -832,6 +826,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
#ifdef CONFIG_QUOTA
ei->i_reserved_quota = 0;
#endif
+ ei->jinode = NULL;
INIT_LIST_HEAD(&ei->i_completed_io_list);
spin_lock_init(&ei->i_completed_io_lock);
ei->cur_aio_dio = NULL;
@@ -900,9 +895,12 @@ void ext4_clear_inode(struct inode *inode)
end_writeback(inode);
dquot_drop(inode);
ext4_discard_preallocations(inode);
- if (EXT4_JOURNAL(inode))
- jbd2_journal_release_jbd_inode(EXT4_SB(inode->i_sb)->s_journal,
- &EXT4_I(inode)->jinode);
+ if (EXT4_I(inode)->jinode) {
+ jbd2_journal_release_jbd_inode(EXT4_JOURNAL(inode),
+ EXT4_I(inode)->jinode);
+ jbd2_free_inode(EXT4_I(inode)->jinode);
+ EXT4_I(inode)->jinode = NULL;
+ }
}

static inline void ext4_show_quota_options(struct seq_file *seq,
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 2447bd8..9e46869 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -94,6 +94,7 @@ EXPORT_SYMBOL(jbd2_journal_file_inode);
EXPORT_SYMBOL(jbd2_journal_init_jbd_inode);
EXPORT_SYMBOL(jbd2_journal_release_jbd_inode);
EXPORT_SYMBOL(jbd2_journal_begin_ordered_truncate);
+EXPORT_SYMBOL(jbd2_inode_cache);

static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
static void __journal_abort_soft (journal_t *journal, int errno);
@@ -2286,17 +2287,19 @@ static void __exit jbd2_remove_jbd_stats_proc_entry(void)

#endif

-struct kmem_cache *jbd2_handle_cache;
+struct kmem_cache *jbd2_handle_cache, *jbd2_inode_cache;

static int __init journal_init_handle_cache(void)
{
- jbd2_handle_cache = kmem_cache_create("jbd2_journal_handle",
- sizeof(handle_t),
- 0, /* offset */
- SLAB_TEMPORARY, /* flags */
- NULL); /* ctor */
+ jbd2_handle_cache = KMEM_CACHE(jbd2_journal_handle, SLAB_TEMPORARY);
if (jbd2_handle_cache == NULL) {
- printk(KERN_EMERG "JBD: failed to create handle cache\n");
+ printk(KERN_EMERG "JBD2: failed to create handle cache\n");
+ return -ENOMEM;
+ }
+ jbd2_inode_cache = KMEM_CACHE(jbd2_inode, 0);
+ if (jbd2_inode_cache == NULL) {
+ printk(KERN_EMERG "JBD2: failed to create inode cache\n");
+ kmem_cache_destroy(jbd2_handle_cache);
return -ENOMEM;
}
return 0;
@@ -2306,6 +2309,9 @@ static void jbd2_journal_destroy_handle_cache(void)
{
if (jbd2_handle_cache)
kmem_cache_destroy(jbd2_handle_cache);
+ if (jbd2_inode_cache)
+ kmem_cache_destroy(jbd2_inode_cache);
+
}

/*
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 2ae86aa..27e79c2 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -94,7 +94,7 @@ extern void jbd2_free(void *ptr, size_t size);
*
* This is an opaque datatype.
**/
-typedef struct handle_s handle_t; /* Atomic operation type */
+typedef struct jbd2_journal_handle handle_t; /* Atomic operation type */


/**
@@ -416,7 +416,7 @@ struct jbd2_revoke_table_s;
* in so it can be fixed later.
*/

-struct handle_s
+struct jbd2_journal_handle
{
/* Which compound transaction is this update a part of? */
transaction_t *h_transaction;
@@ -1158,6 +1158,22 @@ static inline void jbd2_free_handle(handle_t *handle)
kmem_cache_free(jbd2_handle_cache, handle);
}

+/*
+ * jbd2_inode management (optional, for those file systems that want to use
+ * dynamically allocated jbd2_inode structures)
+ */
+extern struct kmem_cache *jbd2_inode_cache;
+
+static inline struct jbd2_inode *jbd2_alloc_inode(gfp_t gfp_flags)
+{
+ return kmem_cache_alloc(jbd2_inode_cache, gfp_flags);
+}
+
+static inline void jbd2_free_inode(struct jbd2_inode *jinode)
+{
+ kmem_cache_free(jbd2_inode_cache, jinode);
+}
+
/* Primary revoke support */
#define JOURNAL_REVOKE_DEFAULT_HASH 256
extern int jbd2_journal_init_revoke(journal_t *, int);
--
1.7.3.1


2011-01-07 20:46:30

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH -v2] ext4: Dynamically allocate the jbd2_inode in ext4_inode_info as necessary

On Fri, Jan 7, 2011 at 4:36 AM, Theodore Ts'o <[email protected]> wrote:
> Replace the jbd2_inode structure (which is 48 bytes) with a pointer
> and only allocate the jbd2_inode when it is needed --- that is, when
> the file system has a journal present and the inode has been opened
> for writing. ?This allows us to further slim down the ext4_inode_info
> structure.
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
>
> I looked into release the jinode structure in ext4_release_file(), but
> that doesn't work because we need to wait until the delayed allocation
> is completed. ?At some point I'll look into adding something that checks
> to see if we have finished doing the background writeout, but I'm going
> to save that for another patch.
>
> ?fs/ext4/ext4.h ? ? ? | ? ?2 +-
> ?fs/ext4/ext4_jbd2.h ?| ? ?2 +-
> ?fs/ext4/file.c ? ? ? | ? 22 ++++++++++++++++++++++
> ?fs/ext4/inode.c ? ? ?| ? 17 ++++++++++++-----
> ?fs/ext4/super.c ? ? ?| ? 16 +++++++---------
> ?fs/jbd2/journal.c ? ?| ? 20 +++++++++++++-------
> ?include/linux/jbd2.h | ? 20 ++++++++++++++++++--
> ?7 files changed, 74 insertions(+), 25 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 2569c23..2d20424 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -811,7 +811,7 @@ struct ext4_inode_info {
> ? ? ? ? */
> ? ? ? ?struct rw_semaphore i_data_sem;
> ? ? ? ?struct inode vfs_inode;
> - ? ? ? struct jbd2_inode jinode;
> + ? ? ? struct jbd2_inode *jinode;
>
> ? ? ? ?struct ext4_ext_cache i_cached_extent;
> ? ? ? ?/*
> diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
> index b0bd792..d8b992e 100644
> --- a/fs/ext4/ext4_jbd2.h
> +++ b/fs/ext4/ext4_jbd2.h
> @@ -253,7 +253,7 @@ static inline int ext4_journal_force_commit(journal_t *journal)
> ?static inline int ext4_jbd2_file_inode(handle_t *handle, struct inode *inode)
> ?{
> ? ? ? ?if (ext4_handle_valid(handle))
> - ? ? ? ? ? ? ? return jbd2_journal_file_inode(handle, &EXT4_I(inode)->jinode);
> + ? ? ? ? ? ? ? return jbd2_journal_file_inode(handle, EXT4_I(inode)->jinode);
> ? ? ? ?return 0;
> ?}
>
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index 5a5c55d..b1b4884 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -104,6 +104,7 @@ static int ext4_file_open(struct inode * inode, struct file * filp)
> ?{
> ? ? ? ?struct super_block *sb = inode->i_sb;
> ? ? ? ?struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> + ? ? ? struct ext4_inode_info *ei = EXT4_I(inode);
> ? ? ? ?struct vfsmount *mnt = filp->f_path.mnt;
> ? ? ? ?struct path path;
> ? ? ? ?char buf[64], *cp;
> @@ -127,6 +128,27 @@ static int ext4_file_open(struct inode * inode, struct file * filp)
> ? ? ? ? ? ? ? ? ? ? ? ?ext4_mark_super_dirty(sb);
> ? ? ? ? ? ? ? ?}
> ? ? ? ?}
> + ? ? ? /*
> + ? ? ? ?* Set up the jbd2_inode if we are opening the inode for
> + ? ? ? ?* writing and the journal is present
> + ? ? ? ?*/
> + ? ? ? if (sbi->s_journal && !ei->jinode && (filp->f_mode & FMODE_WRITE)) {
> + ? ? ? ? ? ? ? struct jbd2_inode *jinode = jbd2_alloc_inode(GFP_KERNEL);
> +
> + ? ? ? ? ? ? ? spin_lock(&inode->i_lock);
> + ? ? ? ? ? ? ? if (!ei->jinode) {
> + ? ? ? ? ? ? ? ? ? ? ? if (!jinode) {
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? spin_unlock(&inode->i_lock);
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? return -ENOMEM;
> + ? ? ? ? ? ? ? ? ? ? ? }
> + ? ? ? ? ? ? ? ? ? ? ? ei->jinode = jinode;
> + ? ? ? ? ? ? ? ? ? ? ? jbd2_journal_init_jbd_inode(ei->jinode, inode);
> + ? ? ? ? ? ? ? ? ? ? ? jinode = NULL;
> + ? ? ? ? ? ? ? }
> + ? ? ? ? ? ? ? if (unlikely(jinode != NULL))
> + ? ? ? ? ? ? ? ? ? ? ? jbd2_free_inode(jinode);

actually, I though, jbd2_free_inode would be better off outside the spinlock,
but if you're going to do it here you might just as well change this to:
+ ? ? ? ? ? ? ? else
+ ? ? ? ? ? ? ? ? ? ? ? jbd2_free_inode(jinode);


> + ? ? ? ? ? ? ? spin_unlock(&inode->i_lock);
> + ? ? ? }
> ? ? ? ?return dquot_file_open(inode, filp);
> ?}
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index d34ddc6..fb5fe73 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -55,10 +55,17 @@ static inline int ext4_begin_ordered_truncate(struct inode *inode,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?loff_t new_size)
> ?{
> ? ? ? ?trace_ext4_begin_ordered_truncate(inode, new_size);
> - ? ? ? return jbd2_journal_begin_ordered_truncate(
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? EXT4_SB(inode->i_sb)->s_journal,
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? &EXT4_I(inode)->jinode,
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? new_size);
> + ? ? ? /*
> + ? ? ? ?* If jinode is zero, then we never opened the file for
> + ? ? ? ?* writing, so there's no need to call
> + ? ? ? ?* jbd2_journal_begin_ordered_truncate() since there's no
> + ? ? ? ?* outstanding writes we need to flush.
> + ? ? ? ?*/
> + ? ? ? if (!EXT4_I(inode)->jinode)
> + ? ? ? ? ? ? ? return 0;
> + ? ? ? return jbd2_journal_begin_ordered_truncate(EXT4_JOURNAL(inode),
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?EXT4_I(inode)->jinode,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?new_size);
> ?}
>
> ?static void ext4_invalidatepage(struct page *page, unsigned long offset);
> @@ -4054,7 +4061,7 @@ int ext4_block_truncate_page(handle_t *handle,
> ? ? ? ?if (ext4_should_journal_data(inode)) {
> ? ? ? ? ? ? ? ?err = ext4_handle_dirty_metadata(handle, inode, bh);
> ? ? ? ?} else {
> - ? ? ? ? ? ? ? if (ext4_should_order_data(inode))
> + ? ? ? ? ? ? ? if (ext4_should_order_data(inode) && EXT4_I(inode)->jinode)
> ? ? ? ? ? ? ? ? ? ? ? ?err = ext4_jbd2_file_inode(handle, inode);
> ? ? ? ? ? ? ? ?mark_buffer_dirty(bh);
> ? ? ? ?}
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index f5960d6..1cd4326 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -818,12 +818,6 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
> ? ? ? ?memset(&ei->i_cached_extent, 0, sizeof(struct ext4_ext_cache));
> ? ? ? ?INIT_LIST_HEAD(&ei->i_prealloc_list);
> ? ? ? ?spin_lock_init(&ei->i_prealloc_lock);
> - ? ? ? /*
> - ? ? ? ?* Note: ?We can be called before EXT4_SB(sb)->s_journal is set,
> - ? ? ? ?* therefore it can be null here. ?Don't check it, just initialize
> - ? ? ? ?* jinode.
> - ? ? ? ?*/
> - ? ? ? jbd2_journal_init_jbd_inode(&ei->jinode, &ei->vfs_inode);
> ? ? ? ?ei->i_reserved_data_blocks = 0;
> ? ? ? ?ei->i_reserved_meta_blocks = 0;
> ? ? ? ?ei->i_allocated_meta_blocks = 0;
> @@ -832,6 +826,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
> ?#ifdef CONFIG_QUOTA
> ? ? ? ?ei->i_reserved_quota = 0;
> ?#endif
> + ? ? ? ei->jinode = NULL;
> ? ? ? ?INIT_LIST_HEAD(&ei->i_completed_io_list);
> ? ? ? ?spin_lock_init(&ei->i_completed_io_lock);
> ? ? ? ?ei->cur_aio_dio = NULL;
> @@ -900,9 +895,12 @@ void ext4_clear_inode(struct inode *inode)
> ? ? ? ?end_writeback(inode);
> ? ? ? ?dquot_drop(inode);
> ? ? ? ?ext4_discard_preallocations(inode);
> - ? ? ? if (EXT4_JOURNAL(inode))
> - ? ? ? ? ? ? ? jbd2_journal_release_jbd_inode(EXT4_SB(inode->i_sb)->s_journal,
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?&EXT4_I(inode)->jinode);
> + ? ? ? if (EXT4_I(inode)->jinode) {
> + ? ? ? ? ? ? ? jbd2_journal_release_jbd_inode(EXT4_JOURNAL(inode),
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?EXT4_I(inode)->jinode);
> + ? ? ? ? ? ? ? jbd2_free_inode(EXT4_I(inode)->jinode);
> + ? ? ? ? ? ? ? EXT4_I(inode)->jinode = NULL;
> + ? ? ? }
> ?}
>
> ?static inline void ext4_show_quota_options(struct seq_file *seq,
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index 2447bd8..9e46869 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -94,6 +94,7 @@ EXPORT_SYMBOL(jbd2_journal_file_inode);
> ?EXPORT_SYMBOL(jbd2_journal_init_jbd_inode);
> ?EXPORT_SYMBOL(jbd2_journal_release_jbd_inode);
> ?EXPORT_SYMBOL(jbd2_journal_begin_ordered_truncate);
> +EXPORT_SYMBOL(jbd2_inode_cache);
>
> ?static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
> ?static void __journal_abort_soft (journal_t *journal, int errno);
> @@ -2286,17 +2287,19 @@ static void __exit jbd2_remove_jbd_stats_proc_entry(void)
>
> ?#endif
>
> -struct kmem_cache *jbd2_handle_cache;
> +struct kmem_cache *jbd2_handle_cache, *jbd2_inode_cache;
>
> ?static int __init journal_init_handle_cache(void)
> ?{
> - ? ? ? jbd2_handle_cache = kmem_cache_create("jbd2_journal_handle",
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? sizeof(handle_t),
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0, ? ? ? ? ? ? ?/* offset */
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? SLAB_TEMPORARY, /* flags */
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? NULL); ? ? ? ? ?/* ctor */
> + ? ? ? jbd2_handle_cache = KMEM_CACHE(jbd2_journal_handle, SLAB_TEMPORARY);
> ? ? ? ?if (jbd2_handle_cache == NULL) {
> - ? ? ? ? ? ? ? printk(KERN_EMERG "JBD: failed to create handle cache\n");
> + ? ? ? ? ? ? ? printk(KERN_EMERG "JBD2: failed to create handle cache\n");
> + ? ? ? ? ? ? ? return -ENOMEM;
> + ? ? ? }
> + ? ? ? jbd2_inode_cache = KMEM_CACHE(jbd2_inode, 0);
> + ? ? ? if (jbd2_inode_cache == NULL) {
> + ? ? ? ? ? ? ? printk(KERN_EMERG "JBD2: failed to create inode cache\n");
> + ? ? ? ? ? ? ? kmem_cache_destroy(jbd2_handle_cache);
> ? ? ? ? ? ? ? ?return -ENOMEM;
> ? ? ? ?}
> ? ? ? ?return 0;
> @@ -2306,6 +2309,9 @@ static void jbd2_journal_destroy_handle_cache(void)
> ?{
> ? ? ? ?if (jbd2_handle_cache)
> ? ? ? ? ? ? ? ?kmem_cache_destroy(jbd2_handle_cache);
> + ? ? ? if (jbd2_inode_cache)
> + ? ? ? ? ? ? ? kmem_cache_destroy(jbd2_inode_cache);
> +
> ?}
>
> ?/*
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index 2ae86aa..27e79c2 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -94,7 +94,7 @@ extern void jbd2_free(void *ptr, size_t size);
> ?*
> ?* This is an opaque datatype.
> ?**/
> -typedef struct handle_s ? ? ? ? ? ? ? ?handle_t; ? ? ? /* Atomic operation type */
> +typedef struct jbd2_journal_handle handle_t; ? /* Atomic operation type */
>
>
> ?/**
> @@ -416,7 +416,7 @@ struct jbd2_revoke_table_s;
> ?* in so it can be fixed later.
> ?*/
>
> -struct handle_s
> +struct jbd2_journal_handle
> ?{
> ? ? ? ?/* Which compound transaction is this update a part of? */
> ? ? ? ?transaction_t ? ? ? ? ? *h_transaction;
> @@ -1158,6 +1158,22 @@ static inline void jbd2_free_handle(handle_t *handle)
> ? ? ? ?kmem_cache_free(jbd2_handle_cache, handle);
> ?}
>
> +/*
> + * jbd2_inode management (optional, for those file systems that want to use
> + * dynamically allocated jbd2_inode structures)
> + */
> +extern struct kmem_cache *jbd2_inode_cache;
> +
> +static inline struct jbd2_inode *jbd2_alloc_inode(gfp_t gfp_flags)
> +{
> + ? ? ? return kmem_cache_alloc(jbd2_inode_cache, gfp_flags);
> +}
> +
> +static inline void jbd2_free_inode(struct jbd2_inode *jinode)
> +{
> + ? ? ? kmem_cache_free(jbd2_inode_cache, jinode);
> +}
> +
> ?/* Primary revoke support */
> ?#define JOURNAL_REVOKE_DEFAULT_HASH 256
> ?extern int ? ? ? ?jbd2_journal_init_revoke(journal_t *, int);
> --
> 1.7.3.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

2011-01-07 22:40:22

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH -v2] ext4: Dynamically allocate the jbd2_inode in ext4_inode_info as necessary

On Fri, Jan 07, 2011 at 10:46:29PM +0200, Amir Goldstein wrote:
>
> actually, I though, jbd2_free_inode would be better off outside the spinlock,

Oops, yes, I agree. Fixed.

- Ted

2011-01-11 21:50:47

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 6/6] ext4: Dynamically allocate the jbd2_inode in ext4_inode_info as necessary

On Wed, Jan 05, 2011 at 12:26:33PM -0700, Andreas Dilger wrote:
>
> How does this change impact the majority of users that are running
> with a journal? It is clearly a win for a small percentage of users
> with no-journal mode, but it may be a net increase in memory usage
> for the majority of the users (with journal)....
>
> The main question is how many files are ever opened for write? It
> isn't just the number of currently-open files for write, because the
> jinfo isn't released until the inode is cleared from memory. While
> I suspect that most inodes in cache are never opened for write, it
> would be worthwhile to compare the ext4_inode_cache object count
> against the jbd2_inode object count, and see how the total memory
> compares to a before-patch system running different workloads (with
> journal).

So here's one unscientific data point, based on your humble kernel
developer who happens to read e-mail using mutt and a Maildir
directory (which has lots of read-only files, one per e-mail) and does
primarily source code editing, kernel builds, web browsing, e-mail,
and irc/IM:

Slabcache: ext4_inode_cache Aliases: 0 Order : 3 Objects: 70820
Slabcache: jbd2_inode Aliases: 5 Order : 0 Objects: 22126

Note that jbd2_inode is aliased, so not all 22,126 objects are
necessarily jbd2_inodes. Some of them might also be ip_fib_alias,
ksm_mm_slot, anon_vma_chain, Acpi-Parse, and nsproxy objects. But
still, the ratio of inodes that I've ever written to inodes that have
only been accessed read-only is over 3:1.

So for at least my laptop/desktop workload, using a separate
jbd2_inode structure is clearly a win, and in fact separating out more
of the ext4_inode_cache structure to create a separate data structure
for fieldsonly needed when the file is opened for writing would likely
be a win. I can probably peel off another 100 bytes or so from the
ext4_inode_info data structure.

- Ted