From: Kalpak Shah Subject: Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode Date: Thu, 12 Jul 2007 17:44:35 +0530 Message-ID: <1184242475.4299.13.camel@garfield> References: <1183275482.4010.133.camel@localhost.localdomain> <20070710163247.5c8bfa3f.akpm@linux-foundation.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-GiLpj11SNWcqlJ8k5JRf" Cc: cmm@us.ibm.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-ext4 To: Andrew Morton Return-path: In-Reply-To: <20070710163247.5c8bfa3f.akpm@linux-foundation.org> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-ext4.vger.kernel.org --=-GiLpj11SNWcqlJ8k5JRf Content-Type: text/plain Content-Transfer-Encoding: 7bit On Tue, 2007-07-10 at 16:32 -0700, Andrew Morton wrote: > On Sun, 01 Jul 2007 03:38:01 -0400 > Mingming Cao wrote: > > > This patch is on top of the nanosecond timestamp and i_version_hi > > patches. > > This sort of information isn't needed (or desired) when this patch hits the > git tree. Please ensure that things like this are cleaned up before the > patches go to Linus. The incremental patch I have attached contains an updated Changelog entry which cleans this up. > > + !(EXT4_I(inode)->i_state & EXT4_STATE_NO_EXPAND)) { > > + /* We need extra buffer credits since we may write into EA block > > + * with this same handle */ > > + if ((jbd2_journal_extend(handle, > > + EXT4_DATA_TRANS_BLOCKS(inode->i_sb))) == 0) { > > + ret = ext4_expand_extra_isize(inode, > > + EXT4_SB(inode->i_sb)->s_want_extra_isize, > > + iloc, handle); > > + if (ret) { > > + EXT4_I(inode)->i_state |= EXT4_STATE_NO_EXPAND; > > + if (!expand_message) { > > + ext4_warning(inode->i_sb, __FUNCTION__, > > + "Unable to expand inode %lu. Delete" > > + " some EAs or run e2fsck.", > > + inode->i_ino); > > + expand_message = 1; > > + } > > + } > > + } > > + } > > Maybe that message should come out once per mount rather than once per boot > (or once per modprobe)? Done. Now the message gets printed the first time s_mnt_count changes, which means once per mount. > > + if (free < new_extra_isize) { > > + if (!tried_min_extra_isize && s_min_extra_isize) { > > + tried_min_extra_isize++; > > + new_extra_isize = s_min_extra_isize; > > Aren't we missing a brelse(bh) here? I have corrected this. > > > > +#define IHDR(inode, raw_inode) \ > > + ((struct ext4_xattr_ibody_header *) \ > > + ((void *)raw_inode + \ > > + EXT4_GOOD_OLD_INODE_SIZE + \ > > + EXT4_I(inode)->i_extra_isize)) > > +#define IFIRST(hdr) ((struct ext4_xattr_entry *)((hdr)+1)) > > Neither of these _have_ to be implemented as macros and hence they should > not be. These macros existed before and have just been moved. There are similar such macros in the ext3/4 xattr code and they should probably be changed to inlines. But that should be done in a different patch. Thanks, Kalpak. --=-GiLpj11SNWcqlJ8k5JRf Content-Disposition: attachment; filename=ext4_expand_inode_extra_isize_correction.patch Content-Type: text/x-patch; name=ext4_expand_inode_extra_isize_correction.patch; charset=UTF-8 Content-Transfer-Encoding: 7bit We need to make sure that existing ext3 filesystems can also avail the new fields that have been added to the ext4 inode. We use s_want_extra_isize and s_min_extra_isize to decide by how much we should expand the inode. If EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature is set then we expand the inode by max(s_want_extra_isize, s_min_extra_isize , sizeof(ext4_inode) - EXT4_GOOD_OLD_INODE_SIZE) bytes. Actually it is still an open question about whether users should be able to set s_*_extra_isize smaller than the known fields or not. This patch also adds the functionality to expand inodes to include the newly added fields. We start by trying to expand by s_want_extra_isize bytes and if its fails we try to expand by s_min_extra_isize bytes. This is done by changing the i_extra_isize if enough space is available in the inode and no EAs are present. If EAs are present and there is enough space in the inode then the EAs in the inode are shifted to make space. If enough space is not available in the inode due to the EAs then 1 or more EAs are shifted to the external EA block. In the worst case when even the external EA block does not have enough space we inform the user that some EA would need to be deleted or s_min_extra_isize would have to be reduced. This would be online expansion of inodes. expand_extra_isize option will also be added to e2fsck which will expand all the inodes and if for any reason expansion fails for any inode then the EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature will be reset. Signed-off-by: Andreas Dilger Signed-off-by: Kalpak Shah Signed-off-by: Mingming Cao Index: linux-2.6.22/fs/ext4/inode.c =================================================================== --- linux-2.6.22.orig/fs/ext4/inode.c +++ linux-2.6.22/fs/ext4/inode.c @@ -3130,9 +3130,8 @@ int ext4_expand_extra_isize(struct inode struct ext4_xattr_ibody_header *header; struct ext4_xattr_entry *entry; - if (EXT4_I(inode)->i_extra_isize >= new_extra_isize) { + if (EXT4_I(inode)->i_extra_isize >= new_extra_isize) return 0; - } raw_inode = ext4_raw_inode(&iloc); @@ -3148,9 +3147,9 @@ int ext4_expand_extra_isize(struct inode return 0; } - /* try to expand with EA present */ + /* try to expand with EAs present */ return ext4_expand_extra_isize_ea(inode, new_extra_isize, - raw_inode, handle); + raw_inode, handle); } /* @@ -3177,29 +3176,34 @@ int ext4_expand_extra_isize(struct inode int ext4_mark_inode_dirty(handle_t *handle, struct inode *inode) { struct ext4_iloc iloc; + struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); + static unsigned int mnt_count; int err, ret; - static int expand_message; might_sleep(); err = ext4_reserve_inode_write(handle, inode, &iloc); - if (EXT4_I(inode)->i_extra_isize < - EXT4_SB(inode->i_sb)->s_want_extra_isize && + if (EXT4_I(inode)->i_extra_isize < sbi->s_want_extra_isize && !(EXT4_I(inode)->i_state & EXT4_STATE_NO_EXPAND)) { - /* We need extra buffer credits since we may write into EA block - * with this same handle */ + /* + * We need extra buffer credits since we may write into EA block + * with this same handle. If journal_extend fails, then it will + * only result in a minor loss of functionality for that inode. + * If this is felt to be critical, then e2fsck should be run to + * force a large enough s_min_extra_isize. + */ if ((jbd2_journal_extend(handle, EXT4_DATA_TRANS_BLOCKS(inode->i_sb))) == 0) { ret = ext4_expand_extra_isize(inode, - EXT4_SB(inode->i_sb)->s_want_extra_isize, - iloc, handle); + sbi->s_want_extra_isize, + iloc, handle); if (ret) { EXT4_I(inode)->i_state |= EXT4_STATE_NO_EXPAND; - if (!expand_message) { + if (mnt_count != sbi->s_es->s_mnt_count) { ext4_warning(inode->i_sb, __FUNCTION__, "Unable to expand inode %lu. Delete" " some EAs or run e2fsck.", inode->i_ino); - expand_message = 1; + mnt_count = sbi->s_es->s_mnt_count; } } } Index: linux-2.6.22/fs/ext4/xattr.c =================================================================== --- linux-2.6.22.orig/fs/ext4/xattr.c +++ linux-2.6.22/fs/ext4/xattr.c @@ -501,7 +501,11 @@ out: return; } -static inline size_t ext4_xattr_free_space(struct ext4_xattr_entry *last, +/* + * Find the available free space for EAs. This also returns the total number of + * bytes used by EA entries. + */ +static size_t ext4_xattr_free_space(struct ext4_xattr_entry *last, size_t *min_offs, void *base, int *total) { for (; !IS_LAST_ENTRY(last); last = EXT4_XATTR_NEXT(last)) { @@ -613,7 +617,6 @@ ext4_xattr_set_entry(struct ext4_xattr_i memmove(s->here, (void *)s->here + size, (void *)last - (void *)s->here + sizeof(__u32)); memset(last, 0, size); - } } @@ -1076,6 +1079,10 @@ retry: return error; } +/* + * Shift the EA entries in the inode to create space for the increased + * i_extra_isize. + */ static void ext4_xattr_shift_entries(struct ext4_xattr_entry *entry, int value_offs_shift, void *to, void *from, size_t n, int blocksize) @@ -1098,12 +1105,11 @@ static void ext4_xattr_shift_entries(str } /* - * Expand an inode by new_extra_isize bytes when EA presents. + * Expand an inode by new_extra_isize bytes when EAs are present. * Returns 0 on success or negative error number on failure. - * */ int ext4_expand_extra_isize_ea(struct inode *inode, int new_extra_isize, - struct ext4_inode *raw_inode, handle_t *handle) + struct ext4_inode *raw_inode, handle_t *handle) { struct ext4_xattr_ibody_header *header; struct ext4_xattr_entry *entry, *last, *first; @@ -1177,6 +1183,7 @@ retry: if (!tried_min_extra_isize && s_min_extra_isize) { tried_min_extra_isize++; new_extra_isize = s_min_extra_isize; + brelse(bh); goto retry; } error = -1; @@ -1193,16 +1200,19 @@ retry: .value = NULL, .value_len = 0, }; - unsigned int total_size, shift_bytes, temp = ~0U; - - is = (struct ext4_xattr_ibody_find *) kmalloc(sizeof(struct - ext4_xattr_ibody_find), GFP_KERNEL); - bs = (struct ext4_xattr_block_find *) kmalloc(sizeof(struct - ext4_xattr_block_find), GFP_KERNEL); - memset((void *)is, 0, sizeof(struct ext4_xattr_ibody_find)); - memset((void *)bs, 0, sizeof(struct ext4_xattr_block_find)); + unsigned int total_size; /* EA entry size + value size */ + unsigned int shift_bytes; /* No. of bytes to shift EAs by? */ + unsigned int min_total_size = ~0U; + + is = kzalloc(sizeof(struct ext4_xattr_ibody_find), GFP_KERNEL); + bs = kzalloc(sizeof(struct ext4_xattr_block_find), GFP_KERNEL); + if (!is || !bs) { + error = -ENOMEM; + goto cleanup; + } - is->s.not_found = bs->s.not_found = -ENODATA; + is->s.not_found = -ENODATA; + bs->s.not_found = -ENODATA; is->iloc.bh = NULL; bs->bh = NULL; @@ -1213,12 +1223,12 @@ retry: total_size = EXT4_XATTR_SIZE(le32_to_cpu(last->e_value_size)) + EXT4_XATTR_LEN(last->e_name_len); - if (total_size <= free && total_size < temp) { + if (total_size <= free && total_size < min_total_size) { if (total_size < new_extra_isize) { small_entry = last; } else { entry = last; - temp = total_size; + min_total_size = total_size; } } } @@ -1243,11 +1253,14 @@ retry: i.name_index = entry->e_name_index, buffer = kmalloc(EXT4_XATTR_SIZE(size), GFP_KERNEL); b_entry_name = kmalloc(entry->e_name_len + 1, GFP_KERNEL); + if (!buffer || !b_entry_name) { + error = -ENOMEM; + goto cleanup; + } /* Save the entry name and the entry value */ - memcpy((void *)buffer, (void *)IFIRST(header) + offs, + memcpy(buffer, (void *)IFIRST(header) + offs, EXT4_XATTR_SIZE(size)); - memcpy((void *)b_entry_name, (void *)entry->e_name, - entry->e_name_len); + memcpy(b_entry_name, entry->e_name, entry->e_name_len); b_entry_name[entry->e_name_len] = '\0'; i.name = b_entry_name; @@ -1292,16 +1305,12 @@ retry: } cleanup: - if (b_entry_name) - kfree(b_entry_name); - if (buffer) - kfree(buffer); - if (is) { + kfree(b_entry_name); + kfree(buffer); + if (is) brelse(is->iloc.bh); - kfree(is); - } - if (bs) - kfree(bs); + kfree(is); + kfree(bs); brelse(bh); up_write(&EXT4_I(inode)->xattr_sem); return error; --=-GiLpj11SNWcqlJ8k5JRf--