From: "Aneesh Kumar K.V" Subject: Re: update uninitialized-block-groups.patch and mballoc-core.patch Date: Fri, 05 Oct 2007 20:44:01 +0530 Message-ID: <470654B9.8000007@linux.vnet.ibm.com> References: <47036832.7030107@linux.vnet.ibm.com> <47063B84.7020808@bull.net> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------050007080108030506060304" Cc: linux-ext4 To: Valerie Clement Return-path: Received: from E23SMTP03.au.ibm.com ([202.81.18.172]:51229 "EHLO e23smtp03.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756313AbXJEPOe (ORCPT ); Fri, 5 Oct 2007 11:14:34 -0400 Received: from sd0109e.au.ibm.com (d23rh905.au.ibm.com [202.81.18.225]) by e23smtp03.au.ibm.com (8.13.1/8.13.1) with ESMTP id l95FEUNd021784 for ; Sat, 6 Oct 2007 01:14:30 +1000 Received: from d23av04.au.ibm.com (d23av04.au.ibm.com [9.190.235.139]) by sd0109e.au.ibm.com (8.13.8/8.13.8/NCO v8.5) with ESMTP id l95FI4Qu280496 for ; Sat, 6 Oct 2007 01:18:04 +1000 Received: from d23av04.au.ibm.com (loopback [127.0.0.1]) by d23av04.au.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l95FEDen030917 for ; Sat, 6 Oct 2007 01:14:14 +1000 In-Reply-To: <47063B84.7020808@bull.net> Sender: linux-ext4-owner@vger.kernel.org List-Id: linux-ext4.vger.kernel.org This is a multi-part message in MIME format. --------------050007080108030506060304 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Valerie Clement wrote: > Aneesh Kumar K.V wrote: >> + if (block_group == sbi->s_gdb_count - 1) { >> + /* >> + * Even though mke2fs always initialize first and last group >> + * if some other tool enabled the EXT4_BG_BLOCK_UNINIT we need >> + * to make sure we calculate the right free blocks >> + */ >> + free_blocks = ext4_blocks_count(sbi->s_es) - >> + le32_to_cpu(sbi->s_es->s_first_data_block) - >> + (EXT4_BLOCKS_PER_GROUP(sb) * sbi->s_groups_count) - >> + bit_max; >> + } else { >> + free_blocks = EXT4_BLOCKS_PER_GROUP(sb) - bit_max; >> + } > > Hi Aneesh, > > I found a bug in the latest uninitialized-block-groups patch. > (block_group == sbi->s_gdb_count - 1) must be replaced by > (block_group == sbi->s_groups_count - 1) in the ext4_init_block_bitmap() > function. > > (Seen while testing the mballoc feature with uninit_groups option > enabled. fsck reported an error on this group.) > That is correct. Thanks for catching this. Did you test the patch in the patch queue. It actually changes the above line slightly differently. Updated patch attached. -aneesh --------------050007080108030506060304 Content-Type: text/x-patch; name="uninitialized-block-groups.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="uninitialized-block-groups.patch" Ext4: Uninitialized Block Groups From: Andreas Dilger In pass1 of e2fsck, every inode table in the fileystem is scanned and checked, regardless of whether it is in use. This is this the most time consuming part of the filesystem check. The unintialized block group feature can greatly reduce e2fsck time by eliminating checking of uninitialized inodes. With this feature, there is a a high water mark of used inodes for each block group. Block and inode bitmaps can be uninitialized on disk via a flag in the group descriptor to avoid reading or scanning them at e2fsck time. A checksum of each group descriptor is used to ensure that corruption in the group descriptor's bit flags does not cause incorrect operation. The feature is enabled through a mkfs option mke2fs /dev/ -O uninit_groups A patch adding support for uninitialized block groups to e2fsprogs tools has been posted to the linux-ext4 mailing list. The patches have been stress tested with fsstress and fsx. In performance tests testing e2fsck time, we have seen that e2fsck time on ext3 grows linearly with the total number of inodes in the filesytem. In ext4 with the uninitialized block groups feature, the e2fsck time is constant, based solely on the number of used inodes rather than the total inode count. Since typical ext4 filesystems only use 1-10% of their inodes, this feature can greatly reduce e2fsck time for users. With performance improvement of 2-20 times, depending on how full the filesystem is. The attached graph shows the major improvements in e2fsck times in filesystems with a large total inode count, but few inodes in use. In each group descriptor if we have EXT4_BG_INODE_UNINIT set in bg_flags: Inode table is not initialized/used in this group. So we can skip the consistency check during fsck. EXT4_BG_BLOCK_UNINIT set in bg_flags: No block in the group is used. So we can skip the block bitmap verification for this group. We also add two new fields to group descriptor as a part of uninitialized group patch. __le16 bg_itable_unused; /* Unused inodes count */ __le16 bg_checksum; /* crc16(sb_uuid+group+desc) */ bg_itable_unused: If we have EXT4_BG_INODE_UNINIT not set in bg_flags then bg_itable_unused will give the offset within the inode table till the inodes are used. This can be used by fsck to skip list of inodes that are marked unused. bg_checksum: Now that we depend on bg_flags and bg_itable_unused to determine the block and inode usage, we need to make sure group descriptor is not corrupt. We add checksum to group descriptor to detect corruption. If the descriptor is found to be corrupt, we mark all the blocks and inodes in the group used. Signed-off-by: Avantika Mathur Signed-off-by: Andreas Dilger Signed-off-by: Mingming Cao Signed-off-by: Aneesh Kumar K.V --- fs/Kconfig | 1 fs/ext4/balloc.c | 112 +++++++++++++++++++++++++++++++++++- fs/ext4/group.h | 27 +++++++++ fs/ext4/ialloc.c | 146 ++++++++++++++++++++++++++++++++++++++++++++--- fs/ext4/resize.c | 21 +------ fs/ext4/super.c | 47 +++++++++++++++ include/linux/ext4_fs.h | 16 ++++- 7 files changed, 335 insertions(+), 35 deletions(-) create mode 100644 fs/ext4/group.h diff --git a/fs/Kconfig b/fs/Kconfig index f9eed6d..97eef97 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -140,6 +140,7 @@ config EXT4DEV_FS tristate "Ext4dev/ext4 extended fs support development (EXPERIMENTAL)" depends on EXPERIMENTAL select JBD2 + select CRC16 help Ext4dev is a predecessor filesystem of the next generation extended fs ext4, based on ext3 filesystem code. It will be diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index e53b4af..dd3816b 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -20,6 +20,7 @@ #include #include +#include "group.h" /* * balloc.c contains the blocks allocation and deallocation routines */ @@ -42,6 +43,94 @@ void ext4_get_group_no_and_offset(struct super_block *sb, ext4_fsblk_t blocknr, } +/* Initializes an uninitialized block bitmap if given, and returns the + * number of blocks free in the group. */ +unsigned ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh, + int block_group, struct ext4_group_desc *gdp) +{ + unsigned long start; + int bit, bit_max; + unsigned free_blocks, group_blocks; + struct ext4_sb_info *sbi = EXT4_SB(sb); + + if (bh) { + J_ASSERT_BH(bh, buffer_locked(bh)); + + /* If checksum is bad mark all blocks used to prevent allocation + * essentially implementing a per-group read-only flag. */ + if (!ext4_group_desc_csum_verify(sbi, block_group, gdp)) { + ext4_error(sb, __FUNCTION__, + "Checksum bad for group %u\n", block_group); + gdp->bg_free_blocks_count = 0; + gdp->bg_free_inodes_count = 0; + gdp->bg_itable_unused = 0; + memset(bh->b_data, 0xff, sb->s_blocksize); + return 0; + } + memset(bh->b_data, 0, sb->s_blocksize); + } + + /* Check for superblock and gdt backups in this group */ + bit_max = ext4_bg_has_super(sb, block_group); + + if (!EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_META_BG) || + block_group < le32_to_cpu(sbi->s_es->s_first_meta_bg) * + sbi->s_desc_per_block) { + if (bit_max) { + bit_max += ext4_bg_num_gdb(sb, block_group); + bit_max += + le16_to_cpu(sbi->s_es->s_reserved_gdt_blocks); + } + } else { /* For META_BG_BLOCK_GROUPS */ + int group_rel = (block_group - + le32_to_cpu(sbi->s_es->s_first_meta_bg)) % + EXT4_DESC_PER_BLOCK(sb); + if (group_rel == 0 || group_rel == 1 || + (group_rel == EXT4_DESC_PER_BLOCK(sb) - 1)) + bit_max += 1; + } + + if (block_group == sbi->s_groups_count - 1) { + /* + * Even though mke2fs always initialize first and last group + * if some other tool enabled the EXT4_BG_BLOCK_UNINIT we need + * to make sure we calculate the right free blocks + */ + group_blocks = ext4_blocks_count(sbi->s_es) - + le32_to_cpu(sbi->s_es->s_first_data_block) - + (EXT4_BLOCKS_PER_GROUP(sb) * (sbi->s_groups_count -1)); + } else { + group_blocks = EXT4_BLOCKS_PER_GROUP(sb); + } + + free_blocks = group_blocks - bit_max; + + if (bh) { + for (bit = 0; bit < bit_max; bit++) + ext4_set_bit(bit, bh->b_data); + + start = block_group * EXT4_BLOCKS_PER_GROUP(sb) + + le32_to_cpu(sbi->s_es->s_first_data_block); + + /* Set bits for block and inode bitmaps, and inode table */ + ext4_set_bit(ext4_block_bitmap(sb, gdp) - start, bh->b_data); + ext4_set_bit(ext4_inode_bitmap(sb, gdp) - start, bh->b_data); + for (bit = le32_to_cpu(gdp->bg_inode_table) - start, + bit_max = bit + sbi->s_itb_per_group; bit < bit_max; bit++) + ext4_set_bit(bit, bh->b_data); + + /* + * Also if the number of blocks within the group is less than the + * blocksize * 8 ( which is the size of bitmap ), set rest of the + * block bitmap to 1 + */ + mark_bitmap_end(group_blocks, sb->s_blocksize * 8, bh->b_data); + } + + return free_blocks - sbi->s_itb_per_group - 2; +} + + /* * The free blocks are managed by bitmaps. A file system contains several * blocks groups. Each group contains 1 bitmap block for blocks, 1 bitmap @@ -110,16 +199,29 @@ struct ext4_group_desc * ext4_get_group_desc(struct super_block * sb, * * Return buffer_head on success or NULL in case of failure. */ -static struct buffer_head * +struct buffer_head * read_block_bitmap(struct super_block *sb, unsigned int block_group) { struct ext4_group_desc * desc; struct buffer_head * bh = NULL; - desc = ext4_get_group_desc (sb, block_group, NULL); + desc = ext4_get_group_desc(sb, block_group, NULL); if (!desc) goto error_out; - bh = sb_bread(sb, ext4_block_bitmap(sb, desc)); + if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) { + bh = sb_getblk(sb, ext4_block_bitmap(sb, desc)); + if (!buffer_uptodate(bh)) { + lock_buffer(bh); + if (!buffer_uptodate(bh)) { + ext4_init_block_bitmap(sb, bh, block_group, + desc); + set_buffer_uptodate(bh); + } + unlock_buffer(bh); + } + } else { + bh = sb_bread(sb, ext4_block_bitmap(sb,desc)); + } if (!bh) ext4_error (sb, "read_block_bitmap", "Cannot read block bitmap - " @@ -586,6 +688,7 @@ do_more: desc->bg_free_blocks_count = cpu_to_le16(le16_to_cpu(desc->bg_free_blocks_count) + group_freed); + desc->bg_checksum = ext4_group_desc_csum(sbi, block_group, desc); spin_unlock(sb_bgl_lock(sbi, block_group)); percpu_counter_mod(&sbi->s_freeblocks_counter, count); @@ -1644,8 +1747,11 @@ allocated: ret_block, goal_hits, goal_attempts); spin_lock(sb_bgl_lock(sbi, group_no)); + if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) + gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT); gdp->bg_free_blocks_count = cpu_to_le16(le16_to_cpu(gdp->bg_free_blocks_count)-num); + gdp->bg_checksum = ext4_group_desc_csum(sbi, group_no, gdp); spin_unlock(sb_bgl_lock(sbi, group_no)); percpu_counter_mod(&sbi->s_freeblocks_counter, -num); diff --git a/fs/ext4/group.h b/fs/ext4/group.h new file mode 100644 index 0000000..1577910 --- /dev/null +++ b/fs/ext4/group.h @@ -0,0 +1,27 @@ +/* + * linux/fs/ext4/group.h + * + * Copyright (C) 2007 Cluster File Systems, Inc + * + * Author: Andreas Dilger + */ + +#ifndef _LINUX_EXT4_GROUP_H +#define _LINUX_EXT4_GROUP_H + +extern __le16 ext4_group_desc_csum(struct ext4_sb_info *sbi, __u32 group, + struct ext4_group_desc *gdp); +extern int ext4_group_desc_csum_verify(struct ext4_sb_info *sbi, __u32 group, + struct ext4_group_desc *gdp); +struct buffer_head *read_block_bitmap(struct super_block *sb, + unsigned int block_group); +extern unsigned ext4_init_block_bitmap(struct super_block *sb, + struct buffer_head *bh, int group, + struct ext4_group_desc *desc); +#define ext4_free_blocks_after_init(sb, group, desc) \ + ext4_init_block_bitmap(sb, NULL, group, desc) +extern unsigned ext4_init_inode_bitmap(struct super_block *sb, + struct buffer_head *bh, int group, + struct ext4_group_desc *desc); +extern void mark_bitmap_end(int start_bit, int end_bit, char *bitmap); +#endif /* _LINUX_EXT4_GROUP_H */ diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index b8b538d..e4c421e 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -28,6 +28,7 @@ #include "xattr.h" #include "acl.h" +#include "group.h" /* * ialloc.c contains the inodes allocation and deallocation routines @@ -43,6 +44,52 @@ * the free blocks count in the block. */ +/* + * To avoid calling the atomic setbit hundreds or thousands of times, we only + * need to use it within a single byte (to ensure we get endianness right). + * We can use memset for the rest of the bitmap as there are no other users. + */ +void mark_bitmap_end(int start_bit, int end_bit, char *bitmap) +{ + int i; + + if (start_bit >= end_bit) + return; + + ext4_debug("mark end bits +%d through +%d used\n", start_bit, end_bit); + for (i = start_bit; i < ((start_bit + 7) & ~7UL); i++) + ext4_set_bit(i, bitmap); + if (i < end_bit) + memset(bitmap + (i >> 3), 0xff, (end_bit - i) >> 3); +} + +/* Initializes an uninitialized inode bitmap */ +unsigned ext4_init_inode_bitmap(struct super_block *sb, + struct buffer_head *bh, int block_group, + struct ext4_group_desc *gdp) +{ + struct ext4_sb_info *sbi = EXT4_SB(sb); + + J_ASSERT_BH(bh, buffer_locked(bh)); + + /* If checksum is bad mark all blocks and inodes use to prevent + * allocation, essentially implementing a per-group read-only flag. */ + if (!ext4_group_desc_csum_verify(sbi, block_group, gdp)) { + ext4_error(sb, __FUNCTION__, "Checksum bad for group %u\n", + block_group); + gdp->bg_free_blocks_count = 0; + gdp->bg_free_inodes_count = 0; + gdp->bg_itable_unused = 0; + memset(bh->b_data, 0xff, sb->s_blocksize); + return 0; + } + + memset(bh->b_data, 0, (EXT4_INODES_PER_GROUP(sb) + 7) / 8); + mark_bitmap_end(EXT4_INODES_PER_GROUP(sb), EXT4_BLOCKS_PER_GROUP(sb), + bh->b_data); + + return EXT4_INODES_PER_GROUP(sb); +} /* * Read the inode allocation bitmap for a given block_group, reading @@ -59,8 +106,20 @@ read_inode_bitmap(struct super_block * sb, unsigned long block_group) desc = ext4_get_group_desc(sb, block_group, NULL); if (!desc) goto error_out; - - bh = sb_bread(sb, ext4_inode_bitmap(sb, desc)); + if (desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) { + bh = sb_getblk(sb, ext4_inode_bitmap(sb, desc)); + if (!buffer_uptodate(bh)) { + lock_buffer(bh); + if (!buffer_uptodate(bh)) { + ext4_init_inode_bitmap(sb, bh, block_group, + desc); + set_buffer_uptodate(bh); + } + unlock_buffer(bh); + } + } else { + bh = sb_bread(sb, ext4_inode_bitmap(sb, desc)); + } if (!bh) ext4_error(sb, "read_inode_bitmap", "Cannot read inode bitmap - " @@ -169,6 +228,8 @@ void ext4_free_inode (handle_t *handle, struct inode * inode) if (is_directory) gdp->bg_used_dirs_count = cpu_to_le16( le16_to_cpu(gdp->bg_used_dirs_count) - 1); + gdp->bg_checksum = ext4_group_desc_csum(sbi, + block_group, gdp); spin_unlock(sb_bgl_lock(sbi, block_group)); percpu_counter_inc(&sbi->s_freeinodes_counter); if (is_directory) @@ -438,7 +499,7 @@ struct inode *ext4_new_inode(handle_t *handle, struct inode * dir, int mode) struct ext4_sb_info *sbi; int err = 0; struct inode *ret; - int i; + int i, free = 0; /* Cannot create files in a deleted directory */ if (!dir || !dir->i_nlink) @@ -520,11 +581,13 @@ repeat_in_this_group: goto out; got: - ino += group * EXT4_INODES_PER_GROUP(sb) + 1; - if (ino < EXT4_FIRST_INO(sb) || ino > le32_to_cpu(es->s_inodes_count)) { - ext4_error (sb, "ext4_new_inode", - "reserved inode or inode > inodes count - " - "block_group = %d, inode=%lu", group, ino); + ino++; + if ((group == 0 && ino < EXT4_FIRST_INO(sb)) || + ino > EXT4_INODES_PER_GROUP(sb)) { + ext4_error(sb, __FUNCTION__, + "reserved inode or inode > inodes count - " + "block_group = %d, inode=%lu", group, + ino + group * EXT4_INODES_PER_GROUP(sb)); err = -EIO; goto fail; } @@ -532,13 +595,78 @@ got: BUFFER_TRACE(bh2, "get_write_access"); err = ext4_journal_get_write_access(handle, bh2); if (err) goto fail; + + /* We may have to initialize the block bitmap if it isn't already */ + if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM) && + gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) { + struct buffer_head *block_bh = read_block_bitmap(sb, group); + + BUFFER_TRACE(block_bh, "get block bitmap access"); + err = ext4_journal_get_write_access(handle, block_bh); + if (err) { + brelse(block_bh); + goto fail; + } + + free = 0; + spin_lock(sb_bgl_lock(sbi, group)); + /* recheck and clear flag under lock if we still need to */ + if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) { + gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT); + free = ext4_free_blocks_after_init(sb, group, gdp); + gdp->bg_free_blocks_count = cpu_to_le16(free); + } + spin_unlock(sb_bgl_lock(sbi, group)); + + /* Don't need to dirty bitmap block if we didn't change it */ + if (free) { + BUFFER_TRACE(block_bh, "dirty block bitmap"); + err = ext4_journal_dirty_metadata(handle, block_bh); + } + + brelse(block_bh); + if (err) + goto fail; + } + spin_lock(sb_bgl_lock(sbi, group)); + /* If we didn't allocate from within the initialized part of the inode + * table then we need to initialize up to this inode. */ + if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) { + if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) { + gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT); + + /* When marking the block group with + * ~EXT4_BG_INODE_UNINIT we don't want to depend + * on the value of bg_itable_unsed even though + * mke2fs could have initialized the same for us. + * Instead we calculated the value below + */ + + free = 0; + } else { + free = EXT4_INODES_PER_GROUP(sb) - + le16_to_cpu(gdp->bg_itable_unused); + } + + /* + * Check the relative inode number against the last used + * relative inode number in this group. if it is greater + * we need to update the bg_itable_unused count + * + */ + if (ino > free) + gdp->bg_itable_unused = + cpu_to_le16(EXT4_INODES_PER_GROUP(sb) - ino); + } + gdp->bg_free_inodes_count = cpu_to_le16(le16_to_cpu(gdp->bg_free_inodes_count) - 1); if (S_ISDIR(mode)) { gdp->bg_used_dirs_count = cpu_to_le16(le16_to_cpu(gdp->bg_used_dirs_count) + 1); } + gdp->bg_checksum = ext4_group_desc_csum(sbi, group, gdp); spin_unlock(sb_bgl_lock(sbi, group)); BUFFER_TRACE(bh2, "call ext4_journal_dirty_metadata"); err = ext4_journal_dirty_metadata(handle, bh2); @@ -560,7 +688,7 @@ got: inode->i_gid = current->fsgid; inode->i_mode = mode; - inode->i_ino = ino; + inode->i_ino = ino + group * EXT4_INODES_PER_GROUP(sb); /* This is the optimal IO size (for stat), not the fs block size */ inode->i_blocks = 0; inode->i_mtime = inode->i_atime = inode->i_ctime = ei->i_crtime = diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c index aa11d7d..19ff743 100644 --- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -16,6 +16,7 @@ #include #include +#include "group.h" #define outside(b, first, last) ((b) < (first) || (b) >= (last)) #define inside(b, first, last) ((b) >= (first) && (b) < (last)) @@ -140,25 +141,6 @@ static struct buffer_head *bclean(handle_t *handle, struct super_block *sb, } /* - * To avoid calling the atomic setbit hundreds or thousands of times, we only - * need to use it within a single byte (to ensure we get endianness right). - * We can use memset for the rest of the bitmap as there are no other users. - */ -static void mark_bitmap_end(int start_bit, int end_bit, char *bitmap) -{ - int i; - - if (start_bit >= end_bit) - return; - - ext4_debug("mark end bits +%d through +%d used\n", start_bit, end_bit); - for (i = start_bit; i < ((start_bit + 7) & ~7UL); i++) - ext4_set_bit(i, bitmap); - if (i < end_bit) - memset(bitmap + (i >> 3), 0xff, (end_bit - i) >> 3); -} - -/* * Set up the block and inode bitmaps, and the inode table for the new group. * This doesn't need to be part of the main transaction, since we are only * changing blocks outside the actual filesystem. We still do journaling to @@ -842,6 +824,7 @@ int ext4_group_add(struct super_block *sb, struct ext4_new_group_data *input) ext4_inode_table_set(sb, gdp, input->inode_table); /* LV FIXME */ gdp->bg_free_blocks_count = cpu_to_le16(input->free_blocks_count); gdp->bg_free_inodes_count = cpu_to_le16(EXT4_INODES_PER_GROUP(sb)); + gdp->bg_checksum = ext4_group_desc_csum(sbi, input->group, gdp); /* * Make the new blocks and inodes valid next. We do this before diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 420d39d..b59610d 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -37,12 +37,14 @@ #include #include #include +#include #include #include "xattr.h" #include "acl.h" #include "namei.h" +#include "group.h" static int ext4_load_journal(struct super_block *, struct ext4_super_block *, unsigned long journal_devnum); @@ -1237,6 +1239,43 @@ static int ext4_setup_super(struct super_block *sb, struct ext4_super_block *es, return res; } +__le16 ext4_group_desc_csum(struct ext4_sb_info *sbi, __u32 block_group, + struct ext4_group_desc *gdp) +{ + __u16 crc = 0; + + if (sbi->s_es->s_feature_ro_compat & + cpu_to_le32(EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) { + int offset = offsetof(struct ext4_group_desc, bg_checksum); + __le32 le_group = cpu_to_le32(block_group); + + crc = crc16(~0, sbi->s_es->s_uuid, sizeof(sbi->s_es->s_uuid)); + crc = crc16(crc, (__u8 *)&le_group, sizeof(le_group)); + crc = crc16(crc, (__u8 *)gdp, offset); + offset += sizeof(gdp->bg_checksum); /* skip checksum */ + /* for checksum of struct ext4_group_desc do the rest...*/ + if ((sbi->s_es->s_feature_incompat & + cpu_to_le32(EXT4_FEATURE_INCOMPAT_64BIT)) && + offset < le16_to_cpu(sbi->s_es->s_desc_size)) + crc = crc16(crc, (__u8 *)gdp + offset, + le16_to_cpu(sbi->s_es->s_desc_size) - + offset); + } + + return cpu_to_le16(crc); +} + +int ext4_group_desc_csum_verify(struct ext4_sb_info *sbi, __u32 block_group, + struct ext4_group_desc *gdp) +{ + if ((sbi->s_es->s_feature_ro_compat & + cpu_to_le32(EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) && + (gdp->bg_checksum != ext4_group_desc_csum(sbi, block_group, gdp))) + return 0; + + return 1; +} + /* Called at mount-time, super-block is locked */ static int ext4_check_descriptors (struct super_block * sb) { @@ -1291,6 +1330,14 @@ static int ext4_check_descriptors (struct super_block * sb) i, inode_table); return 0; } + if (!ext4_group_desc_csum_verify(sbi, i, gdp)) { + ext4_error(sb, __FUNCTION__, + "Checksum for group %d failed (%u!=%u)\n", i, + le16_to_cpu(ext4_group_desc_csum(sbi, i, + gdp)), + le16_to_cpu(gdp->bg_checksum)); + return 0; + } first_block += EXT4_BLOCKS_PER_GROUP(sb); gdp = (struct ext4_group_desc *) ((__u8 *)gdp + EXT4_DESC_SIZE(sb)); diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h index 151738a..b77b59f 100644 --- a/include/linux/ext4_fs.h +++ b/include/linux/ext4_fs.h @@ -105,19 +105,25 @@ */ struct ext4_group_desc { - __le32 bg_block_bitmap; /* Blocks bitmap block */ - __le32 bg_inode_bitmap; /* Inodes bitmap block */ + __le32 bg_block_bitmap; /* Blocks bitmap block */ + __le32 bg_inode_bitmap; /* Inodes bitmap block */ __le32 bg_inode_table; /* Inodes table block */ __le16 bg_free_blocks_count; /* Free blocks count */ __le16 bg_free_inodes_count; /* Free inodes count */ __le16 bg_used_dirs_count; /* Directories count */ - __u16 bg_flags; - __u32 bg_reserved[3]; + __le16 bg_flags; /* EXT4_BG_flags (INODE_UNINIT, etc) */ + __u32 bg_reserved[2]; /* Likely block/inode bitmap checksum */ + __le16 bg_itable_unused; /* Unused inodes count */ + __le16 bg_checksum; /* crc16(sb_uuid+group+desc) */ __le32 bg_block_bitmap_hi; /* Blocks bitmap block MSB */ __le32 bg_inode_bitmap_hi; /* Inodes bitmap block MSB */ __le32 bg_inode_table_hi; /* Inodes table block MSB */ }; +#define EXT4_BG_INODE_UNINIT 0x0001 /* Inode table/bitmap not in use */ +#define EXT4_BG_BLOCK_UNINIT 0x0002 /* Block bitmap not in use */ +#define EXT4_BG_INODE_ZEROED 0x0004 /* On-disk itable initialized to zero */ + #ifdef __KERNEL__ #include #include @@ -665,6 +671,7 @@ static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino) #define EXT4_FEATURE_RO_COMPAT_SPARSE_SUPER 0x0001 #define EXT4_FEATURE_RO_COMPAT_LARGE_FILE 0x0002 #define EXT4_FEATURE_RO_COMPAT_BTREE_DIR 0x0004 +#define EXT4_FEATURE_RO_COMPAT_GDT_CSUM 0x0010 #define EXT4_FEATURE_RO_COMPAT_DIR_NLINK 0x0020 #define EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE 0x0040 @@ -684,6 +691,7 @@ static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino) EXT4_FEATURE_INCOMPAT_64BIT) #define EXT4_FEATURE_RO_COMPAT_SUPP (EXT4_FEATURE_RO_COMPAT_SPARSE_SUPER| \ EXT4_FEATURE_RO_COMPAT_LARGE_FILE| \ + EXT4_FEATURE_RO_COMPAT_GDT_CSUM| \ EXT4_FEATURE_RO_COMPAT_DIR_NLINK | \ EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE | \ EXT4_FEATURE_RO_COMPAT_BTREE_DIR) --------------050007080108030506060304--