2011-12-14 00:46:22

by djwong

[permalink] [raw]
Subject: [PATCH v2.2 00/23] ext4: Add metadata checksumming

Hi all,

This patchset adds crc32c checksums to most of the ext4 metadata objects. A
full design document is on the ext4 wiki[1] but I will summarize that document here.

As much as we wish our storage hardware was totally reliable, it is still
quite possible for data to be corrupted on disk, corrupted during transfer over
a wire, or written to the wrong places. To protect against this sort of
non-hostile corruption, it is desirable to store checksums of metadata objects
on the filesystem to prevent broken metadata from shredding the filesystem.

The crc32c polynomial was chosen for its improved error detection capabilities
over crc32 and crc16, and because of its hardware acceleration on current and
upcoming Intel and Sparc chips.

Each type of metadata object has been retrofitted to store a checksum as follows:

- The superblock stores a crc32c of itself.
- Each inode stores crc32c(fs_uuid + inode_num + inode_gen + inode +
slack_space_after_inode)
- Block and inode bitmaps each get their own crc32c(fs_uuid + group_num +
bitmap), stored in the block group descriptor.
- Each extent tree block stores a crc32c(fs_uuid + inode_num + inode_gen +
extent_entries) in unused space at the end of the block.
- Each directory leaf block has an unused-looking directory entry big enough to
store a crc32c(fs_uuid + inode_num + inode_gen + block) at the end of the
block.
- Each directory htree block is shortened to contain a crc32c(fs_uuid +
inode_num + inode_gen + block) at the end of the block.
- Extended attribute blocks store crc32c(fs_uuid + id + ea_block) in the
header, where id is, depending on the refcount, either the inode_num and
inode_gen; or the block number.
- MMP blocks store crc32c(fs_uuid + mmpblock) at the end of the MMP block.
- Block groups can now use crc32c instead of crc16.
- The journal now has a v2 checksum feature flag.
- crc32c(j_uuid + block) checksums have been inserted into descriptor blocks,
commit blocks, revoke blocks, and the journal superblock.
- Each block tag in a descriptor block has a checksum of the related data block.

The patchset for e2fsprogs will be sent under separate cover only to linux-ext4
as it is quite lengthy (~48 patches).

As far as performance impact goes, I see nearly no change with a standard mail
server ffsb simulation. On a test that involves only file creation and
deletion and extent tree modifications, I see a drop of about 50 percent with
the current kernel crc32c implementation; this improves to a drop of about 20
percent with the enclosed crc32c implementation. However, given that metadata
is usually a small fraction of total IO, it doesn't seem like the cost of
enabling this feature is unreasonable.

There are of course unresolved issues:

- I haven't fixed it up to checksum the exclude bitmap yet. I'll probably
submit that as an add-on to the snapshot patchset.

- Using the journal commit hooks to delay crc32c calculation until dirty
buffers are actually being written to disk.

- Interaction with online resize code. Yongqiang seems to be in the process of
rewriting this not to use custom metadata block write functions, but I haven't
looked at it very closely yet.

Please have a look at the design document and patches, and please feel free to
suggest any changes.

v2: Checksum the MMP block, store the checksum type in the superblock, include
the inode generation in file checksums, and finally solve the problem of limited
space in block groups by splitting the checksum into two halves.

v2.1: Checksum the reserved parts of the htree tail structure. Fix some flag
handling bugs with the mb cache init routine wherein bitmaps could fail to be
checksummed at read time.

v2.2: Reincorporate the FS UUID in the bitmap checksum calcuations. Move all
disk layout changes to the front and the feature flag enablement to the end of
the patch set. Fail journal recovery if revoke block fails checksum.

This patchset has been tested on 3.2.0-rc5 on x64, i386, ppc64, and ppc32. The
patches seems to work fine on all four platforms.

--D

[1] https://ext4.wiki.kernel.org/index.php/Ext4_Metadata_Checksums



2011-12-14 00:46:20

by djwong

[permalink] [raw]
Subject: [PATCH 01/23] ext4: Create a new BH_Verified flag to avoid unnecessary metadata validation

Create a new BH_Verified flag to indicate that we've verified all the data in a
buffer_head for correctness. This allows us to bypass expensive verification
steps when they are not necessary without missing them when they are.

v2: The later jbd2 metadata checksumming patches need a BH_Verified flag for
journal metadata, so put the flag in jbd2.h so that both drivers can share the
same bit.

Signed-off-by: Darrick J. Wong <[email protected]>
---
fs/ext4/extents.c | 35 ++++++++++++++++++++++++++---------
include/linux/jbd_common.h | 2 ++
2 files changed, 28 insertions(+), 9 deletions(-)


diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 61fa9e1..00f97f4 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -396,6 +396,26 @@ int ext4_ext_check_inode(struct inode *inode)
return ext4_ext_check(inode, ext_inode_hdr(inode), ext_depth(inode));
}

+static int __ext4_ext_check_block(const char *function, unsigned int line,
+ struct inode *inode,
+ struct ext4_extent_header *eh,
+ int depth,
+ struct buffer_head *bh)
+{
+ int ret;
+
+ if (buffer_verified(bh))
+ return 0;
+ ret = ext4_ext_check(inode, eh, depth);
+ if (ret)
+ return ret;
+ set_buffer_verified(bh);
+ return ret;
+}
+
+#define ext4_ext_check_block(inode, eh, depth, bh) \
+ __ext4_ext_check_block(__func__, __LINE__, inode, eh, depth, bh)
+
#ifdef EXT_DEBUG
static void ext4_ext_show_path(struct inode *inode, struct ext4_ext_path *path)
{
@@ -652,8 +672,6 @@ ext4_ext_find_extent(struct inode *inode, ext4_lblk_t block,
i = depth;
/* walk through the tree */
while (i) {
- int need_to_validate = 0;
-
ext_debug("depth %d: num %d, max %d\n",
ppos, le16_to_cpu(eh->eh_entries), le16_to_cpu(eh->eh_max));

@@ -672,8 +690,6 @@ ext4_ext_find_extent(struct inode *inode, ext4_lblk_t block,
put_bh(bh);
goto err;
}
- /* validate the extent entries */
- need_to_validate = 1;
}
eh = ext_block_hdr(bh);
ppos++;
@@ -687,7 +703,7 @@ ext4_ext_find_extent(struct inode *inode, ext4_lblk_t block,
path[ppos].p_hdr = eh;
i--;

- if (need_to_validate && ext4_ext_check(inode, eh, i))
+ if (ext4_ext_check_block(inode, eh, i, bh))
goto err;
}

@@ -1328,7 +1344,8 @@ got_index:
return -EIO;
eh = ext_block_hdr(bh);
/* subtract from p_depth to get proper eh_depth */
- if (ext4_ext_check(inode, eh, path->p_depth - depth)) {
+ if (ext4_ext_check_block(inode, eh,
+ path->p_depth - depth, bh)) {
put_bh(bh);
return -EIO;
}
@@ -1341,7 +1358,7 @@ got_index:
if (bh == NULL)
return -EIO;
eh = ext_block_hdr(bh);
- if (ext4_ext_check(inode, eh, path->p_depth - depth)) {
+ if (ext4_ext_check_block(inode, eh, path->p_depth - depth, bh)) {
put_bh(bh);
return -EIO;
}
@@ -2572,8 +2589,8 @@ again:
err = -EIO;
break;
}
- if (ext4_ext_check(inode, ext_block_hdr(bh),
- depth - i - 1)) {
+ if (ext4_ext_check_block(inode, ext_block_hdr(bh),
+ depth - i - 1, bh)) {
err = -EIO;
break;
}
diff --git a/include/linux/jbd_common.h b/include/linux/jbd_common.h
index 6230f85..6133679 100644
--- a/include/linux/jbd_common.h
+++ b/include/linux/jbd_common.h
@@ -12,6 +12,7 @@ enum jbd_state_bits {
BH_State, /* Pins most journal_head state */
BH_JournalHead, /* Pins bh->b_private and jh->b_bh */
BH_Unshadow, /* Dummy bit, for BJ_Shadow wakeup filtering */
+ BH_Verified, /* Metadata block has been verified ok */
BH_JBDPrivateStart, /* First bit available for private use by FS */
};

@@ -24,6 +25,7 @@ TAS_BUFFER_FNS(Revoked, revoked)
BUFFER_FNS(RevokeValid, revokevalid)
TAS_BUFFER_FNS(RevokeValid, revokevalid)
BUFFER_FNS(Freed, freed)
+BUFFER_FNS(Verified, verified)

static inline struct buffer_head *jh2bh(struct journal_head *jh)
{

2011-12-14 00:46:41

by djwong

[permalink] [raw]
Subject: [PATCH 04/23] ext4: Only call out to crc32c if necessary

Only obtain a reference to the cryptoapi and crc32c if we happen to mount a
filesystem with metadata checksumming enabled.

Signed-off-by: Darrick J. Wong <[email protected]>
---
fs/ext4/Kconfig | 2 ++
fs/ext4/ext4.h | 23 +++++++++++++++++++++++
fs/ext4/super.c | 16 ++++++++++++++++
3 files changed, 41 insertions(+), 0 deletions(-)


diff --git a/fs/ext4/Kconfig b/fs/ext4/Kconfig
index 9ed1bb1..c22f170 100644
--- a/fs/ext4/Kconfig
+++ b/fs/ext4/Kconfig
@@ -2,6 +2,8 @@ config EXT4_FS
tristate "The Extended 4 (ext4) filesystem"
select JBD2
select CRC16
+ select CRYPTO
+ select CRYPTO_CRC32C
help
This is the next generation of the ext3 filesystem.

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index b8242a1..84e6f68 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -29,6 +29,7 @@
#include <linux/wait.h>
#include <linux/blockgroup_lock.h>
#include <linux/percpu_counter.h>
+#include <crypto/hash.h>
#ifdef __KERNEL__
#include <linux/compat.h>
#endif
@@ -1259,6 +1260,9 @@ struct ext4_sb_info {

/* record the last minlen when FITRIM is called. */
atomic_t s_last_trim_minblks;
+
+ /* Reference to checksum algorithm driver via cryptoapi */
+ struct crypto_shash *s_chksum_driver;
};

static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
@@ -1614,6 +1618,25 @@ static inline __le16 ext4_rec_len_to_disk(unsigned len, unsigned blocksize)
#define DX_HASH_HALF_MD4_UNSIGNED 4
#define DX_HASH_TEA_UNSIGNED 5

+static inline u32 ext4_chksum(struct ext4_sb_info *sbi, u32 crc,
+ const void *address, unsigned int length)
+{
+ struct {
+ struct shash_desc shash;
+ char ctx[crypto_shash_descsize(sbi->s_chksum_driver)];
+ } desc;
+ int err;
+
+ desc.shash.tfm = sbi->s_chksum_driver;
+ desc.shash.flags = 0;
+ *(u32 *)desc.ctx = crc;
+
+ err = crypto_shash_update(&desc.shash, address, length);
+ BUG_ON(err);
+
+ return *(u32 *)desc.ctx;
+}
+
#ifdef __KERNEL__

/* hash info structure used by the directory hash */
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index a56c14e..8ff94dc 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -887,6 +887,8 @@ static void ext4_put_super(struct super_block *sb)
unlock_super(sb);
kobject_put(&sbi->s_kobj);
wait_for_completion(&sbi->s_kobj_unregister);
+ if (sbi->s_chksum_driver)
+ crypto_free_shash(sbi->s_chksum_driver);
kfree(sbi->s_blockgroup_lock);
kfree(sbi);
}
@@ -3200,6 +3202,18 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
goto cantfind_ext4;
}

+ /* Load the checksum driver */
+ if (EXT4_HAS_RO_COMPAT_FEATURE(sb,
+ EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) {
+ sbi->s_chksum_driver = crypto_alloc_shash("crc32c", 0, 0);
+ if (IS_ERR(sbi->s_chksum_driver)) {
+ ext4_msg(sb, KERN_ERR, "Cannot load crc32c driver.");
+ ret = PTR_ERR(sbi->s_chksum_driver);
+ sbi->s_chksum_driver = NULL;
+ goto failed_mount;
+ }
+ }
+
/* Set defaults before we parse the mount options */
def_mount_opts = le32_to_cpu(es->s_default_mount_opts);
set_opt(sb, INIT_INODE_TABLE);
@@ -3880,6 +3894,8 @@ failed_mount2:
brelse(sbi->s_group_desc[i]);
ext4_kvfree(sbi->s_group_desc);
failed_mount:
+ if (sbi->s_chksum_driver)
+ crypto_free_shash(sbi->s_chksum_driver);
if (sbi->s_proc) {
remove_proc_entry(sb->s_id, ext4_proc_root);
}


2011-12-14 00:46:34

by djwong

[permalink] [raw]
Subject: [PATCH 03/23] ext4: Record the checksum algorithm in use in the superblock

Record the type of checksum algorithm we're using for metadata in the
superblock, in case we ever want/need to change the algorithm.

Signed-off-by: Darrick J. Wong <[email protected]>
---
fs/ext4/super.c | 18 ++++++++++++++++++
1 files changed, 18 insertions(+), 0 deletions(-)


diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 3858767..a56c14e 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -111,6 +111,16 @@ static struct file_system_type ext3_fs_type = {
#define IS_EXT3_SB(sb) (0)
#endif

+static int ext4_verify_csum_type(struct super_block *sb,
+ struct ext4_super_block *es)
+{
+ if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
+ EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ return 1;
+
+ return es->s_checksum_type == EXT4_CRC32C_CHKSUM;
+}
+
void *ext4_kvmalloc(size_t size, gfp_t flags)
{
void *ret;
@@ -3182,6 +3192,14 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
goto cantfind_ext4;
sbi->s_kbytes_written = le64_to_cpu(es->s_kbytes_written);

+ /* Check for a known checksum algorithm */
+ if (!ext4_verify_csum_type(sb, es)) {
+ ext4_msg(sb, KERN_ERR, "VFS: Found ext4 filesystem with "
+ "unknown checksum algorithm.");
+ silent = 1;
+ goto cantfind_ext4;
+ }
+
/* Set defaults before we parse the mount options */
def_mount_opts = le32_to_cpu(es->s_default_mount_opts);
set_opt(sb, INIT_INODE_TABLE);

2011-12-14 00:47:54

by djwong

[permalink] [raw]
Subject: [PATCH 13/23] ext4: Add new feature to make block group checksums use metadata_csum algorithm

Add a new feature flag to enable block group descriptors to use the (faster)
metadata_csum checksum algorithm.

Signed-off-by: Darrick J. Wong <[email protected]>
---
fs/ext4/balloc.c | 2 +-
fs/ext4/ext4.h | 4 ++-
fs/ext4/ialloc.c | 11 ++++-----
fs/ext4/mballoc.c | 6 ++---
fs/ext4/resize.c | 2 +-
fs/ext4/super.c | 65 ++++++++++++++++++++++++++++++++++++++---------------
6 files changed, 59 insertions(+), 31 deletions(-)


diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index dcac4bd..300f751 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -212,7 +212,7 @@ void ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
sb->s_blocksize * 8, bh->b_data);
ext4_block_bitmap_csum_set(sb, block_group, gdp, bh,
EXT4_BLOCKS_PER_GROUP(sb) / 8);
- gdp->bg_checksum = ext4_group_desc_csum(sbi, block_group, gdp);
+ ext4_group_desc_csum_set(sbi, block_group, gdp);
}

/* Return the number of free blocks in a block group. It is used when
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index e7489c4..ae71af3 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2086,10 +2086,10 @@ extern void ext4_used_dirs_set(struct super_block *sb,
struct ext4_group_desc *bg, __u32 count);
extern void ext4_itable_unused_set(struct super_block *sb,
struct ext4_group_desc *bg, __u32 count);
-extern __le16 ext4_group_desc_csum(struct ext4_sb_info *sbi, __u32 group,
- struct ext4_group_desc *gdp);
extern int ext4_group_desc_csum_verify(struct ext4_sb_info *sbi, __u32 group,
struct ext4_group_desc *gdp);
+extern void ext4_group_desc_csum_set(struct ext4_sb_info *sbi, __u32 group,
+ struct ext4_group_desc *gdp);

static inline ext4_fsblk_t ext4_blocks_count(struct ext4_super_block *es)
{
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 261ffce..6d33995 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -92,7 +92,7 @@ static unsigned ext4_init_inode_bitmap(struct super_block *sb,
bh->b_data);
ext4_inode_bitmap_csum_set(sb, block_group, gdp, bh,
EXT4_INODES_PER_GROUP(sb) / 8);
- gdp->bg_checksum = ext4_group_desc_csum(sbi, block_group, gdp);
+ ext4_group_desc_csum_set(sbi, block_group, gdp);

return EXT4_INODES_PER_GROUP(sb);
}
@@ -287,7 +287,7 @@ void ext4_free_inode(handle_t *handle, struct inode *inode)
}
ext4_inode_bitmap_csum_set(sb, block_group, gdp, bitmap_bh,
EXT4_INODES_PER_GROUP(sb) / 8);
- gdp->bg_checksum = ext4_group_desc_csum(sbi, block_group, gdp);
+ ext4_group_desc_csum_set(sbi, block_group, gdp);
ext4_unlock_group(sb, block_group);

percpu_counter_inc(&sbi->s_freeinodes_counter);
@@ -697,7 +697,7 @@ static int ext4_claim_inode(struct super_block *sb,
}
ext4_inode_bitmap_csum_set(sb, group, gdp, inode_bitmap_bh,
EXT4_INODES_PER_GROUP(sb) / 8);
- gdp->bg_checksum = ext4_group_desc_csum(sbi, group, gdp);
+ ext4_group_desc_csum_set(sbi, group, gdp);
err_ret:
ext4_unlock_group(sb, group);
up_read(&grp->alloc_sem);
@@ -858,8 +858,7 @@ got:
block_bitmap_bh,
EXT4_BLOCKS_PER_GROUP(sb) /
8);
- gdp->bg_checksum = ext4_group_desc_csum(sbi, group,
- gdp);
+ ext4_group_desc_csum_set(sbi, group, gdp);
}
ext4_unlock_group(sb, group);

@@ -1223,7 +1222,7 @@ int ext4_init_inode_table(struct super_block *sb, ext4_group_t group,
skip_zeroout:
ext4_lock_group(sb, group);
gdp->bg_flags |= cpu_to_le16(EXT4_BG_INODE_ZEROED);
- gdp->bg_checksum = ext4_group_desc_csum(sbi, group, gdp);
+ ext4_group_desc_csum_set(sbi, group, gdp);
ext4_unlock_group(sb, group);

BUFFER_TRACE(group_desc_bh,
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index dbd1453..2bbd9ee 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2919,7 +2919,7 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
ext4_free_group_clusters_set(sb, gdp, len);
ext4_block_bitmap_csum_set(sb, ac->ac_b_ex.fe_group, gdp, bitmap_bh,
EXT4_BLOCKS_PER_GROUP(sb) / 8);
- gdp->bg_checksum = ext4_group_desc_csum(sbi, ac->ac_b_ex.fe_group, gdp);
+ ext4_group_desc_csum_set(sbi, ac->ac_b_ex.fe_group, gdp);

ext4_unlock_group(sb, ac->ac_b_ex.fe_group);
percpu_counter_sub(&sbi->s_freeclusters_counter, ac->ac_b_ex.fe_len);
@@ -4787,7 +4787,7 @@ do_more:
ext4_free_group_clusters_set(sb, gdp, ret);
ext4_block_bitmap_csum_set(sb, block_group, gdp, bitmap_bh,
EXT4_BLOCKS_PER_GROUP(sb) / 8);
- gdp->bg_checksum = ext4_group_desc_csum(sbi, block_group, gdp);
+ ext4_group_desc_csum_set(sbi, block_group, gdp);
ext4_unlock_group(sb, block_group);
percpu_counter_add(&sbi->s_freeclusters_counter, count_clusters);

@@ -4933,7 +4933,7 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
ext4_free_group_clusters_set(sb, desc, blk_free_count);
ext4_block_bitmap_csum_set(sb, block_group, desc, bitmap_bh,
EXT4_BLOCKS_PER_GROUP(sb) / 8);
- desc->bg_checksum = ext4_group_desc_csum(sbi, block_group, desc);
+ ext4_group_desc_csum_set(sbi, block_group, desc);
ext4_unlock_group(sb, block_group);
percpu_counter_add(&sbi->s_freeclusters_counter,
EXT4_B2C(sbi, blocks_freed));
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index c33b72a..083f429 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -880,7 +880,7 @@ int ext4_group_add(struct super_block *sb, struct ext4_new_group_data *input)
ext4_free_group_clusters_set(sb, gdp, input->free_blocks_count);
ext4_free_inodes_set(sb, gdp, EXT4_INODES_PER_GROUP(sb));
gdp->bg_flags = cpu_to_le16(EXT4_BG_INODE_ZEROED);
- gdp->bg_checksum = ext4_group_desc_csum(sbi, input->group, gdp);
+ ext4_group_desc_csum_set(sbi, input->group, gdp);

/*
* We can allocate memory for mb_alloc based on the new group
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 5e2842c..6838d01 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2092,29 +2092,49 @@ failed:
return 0;
}

-__le16 ext4_group_desc_csum(struct ext4_sb_info *sbi, __u32 block_group,
- struct ext4_group_desc *gdp)
+static __le16 ext4_group_desc_csum(struct ext4_sb_info *sbi, __u32 block_group,
+ struct ext4_group_desc *gdp)
{
+ int offset;
__u16 crc = 0;
+ __le32 le_group = cpu_to_le32(block_group);

- if (sbi->s_es->s_feature_ro_compat &
- cpu_to_le32(EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
- int offset = offsetof(struct ext4_group_desc, bg_checksum);
- __le32 le_group = cpu_to_le32(block_group);
-
- crc = crc16(~0, sbi->s_es->s_uuid, sizeof(sbi->s_es->s_uuid));
- crc = crc16(crc, (__u8 *)&le_group, sizeof(le_group));
- crc = crc16(crc, (__u8 *)gdp, offset);
- offset += sizeof(gdp->bg_checksum); /* skip checksum */
- /* for checksum of struct ext4_group_desc do the rest...*/
- if ((sbi->s_es->s_feature_incompat &
- cpu_to_le32(EXT4_FEATURE_INCOMPAT_64BIT)) &&
- offset < le16_to_cpu(sbi->s_es->s_desc_size))
- crc = crc16(crc, (__u8 *)gdp + offset,
- le16_to_cpu(sbi->s_es->s_desc_size) -
- offset);
+ if ((sbi->s_es->s_feature_ro_compat &
+ cpu_to_le32(EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) &&
+ (sbi->s_es->s_feature_incompat &
+ cpu_to_le32(EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM))) {
+ /* Use new metadata_csum algorithm */
+ __u16 old_csum;
+ __u32 csum32;
+
+ old_csum = gdp->bg_checksum;
+ gdp->bg_checksum = 0;
+ csum32 = ext4_chksum(sbi, sbi->s_csum_seed, (__u8 *)&le_group,
+ sizeof(le_group));
+ csum32 = ext4_chksum(sbi, csum32, (__u8 *)gdp,
+ sbi->s_desc_size);
+ gdp->bg_checksum = old_csum;
+
+ crc = csum32 & 0xFFFF;
+ goto out;
}

+ /* old crc16 code */
+ offset = offsetof(struct ext4_group_desc, bg_checksum);
+
+ crc = crc16(~0, sbi->s_es->s_uuid, sizeof(sbi->s_es->s_uuid));
+ crc = crc16(crc, (__u8 *)&le_group, sizeof(le_group));
+ crc = crc16(crc, (__u8 *)gdp, offset);
+ offset += sizeof(gdp->bg_checksum); /* skip checksum */
+ /* for checksum of struct ext4_group_desc do the rest...*/
+ if ((sbi->s_es->s_feature_incompat &
+ cpu_to_le32(EXT4_FEATURE_INCOMPAT_64BIT)) &&
+ offset < le16_to_cpu(sbi->s_es->s_desc_size))
+ crc = crc16(crc, (__u8 *)gdp + offset,
+ le16_to_cpu(sbi->s_es->s_desc_size) -
+ offset);
+
+out:
return cpu_to_le16(crc);
}

@@ -2129,6 +2149,15 @@ int ext4_group_desc_csum_verify(struct ext4_sb_info *sbi, __u32 block_group,
return 1;
}

+void ext4_group_desc_csum_set(struct ext4_sb_info *sbi, __u32 block_group,
+ struct ext4_group_desc *gdp)
+{
+ if (!(sbi->s_es->s_feature_ro_compat &
+ cpu_to_le32(EXT4_FEATURE_RO_COMPAT_GDT_CSUM)))
+ return;
+ gdp->bg_checksum = ext4_group_desc_csum(sbi, block_group, gdp);
+}
+
/* Called at mount-time, super-block is locked */
static int ext4_check_descriptors(struct super_block *sb,
ext4_group_t *first_not_zeroed)


2011-12-14 00:47:58

by djwong

[permalink] [raw]
Subject: [PATCH 15/23] jbd2: Change disk layout for metadata checksumming

Define flags and allocate space in on-disk journal structures to support
checksumming of journal metadata.

Signed-off-by: Darrick J. Wong <[email protected]>
---
include/linux/jbd2.h | 28 +++++++++++++++++++++++++++-
1 files changed, 27 insertions(+), 1 deletions(-)


diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 2092ea2..ecd9b45 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -147,12 +147,24 @@ typedef struct journal_header_s
#define JBD2_CRC32_CHKSUM 1
#define JBD2_MD5_CHKSUM 2
#define JBD2_SHA1_CHKSUM 3
+#define JBD2_CRC32C_CHKSUM 4

#define JBD2_CRC32_CHKSUM_SIZE 4

#define JBD2_CHECKSUM_BYTES (32 / sizeof(u32))
/*
* Commit block header for storing transactional checksums:
+ *
+ * NOTE: If FEATURE_COMPAT_CHECKSUM (checksum v1) is set, the h_chksum*
+ * fields are used to store a checksum of the descriptor and data blocks.
+ *
+ * If FEATURE_INCOMPAT_CSUM_V2 (checksum v2) is set, then the h_chksum
+ * field is used to store crc32c(uuid+commit_block). Each journal metadata
+ * block gets its own checksum, and data block checksums are stored in
+ * journal_block_tag (in the descriptor). The other h_chksum* fields are
+ * not used.
+ *
+ * Checksum v1 and v2 are mutually exclusive features.
*/
struct commit_header {
__be32 h_magic;
@@ -177,11 +189,17 @@ typedef struct journal_block_tag_s
__be32 t_blocknr; /* The on-disk block number */
__be32 t_flags; /* See below */
__be32 t_blocknr_high; /* most-significant high 32bits. */
+ __be32 t_checksum; /* crc32c(uuid+seq+block) */
} journal_block_tag_t;

#define JBD2_TAG_SIZE32 (offsetof(journal_block_tag_t, t_blocknr_high))
#define JBD2_TAG_SIZE64 (sizeof(journal_block_tag_t))

+/* Tail of descriptor block, for checksumming */
+struct jbd2_journal_block_tail {
+ __be32 t_checksum; /* crc32c(uuid+descr_block) */
+};
+
/*
* The revoke descriptor: used on disk to describe a series of blocks to
* be revoked from the log
@@ -192,6 +210,10 @@ typedef struct jbd2_journal_revoke_header_s
__be32 r_count; /* Count of bytes used in the block */
} jbd2_journal_revoke_header_t;

+/* Tail of revoke block, for checksumming */
+struct jbd2_journal_revoke_tail {
+ __be32 r_checksum; /* crc32c(uuid+revoke_block) */
+};

/* Definitions for the journal tag flags word: */
#define JBD2_FLAG_ESCAPE 1 /* on-disk block is escaped */
@@ -241,7 +263,10 @@ typedef struct journal_superblock_s
__be32 s_max_trans_data; /* Limit of data blocks per trans. */

/* 0x0050 */
- __u32 s_padding[44];
+ __u8 s_checksum_type; /* checksum type */
+ __u8 s_padding2[3];
+ __u32 s_padding[42];
+ __be32 s_checksum; /* crc32c(superblock) */

/* 0x0100 */
__u8 s_users[16*48]; /* ids of all fs'es sharing the log */
@@ -263,6 +288,7 @@ typedef struct journal_superblock_s
#define JBD2_FEATURE_INCOMPAT_REVOKE 0x00000001
#define JBD2_FEATURE_INCOMPAT_64BIT 0x00000002
#define JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT 0x00000004
+#define JBD2_FEATURE_INCOMPAT_CSUM_V2 0x00000008

/* Features known to this kernel version: */
#define JBD2_KNOWN_COMPAT_FEATURES JBD2_FEATURE_COMPAT_CHECKSUM


2011-12-14 00:48:04

by djwong

[permalink] [raw]
Subject: [PATCH 16/23] jbd2: Enable journal clients to enable v2 checksumming

Add in the necessary code so that journal clients can enable the new journal
checksumming features.

Signed-off-by: Darrick J. Wong <[email protected]>
---
fs/ext4/super.c | 55 ++++++++++++++++++++++++++++++++++++++++-------------
fs/jbd2/journal.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 92 insertions(+), 13 deletions(-)


diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 6838d01..7bcbe47 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3173,6 +3173,44 @@ static void ext4_destroy_lazyinit_thread(void)
kthread_stop(ext4_lazyinit_task);
}

+static int set_journal_csum_feature_set(struct super_block *sb)
+{
+ int ret = 1;
+ int compat, incompat;
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+
+ if (EXT4_HAS_RO_COMPAT_FEATURE(sb,
+ EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) {
+ /* journal checksum v2 */
+ compat = 0;
+ incompat = JBD2_FEATURE_INCOMPAT_CSUM_V2;
+ } else {
+ /* journal checksum v1 */
+ compat = JBD2_FEATURE_COMPAT_CHECKSUM;
+ incompat = 0;
+ }
+
+ if (test_opt(sb, JOURNAL_ASYNC_COMMIT)) {
+ ret = jbd2_journal_set_features(sbi->s_journal,
+ compat, 0,
+ JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT |
+ incompat);
+ } else if (test_opt(sb, JOURNAL_CHECKSUM)) {
+ ret = jbd2_journal_set_features(sbi->s_journal,
+ compat, 0,
+ incompat);
+ jbd2_journal_clear_features(sbi->s_journal, 0, 0,
+ JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT);
+ } else {
+ jbd2_journal_clear_features(sbi->s_journal,
+ JBD2_FEATURE_COMPAT_CHECKSUM, 0,
+ JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT |
+ JBD2_FEATURE_INCOMPAT_CSUM_V2);
+ }
+
+ return ret;
+}
+
static int ext4_fill_super(struct super_block *sb, void *data, int silent)
{
char *orig_data = kstrdup(data, GFP_KERNEL);
@@ -3762,19 +3800,10 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
goto failed_mount_wq;
}

- if (test_opt(sb, JOURNAL_ASYNC_COMMIT)) {
- jbd2_journal_set_features(sbi->s_journal,
- JBD2_FEATURE_COMPAT_CHECKSUM, 0,
- JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT);
- } else if (test_opt(sb, JOURNAL_CHECKSUM)) {
- jbd2_journal_set_features(sbi->s_journal,
- JBD2_FEATURE_COMPAT_CHECKSUM, 0, 0);
- jbd2_journal_clear_features(sbi->s_journal, 0, 0,
- JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT);
- } else {
- jbd2_journal_clear_features(sbi->s_journal,
- JBD2_FEATURE_COMPAT_CHECKSUM, 0,
- JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT);
+ if (!set_journal_csum_feature_set(sb)) {
+ ext4_msg(sb, KERN_ERR, "Failed to set journal checksum "
+ "feature set");
+ goto failed_mount_wq;
}

/* We have now updated the journal if required, so we can
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 0fa0123..300b46f 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -100,6 +100,15 @@ static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
static void __journal_abort_soft (journal_t *journal, int errno);
static int jbd2_journal_create_slab(size_t slab_size);

+/* Checksumming functions */
+int jbd2_verify_csum_type(journal_t *j, journal_superblock_t *sb)
+{
+ if (!JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+ return 1;
+
+ return sb->s_checksum_type == JBD2_CRC32C_CHKSUM;
+}
+
/*
* Helper function used to manage commit timeouts
*/
@@ -1222,6 +1231,9 @@ static int journal_get_superblock(journal_t *journal)
}
}

+ if (buffer_verified(bh))
+ return 0;
+
sb = journal->j_superblock;

err = -EINVAL;
@@ -1259,6 +1271,21 @@ static int journal_get_superblock(journal_t *journal)
goto out;
}

+ if (JBD2_HAS_COMPAT_FEATURE(journal, JBD2_FEATURE_COMPAT_CHECKSUM) &&
+ JBD2_HAS_INCOMPAT_FEATURE(journal, JBD2_FEATURE_INCOMPAT_CSUM_V2)) {
+ /* Can't have checksum v1 and v2 on at the same time! */
+ printk(KERN_ERR "JBD: Can't enable checksumming v1 and v2 "
+ "at the same time!\n");
+ goto out;
+ }
+
+ if (!jbd2_verify_csum_type(journal, sb)) {
+ printk(KERN_ERR "JBD: Unknown checksum type\n");
+ goto out;
+ }
+
+ set_buffer_verified(bh);
+
return 0;

out:
@@ -1502,6 +1529,10 @@ int jbd2_journal_check_available_features (journal_t *journal, unsigned long com
int jbd2_journal_set_features (journal_t *journal, unsigned long compat,
unsigned long ro, unsigned long incompat)
{
+#define INCOMPAT_FEATURE_ON(f) \
+ ((incompat & (f)) && !(sb->s_feature_incompat & cpu_to_be32(f)))
+#define COMPAT_FEATURE_ON(f) \
+ ((compat & (f)) && !(sb->s_feature_compat & cpu_to_be32(f)))
journal_superblock_t *sb;

if (jbd2_journal_check_used_features(journal, compat, ro, incompat))
@@ -1510,16 +1541,35 @@ int jbd2_journal_set_features (journal_t *journal, unsigned long compat,
if (!jbd2_journal_check_available_features(journal, compat, ro, incompat))
return 0;

+ /* Asking for checksumming v2 and v1? Only give them v2. */
+ if (incompat & JBD2_FEATURE_INCOMPAT_CSUM_V2 &&
+ compat & JBD2_FEATURE_COMPAT_CHECKSUM)
+ compat &= ~JBD2_FEATURE_COMPAT_CHECKSUM;
+
jbd_debug(1, "Setting new features 0x%lx/0x%lx/0x%lx\n",
compat, ro, incompat);

sb = journal->j_superblock;

+ /* If enabling v2 checksums, update superblock */
+ if (INCOMPAT_FEATURE_ON(JBD2_FEATURE_INCOMPAT_CSUM_V2)) {
+ sb->s_checksum_type = JBD2_CRC32C_CHKSUM;
+ sb->s_feature_compat &=
+ ~cpu_to_be32(JBD2_FEATURE_COMPAT_CHECKSUM);
+ }
+
+ /* If enabling v1 checksums, downgrade superblock */
+ if (COMPAT_FEATURE_ON(JBD2_FEATURE_COMPAT_CHECKSUM))
+ sb->s_feature_incompat &=
+ ~cpu_to_be32(JBD2_FEATURE_INCOMPAT_CSUM_V2);
+
sb->s_feature_compat |= cpu_to_be32(compat);
sb->s_feature_ro_compat |= cpu_to_be32(ro);
sb->s_feature_incompat |= cpu_to_be32(incompat);

return 1;
+#undef COMPAT_FEATURE_ON
+#undef INCOMPAT_FEATURE_ON
}

/*

2011-12-14 00:48:25

by djwong

[permalink] [raw]
Subject: [PATCH 19/23] jbd2: Checksum revocation blocks

Compute and verify revoke blocks inside the journal.

Signed-off-by: Darrick J. Wong <[email protected]>
---
fs/jbd2/recovery.c | 22 ++++++++++++++++++++++
fs/jbd2/revoke.c | 27 ++++++++++++++++++++++++++-
2 files changed, 48 insertions(+), 1 deletions(-)


diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
index da6d7ba..bfc52cd 100644
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@@ -703,6 +703,25 @@ static int do_one_pass(journal_t *journal,
return err;
}

+static int jbd2_revoke_block_csum_verify(journal_t *j,
+ void *buf)
+{
+ struct jbd2_journal_revoke_tail *tail;
+ __u32 provided, calculated;
+
+ if (!JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+ return 1;
+
+ tail = (struct jbd2_journal_revoke_tail *)(buf + j->j_blocksize -
+ sizeof(struct jbd2_journal_revoke_tail));
+ provided = tail->r_checksum;
+ tail->r_checksum = 0;
+ calculated = jbd2_chksum(j, j->j_csum_seed, buf, j->j_blocksize);
+ tail->r_checksum = provided;
+
+ provided = be32_to_cpu(provided);
+ return provided == calculated;
+}

/* Scan a revoke record, marking all blocks mentioned as revoked. */

@@ -717,6 +736,9 @@ static int scan_revoke_records(journal_t *journal, struct buffer_head *bh,
offset = sizeof(jbd2_journal_revoke_header_t);
max = be32_to_cpu(header->r_count);

+ if (!jbd2_revoke_block_csum_verify(journal, header))
+ return -EINVAL;
+
if (JBD2_HAS_INCOMPAT_FEATURE(journal, JBD2_FEATURE_INCOMPAT_64BIT))
record_len = 8;

diff --git a/fs/jbd2/revoke.c b/fs/jbd2/revoke.c
index 69fd935..a197ba5 100644
--- a/fs/jbd2/revoke.c
+++ b/fs/jbd2/revoke.c
@@ -548,6 +548,7 @@ static void write_one_revoke_record(journal_t *journal,
struct jbd2_revoke_record_s *record,
int write_op)
{
+ int csum_size = 0;
struct journal_head *descriptor;
int offset;
journal_header_t *header;
@@ -562,9 +563,13 @@ static void write_one_revoke_record(journal_t *journal,
descriptor = *descriptorp;
offset = *offsetp;

+ /* Do we need to leave space at the end for a checksum? */
+ if (JBD2_HAS_INCOMPAT_FEATURE(journal, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+ csum_size = sizeof(struct jbd2_journal_revoke_tail);
+
/* Make sure we have a descriptor with space left for the record */
if (descriptor) {
- if (offset == journal->j_blocksize) {
+ if (offset >= journal->j_blocksize - csum_size) {
flush_descriptor(journal, descriptor, offset, write_op);
descriptor = NULL;
}
@@ -601,6 +606,24 @@ static void write_one_revoke_record(journal_t *journal,
*offsetp = offset;
}

+static void jbd2_revoke_csum_set(journal_t *j,
+ struct journal_head *descriptor)
+{
+ struct jbd2_journal_revoke_tail *tail;
+ __u32 csum;
+
+ if (!JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+ return;
+
+ tail = (struct jbd2_journal_revoke_tail *)
+ (jh2bh(descriptor)->b_data + j->j_blocksize -
+ sizeof(struct jbd2_journal_revoke_tail));
+ tail->r_checksum = 0;
+ csum = jbd2_chksum(j, j->j_csum_seed, jh2bh(descriptor)->b_data,
+ j->j_blocksize);
+ tail->r_checksum = cpu_to_be32(csum);
+}
+
/*
* Flush a revoke descriptor out to the journal. If we are aborting,
* this is a noop; otherwise we are generating a buffer which needs to
@@ -622,6 +645,8 @@ static void flush_descriptor(journal_t *journal,

header = (jbd2_journal_revoke_header_t *) jh2bh(descriptor)->b_data;
header->r_count = cpu_to_be32(offset);
+ jbd2_revoke_csum_set(journal, descriptor);
+
set_buffer_jwrite(bh);
BUFFER_TRACE(bh, "write");
set_buffer_dirty(bh);

2011-12-14 00:48:32

by djwong

[permalink] [raw]
Subject: [PATCH 20/23] jbd2: Checksum descriptor blocks

Calculate and verify a checksum of each descriptor block.

Signed-off-by: Darrick J. Wong <[email protected]>
---
fs/jbd2/commit.c | 26 ++++++++++++++++++++++++--
fs/jbd2/recovery.c | 37 ++++++++++++++++++++++++++++++++++++-
2 files changed, 60 insertions(+), 3 deletions(-)


diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 68d704d..99dbb9a 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -302,6 +302,24 @@ static void write_tag_block(int tag_bytes, journal_block_tag_t *tag,
tag->t_blocknr_high = cpu_to_be32((block >> 31) >> 1);
}

+static void jbd2_descr_block_csum_set(journal_t *j,
+ struct journal_head *descriptor)
+{
+ struct jbd2_journal_block_tail *tail;
+ __u32 csum;
+
+ if (!JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+ return;
+
+ tail = (struct jbd2_journal_block_tail *)
+ (jh2bh(descriptor)->b_data + j->j_blocksize -
+ sizeof(struct jbd2_journal_block_tail));
+ tail->t_checksum = 0;
+ csum = jbd2_chksum(j, j->j_csum_seed, jh2bh(descriptor)->b_data,
+ j->j_blocksize);
+ tail->t_checksum = cpu_to_be32(csum);
+}
+
/*
* jbd2_journal_commit_transaction
*
@@ -331,6 +349,10 @@ void jbd2_journal_commit_transaction(journal_t *journal)
struct buffer_head *cbh = NULL; /* For transactional checksums */
__u32 crc32_sum = ~0;
struct blk_plug plug;
+ int csum_size = 0;
+
+ if (JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+ csum_size = sizeof(struct jbd2_journal_block_tail);

/*
* First job: lock down the current transaction and wait for
@@ -623,7 +645,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)

if (bufs == journal->j_wbufsize ||
commit_transaction->t_buffers == NULL ||
- space_left < tag_bytes + 16) {
+ space_left < tag_bytes + 16 + csum_size) {

jbd_debug(4, "JBD2: Submit %d IOs\n", bufs);

@@ -632,7 +654,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
the last tag we set up. */

tag->t_flags |= cpu_to_be32(JBD2_FLAG_LAST_TAG);
-
+ jbd2_descr_block_csum_set(journal, descriptor);
start_journal_io:
for (i = 0; i < bufs; i++) {
struct buffer_head *bh = wbuf[i];
diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
index bfc52cd..513006b 100644
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@@ -173,6 +173,25 @@ static int jread(struct buffer_head **bhp, journal_t *journal,
return 0;
}

+static int jbd2_descr_block_csum_verify(journal_t *j,
+ void *buf)
+{
+ struct jbd2_journal_block_tail *tail;
+ __u32 provided, calculated;
+
+ if (!JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+ return 1;
+
+ tail = (struct jbd2_journal_block_tail *)(buf + j->j_blocksize -
+ sizeof(struct jbd2_journal_block_tail));
+ provided = tail->t_checksum;
+ tail->t_checksum = 0;
+ calculated = jbd2_chksum(j, j->j_csum_seed, buf, j->j_blocksize);
+ tail->t_checksum = provided;
+
+ provided = be32_to_cpu(provided);
+ return provided == calculated;
+}

/*
* Count the number of in-use tags in a journal descriptor block.
@@ -185,6 +204,9 @@ static int count_tags(journal_t *journal, struct buffer_head *bh)
int nr = 0, size = journal->j_blocksize;
int tag_bytes = journal_tag_bytes(journal);

+ if (JBD2_HAS_INCOMPAT_FEATURE(journal, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+ size -= sizeof(struct jbd2_journal_block_tail);
+
tagp = &bh->b_data[sizeof(journal_header_t)];

while ((tagp - bh->b_data + tag_bytes) <= size) {
@@ -363,6 +385,7 @@ static int do_one_pass(journal_t *journal,
int blocktype;
int tag_bytes = journal_tag_bytes(journal);
__u32 crc32_sum = ~0; /* Transactional Checksums */
+ int descr_csum_size = 0;

/*
* First thing is to establish what we expect to find in the log
@@ -448,6 +471,18 @@ static int do_one_pass(journal_t *journal,

switch(blocktype) {
case JBD2_DESCRIPTOR_BLOCK:
+ /* Verify checksum first */
+ if (JBD2_HAS_INCOMPAT_FEATURE(journal,
+ JBD2_FEATURE_INCOMPAT_CSUM_V2))
+ descr_csum_size =
+ sizeof(struct jbd2_journal_block_tail);
+ if (descr_csum_size > 0 &&
+ !jbd2_descr_block_csum_verify(journal,
+ bh->b_data)) {
+ err = -EIO;
+ goto failed;
+ }
+
/* If it is a valid descriptor block, replay it
* in pass REPLAY; if journal_checksums enabled, then
* calculate checksums in PASS_SCAN, otherwise,
@@ -478,7 +513,7 @@ static int do_one_pass(journal_t *journal,

tagp = &bh->b_data[sizeof(journal_header_t)];
while ((tagp - bh->b_data + tag_bytes)
- <= journal->j_blocksize) {
+ <= journal->j_blocksize - descr_csum_size) {
unsigned long io_block;

tag = (journal_block_tag_t *) tagp;

2011-12-14 00:48:17

by djwong

[permalink] [raw]
Subject: [PATCH 18/23] jbd2: Update structure definitions and flags to support extended checksumming

Add structure definitions, flags, and constants to extend checksum protection
to all journal blocks.

Signed-off-by: Darrick J. Wong <[email protected]>
---
fs/jbd2/journal.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/jbd2.h | 3 +++
2 files changed, 53 insertions(+), 0 deletions(-)


diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index bb7f03f..bc80533 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -109,6 +109,34 @@ int jbd2_verify_csum_type(journal_t *j, journal_superblock_t *sb)
return sb->s_checksum_type == JBD2_CRC32C_CHKSUM;
}

+static __u32 jbd2_superblock_csum(journal_t *j, journal_superblock_t *sb)
+{
+ __u32 csum, old_csum;
+
+ old_csum = sb->s_checksum;
+ sb->s_checksum = 0;
+ csum = jbd2_chksum(j, ~0, (char *)sb, sizeof(journal_superblock_t));
+ sb->s_checksum = old_csum;
+
+ return cpu_to_be32(csum);
+}
+
+int jbd2_superblock_csum_verify(journal_t *j, journal_superblock_t *sb)
+{
+ if (!JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+ return 1;
+
+ return sb->s_checksum == jbd2_superblock_csum(j, sb);
+}
+
+void jbd2_superblock_csum_set(journal_t *j, journal_superblock_t *sb)
+{
+ if (!JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+ return;
+
+ sb->s_checksum = jbd2_superblock_csum(j, sb);
+}
+
/*
* Helper function used to manage commit timeouts
*/
@@ -1178,6 +1206,7 @@ void jbd2_journal_update_superblock(journal_t *journal, int wait)
sb->s_sequence = cpu_to_be32(journal->j_tail_sequence);
sb->s_start = cpu_to_be32(journal->j_tail);
sb->s_errno = cpu_to_be32(journal->j_errno);
+ jbd2_superblock_csum_set(journal, sb);
read_unlock(&journal->j_state_lock);

BUFFER_TRACE(bh, "marking dirty");
@@ -1295,6 +1324,17 @@ static int journal_get_superblock(journal_t *journal)
}
}

+ /* Check superblock checksum */
+ if (!jbd2_superblock_csum_verify(journal, sb)) {
+ printk(KERN_ERR "JBD: journal checksum error\n");
+ goto out;
+ }
+
+ /* Precompute checksum seed for all metadata */
+ if (JBD2_HAS_INCOMPAT_FEATURE(journal, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+ journal->j_csum_seed = jbd2_chksum(journal, ~0, sb->s_uuid,
+ sizeof(sb->s_uuid));
+
set_buffer_verified(bh);

return 0;
@@ -1581,6 +1621,13 @@ int jbd2_journal_set_features (journal_t *journal, unsigned long compat,
return 0;
}
}
+
+ /* Precompute checksum seed for all metadata */
+ if (JBD2_HAS_INCOMPAT_FEATURE(journal,
+ JBD2_FEATURE_INCOMPAT_CSUM_V2))
+ journal->j_csum_seed = jbd2_chksum(journal, ~0,
+ sb->s_uuid,
+ sizeof(sb->s_uuid));
}

/* If enabling v1 checksums, downgrade superblock */
@@ -1591,6 +1638,7 @@ int jbd2_journal_set_features (journal_t *journal, unsigned long compat,
sb->s_feature_compat |= cpu_to_be32(compat);
sb->s_feature_ro_compat |= cpu_to_be32(ro);
sb->s_feature_incompat |= cpu_to_be32(incompat);
+ jbd2_journal_update_superblock(journal, 0);

return 1;
#undef COMPAT_FEATURE_ON
@@ -1621,6 +1669,7 @@ void jbd2_journal_clear_features(journal_t *journal, unsigned long compat,
sb->s_feature_compat &= ~cpu_to_be32(compat);
sb->s_feature_ro_compat &= ~cpu_to_be32(ro);
sb->s_feature_incompat &= ~cpu_to_be32(incompat);
+ jbd2_journal_update_superblock(journal, 0);
}
EXPORT_SYMBOL(jbd2_journal_clear_features);

@@ -1669,6 +1718,7 @@ static int journal_convert_superblock_v1(journal_t *journal,

sb->s_nr_users = cpu_to_be32(1);
sb->s_header.h_blocktype = cpu_to_be32(JBD2_SUPERBLOCK_V2);
+ jbd2_superblock_csum_set(journal, sb);
journal->j_format_version = 2;

bh = journal->j_sb_buffer;
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 5b2abf9..51d4a0b 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -969,6 +969,9 @@ struct journal_s

/* Reference to checksum algorithm driver via cryptoapi */
struct crypto_shash *j_chksum_driver;
+
+ /* Precomputed journal UUID checksum for seeding other checksums */
+ __u32 j_csum_seed;
};

/*

2011-12-14 00:48:57

by djwong

[permalink] [raw]
Subject: [PATCH 14/23] ext4: Add checksums to the MMP block

Compute and verify a checksum for the MMP block.

Signed-off-by: Darrick J. Wong <[email protected]>
---
fs/ext4/ext4.h | 3 +++
fs/ext4/mmp.c | 44 +++++++++++++++++++++++++++++++++++++++-----
2 files changed, 42 insertions(+), 5 deletions(-)


diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index ae71af3..f6bf9d0 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2378,6 +2378,9 @@ extern int ext4_bio_write_page(struct ext4_io_submit *io,

/* mmp.c */
extern int ext4_multi_mount_protect(struct super_block *, ext4_fsblk_t);
+extern void ext4_mmp_csum_set(struct super_block *sb, struct mmp_struct *mmp);
+extern int ext4_mmp_csum_verify(struct super_block *sb,
+ struct mmp_struct *mmp);

/* BH_Uninit flag: blocks are allocated but uninitialized on disk */
enum ext4_state_bits {
diff --git a/fs/ext4/mmp.c b/fs/ext4/mmp.c
index 7ea4ba4..142e9d7 100644
--- a/fs/ext4/mmp.c
+++ b/fs/ext4/mmp.c
@@ -6,12 +6,45 @@

#include "ext4.h"

+/* Checksumming functions */
+static __u32 ext4_mmp_csum(struct super_block *sb, struct mmp_struct *mmp)
+{
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+ int offset = offsetof(struct mmp_struct, mmp_checksum);
+ __u32 csum;
+
+ csum = ext4_chksum(sbi, sbi->s_csum_seed, (char *)mmp, offset);
+
+ return cpu_to_le32(csum);
+}
+
+int ext4_mmp_csum_verify(struct super_block *sb, struct mmp_struct *mmp)
+{
+ if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
+ EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ return 1;
+
+ return mmp->mmp_checksum == ext4_mmp_csum(sb, mmp);
+}
+
+void ext4_mmp_csum_set(struct super_block *sb, struct mmp_struct *mmp)
+{
+ if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
+ EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ return;
+
+ mmp->mmp_checksum = ext4_mmp_csum(sb, mmp);
+}
+
/*
* Write the MMP block using WRITE_SYNC to try to get the block on-disk
* faster.
*/
-static int write_mmp_block(struct buffer_head *bh)
+static int write_mmp_block(struct super_block *sb, struct buffer_head *bh)
{
+ struct mmp_struct *mmp = (struct mmp_struct *)(bh->b_data);
+
+ ext4_mmp_csum_set(sb, mmp);
mark_buffer_dirty(bh);
lock_buffer(bh);
bh->b_end_io = end_buffer_write_sync;
@@ -59,7 +92,8 @@ static int read_mmp_block(struct super_block *sb, struct buffer_head **bh,
}

mmp = (struct mmp_struct *)((*bh)->b_data);
- if (le32_to_cpu(mmp->mmp_magic) != EXT4_MMP_MAGIC)
+ if (le32_to_cpu(mmp->mmp_magic) != EXT4_MMP_MAGIC ||
+ !ext4_mmp_csum_verify(sb, mmp))
return -EINVAL;

return 0;
@@ -120,7 +154,7 @@ static int kmmpd(void *data)
mmp->mmp_time = cpu_to_le64(get_seconds());
last_update_time = jiffies;

- retval = write_mmp_block(bh);
+ retval = write_mmp_block(sb, bh);
/*
* Don't spew too many error messages. Print one every
* (s_mmp_update_interval * 60) seconds.
@@ -200,7 +234,7 @@ static int kmmpd(void *data)
mmp->mmp_seq = cpu_to_le32(EXT4_MMP_SEQ_CLEAN);
mmp->mmp_time = cpu_to_le64(get_seconds());

- retval = write_mmp_block(bh);
+ retval = write_mmp_block(sb, bh);

failed:
kfree(data);
@@ -299,7 +333,7 @@ skip:
seq = mmp_new_seq();
mmp->mmp_seq = cpu_to_le32(seq);

- retval = write_mmp_block(bh);
+ retval = write_mmp_block(sb, bh);
if (retval)
goto failed;



2011-12-14 00:48:11

by djwong

[permalink] [raw]
Subject: [PATCH 17/23] jbd2: Grab a reference to the crc32c driver only when necessary

Obtain a reference to the crc32c driver only when the journal needs it for
checksum v2.

Signed-off-by: Darrick J. Wong <[email protected]>
---
fs/jbd2/Kconfig | 2 ++
fs/jbd2/journal.c | 25 +++++++++++++++++++++++++
include/linux/jbd2.h | 23 +++++++++++++++++++++++
3 files changed, 50 insertions(+), 0 deletions(-)


diff --git a/fs/jbd2/Kconfig b/fs/jbd2/Kconfig
index f32f346..69a48c2 100644
--- a/fs/jbd2/Kconfig
+++ b/fs/jbd2/Kconfig
@@ -1,6 +1,8 @@
config JBD2
tristate
select CRC32
+ select CRYPTO
+ select CRYPTO_CRC32C
help
This is a generic journaling layer for block devices that support
both 32-bit and 64-bit block numbers. It is currently used by
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 300b46f..bb7f03f 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1284,6 +1284,17 @@ static int journal_get_superblock(journal_t *journal)
goto out;
}

+ /* Load the checksum driver */
+ if (JBD2_HAS_INCOMPAT_FEATURE(journal, JBD2_FEATURE_INCOMPAT_CSUM_V2)) {
+ journal->j_chksum_driver = crypto_alloc_shash("crc32c", 0, 0);
+ if (IS_ERR(journal->j_chksum_driver)) {
+ printk(KERN_ERR "JBD: Cannot load crc32c driver.\n");
+ err = PTR_ERR(journal->j_chksum_driver);
+ journal->j_chksum_driver = NULL;
+ goto out;
+ }
+ }
+
set_buffer_verified(bh);

return 0;
@@ -1440,6 +1451,8 @@ int jbd2_journal_destroy(journal_t *journal)
iput(journal->j_inode);
if (journal->j_revoke)
jbd2_journal_destroy_revoke(journal);
+ if (journal->j_chksum_driver)
+ crypto_free_shash(journal->j_chksum_driver);
kfree(journal->j_wbuf);
kfree(journal);

@@ -1556,6 +1569,18 @@ int jbd2_journal_set_features (journal_t *journal, unsigned long compat,
sb->s_checksum_type = JBD2_CRC32C_CHKSUM;
sb->s_feature_compat &=
~cpu_to_be32(JBD2_FEATURE_COMPAT_CHECKSUM);
+
+ /* Load the checksum driver */
+ if (journal->j_chksum_driver == NULL) {
+ journal->j_chksum_driver = crypto_alloc_shash("crc32c",
+ 0, 0);
+ if (IS_ERR(journal->j_chksum_driver)) {
+ printk(KERN_ERR "JBD: Cannot load crc32c "
+ "driver.\n");
+ journal->j_chksum_driver = NULL;
+ return 0;
+ }
+ }
}

/* If enabling v1 checksums, downgrade superblock */
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index ecd9b45..5b2abf9 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -31,6 +31,7 @@
#include <linux/mutex.h>
#include <linux/timer.h>
#include <linux/slab.h>
+#include <crypto/hash.h>
#endif

#define journal_oom_retry 1
@@ -965,6 +966,9 @@ struct journal_s
* superblock pointer here
*/
void *j_private;
+
+ /* Reference to checksum algorithm driver via cryptoapi */
+ struct crypto_shash *j_chksum_driver;
};

/*
@@ -1283,6 +1287,25 @@ static inline int jbd_space_needed(journal_t *journal)

extern int jbd_blocks_per_page(struct inode *inode);

+static inline u32 jbd2_chksum(journal_t *journal, u32 crc,
+ const void *address, unsigned int length)
+{
+ struct {
+ struct shash_desc shash;
+ char ctx[crypto_shash_descsize(journal->j_chksum_driver)];
+ } desc;
+ int err;
+
+ desc.shash.tfm = journal->j_chksum_driver;
+ desc.shash.flags = 0;
+ *(u32 *)desc.ctx = crc;
+
+ err = crypto_shash_update(&desc.shash, address, length);
+ BUG_ON(err);
+
+ return *(u32 *)desc.ctx;
+}
+
#ifdef __KERNEL__

#define buffer_trace_init(bh) do {} while (0)

2011-12-14 00:48:54

by djwong

[permalink] [raw]
Subject: [PATCH 23/23] ext4/jbd2: Add metadata checksumming to the list of supported features

Activate the metadata checksumming feature by adding it to ext4 and jbd2's
lists of supported features.

Signed-off-by: Darrick J. Wong <[email protected]>
---
fs/ext4/ext4.h | 6 ++++--
include/linux/jbd2.h | 3 ++-
2 files changed, 6 insertions(+), 3 deletions(-)


diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index f6bf9d0..ca30846 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1461,7 +1461,8 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
EXT4_FEATURE_INCOMPAT_EXTENTS| \
EXT4_FEATURE_INCOMPAT_64BIT| \
EXT4_FEATURE_INCOMPAT_FLEX_BG| \
- EXT4_FEATURE_INCOMPAT_MMP)
+ EXT4_FEATURE_INCOMPAT_MMP| \
+ EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM)
#define EXT4_FEATURE_RO_COMPAT_SUPP (EXT4_FEATURE_RO_COMPAT_SPARSE_SUPER| \
EXT4_FEATURE_RO_COMPAT_LARGE_FILE| \
EXT4_FEATURE_RO_COMPAT_GDT_CSUM| \
@@ -1469,7 +1470,8 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE | \
EXT4_FEATURE_RO_COMPAT_BTREE_DIR |\
EXT4_FEATURE_RO_COMPAT_HUGE_FILE |\
- EXT4_FEATURE_RO_COMPAT_BIGALLOC)
+ EXT4_FEATURE_RO_COMPAT_BIGALLOC |\
+ EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)

/*
* Default values for user and/or group using reserved blocks
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 51d4a0b..b1c5857 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -296,7 +296,8 @@ typedef struct journal_superblock_s
#define JBD2_KNOWN_ROCOMPAT_FEATURES 0
#define JBD2_KNOWN_INCOMPAT_FEATURES (JBD2_FEATURE_INCOMPAT_REVOKE | \
JBD2_FEATURE_INCOMPAT_64BIT | \
- JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT)
+ JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT | \
+ JBD2_FEATURE_INCOMPAT_CSUM_V2)

#ifdef __KERNEL__


2011-12-14 00:48:47

by djwong

[permalink] [raw]
Subject: [PATCH 22/23] jbd2: Checksum data blocks that are stored in the journal

Calculate and verify checksums of each data block being stored in the journal.

Signed-off-by: Darrick J. Wong <[email protected]>
---
fs/jbd2/commit.c | 22 ++++++++++++++++++++++
fs/jbd2/journal.c | 10 ++++++++--
fs/jbd2/recovery.c | 30 ++++++++++++++++++++++++++++++
3 files changed, 60 insertions(+), 2 deletions(-)


diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index ccf0b6f..25bb1c3 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -339,6 +339,26 @@ static void jbd2_descr_block_csum_set(journal_t *j,
tail->t_checksum = cpu_to_be32(csum);
}

+static void jbd2_block_tag_csum_set(journal_t *j, journal_block_tag_t *tag,
+ struct buffer_head *bh, __u32 sequence)
+{
+ struct page *page = bh->b_page;
+ __u8 *addr;
+ __u32 csum;
+
+ if (!JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+ return;
+
+ sequence = cpu_to_be32(sequence);
+ addr = kmap_atomic(page, KM_USER0);
+ csum = jbd2_chksum(j, j->j_csum_seed, (__u8 *)&sequence,
+ sizeof(sequence));
+ csum = jbd2_chksum(j, csum, addr + offset_in_page(bh->b_data),
+ bh->b_size);
+ kunmap_atomic(addr, KM_USER0);
+
+ tag->t_checksum = cpu_to_be32(csum);
+}
/*
* jbd2_journal_commit_transaction
*
@@ -649,6 +669,8 @@ void jbd2_journal_commit_transaction(journal_t *journal)
tag = (journal_block_tag_t *) tagp;
write_tag_block(tag_bytes, tag, jh2bh(jh)->b_blocknr);
tag->t_flags = cpu_to_be32(tag_flag);
+ jbd2_block_tag_csum_set(journal, tag, jh2bh(new_jh),
+ commit_transaction->t_tid);
tagp += tag_bytes;
space_left -= tag_bytes;

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index bc80533..ff4a870 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -2005,10 +2005,16 @@ int jbd2_journal_blocks_per_page(struct inode *inode)
*/
size_t journal_tag_bytes(journal_t *journal)
{
+ journal_block_tag_t tag;
+ size_t x = 0;
+
+ if (JBD2_HAS_INCOMPAT_FEATURE(journal, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+ x += sizeof(tag.t_checksum);
+
if (JBD2_HAS_INCOMPAT_FEATURE(journal, JBD2_FEATURE_INCOMPAT_64BIT))
- return JBD2_TAG_SIZE64;
+ return x + JBD2_TAG_SIZE64;
else
- return JBD2_TAG_SIZE32;
+ return x + JBD2_TAG_SIZE32;
}

/*
diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
index a757d8d..3dd8544 100644
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@@ -390,6 +390,23 @@ static int jbd2_commit_block_csum_verify(journal_t *j, void *buf)
return provided == calculated;
}

+static int jbd2_block_tag_csum_verify(journal_t *j, journal_block_tag_t *tag,
+ void *buf, __u32 sequence)
+{
+ __u32 provided, calculated;
+
+ if (!JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+ return 1;
+
+ sequence = cpu_to_be32(sequence);
+ calculated = jbd2_chksum(j, j->j_csum_seed, (__u8 *)&sequence,
+ sizeof(sequence));
+ calculated = jbd2_chksum(j, calculated, buf, j->j_blocksize);
+ provided = be32_to_cpu(tag->t_checksum);
+
+ return provided == cpu_to_be32(calculated);
+}
+
static int do_one_pass(journal_t *journal,
struct recovery_info *info, enum passtype pass)
{
@@ -566,6 +583,19 @@ static int do_one_pass(journal_t *journal,
goto skip_write;
}

+ /* Look for block corruption */
+ if (!jbd2_block_tag_csum_verify(
+ journal, tag, obh->b_data,
+ be32_to_cpu(tmp->h_sequence))) {
+ brelse(obh);
+ success = -EIO;
+ printk(KERN_ERR "JBD: Invalid "
+ "checksum recovering "
+ "block %llu in log\n",
+ blocknr);
+ continue;
+ }
+
/* Find a buffer for the new
* data being restored */
nbh = __getblk(journal->j_fs_dev,


2011-12-14 00:48:38

by djwong

[permalink] [raw]
Subject: [PATCH 21/23] jbd2: Checksum commit blocks

Calculate and verify the checksum of commit blocks. In checksum v2, deprecate
most of the checksum v1 commit block checksum fields, since each block has its
own checksum.

Signed-off-by: Darrick J. Wong <[email protected]>
---
fs/jbd2/commit.c | 21 ++++++++++++++++++++-
fs/jbd2/recovery.c | 31 +++++++++++++++++++++++++++++++
2 files changed, 51 insertions(+), 1 deletions(-)


diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 99dbb9a..ccf0b6f 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -86,6 +86,24 @@ nope:
__brelse(bh);
}

+static void jbd2_commit_block_csum_set(journal_t *j,
+ struct journal_head *descriptor)
+{
+ struct commit_header *h;
+ __u32 csum;
+
+ if (!JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+ return;
+
+ h = (struct commit_header *)(jh2bh(descriptor)->b_data);
+ h->h_chksum_type = 0;
+ h->h_chksum_size = 0;
+ h->h_chksum[0] = 0;
+ csum = jbd2_chksum(j, j->j_csum_seed, jh2bh(descriptor)->b_data,
+ j->j_blocksize);
+ h->h_chksum[0] = cpu_to_be32(csum);
+}
+
/*
* Done it all: now submit the commit record. We should have
* cleaned up our previous buffers by now, so if we are in abort
@@ -129,6 +147,7 @@ static int journal_submit_commit_record(journal_t *journal,
tmp->h_chksum_size = JBD2_CRC32_CHKSUM_SIZE;
tmp->h_chksum[0] = cpu_to_be32(crc32_sum);
}
+ jbd2_commit_block_csum_set(journal, descriptor);

JBUFFER_TRACE(descriptor, "submit commit block");
lock_buffer(bh);
@@ -351,7 +370,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
struct blk_plug plug;
int csum_size = 0;

- if (JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+ if (JBD2_HAS_INCOMPAT_FEATURE(journal, JBD2_FEATURE_INCOMPAT_CSUM_V2))
csum_size = sizeof(struct jbd2_journal_block_tail);

/*
diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
index 513006b..a757d8d 100644
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@@ -372,6 +372,24 @@ static int calc_chksums(journal_t *journal, struct buffer_head *bh,
return 0;
}

+static int jbd2_commit_block_csum_verify(journal_t *j, void *buf)
+{
+ struct commit_header *h;
+ __u32 provided, calculated;
+
+ if (!JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+ return 1;
+
+ h = buf;
+ provided = h->h_chksum[0];
+ h->h_chksum[0] = 0;
+ calculated = jbd2_chksum(j, j->j_csum_seed, buf, j->j_blocksize);
+ h->h_chksum[0] = provided;
+
+ provided = be32_to_cpu(provided);
+ return provided == calculated;
+}
+
static int do_one_pass(journal_t *journal,
struct recovery_info *info, enum passtype pass)
{
@@ -682,6 +700,19 @@ static int do_one_pass(journal_t *journal,
}
crc32_sum = ~0;
}
+ if (pass == PASS_SCAN &&
+ !jbd2_commit_block_csum_verify(journal,
+ bh->b_data)) {
+ info->end_transaction = next_commit_ID;
+
+ if (!JBD2_HAS_INCOMPAT_FEATURE(journal,
+ JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT)) {
+ journal->j_failed_commit =
+ next_commit_ID;
+ brelse(bh);
+ break;
+ }
+ }
brelse(bh);
next_commit_ID++;
continue;

2011-12-14 01:10:47

by djwong

[permalink] [raw]
Subject: Re: [PATCH v2.2 00/23] ext4: Add metadata checksumming

On Tue, Dec 13, 2011 at 04:46:14PM -0800, Darrick J. Wong wrote:
> Hi all,
>
> This patchset adds crc32c checksums to most of the ext4 metadata objects. A
> full design document is on the ext4 wiki[1] but I will summarize that document here.
>
> As much as we wish our storage hardware was totally reliable, it is still
> quite possible for data to be corrupted on disk, corrupted during transfer over
> a wire, or written to the wrong places. To protect against this sort of
> non-hostile corruption, it is desirable to store checksums of metadata objects
> on the filesystem to prevent broken metadata from shredding the filesystem.
>
> The crc32c polynomial was chosen for its improved error detection capabilities
> over crc32 and crc16, and because of its hardware acceleration on current and
> upcoming Intel and Sparc chips.
>
> Each type of metadata object has been retrofitted to store a checksum as follows:
>
> - The superblock stores a crc32c of itself.
> - Each inode stores crc32c(fs_uuid + inode_num + inode_gen + inode +
> slack_space_after_inode)
> - Block and inode bitmaps each get their own crc32c(fs_uuid + group_num +
> bitmap), stored in the block group descriptor.
> - Each extent tree block stores a crc32c(fs_uuid + inode_num + inode_gen +
> extent_entries) in unused space at the end of the block.
> - Each directory leaf block has an unused-looking directory entry big enough to
> store a crc32c(fs_uuid + inode_num + inode_gen + block) at the end of the
> block.
> - Each directory htree block is shortened to contain a crc32c(fs_uuid +
> inode_num + inode_gen + block) at the end of the block.
> - Extended attribute blocks store crc32c(fs_uuid + id + ea_block) in the
> header, where id is, depending on the refcount, either the inode_num and
> inode_gen; or the block number.
> - MMP blocks store crc32c(fs_uuid + mmpblock) at the end of the MMP block.
> - Block groups can now use crc32c instead of crc16.
> - The journal now has a v2 checksum feature flag.
> - crc32c(j_uuid + block) checksums have been inserted into descriptor blocks,
> commit blocks, revoke blocks, and the journal superblock.
> - Each block tag in a descriptor block has a checksum of the related data block.
>
> The patchset for e2fsprogs will be sent under separate cover only to linux-ext4
> as it is quite lengthy (~48 patches).
>
> As far as performance impact goes, I see nearly no change with a standard mail
> server ffsb simulation. On a test that involves only file creation and
> deletion and extent tree modifications, I see a drop of about 50 percent with
> the current kernel crc32c implementation; this improves to a drop of about 20
> percent with the enclosed crc32c implementation. However, given that metadata
> is usually a small fraction of total IO, it doesn't seem like the cost of
> enabling this feature is unreasonable.
>
> There are of course unresolved issues:
>
> - I haven't fixed it up to checksum the exclude bitmap yet. I'll probably
> submit that as an add-on to the snapshot patchset.
>
> - Using the journal commit hooks to delay crc32c calculation until dirty
> buffers are actually being written to disk.
>
> - Interaction with online resize code. Yongqiang seems to be in the process of
> rewriting this not to use custom metadata block write functions, but I haven't
> looked at it very closely yet.
>
> Please have a look at the design document and patches, and please feel free to
> suggest any changes.
>
> v2: Checksum the MMP block, store the checksum type in the superblock, include
> the inode generation in file checksums, and finally solve the problem of limited
> space in block groups by splitting the checksum into two halves.
>
> v2.1: Checksum the reserved parts of the htree tail structure. Fix some flag
> handling bugs with the mb cache init routine wherein bitmaps could fail to be
> checksummed at read time.
>
> v2.2: Reincorporate the FS UUID in the bitmap checksum calcuations. Move all
> disk layout changes to the front and the feature flag enablement to the end of
> the patch set. Fail journal recovery if revoke block fails checksum.
>
> This patchset has been tested on 3.2.0-rc5 on x64, i386, ppc64, and ppc32. The
> patches seems to work fine on all four platforms.

OH COME ON!!!!

stgit helpfully changed the From lines, screwing everything up. Awesome. I
apologize, I wasn't expecting stgit to change the From: lines when I migrated
the disk format changes to the front of the set.

I don't really want to respam the list just to fix this one little thing. If
people want more code changes I'll gladly make them and re-send. If not, then
Ted, I can just send you the whole mess as a big mbox file, or post them on a
webserver somewhere, with correct attribution.

--D
>
> --D
>
> [1] https://ext4.wiki.kernel.org/index.php/Ext4_Metadata_Checksums
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


2011-12-14 01:31:58

by djwong

[permalink] [raw]
Subject: Re: [PATCH v2.2 00/23] ext4: Add metadata checksumming

On Tue, Dec 13, 2011 at 05:10:28PM -0800, Darrick J. Wong wrote:
> On Tue, Dec 13, 2011 at 04:46:14PM -0800, Darrick J. Wong wrote:
<snip>
> > This patchset has been tested on 3.2.0-rc5 on x64, i386, ppc64, and ppc32. The
> > patches seems to work fine on all four platforms.
>
> OH COME ON!!!!
>
> stgit helpfully changed the From lines, screwing everything up. Awesome. I
> apologize, I wasn't expecting stgit to change the From: lines when I migrated
> the disk format changes to the front of the set.
>
> I don't really want to respam the list just to fix this one little thing. If
> people want more code changes I'll gladly make them and re-send. If not, then
> Ted, I can just send you the whole mess as a big mbox file, or post them on a
> webserver somewhere, with correct attribution.

The e2fsprogs patchset is also found here:
http://djwong.org/docs/ext4_checksum_progs/

The kernel patchset is also found here:
http://djwong.org/docs/ext4_checksum_kernel/

(Honestly, I don't enjoy spamming the list with 91 patches.)

--D
>
> --D
> >
> > --D
> >
> > [1] https://ext4.wiki.kernel.org/index.php/Ext4_Metadata_Checksums
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2012-01-06 18:46:19

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH v2.2 00/23] ext4: Add metadata checksumming

On Tue, Dec 13, 2011 at 04:46:14PM -0800, Darrick J. Wong wrote:
> This patchset has been tested on 3.2.0-rc5 on x64, i386, ppc64, and ppc32.
> The patches seems to work fine on all four platforms.
[snip]
> --D
>
> [1] https://ext4.wiki.kernel.org/index.php/Ext4_Metadata_Checksums

This URL is broken due to the kernel.org rebuild, so if it is going into
Git it should be updated to the new URL:

https://ext4.wiki.kernel.org/articles/e/x/t/Ext4_Metadata_Checksums_4d24.html

Cheers, Andreas






2012-01-06 18:57:58

by djwong

[permalink] [raw]
Subject: Re: [PATCH v2.2 00/23] ext4: Add metadata checksumming

On Fri, Jan 06, 2012 at 11:46:17AM -0700, Andreas Dilger wrote:
> On Tue, Dec 13, 2011 at 04:46:14PM -0800, Darrick J. Wong wrote:
> > This patchset has been tested on 3.2.0-rc5 on x64, i386, ppc64, and ppc32.
> > The patches seems to work fine on all four platforms.
> [snip]
> > --D
> >
> > [1] https://ext4.wiki.kernel.org/index.php/Ext4_Metadata_Checksums
>
> This URL is broken due to the kernel.org rebuild, so if it is going into
> Git it should be updated to the new URL:
>
> https://ext4.wiki.kernel.org/articles/e/x/t/Ext4_Metadata_Checksums_4d24.html

Ok, I was about to respam^post the patchset since the merge window is open
again. I'll update the URL.

Anyone know if wiki logins/write access will be restored soon?

--D
>
> Cheers, Andreas
>
>
>
>
>