2015-05-22 11:29:01

by Jan Kara

[permalink] [raw]
Subject: [PATCH 0/4] e2fsprogs: Support for orphan file feature

Hello,

this is support orphan file feature in e2fsprogs. mke2fs and tune2fs support
should be fine, e2fsck support still has bugs so use with care. I'm posting
this mainly so that people can easily create filesystem with orphan file
feature for testing the kernel patches.

Honza


2015-05-22 11:29:02

by Jan Kara

[permalink] [raw]
Subject: [PATCH 2/4] mke2fs: Add support for orphan_file feature

Signed-off-by: Jan Kara <[email protected]>
---
misc/mke2fs.8.in | 5 +++++
misc/mke2fs.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/misc/mke2fs.8.in b/misc/mke2fs.8.in
index aeb5caf6e869..173e19979bb2 100644
--- a/misc/mke2fs.8.in
+++ b/misc/mke2fs.8.in
@@ -359,6 +359,11 @@ filesystem to change based on the user running \fBmke2fs\fR.
Set a flag in the filesystem superblock indicating that it may be
mounted using experimental kernel code, such as the ext4dev filesystem.
.TP
+.BI orphan_file_size= size
+Set size of the file for tracking unlinked but still open inodes and inodes
+with truncate in progress. Larger file allows for better scalability, reserving
+a few blocks per cpu is ideal.
+.TP
.BI discard
Attempt to discard blocks at mkfs time (discarding blocks initially is useful
on solid state devices and sparse / thin-provisioned storage). When the device
diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index 6883103e33c6..380a719e5739 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -91,6 +91,7 @@ static uid_t root_uid;
static gid_t root_gid;
int journal_size;
int journal_flags;
+static __u64 orphan_file_size;
static int lazy_itable_init;
static int packed_meta_blocks;
static char *bad_blocks_filename = NULL;
@@ -1022,6 +1023,28 @@ static void parse_extended_opts(struct ext2_super_block *param,
r_usage++;
continue;
}
+ } else if (!strcmp(token, "orphan_file_size")) {
+ if (!arg) {
+ r_usage++;
+ badopt = token;
+ continue;
+ }
+ orphan_file_size = strtoul(arg, &p, 0);
+ if (*p) {
+ fprintf(stderr,
+ _("Invalid size of orphan file %s\n"),
+ arg);
+ r_usage++;
+ continue;
+ }
+ if (orphan_file_size < EXT4_MIN_ORPHAN_FILE_SIZE) {
+ fprintf(stderr,
+ _("Orphan file is too small. Minimum "
+ "size is %u\n"),
+ EXT4_MIN_ORPHAN_FILE_SIZE);
+ r_usage++;
+ continue;
+ }
} else {
r_usage++;
badopt = token;
@@ -1067,7 +1090,8 @@ static __u32 ok_features[3] = {
EXT2_FEATURE_COMPAT_RESIZE_INODE |
EXT2_FEATURE_COMPAT_DIR_INDEX |
EXT2_FEATURE_COMPAT_EXT_ATTR |
- EXT4_FEATURE_COMPAT_SPARSE_SUPER2,
+ EXT4_FEATURE_COMPAT_SPARSE_SUPER2 |
+ EXT4_FEATURE_COMPAT_ORPHAN_FILE,
/* Incompat */
EXT2_FEATURE_INCOMPAT_FILETYPE|
EXT3_FEATURE_INCOMPAT_EXTENTS|
@@ -3109,6 +3133,36 @@ no_journal:
if (EXT2_HAS_RO_COMPAT_FEATURE(&fs_param,
EXT4_FEATURE_RO_COMPAT_QUOTA))
create_quota_inodes(fs);
+ if (EXT2_HAS_COMPAT_FEATURE(&fs_param,
+ EXT4_FEATURE_COMPAT_ORPHAN_FILE)) {
+ e2_blkcnt_t orphan_file_blocks;
+
+ if (fs->super->s_first_ino <= EXT4_ORPHAN_INO) {
+ com_err(program_name, 0, _("inode %d has to be "
+ "reserved for orphan file feature"),
+ EXT4_ORPHAN_INO);
+ exit(1);
+ }
+ if (!EXT2_HAS_COMPAT_FEATURE(&fs_param,
+ EXT3_FEATURE_COMPAT_HAS_JOURNAL)) {
+ com_err(program_name, 0, _("cannot set orphan_file "
+ "flag without a journal."));
+ exit(1);
+ }
+ if (orphan_file_size) {
+ orphan_file_blocks = (orphan_file_size +
+ fs->blocksize - 1) / fs->blocksize;
+ } else {
+ orphan_file_blocks = ext2fs_default_orphan_file_blocks(
+ ext2fs_blocks_count(fs->super));
+ }
+ retval = ext2fs_create_orphan_file(fs, orphan_file_blocks);
+ if (retval) {
+ com_err(program_name, retval,
+ _("while creating orphan file"));
+ exit(1);
+ }
+ }

retval = mk_hugefiles(fs, device_name);
if (retval)
--
2.1.4


2015-05-22 11:29:02

by Jan Kara

[permalink] [raw]
Subject: [PATCH 4/4] tune2fs: Add support for orphan_file feature

Signed-off-by: Jan Kara <[email protected]>
---
misc/tune2fs.8.in | 5 ++++
misc/tune2fs.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 84 insertions(+), 2 deletions(-)

diff --git a/misc/tune2fs.8.in b/misc/tune2fs.8.in
index 9d1df8242baa..c2355da5a7aa 100644
--- a/misc/tune2fs.8.in
+++ b/misc/tune2fs.8.in
@@ -232,6 +232,11 @@ program.
This superblock setting is only honored in 2.6.35+ kernels;
and not at all by the ext2 and ext3 file system drivers.
.TP
+.BI orphan_file_size= size
+Set size of the file for tracking unlinked but still open inodes and inodes
+with truncate in progress. Larger file allows for better scalability, reserving
+a few blocks per cpu is ideal.
+.TP
.B test_fs
Set a flag in the filesystem superblock indicating that it may be
mounted using experimental kernel code, such as the ext4dev filesystem.
diff --git a/misc/tune2fs.c b/misc/tune2fs.c
index f930df2f6683..672efd5872f8 100644
--- a/misc/tune2fs.c
+++ b/misc/tune2fs.c
@@ -103,6 +103,7 @@ static int fsck_requested;
int journal_size, journal_flags;
char *journal_device;
static blk64_t journal_location = ~0LL;
+static e2_blkcnt_t orphan_file_blocks;

static struct list_head blk_move_list;

@@ -143,7 +144,8 @@ static void usage(void)
static __u32 ok_features[3] = {
/* Compat */
EXT3_FEATURE_COMPAT_HAS_JOURNAL |
- EXT2_FEATURE_COMPAT_DIR_INDEX,
+ EXT2_FEATURE_COMPAT_DIR_INDEX |
+ EXT4_FEATURE_COMPAT_ORPHAN_FILE,
/* Incompat */
EXT2_FEATURE_INCOMPAT_FILETYPE |
EXT3_FEATURE_INCOMPAT_EXTENTS |
@@ -169,7 +171,8 @@ static __u32 clear_ok_features[3] = {
/* Compat */
EXT3_FEATURE_COMPAT_HAS_JOURNAL |
EXT2_FEATURE_COMPAT_RESIZE_INODE |
- EXT2_FEATURE_COMPAT_DIR_INDEX,
+ EXT2_FEATURE_COMPAT_DIR_INDEX |
+ EXT4_FEATURE_COMPAT_ORPHAN_FILE,
/* Incompat */
EXT2_FEATURE_INCOMPAT_FILETYPE |
EXT4_FEATURE_INCOMPAT_FLEX_BG |
@@ -1025,6 +1028,44 @@ static int update_feature_set(ext2_filsys fs, char *features)
}
}

+ if (FEATURE_OFF(E2P_FEATURE_COMPAT, EXT4_FEATURE_COMPAT_ORPHAN_FILE)) {
+ if (mount_flags & EXT2_MF_MOUNTED) {
+ fputs(_("The orphan_file feature may only be cleared "
+ "when the filesystem is unmounted.\n"), stderr);
+ return 1;
+ }
+ if ((sb->s_feature_ro_compat &
+ EXT4_FEATURE_RO_COMPAT_ORPHAN_PRESENT) &&
+ f_flag < 2) {
+ fputs(_("The orphan_present flag is set. Please run "
+ "e2fsck before clearing orphan_file flag.\n"),
+ stderr);
+ return 1;
+ }
+ err = ext2fs_truncate_orphan_file(fs);
+ if (err) {
+ com_err(program_name, err,
+ _("\n\twhile trying to truncate orhan file\n"));
+ return 1;
+ }
+ }
+
+ if (FEATURE_ON(E2P_FEATURE_COMPAT, EXT4_FEATURE_COMPAT_ORPHAN_FILE)) {
+ if (!(sb->s_feature_compat & EXT3_FEATURE_COMPAT_HAS_JOURNAL)) {
+ fputs(_("orphan_file flag can be set only for "
+ "filesystems with journal.\n"), stderr);
+ return 1;
+ }
+ /*
+ * If adding an orphan file, let the create orphan file
+ * code below handle setting the flag and creating it.
+ * We supply a default size if necessary.
+ */
+ orphan_file_blocks = ext2fs_default_orphan_file_blocks(
+ ext2fs_blocks_count(fs->super));
+ sb->s_feature_compat &= ~EXT4_FEATURE_COMPAT_ORPHAN_FILE;
+ }
+
if (FEATURE_ON(E2P_FEATURE_RO_INCOMPAT,
EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER)) {
if (sb->s_feature_incompat &
@@ -1970,6 +2011,31 @@ static int parse_extended_opts(ext2_filsys fs, const char *opts)
continue;
}
ext_mount_opts = strdup(arg);
+ } else if (!strcmp(token, "orphan_file_size")) {
+ __u64 size;
+
+ if (!arg) {
+ r_usage++;
+ continue;
+ }
+ size = strtoul(arg, &p, 0);
+ if (*p) {
+ fprintf(stderr,
+ _("Invalid size of orphan file %s\n"),
+ arg);
+ r_usage++;
+ continue;
+ }
+ if (size < EXT4_MIN_ORPHAN_FILE_SIZE) {
+ fprintf(stderr,
+ _("Orphan file is too small. Minimum "
+ "size is %u\n"),
+ EXT4_MIN_ORPHAN_FILE_SIZE);
+ r_usage++;
+ continue;
+ }
+ orphan_file_blocks = (size + fs->blocksize - 1) /
+ fs->blocksize;
} else
r_usage++;
}
@@ -2921,6 +2987,17 @@ retry_open:
if (rc)
goto closefs;
}
+ if (orphan_file_blocks) {
+ errcode_t err;
+
+ err = ext2fs_create_orphan_file(fs, orphan_file_blocks);
+ if (err) {
+ com_err(program_name, err, "%s",
+ _("while creating orphan file"));
+ rc = 1;
+ goto closefs;
+ }
+ }

if (Q_flag) {
if (mount_flags & EXT2_MF_MOUNTED) {
--
2.1.4


2015-05-22 11:29:02

by Jan Kara

[permalink] [raw]
Subject: [PATCH 1/4] libext2fs: Support for orphan file feature

Add support for creating and deleting orphan file and a couple of
utility functions that will be used in other tools.

Signed-off-by: Jan Kara <[email protected]>
---
lib/e2p/feature.c | 4 +
lib/ext2fs/Makefile.in | 2 +
lib/ext2fs/ext2_fs.h | 11 +++
lib/ext2fs/ext2fs.h | 35 +++++++-
lib/ext2fs/orphan.c | 217 +++++++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 267 insertions(+), 2 deletions(-)
create mode 100644 lib/ext2fs/orphan.c

diff --git a/lib/e2p/feature.c b/lib/e2p/feature.c
index 73884f2cf5bf..a8e0d4a4644a 100644
--- a/lib/e2p/feature.c
+++ b/lib/e2p/feature.c
@@ -45,6 +45,8 @@ static struct feature feature_list[] = {
"snapshot_bitmap" },
{ E2P_FEATURE_COMPAT, EXT4_FEATURE_COMPAT_SPARSE_SUPER2,
"sparse_super2" },
+ { E2P_FEATURE_COMPAT, EXT4_FEATURE_COMPAT_ORPHAN_FILE,
+ "orphan_file" },

{ E2P_FEATURE_RO_INCOMPAT, EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER,
"sparse_super" },
@@ -70,6 +72,8 @@ static struct feature feature_list[] = {
"replica" },
{ E2P_FEATURE_RO_INCOMPAT, EXT4_FEATURE_RO_COMPAT_READONLY,
"read-only" },
+ { E2P_FEATURE_RO_INCOMPAT, EXT4_FEATURE_RO_COMPAT_ORPHAN_PRESENT,
+ "orphan_file_used" },

{ E2P_FEATURE_INCOMPAT, EXT2_FEATURE_INCOMPAT_COMPRESSION,
"compression" },
diff --git a/lib/ext2fs/Makefile.in b/lib/ext2fs/Makefile.in
index 8a7f8ca52902..67120b10438c 100644
--- a/lib/ext2fs/Makefile.in
+++ b/lib/ext2fs/Makefile.in
@@ -109,6 +109,7 @@ OBJS= $(DEBUGFS_LIB_OBJS) $(RESIZE_LIB_OBJS) $(E2IMAGE_LIB_OBJS) \
native.o \
newdir.o \
openfs.o \
+ orphan.o \
progress.o \
punch.o \
qcow2.o \
@@ -189,6 +190,7 @@ SRCS= ext2_err.c \
$(srcdir)/native.c \
$(srcdir)/newdir.c \
$(srcdir)/openfs.c \
+ $(srcdir)/orphan.c \
$(srcdir)/progress.c \
$(srcdir)/punch.c \
$(srcdir)/qcow2.c \
diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
index a755cfac8eae..a77c8fa09938 100644
--- a/lib/ext2fs/ext2_fs.h
+++ b/lib/ext2fs/ext2_fs.h
@@ -52,6 +52,7 @@
#define EXT2_JOURNAL_INO 8 /* Journal inode */
#define EXT2_EXCLUDE_INO 9 /* The "exclude" inode, for snapshots */
#define EXT4_REPLICA_INO 10 /* Used by non-upstream feature */
+#define EXT4_ORPHAN_INO 9 /* Inode with orphan entries */

/* First non-reserved inode for old ext2 filesystems */
#define EXT2_GOOD_OLD_FIRST_INO 11
@@ -769,6 +770,7 @@ struct ext2_super_block {
/* #define EXT2_FEATURE_COMPAT_EXCLUDE_INODE 0x0080 not used, legacy */
#define EXT2_FEATURE_COMPAT_EXCLUDE_BITMAP 0x0100
#define EXT4_FEATURE_COMPAT_SPARSE_SUPER2 0x0200
+#define EXT4_FEATURE_COMPAT_ORPHAN_FILE 0x0400


#define EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER 0x0001
@@ -789,6 +791,7 @@ struct ext2_super_block {
#define EXT4_FEATURE_RO_COMPAT_METADATA_CSUM 0x0400
#define EXT4_FEATURE_RO_COMPAT_REPLICA 0x0800
#define EXT4_FEATURE_RO_COMPAT_READONLY 0x1000
+#define EXT4_FEATURE_RO_COMPAT_ORPHAN_PRESENT 0x2000


#define EXT2_FEATURE_INCOMPAT_COMPRESSION 0x0001
@@ -838,6 +841,14 @@ struct ext2_super_block {
#define EXT4_DEFM_DISCARD 0x0400
#define EXT4_DEFM_NODELALLOC 0x0800

+#define EXT4_ORPHAN_BLOCK_MAGIC 0x0b10ca04
+
+/* Structure at the tail of orphan block */
+struct ext4_orphan_block_tail {
+ __u32 ob_magic;
+ __u32 ob_checksum;
+};
+
/*
* Structure of a directory entry
*/
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 28c46701da29..1e303d5d59ca 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -555,7 +555,8 @@ typedef struct ext2_icount *ext2_icount_t;
EXT2_FEATURE_COMPAT_RESIZE_INODE|\
EXT2_FEATURE_COMPAT_DIR_INDEX|\
EXT2_FEATURE_COMPAT_EXT_ATTR|\
- EXT4_FEATURE_COMPAT_SPARSE_SUPER2)
+ EXT4_FEATURE_COMPAT_SPARSE_SUPER2|\
+ EXT4_FEATURE_COMPAT_ORPHAN_FILE)

#ifdef CONFIG_MMP
#define EXT4_LIB_INCOMPAT_MMP EXT4_FEATURE_INCOMPAT_MMP
@@ -589,7 +590,8 @@ typedef struct ext2_icount *ext2_icount_t;
EXT4_FEATURE_RO_COMPAT_BIGALLOC|\
EXT4_LIB_RO_COMPAT_QUOTA|\
EXT4_FEATURE_RO_COMPAT_METADATA_CSUM|\
- EXT4_FEATURE_RO_COMPAT_READONLY)
+ EXT4_FEATURE_RO_COMPAT_READONLY|\
+ EXT4_FEATURE_RO_COMPAT_ORPHAN_PRESENT)

/*
* These features are only allowed if EXT2_FLAG_SOFTSUPP_FEATURES is passed
@@ -1512,6 +1514,19 @@ errcode_t ext2fs_get_data_io(ext2_filsys fs, io_channel *old_io);
errcode_t ext2fs_set_data_io(ext2_filsys fs, io_channel new_io);
errcode_t ext2fs_rewrite_to_io(ext2_filsys fs, io_channel new_io);

+/* orphan.c */
+/*
+ * Minimum orphan file size (it must be at least 1 block and smaller one isn't
+ * very useful).
+ */
+#define EXT4_MIN_ORPHAN_FILE_SIZE 16384
+
+extern errcode_t ext2fs_create_orphan_file(ext2_filsys fs, blk_t num_blocks);
+extern errcode_t ext2fs_truncate_orphan_file(ext2_filsys fs);
+extern e2_blkcnt_t ext2fs_default_orphan_file_blocks(__u64 num_blocks);
+extern errcode_t ext2fs_orphan_file_block_csum_set(ext2_filsys fs, char *buf);
+extern int ext2fs_orphan_file_block_csum_verify(ext2_filsys fs, char *buf);
+
/* get_pathname.c */
extern errcode_t ext2fs_get_pathname(ext2_filsys fs, ext2_ino_t dir, ext2_ino_t ino,
char **name);
@@ -1645,6 +1660,9 @@ extern int ext2fs_dirent_name_len(const struct ext2_dir_entry *entry);
extern void ext2fs_dirent_set_name_len(struct ext2_dir_entry *entry, int len);
extern int ext2fs_dirent_file_type(const struct ext2_dir_entry *entry);
extern void ext2fs_dirent_set_file_type(struct ext2_dir_entry *entry, int type);
+extern int ext2fs_inodes_per_orphan_block(ext2_filsys fs);
+extern struct ext4_orphan_block_tail *ext2fs_orphan_block_tail(ext2_filsys fs,
+ char *buf)

#endif

@@ -1915,6 +1933,19 @@ _INLINE_ void ext2fs_dirent_set_file_type(struct ext2_dir_entry *entry, int type
entry->name_len = (entry->name_len & 0xff) | (type << 8);
}

+_INLINE_ int ext2fs_inodes_per_orphan_block(ext2_filsys fs)
+{
+ return (fs->blocksize - sizeof(struct ext4_orphan_block_tail)) /
+ sizeof(__u32);
+}
+
+_INLINE_ struct ext4_orphan_block_tail *
+ext2fs_orphan_block_tail(ext2_filsys fs, char *buf)
+{
+ return (struct ext4_orphan_block_tail *)(buf + fs->blocksize -
+ sizeof(struct ext4_orphan_block_tail));
+}
+
#undef _INLINE_
#endif

diff --git a/lib/ext2fs/orphan.c b/lib/ext2fs/orphan.c
new file mode 100644
index 000000000000..1fd5c0688218
--- /dev/null
+++ b/lib/ext2fs/orphan.c
@@ -0,0 +1,217 @@
+/*
+ * orphan.c --- utility function to handle orphan file
+ *
+ * Copyright (C) 2015 Jan Kara.
+ *
+ * %Begin-Header%
+ * This file may be redistributed under the terms of the GNU Library
+ * General Public License, version 2.
+ * %End-Header%
+ */
+
+#include "config.h"
+#include <string.h>
+
+#include "ext2_fs.h"
+#include "ext2fsP.h"
+
+errcode_t ext2fs_truncate_orphan_file(ext2_filsys fs)
+{
+ struct ext2_inode inode;
+ errcode_t err;
+
+ err = ext2fs_read_inode(fs, EXT4_ORPHAN_INO, &inode);
+ if (err)
+ return err;
+
+ err = ext2fs_punch(fs, EXT4_ORPHAN_INO, &inode, NULL, 0, ~0ULL);
+ if (err)
+ return err;
+
+ fs->flags &= ~EXT2_FLAG_SUPER_ONLY;
+ memset(&inode, 0, sizeof(struct ext2_inode));
+ err = ext2fs_write_inode(fs, EXT4_ORPHAN_INO, &inode);
+
+ fs->super->s_feature_compat &= ~EXT4_FEATURE_COMPAT_ORPHAN_FILE;
+ ext2fs_mark_super_dirty(fs);
+
+ return err;
+}
+
+struct mkorphan_info {
+ char *buf;
+ char *zerobuf;
+ blk_t num_blocks;
+ blk_t alloc_blocks;
+ errcode_t err;
+};
+
+static int mkorphan_proc(ext2_filsys fs,
+ blk64_t *blocknr,
+ e2_blkcnt_t blockcnt,
+ blk64_t ref_block EXT2FS_ATTR((unused)),
+ int ref_offset EXT2FS_ATTR((unused)),
+ void *priv_data)
+{
+ struct mkorphan_info *oi = (struct mkorphan_info *)priv_data;
+ blk64_t new_blk;
+ errcode_t err;
+
+ err = ext2fs_new_block2(fs, 0, 0, &new_blk);
+ if (err) {
+ oi->err = err;
+ return BLOCK_ABORT;
+ }
+ ext2fs_block_alloc_stats2(fs, new_blk, +1);
+ if (blockcnt >= 0)
+ err = io_channel_write_blk64(fs->io, new_blk, 1, oi->buf);
+ else
+ err = io_channel_write_blk64(fs->io, new_blk, 1, oi->zerobuf);
+ if (err) {
+ oi->err = err;
+ return BLOCK_ABORT;
+ }
+ oi->alloc_blocks++;
+ *blocknr = new_blk;
+ if (blockcnt >= 0 && --oi->num_blocks == 0)
+ return BLOCK_CHANGED | BLOCK_ABORT;
+ return BLOCK_CHANGED;
+}
+
+errcode_t ext2fs_create_orphan_file(ext2_filsys fs, blk_t num_blocks)
+{
+ struct ext2_inode inode;
+ errcode_t err;
+ char *buf = NULL, *zerobuf = NULL;
+ struct mkorphan_info oi;
+ struct ext4_orphan_block_tail *ob_tail;
+
+ err = ext2fs_read_inode(fs, EXT4_ORPHAN_INO, &inode);
+ if (err)
+ return err;
+ if (EXT2_I_SIZE(&inode)) {
+ err = ext2fs_truncate_orphan_file(fs);
+ if (err)
+ return err;
+ }
+
+ memset(&inode, 0, sizeof(struct ext2_inode));
+ if (fs->super->s_feature_incompat & EXT3_FEATURE_INCOMPAT_EXTENTS) {
+ inode.i_flags |= EXT4_EXTENTS_FL;
+ err = ext2fs_write_inode(fs, EXT4_ORPHAN_INO, &inode);
+ if (err)
+ return err;
+ }
+
+ err = ext2fs_get_mem(fs->blocksize, &buf);
+ if (err)
+ return err;
+ err = ext2fs_get_mem(fs->blocksize, &zerobuf);
+ if (err)
+ goto out;
+ memset(buf, 0, fs->blocksize);
+ memset(zerobuf, 0, fs->blocksize);
+ ob_tail = ext2fs_orphan_block_tail(fs, buf);
+ ob_tail->ob_magic = ext2fs_cpu_to_le32(EXT4_ORPHAN_BLOCK_MAGIC);
+ ext2fs_orphan_file_block_csum_set(fs, buf);
+ oi.num_blocks = num_blocks;
+ oi.alloc_blocks = 0;
+ oi.buf = buf;
+ oi.zerobuf = zerobuf;
+ oi.err = 0;
+ err = ext2fs_block_iterate3(fs, EXT4_ORPHAN_INO, BLOCK_FLAG_APPEND,
+ 0, mkorphan_proc, &oi);
+ if (err)
+ goto out;
+
+ /* Reread inode after blocks were allocated */
+ err = ext2fs_read_inode(fs, EXT4_ORPHAN_INO, &inode);
+ if (err)
+ goto out;
+ ext2fs_iblk_set(fs, &inode, 0);
+ inode.i_atime = inode.i_mtime =
+ inode.i_ctime = fs->now ? fs->now : time(0);
+ inode.i_links_count = 1;
+ inode.i_mode = LINUX_S_IFREG | 0600;
+ ext2fs_iblk_add_blocks(fs, &inode, oi.alloc_blocks);
+ err = ext2fs_inode_size_set(fs, &inode,
+ (unsigned long long)fs->blocksize * num_blocks);
+ if (err)
+ goto out;
+ err = ext2fs_write_new_inode(fs, EXT4_ORPHAN_INO, &inode);
+ if (err)
+ goto out;
+
+ fs->super->s_feature_compat |= EXT4_FEATURE_COMPAT_ORPHAN_FILE;
+ ext2fs_mark_super_dirty(fs);
+out:
+ if (buf)
+ ext2fs_free_mem(&buf);
+ if (zerobuf)
+ ext2fs_free_mem(&zerobuf);
+ return err;
+}
+
+/*
+ * Find reasonable size for orphan file. We choose orphan file size to be
+ * between 32 and 512 filesystem blocks and not more than 1/4096 of the
+ * filesystem unless it is really small.
+ */
+e2_blkcnt_t ext2fs_default_orphan_file_blocks(__u64 num_blocks)
+{
+ if (num_blocks < 128 * 1024)
+ return 32;
+ if (num_blocks < 2 * 1024 * 1024)
+ return num_blocks / 4096;
+ return 512;
+}
+
+static errcode_t ext2fs_orphan_file_block_csum(ext2_filsys fs, char *buf,
+ __u32 *crc)
+{
+ int inodes_per_ob = ext2fs_inodes_per_orphan_block(fs);
+ __u32 gen;
+ ext2_ino_t inum;
+ struct ext2_inode inode;
+ errcode_t retval;
+
+ retval = ext2fs_read_inode(fs, EXT4_ORPHAN_INO, &inode);
+ if (retval)
+ return retval;
+ inum = ext2fs_cpu_to_le32(EXT4_ORPHAN_INO);
+ gen = ext2fs_cpu_to_le32(inode.i_generation);
+ *crc = ext2fs_crc32c_le(fs->csum_seed, (unsigned char *)&inum,
+ sizeof(inum));
+ *crc = ext2fs_crc32c_le(*crc, (unsigned char *)&gen, sizeof(gen));
+ *crc = ext2fs_crc32c_le(*crc, buf, inodes_per_ob * sizeof(__u32));
+
+ return 0;
+}
+
+errcode_t ext2fs_orphan_file_block_csum_set(ext2_filsys fs, char *buf)
+{
+ struct ext4_orphan_block_tail *tail;
+
+ if (!EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
+ EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ return 0;
+
+ tail = ext2fs_orphan_block_tail(fs, buf);
+ return ext2fs_orphan_file_block_csum(fs, buf, &tail->ob_checksum);
+}
+
+int ext2fs_orphan_file_block_csum_verify(ext2_filsys fs, char *buf)
+{
+ struct ext4_orphan_block_tail *tail;
+ __u32 crc;
+ errcode_t retval;
+
+ if (!EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
+ EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+ return 1;
+ retval = ext2fs_orphan_file_block_csum(fs, buf, &crc);
+ if (retval)
+ return 0;
+ tail = ext2fs_orphan_block_tail(fs, buf);
+ return ext2fs_le32_to_cpu(tail->ob_checksum) == crc;
+}
--
2.1.4


2015-05-22 11:29:02

by Jan Kara

[permalink] [raw]
Subject: [PATCH 3/4] e2fsck: Add support for handling orphan file

Signed-off-by: Jan Kara <[email protected]>
---
e2fsck/pass1.c | 27 ++++++
e2fsck/problem.c | 55 +++++++++++
e2fsck/problem.h | 36 +++++++
e2fsck/super.c | 288 ++++++++++++++++++++++++++++++++++++++++++++++++-------
e2fsck/unix.c | 60 ++++++++++++
5 files changed, 429 insertions(+), 37 deletions(-)

diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index b007f6522ee3..207778113846 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -1466,6 +1466,33 @@ void e2fsck_pass1(e2fsck_t ctx)
inode_size, "pass1");
failed_csum = 0;
}
+ } else if (ino == EXT4_ORPHAN_INO) {
+ ext2fs_mark_inode_bitmap2(ctx->inode_used_map, ino);
+ if (fs->super->s_feature_compat &
+ EXT4_FEATURE_COMPAT_ORPHAN_FILE) {
+ if (!LINUX_S_ISREG(inode->i_mode) &&
+ fix_problem(ctx, PR_1_ORPHAN_FILE_BAD_MODE,
+ &pctx)) {
+ inode->i_mode = LINUX_S_IFREG;
+ e2fsck_write_inode(ctx, ino, inode,
+ "pass1");
+ failed_csum = 0;
+ }
+ check_blocks(ctx, &pctx, block_buf);
+ FINISH_INODE_LOOP(ctx, ino, &pctx, failed_csum);
+ continue;
+ }
+ if ((inode->i_links_count ||
+ inode->i_blocks || inode->i_block[0]) &&
+ fix_problem(ctx, PR_1_ORPHAN_FILE_NOT_CLEAR,
+ &pctx)) {
+ memset(inode, 0, inode_size);
+ ext2fs_icount_store(ctx->inode_link_info, ino,
+ 0);
+ e2fsck_write_inode_full(ctx, ino, inode,
+ inode_size, "pass1");
+ failed_csum = 0;
+ }
} else if (ino < EXT2_FIRST_INODE(fs->super)) {
problem_t problem = 0;

diff --git a/e2fsck/problem.c b/e2fsck/problem.c
index 62fce25d97f9..15813deeeafa 100644
--- a/e2fsck/problem.c
+++ b/e2fsck/problem.c
@@ -464,6 +464,26 @@ static struct e2fsck_problem problem_table[] = {
N_("External @j @S checksum does not match @S. "),
PROMPT_FIX, PR_PREEN_OK },

+ /* Orphan file contains holes */
+ { PR_0_ORPHAN_FILE_HOLE,
+ N_("Orphan file (@i %i) contains hole.\n"),
+ PROMPT_NONE, 0 },
+
+ /* Orphan file block has wrong magic */
+ { PR_0_ORPHAN_FILE_BAD_MAGIC,
+ N_("Orphan file (@i %i) block %B contains wrong magic.\n"),
+ PROMPT_NONE, 0 },
+
+ /* Orphan file block has wrong magic */
+ { PR_0_ORPHAN_FILE_BAD_CHECKSUM,
+ N_("Orphan file (@i %i) block %B contains wrong checksum.\n"),
+ PROMPT_NONE, 0 },
+
+ /* orphan_present set without orphan_file set */
+ { PR_0_ORPHAN_PRESENT_WITHOUT_FILE,
+ N_("@S orphan_present flag set without orphan_file flag.\n"),
+ PROMPT_NONE, 0 },
+
/* Pass 1 errors */

/* Pass 1: Checking inodes, blocks, and sizes */
@@ -1101,6 +1121,16 @@ static struct e2fsck_problem problem_table[] = {
N_("@A memory for encrypted @d list\n"),
PROMPT_NONE, PR_FATAL },

+ /* Orphan file has bad mode */
+ { PR_1_ORPHAN_FILE_BAD_MODE,
+ N_("Orphan file @i %i is not regular file. "),
+ PROMPT_CLEAR, PR_PREEN_OK },
+
+ /* Orphan file inode is not in use, but contains data */
+ { PR_1_ORPHAN_FILE_NOT_CLEAR,
+ N_("Orphan file @i %i is not in use, but contains data. "),
+ PROMPT_CLEAR, PR_PREEN_OK },
+
/* Pass 1b errors */

/* Pass 1B: Rescan for duplicate/bad blocks */
@@ -1926,6 +1956,31 @@ static struct e2fsck_problem problem_table[] = {
N_("Error flushing writes to storage device: %m\n"),
PROMPT_NULL, PR_FATAL },

+ /* Orphan file without a journal */
+ { PR_6_ORPHAN_FILE_WITHOUT_JOURNAL,
+ N_("@S has orphan file without a journal.\n"),
+ PROMPT_CLEAR, PR_PREEN_OK },
+
+ /* Orphan file truncation failed */
+ { PR_6_ORPHAN_FILE_TRUNC_FAILED,
+ N_("Failed to truncate orphan file.\n"),
+ PROMPT_NONE, 0 },
+
+ /* Failed to initialize orphan file */
+ { PR_6_ORPHAN_FILE_CORRUPTED,
+ N_("Failed to initialize orphan file.\n"),
+ PROMPT_RECREATE, PR_PREEN_OK },
+
+ /* Cannot fix corrupted orphan file with invalid bitmaps */
+ { PR_6_ORPHAN_FILE_BITMAP_INVALID,
+ N_("Cannot fix corrupted orphan file with invalid bitmaps.\n"),
+ PROMPT_NONE, 0 },
+
+ /* Orphan file creation failed */
+ { PR_6_ORPHAN_FILE_CREATE_FAILED,
+ N_("Failed to truncate orphan file.\n"),
+ PROMPT_NONE, 0 },
+
{ 0 }
};

diff --git a/e2fsck/problem.h b/e2fsck/problem.h
index bc959c483199..3f9cae9cc261 100644
--- a/e2fsck/problem.h
+++ b/e2fsck/problem.h
@@ -249,6 +249,18 @@ struct problem_context {
/* Checking group descriptor failed */
#define PR_0_CHECK_DESC_FAILED 0x000045

+/* Orphan file contains holes */
+#define PR_0_ORPHAN_FILE_HOLE 0x000046
+
+/* Orphan file block has wrong magic */
+#define PR_0_ORPHAN_FILE_BAD_MAGIC 0x000047
+
+/* Orphan file block has wrong checksum */
+#define PR_0_ORPHAN_FILE_BAD_CHECKSUM 0x000048
+
+/* orphan_present set without orphan_file set */
+#define PR_0_ORPHAN_PRESENT_WITHOUT_FILE 0x000049
+
/*
* metadata_csum supersedes uninit_bg; both feature bits cannot be set
* simultaneously.
@@ -644,6 +656,15 @@ struct problem_context {
/* Error allocating memory for encrypted directory list */
#define PR_1_ALLOCATE_ENCRYPTED_DIRLIST 0x01007E

+/* Orphan file inode is not a regular file */
+#define PR_1_ORPHAN_FILE_BAD_MODE 0x01007F
+
+/* Orphan file inode is not in use, but contains data */
+#define PR_1_ORPHAN_FILE_NOT_CLEAR 0x010080
+
+/* Orphan file inode is not clear */
+#define PR_1_ORPHAN_INODE_NOT_CLEAR 0x01007F
+
/*
* Pass 1b errors
*/
@@ -1161,6 +1182,21 @@ struct problem_context {
/* Error flushing writes to storage device */
#define PR_6_IO_FLUSH 0x060005

+/* Orphan file without a journal */
+#define PR_6_ORPHAN_FILE_WITHOUT_JOURNAL 0x060006
+
+/* Orphan file truncation failed */
+#define PR_6_ORPHAN_FILE_TRUNC_FAILED 0x060007
+
+/* Failed to initialize orphan file */
+#define PR_6_ORPHAN_FILE_CORRUPTED 0x060008
+
+/* Cannot fix corrupted orphan file with invalid bitmaps */
+#define PR_6_ORPHAN_FILE_BITMAP_INVALID 0x060009
+
+/* Orphan file creation failed */
+#define PR_6_ORPHAN_FILE_CREATE_FAILED 0x06000A
+
/*
* Function declarations
*/
diff --git a/e2fsck/super.c b/e2fsck/super.c
index 9eebd4da4aa6..95841329b5bf 100644
--- a/e2fsck/super.c
+++ b/e2fsck/super.c
@@ -224,6 +224,153 @@ static int release_inode_blocks(e2fsck_t ctx, ext2_ino_t ino,
return 0;
}

+static int release_orphan_inode(e2fsck_t ctx, ext2_ino_t *ino, char *block_buf)
+{
+ ext2_filsys fs = ctx->fs;
+ struct problem_context pctx;
+ struct ext2_inode inode;
+ ext2_ino_t next_ino;
+
+ e2fsck_read_inode(ctx, *ino, &inode, "release_orphan_inode");
+ clear_problem_context(&pctx);
+ pctx.ino = *ino;
+ pctx.inode = &inode;
+ pctx.str = inode.i_links_count ? _("Truncating") : _("Clearing");
+
+ fix_problem(ctx, PR_0_ORPHAN_CLEAR_INODE, &pctx);
+
+ next_ino = inode.i_dtime;
+ if (next_ino &&
+ ((next_ino < EXT2_FIRST_INODE(fs->super)) ||
+ (next_ino > fs->super->s_inodes_count))) {
+ pctx.ino = next_ino;
+ fix_problem(ctx, PR_0_ORPHAN_ILLEGAL_INODE, &pctx);
+ return 1;
+ }
+ if (release_inode_blocks(ctx, *ino, &inode, block_buf, &pctx))
+ return 1;
+
+ if (!inode.i_links_count) {
+ ext2fs_inode_alloc_stats2(fs, *ino, -1,
+ LINUX_S_ISDIR(inode.i_mode));
+ ctx->free_inodes++;
+ inode.i_dtime = ctx->now;
+ } else {
+ inode.i_dtime = 0;
+ }
+ e2fsck_write_inode(ctx, *ino, &inode, "delete_file");
+ *ino = next_ino;
+
+ return 0;
+}
+
+struct process_orphan_block_data {
+ e2fsck_t ctx;
+ char *buf;
+ char *block_buf;
+ int abort;
+ errcode_t errcode;
+};
+
+static int process_orphan_block(ext2_filsys fs,
+ blk64_t *block_nr,
+ e2_blkcnt_t blockcnt,
+ blk64_t ref_blk EXT2FS_ATTR((unused)),
+ int ref_offset EXT2FS_ATTR((unused)),
+ void *priv_data)
+{
+ struct process_orphan_block_data *pd;
+ e2fsck_t ctx;
+ struct problem_context pctx;
+ blk64_t blk = *block_nr;
+ e2_blkcnt_t i;
+ struct ext4_orphan_block_tail *tail;
+ int j;
+ int inodes_per_ob;
+ __u32 *bdata;
+ ext2_ino_t ino;
+
+ pd = priv_data;
+ ctx = pd->ctx;
+ clear_problem_context(&pctx);
+ pctx.ino = EXT4_ORPHAN_INO;
+
+ /* Orphan file must not have holes */
+ if (!blk) {
+ fix_problem(ctx, PR_0_ORPHAN_FILE_HOLE, &pctx);
+return_abort:
+ pd->abort = 1;
+ return BLOCK_ABORT;
+ }
+ inodes_per_ob = ext2fs_inodes_per_orphan_block(fs);
+ for (i = 0; i < blockcnt; i++) {
+ pctx.blk = blk + i;
+ pd->errcode = io_channel_read_blk64(fs->io, blk + i, 1, pd->buf);
+ if (pd->errcode)
+ goto return_abort;
+ tail = ext2fs_orphan_block_tail(fs, pd->buf);
+ if (ext2fs_le32_to_cpu(tail->ob_magic) !=
+ EXT4_ORPHAN_BLOCK_MAGIC) {
+ fix_problem(ctx, PR_0_ORPHAN_FILE_BAD_MAGIC, &pctx);
+ goto return_abort;
+ }
+ if (!ext2fs_orphan_file_block_csum_verify(fs, pd->buf)) {
+ fix_problem(ctx, PR_0_ORPHAN_FILE_BAD_CHECKSUM, &pctx);
+ goto return_abort;
+ }
+ bdata = (__u32 *)pd->buf;
+ for (j = 0; j < inodes_per_ob; j++) {
+ if (!bdata[j])
+ continue;
+ ino = ext2fs_le32_to_cpu(bdata[j]);
+ if (release_orphan_inode(ctx, &ino, pd->block_buf))
+ goto return_abort;
+ }
+ }
+ return 0;
+}
+
+static int process_orphan_file(e2fsck_t ctx, char *block_buf)
+{
+ ext2_filsys fs = ctx->fs;
+ char *orphan_buf;
+ struct process_orphan_block_data pd;
+ int ret = 0;
+ errcode_t retval;
+
+ if (!(fs->super->s_feature_compat & EXT4_FEATURE_COMPAT_ORPHAN_FILE))
+ return 0;
+
+ orphan_buf = (char *) e2fsck_allocate_memory(ctx, fs->blocksize * 4,
+ "orphan block buffer");
+ pd.buf = orphan_buf + 3 * fs->blocksize;
+ pd.block_buf = block_buf;
+ pd.ctx = ctx;
+ pd.abort = 0;
+ pd.errcode = 0;
+ retval = ext2fs_block_iterate3(fs, EXT4_ORPHAN_INO,
+ BLOCK_FLAG_DATA_ONLY | BLOCK_FLAG_HOLE,
+ orphan_buf, process_orphan_block, &pd);
+ if (retval) {
+ com_err("process_orphan_block", retval,
+ _("while calling ext2fs_block_iterate for inode %d"),
+ EXT4_ORPHAN_INO);
+ ret = 1;
+ goto out;
+ }
+ if (pd.abort) {
+ if (pd.errcode) {
+ com_err("process_orphan_block", pd.errcode,
+ _("while reading blocks of inode %d"),
+ EXT4_ORPHAN_INO);
+ }
+ ret = 1;
+ }
+out:
+ ext2fs_free_mem(&orphan_buf);
+ return 0;
+}
+
/*
* This function releases all of the orphan inodes. It returns 1 if
* it hit some error, and 0 on success.
@@ -231,14 +378,16 @@ static int release_inode_blocks(e2fsck_t ctx, ext2_ino_t ino,
static int release_orphan_inodes(e2fsck_t ctx)
{
ext2_filsys fs = ctx->fs;
- ext2_ino_t ino, next_ino;
- struct ext2_inode inode;
+ ext2_ino_t ino;
struct problem_context pctx;
char *block_buf;

- if ((ino = fs->super->s_last_orphan) == 0)
+ if (fs->super->s_last_orphan == 0 &&
+ !(fs->super->s_feature_ro_compat &
+ EXT4_FEATURE_RO_COMPAT_ORPHAN_PRESENT))
return 0;

+ ino = fs->super->s_last_orphan;
/*
* Win or lose, we won't be using the head of the orphan inode
* list again.
@@ -247,15 +396,16 @@ static int release_orphan_inodes(e2fsck_t ctx)
ext2fs_mark_super_dirty(fs);

/*
- * If the filesystem contains errors, don't run the orphan
- * list, since the orphan list can't be trusted; and we're
- * going to be running a full e2fsck run anyway...
+ * If the filesystem contains errors, don't process the orphan list
+ * or orphan file, since neither can be trusted; and we're going to
+ * be running a full e2fsck run anyway... We clear orphan file contents
+ * after filesystem is checked to avoid clearing someone else's data.
*/
if (fs->super->s_state & EXT2_ERROR_FS)
return 0;

- if ((ino < EXT2_FIRST_INODE(fs->super)) ||
- (ino > fs->super->s_inodes_count)) {
+ if (ino && ((ino < EXT2_FIRST_INODE(fs->super)) ||
+ (ino > fs->super->s_inodes_count))) {
clear_problem_context(&pctx);
pctx.ino = ino;
fix_problem(ctx, PR_0_ORPHAN_ILLEGAL_HEAD_INODE, &pctx);
@@ -266,39 +416,23 @@ static int release_orphan_inodes(e2fsck_t ctx)
"block iterate buffer");
e2fsck_read_bitmaps(ctx);

+ /* First process orphan list */
while (ino) {
- e2fsck_read_inode(ctx, ino, &inode, "release_orphan_inodes");
- clear_problem_context(&pctx);
- pctx.ino = ino;
- pctx.inode = &inode;
- pctx.str = inode.i_links_count ? _("Truncating") :
- _("Clearing");
-
- fix_problem(ctx, PR_0_ORPHAN_CLEAR_INODE, &pctx);
-
- next_ino = inode.i_dtime;
- if (next_ino &&
- ((next_ino < EXT2_FIRST_INODE(fs->super)) ||
- (next_ino > fs->super->s_inodes_count))) {
- pctx.ino = next_ino;
- fix_problem(ctx, PR_0_ORPHAN_ILLEGAL_INODE, &pctx);
- goto return_abort;
- }
-
- if (release_inode_blocks(ctx, ino, &inode, block_buf, &pctx))
+ if (release_orphan_inode(ctx, &ino, block_buf))
goto return_abort;
+ }

- if (!inode.i_links_count) {
- ext2fs_inode_alloc_stats2(fs, ino, -1,
- LINUX_S_ISDIR(inode.i_mode));
- ctx->free_inodes++;
- inode.i_dtime = ctx->now;
- } else {
- inode.i_dtime = 0;
- }
- e2fsck_write_inode(ctx, ino, &inode, "delete_file");
- ino = next_ino;
+ /* Next process orphan file */
+ if (fs->super->s_feature_ro_compat &
+ EXT4_FEATURE_RO_COMPAT_ORPHAN_PRESENT &&
+ !(fs->super->s_feature_compat & EXT4_FEATURE_COMPAT_ORPHAN_FILE)) {
+ fix_problem(ctx, PR_0_ORPHAN_PRESENT_WITHOUT_FILE, &pctx);
+ goto return_abort;
}
+ if (process_orphan_file(ctx, block_buf))
+ goto return_abort;
+
+out:
ext2fs_free_mem(&block_buf);
return 0;
return_abort:
@@ -306,6 +440,86 @@ return_abort:
return 1;
}

+static int reinit_orphan_block(ext2_filsys fs,
+ blk64_t *block_nr,
+ e2_blkcnt_t blockcnt,
+ blk64_t ref_blk EXT2FS_ATTR((unused)),
+ int ref_offset EXT2FS_ATTR((unused)),
+ void *priv_data)
+{
+ struct process_orphan_block_data *pd;
+ e2fsck_t ctx;
+ blk64_t blk = *block_nr;
+ struct ext4_orphan_block_tail *tail;
+ int inodes_per_ob;
+ __u32 *bdata;
+ e2_blkcnt_t i;
+
+ pd = priv_data;
+ ctx = pd->ctx;
+
+ /* Orphan file must not have holes */
+ if (!blk) {
+return_abort:
+ pd->abort = 1;
+ return BLOCK_ABORT;
+ }
+ memset(pd->buf, 0, fs->blocksize);
+ tail = ext2fs_orphan_block_tail(fs, pd->buf);
+ tail->ob_magic = ext2fs_cpu_to_le32(EXT4_ORPHAN_BLOCK_MAGIC);
+ ext2fs_orphan_file_block_csum_set(fs, pd->buf);
+ inodes_per_ob = ext2fs_inodes_per_orphan_block(fs);
+ for (i = 0; i < blockcnt; i++) {
+ pd->errcode = io_channel_write_blk64(fs->io, blk + i, 1,
+ pd->buf);
+ if (pd->errcode)
+ goto return_abort;
+ }
+ return 0;
+}
+
+/*
+ * Check and clear orphan file. We just return non-zero if we hit some
+ * inconsistency. Caller will truncate & recreate new orphan file.
+ */
+int check_init_orphan_file(e2fsck_t ctx)
+{
+ ext2_filsys fs = ctx->fs;
+ char *orphan_buf;
+ struct process_orphan_block_data pd;
+ int ret = 0;
+ errcode_t retval;
+
+ orphan_buf = (char *) e2fsck_allocate_memory(ctx, fs->blocksize * 4,
+ "orphan block buffer");
+ pd.buf = orphan_buf + 3 * fs->blocksize;
+ pd.block_buf = NULL;
+ pd.ctx = ctx;
+ pd.abort = 0;
+ pd.errcode = 0;
+ retval = ext2fs_block_iterate3(fs, EXT4_ORPHAN_INO,
+ BLOCK_FLAG_DATA_ONLY | BLOCK_FLAG_HOLE,
+ orphan_buf, reinit_orphan_block, &pd);
+ if (retval) {
+ com_err("reinit_orphan_block", retval,
+ _("while calling ext2fs_block_iterate for inode %d"),
+ EXT4_ORPHAN_INO);
+ ret = 1;
+ goto out;
+ }
+ if (pd.abort) {
+ if (pd.errcode) {
+ com_err("process_orphan_block", pd.errcode,
+ _("while reading blocks of inode %d"),
+ EXT4_ORPHAN_INO);
+ }
+ ret = 1;
+ }
+out:
+ ext2fs_free_mem(&orphan_buf);
+ return 0;
+}
+
/*
* Check the resize inode to make sure it is sane. We check both for
* the case where on-line resizing is not enabled (in which case the
diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index 3db34802c831..7e312d1c6369 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -1671,8 +1671,68 @@ print_unsupp_features:
_("\n*** journal has been regenerated ***\n"));
}
}
+
no_journal:
+ /* Orphan file will be clear - clear the feature unconditionally */
+ if (!(ctx->options & E2F_OPT_READONLY)) {
+ fs->super->s_feature_ro_compat &=
+ ~EXT4_FEATURE_RO_COMPAT_ORPHAN_PRESENT;
+ ext2fs_mark_super_dirty(fs);
+ }
+
+ /* If there isn't journal, no point in creating orphan file... */
+ if (fs->super->s_feature_compat & EXT4_FEATURE_COMPAT_ORPHAN_FILE) {
+ /* No point in orphan file without a journal... */
+ if (!(fs->super->s_feature_compat &
+ EXT3_FEATURE_COMPAT_HAS_JOURNAL) &&
+ fix_problem(ctx, PR_6_ORPHAN_FILE_WITHOUT_JOURNAL, &pctx)) {
+ retval = ext2fs_truncate_orphan_file(fs);
+ if (retval) {
+ /* Huh, failed to truncate file */
+ fix_problem(ctx, PR_6_ORPHAN_FILE_TRUNC_FAILED,
+ &pctx);
+ goto check_quotas;
+ }
+ goto check_quotas;
+ }
+ if (check_init_orphan_file(ctx) &&
+ fix_problem(ctx, PR_6_ORPHAN_FILE_CORRUPTED, &pctx)) {
+ int orphan_file_blocks;
+
+ if (ctx->invalid_bitmaps) {
+ fix_problem(ctx,
+ PR_6_ORPHAN_FILE_BITMAP_INVALID,
+ &pctx);
+ goto check_quotas;
+ }
+
+ retval = ext2fs_truncate_orphan_file(fs);
+ if (retval) {
+ /* Huh, failed to truncate file */
+ fix_problem(ctx, PR_6_ORPHAN_FILE_TRUNC_FAILED,
+ &pctx);
+ goto check_quotas;
+ }
+
+ orphan_file_blocks = ext2fs_default_orphan_file_blocks(
+ ext2fs_blocks_count(fs->super));
+ log_out(ctx, _("Creating orphan file (%d blocks): "),
+ orphan_file_blocks);
+ fflush(stdout);
+ retval = ext2fs_create_orphan_file(fs,
+ orphan_file_blocks);
+ if (retval) {
+ log_out(ctx, "%s: while trying to create "
+ "orphan file\n", error_message(retval));
+ fix_problem(ctx, PR_6_ORPHAN_FILE_CREATE_FAILED,
+ &pctx);
+ goto check_quotas;
+ }
+ log_out(ctx, "%s", _(" Done.\n"));
+ }
+ }

+check_quotas:
if (ctx->qctx) {
int i, needs_writeout;
for (i = 0; i < MAXQUOTAS; i++) {
--
2.1.4


2015-05-22 15:52:14

by Albino B Neto

[permalink] [raw]
Subject: Re: [PATCH 0/4] e2fsprogs: Support for orphan file feature

2015-05-22 8:28 GMT-03:00 Jan Kara <[email protected]>:
> this is support orphan file feature in e2fsprogs. mke2fs and tune2fs support
> should be fine, e2fsck support still has bugs so use with care. I'm posting
> this mainly so that people can easily create filesystem with orphan file
> feature for testing the kernel patches.

How ?

Albino

2015-05-22 17:18:48

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH 4/4] tune2fs: Add support for orphan_file feature

On Fri, May 22, 2015 at 01:28:57PM +0200, Jan Kara wrote:
> Signed-off-by: Jan Kara <[email protected]>
> ---
> misc/tune2fs.8.in | 5 ++++
> misc/tune2fs.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
> 2 files changed, 84 insertions(+), 2 deletions(-)
>
> diff --git a/misc/tune2fs.8.in b/misc/tune2fs.8.in
> index 9d1df8242baa..c2355da5a7aa 100644
> --- a/misc/tune2fs.8.in
> +++ b/misc/tune2fs.8.in
> @@ -232,6 +232,11 @@ program.
> This superblock setting is only honored in 2.6.35+ kernels;
> and not at all by the ext2 and ext3 file system drivers.
> .TP
> +.BI orphan_file_size= size
> +Set size of the file for tracking unlinked but still open inodes and inodes
> +with truncate in progress. Larger file allows for better scalability, reserving
> +a few blocks per cpu is ideal.
> +.TP
> .B test_fs
> Set a flag in the filesystem superblock indicating that it may be
> mounted using experimental kernel code, such as the ext4dev filesystem.
> diff --git a/misc/tune2fs.c b/misc/tune2fs.c
> index f930df2f6683..672efd5872f8 100644
> --- a/misc/tune2fs.c
> +++ b/misc/tune2fs.c
> @@ -103,6 +103,7 @@ static int fsck_requested;
> int journal_size, journal_flags;
> char *journal_device;
> static blk64_t journal_location = ~0LL;
> +static e2_blkcnt_t orphan_file_blocks;
>
> static struct list_head blk_move_list;
>
> @@ -143,7 +144,8 @@ static void usage(void)
> static __u32 ok_features[3] = {
> /* Compat */
> EXT3_FEATURE_COMPAT_HAS_JOURNAL |
> - EXT2_FEATURE_COMPAT_DIR_INDEX,
> + EXT2_FEATURE_COMPAT_DIR_INDEX |
> + EXT4_FEATURE_COMPAT_ORPHAN_FILE,
> /* Incompat */
> EXT2_FEATURE_INCOMPAT_FILETYPE |
> EXT3_FEATURE_INCOMPAT_EXTENTS |
> @@ -169,7 +171,8 @@ static __u32 clear_ok_features[3] = {
> /* Compat */
> EXT3_FEATURE_COMPAT_HAS_JOURNAL |
> EXT2_FEATURE_COMPAT_RESIZE_INODE |
> - EXT2_FEATURE_COMPAT_DIR_INDEX,
> + EXT2_FEATURE_COMPAT_DIR_INDEX |
> + EXT4_FEATURE_COMPAT_ORPHAN_FILE,
> /* Incompat */
> EXT2_FEATURE_INCOMPAT_FILETYPE |
> EXT4_FEATURE_INCOMPAT_FLEX_BG |
> @@ -1025,6 +1028,44 @@ static int update_feature_set(ext2_filsys fs, char *features)
> }
> }
>
> + if (FEATURE_OFF(E2P_FEATURE_COMPAT, EXT4_FEATURE_COMPAT_ORPHAN_FILE)) {
> + if (mount_flags & EXT2_MF_MOUNTED) {
> + fputs(_("The orphan_file feature may only be cleared "
> + "when the filesystem is unmounted.\n"), stderr);
> + return 1;
> + }
> + if ((sb->s_feature_ro_compat &
> + EXT4_FEATURE_RO_COMPAT_ORPHAN_PRESENT) &&
> + f_flag < 2) {
> + fputs(_("The orphan_present flag is set. Please run "
> + "e2fsck before clearing orphan_file flag.\n"),
> + stderr);
> + return 1;
> + }
> + err = ext2fs_truncate_orphan_file(fs);
> + if (err) {
> + com_err(program_name, err,
> + _("\n\twhile trying to truncate orhan file\n"));

"orphan"

> + return 1;
> + }
> + }
> +
> + if (FEATURE_ON(E2P_FEATURE_COMPAT, EXT4_FEATURE_COMPAT_ORPHAN_FILE)) {
> + if (!(sb->s_feature_compat & EXT3_FEATURE_COMPAT_HAS_JOURNAL)) {
> + fputs(_("orphan_file flag can be set only for "
> + "filesystems with journal.\n"), stderr);
> + return 1;
> + }
> + /*
> + * If adding an orphan file, let the create orphan file
> + * code below handle setting the flag and creating it.
> + * We supply a default size if necessary.
> + */
> + orphan_file_blocks = ext2fs_default_orphan_file_blocks(
> + ext2fs_blocks_count(fs->super));
> + sb->s_feature_compat &= ~EXT4_FEATURE_COMPAT_ORPHAN_FILE;
> + }
> +
> if (FEATURE_ON(E2P_FEATURE_RO_INCOMPAT,
> EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER)) {
> if (sb->s_feature_incompat &
> @@ -1970,6 +2011,31 @@ static int parse_extended_opts(ext2_filsys fs, const char *opts)
> continue;
> }
> ext_mount_opts = strdup(arg);
> + } else if (!strcmp(token, "orphan_file_size")) {
> + __u64 size;
> +
> + if (!arg) {
> + r_usage++;
> + continue;
> + }
> + size = strtoul(arg, &p, 0);

Would be nice if I could supply units, e.g. orphan_file_size = 128K here.


> + if (*p) {
> + fprintf(stderr,
> + _("Invalid size of orphan file %s\n"),
> + arg);
> + r_usage++;
> + continue;
> + }
> + if (size < EXT4_MIN_ORPHAN_FILE_SIZE) {
> + fprintf(stderr,
> + _("Orphan file is too small. Minimum "
> + "size is %u\n"),
> + EXT4_MIN_ORPHAN_FILE_SIZE);
> + r_usage++;
> + continue;
> + }
> + orphan_file_blocks = (size + fs->blocksize - 1) /
> + fs->blocksize;
> } else
> r_usage++;
> }
> @@ -2921,6 +2987,17 @@ retry_open:
> if (rc)
> goto closefs;
> }
> + if (orphan_file_blocks) {

If someone specifies -E orphan_file_size=NNN -O ^orphan_file, does this have
the effect of erasing and recreating the orphan file?

--D

> + errcode_t err;
> +
> + err = ext2fs_create_orphan_file(fs, orphan_file_blocks);
> + if (err) {
> + com_err(program_name, err, "%s",
> + _("while creating orphan file"));
> + rc = 1;
> + goto closefs;
> + }
> + }
>
> if (Q_flag) {
> if (mount_flags & EXT2_MF_MOUNTED) {
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2015-05-22 17:35:27

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH 1/4] libext2fs: Support for orphan file feature

On Fri, May 22, 2015 at 01:28:54PM +0200, Jan Kara wrote:
> Add support for creating and deleting orphan file and a couple of
> utility functions that will be used in other tools.
>
> Signed-off-by: Jan Kara <[email protected]>
> ---
> lib/e2p/feature.c | 4 +
> lib/ext2fs/Makefile.in | 2 +
> lib/ext2fs/ext2_fs.h | 11 +++
> lib/ext2fs/ext2fs.h | 35 +++++++-
> lib/ext2fs/orphan.c | 217 +++++++++++++++++++++++++++++++++++++++++++++++++
> 5 files changed, 267 insertions(+), 2 deletions(-)
> create mode 100644 lib/ext2fs/orphan.c
>
> diff --git a/lib/e2p/feature.c b/lib/e2p/feature.c
> index 73884f2cf5bf..a8e0d4a4644a 100644
> --- a/lib/e2p/feature.c
> +++ b/lib/e2p/feature.c
> @@ -45,6 +45,8 @@ static struct feature feature_list[] = {
> "snapshot_bitmap" },
> { E2P_FEATURE_COMPAT, EXT4_FEATURE_COMPAT_SPARSE_SUPER2,
> "sparse_super2" },
> + { E2P_FEATURE_COMPAT, EXT4_FEATURE_COMPAT_ORPHAN_FILE,
> + "orphan_file" },
>
> { E2P_FEATURE_RO_INCOMPAT, EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER,
> "sparse_super" },
> @@ -70,6 +72,8 @@ static struct feature feature_list[] = {
> "replica" },
> { E2P_FEATURE_RO_INCOMPAT, EXT4_FEATURE_RO_COMPAT_READONLY,
> "read-only" },
> + { E2P_FEATURE_RO_INCOMPAT, EXT4_FEATURE_RO_COMPAT_ORPHAN_PRESENT,
> + "orphan_file_used" },
>
> { E2P_FEATURE_INCOMPAT, EXT2_FEATURE_INCOMPAT_COMPRESSION,
> "compression" },
> diff --git a/lib/ext2fs/Makefile.in b/lib/ext2fs/Makefile.in
> index 8a7f8ca52902..67120b10438c 100644
> --- a/lib/ext2fs/Makefile.in
> +++ b/lib/ext2fs/Makefile.in
> @@ -109,6 +109,7 @@ OBJS= $(DEBUGFS_LIB_OBJS) $(RESIZE_LIB_OBJS) $(E2IMAGE_LIB_OBJS) \
> native.o \
> newdir.o \
> openfs.o \
> + orphan.o \
> progress.o \
> punch.o \
> qcow2.o \
> @@ -189,6 +190,7 @@ SRCS= ext2_err.c \
> $(srcdir)/native.c \
> $(srcdir)/newdir.c \
> $(srcdir)/openfs.c \
> + $(srcdir)/orphan.c \
> $(srcdir)/progress.c \
> $(srcdir)/punch.c \
> $(srcdir)/qcow2.c \
> diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
> index a755cfac8eae..a77c8fa09938 100644
> --- a/lib/ext2fs/ext2_fs.h
> +++ b/lib/ext2fs/ext2_fs.h
> @@ -52,6 +52,7 @@
> #define EXT2_JOURNAL_INO 8 /* Journal inode */
> #define EXT2_EXCLUDE_INO 9 /* The "exclude" inode, for snapshots */
> #define EXT4_REPLICA_INO 10 /* Used by non-upstream feature */
> +#define EXT4_ORPHAN_INO 9 /* Inode with orphan entries */
>
> /* First non-reserved inode for old ext2 filesystems */
> #define EXT2_GOOD_OLD_FIRST_INO 11
> @@ -769,6 +770,7 @@ struct ext2_super_block {
> /* #define EXT2_FEATURE_COMPAT_EXCLUDE_INODE 0x0080 not used, legacy */
> #define EXT2_FEATURE_COMPAT_EXCLUDE_BITMAP 0x0100
> #define EXT4_FEATURE_COMPAT_SPARSE_SUPER2 0x0200
> +#define EXT4_FEATURE_COMPAT_ORPHAN_FILE 0x0400
>
>
> #define EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER 0x0001
> @@ -789,6 +791,7 @@ struct ext2_super_block {
> #define EXT4_FEATURE_RO_COMPAT_METADATA_CSUM 0x0400
> #define EXT4_FEATURE_RO_COMPAT_REPLICA 0x0800
> #define EXT4_FEATURE_RO_COMPAT_READONLY 0x1000
> +#define EXT4_FEATURE_RO_COMPAT_ORPHAN_PRESENT 0x2000
>
>
> #define EXT2_FEATURE_INCOMPAT_COMPRESSION 0x0001
> @@ -838,6 +841,14 @@ struct ext2_super_block {
> #define EXT4_DEFM_DISCARD 0x0400
> #define EXT4_DEFM_NODELALLOC 0x0800
>
> +#define EXT4_ORPHAN_BLOCK_MAGIC 0x0b10ca04
> +
> +/* Structure at the tail of orphan block */
> +struct ext4_orphan_block_tail {
> + __u32 ob_magic;
> + __u32 ob_checksum;
> +};
> +
> /*
> * Structure of a directory entry
> */
> diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
> index 28c46701da29..1e303d5d59ca 100644
> --- a/lib/ext2fs/ext2fs.h
> +++ b/lib/ext2fs/ext2fs.h
> @@ -555,7 +555,8 @@ typedef struct ext2_icount *ext2_icount_t;
> EXT2_FEATURE_COMPAT_RESIZE_INODE|\
> EXT2_FEATURE_COMPAT_DIR_INDEX|\
> EXT2_FEATURE_COMPAT_EXT_ATTR|\
> - EXT4_FEATURE_COMPAT_SPARSE_SUPER2)
> + EXT4_FEATURE_COMPAT_SPARSE_SUPER2|\
> + EXT4_FEATURE_COMPAT_ORPHAN_FILE)
>
> #ifdef CONFIG_MMP
> #define EXT4_LIB_INCOMPAT_MMP EXT4_FEATURE_INCOMPAT_MMP
> @@ -589,7 +590,8 @@ typedef struct ext2_icount *ext2_icount_t;
> EXT4_FEATURE_RO_COMPAT_BIGALLOC|\
> EXT4_LIB_RO_COMPAT_QUOTA|\
> EXT4_FEATURE_RO_COMPAT_METADATA_CSUM|\
> - EXT4_FEATURE_RO_COMPAT_READONLY)
> + EXT4_FEATURE_RO_COMPAT_READONLY|\
> + EXT4_FEATURE_RO_COMPAT_ORPHAN_PRESENT)
>
> /*
> * These features are only allowed if EXT2_FLAG_SOFTSUPP_FEATURES is passed
> @@ -1512,6 +1514,19 @@ errcode_t ext2fs_get_data_io(ext2_filsys fs, io_channel *old_io);
> errcode_t ext2fs_set_data_io(ext2_filsys fs, io_channel new_io);
> errcode_t ext2fs_rewrite_to_io(ext2_filsys fs, io_channel new_io);
>
> +/* orphan.c */
> +/*
> + * Minimum orphan file size (it must be at least 1 block and smaller one isn't
> + * very useful).
> + */
> +#define EXT4_MIN_ORPHAN_FILE_SIZE 16384

What about 64k block size? I guess it's fine not to use the whole block in
this (non-default) configuration.

> +
> +extern errcode_t ext2fs_create_orphan_file(ext2_filsys fs, blk_t num_blocks);
> +extern errcode_t ext2fs_truncate_orphan_file(ext2_filsys fs);
> +extern e2_blkcnt_t ext2fs_default_orphan_file_blocks(__u64 num_blocks);
> +extern errcode_t ext2fs_orphan_file_block_csum_set(ext2_filsys fs, char *buf);
> +extern int ext2fs_orphan_file_block_csum_verify(ext2_filsys fs, char *buf);
> +
> /* get_pathname.c */
> extern errcode_t ext2fs_get_pathname(ext2_filsys fs, ext2_ino_t dir, ext2_ino_t ino,
> char **name);
> @@ -1645,6 +1660,9 @@ extern int ext2fs_dirent_name_len(const struct ext2_dir_entry *entry);
> extern void ext2fs_dirent_set_name_len(struct ext2_dir_entry *entry, int len);
> extern int ext2fs_dirent_file_type(const struct ext2_dir_entry *entry);
> extern void ext2fs_dirent_set_file_type(struct ext2_dir_entry *entry, int type);
> +extern int ext2fs_inodes_per_orphan_block(ext2_filsys fs);
> +extern struct ext4_orphan_block_tail *ext2fs_orphan_block_tail(ext2_filsys fs,
> + char *buf)
>
> #endif
>
> @@ -1915,6 +1933,19 @@ _INLINE_ void ext2fs_dirent_set_file_type(struct ext2_dir_entry *entry, int type
> entry->name_len = (entry->name_len & 0xff) | (type << 8);
> }
>
> +_INLINE_ int ext2fs_inodes_per_orphan_block(ext2_filsys fs)
> +{
> + return (fs->blocksize - sizeof(struct ext4_orphan_block_tail)) /
> + sizeof(__u32);
> +}
> +
> +_INLINE_ struct ext4_orphan_block_tail *
> +ext2fs_orphan_block_tail(ext2_filsys fs, char *buf)
> +{
> + return (struct ext4_orphan_block_tail *)(buf + fs->blocksize -
> + sizeof(struct ext4_orphan_block_tail));
> +}
> +
> #undef _INLINE_
> #endif
>
> diff --git a/lib/ext2fs/orphan.c b/lib/ext2fs/orphan.c
> new file mode 100644
> index 000000000000..1fd5c0688218
> --- /dev/null
> +++ b/lib/ext2fs/orphan.c
> @@ -0,0 +1,217 @@
> +/*
> + * orphan.c --- utility function to handle orphan file
> + *
> + * Copyright (C) 2015 Jan Kara.
> + *
> + * %Begin-Header%
> + * This file may be redistributed under the terms of the GNU Library
> + * General Public License, version 2.
> + * %End-Header%
> + */
> +
> +#include "config.h"
> +#include <string.h>
> +
> +#include "ext2_fs.h"
> +#include "ext2fsP.h"
> +
> +errcode_t ext2fs_truncate_orphan_file(ext2_filsys fs)
> +{
> + struct ext2_inode inode;
> + errcode_t err;
> +
> + err = ext2fs_read_inode(fs, EXT4_ORPHAN_INO, &inode);
> + if (err)
> + return err;
> +
> + err = ext2fs_punch(fs, EXT4_ORPHAN_INO, &inode, NULL, 0, ~0ULL);
> + if (err)
> + return err;
> +
> + fs->flags &= ~EXT2_FLAG_SUPER_ONLY;
> + memset(&inode, 0, sizeof(struct ext2_inode));
> + err = ext2fs_write_inode(fs, EXT4_ORPHAN_INO, &inode);
> +
> + fs->super->s_feature_compat &= ~EXT4_FEATURE_COMPAT_ORPHAN_FILE;
> + ext2fs_mark_super_dirty(fs);
> +
> + return err;
> +}
> +
> +struct mkorphan_info {
> + char *buf;
> + char *zerobuf;
> + blk_t num_blocks;
> + blk_t alloc_blocks;
> + errcode_t err;
> +};
> +
> +static int mkorphan_proc(ext2_filsys fs,
> + blk64_t *blocknr,
> + e2_blkcnt_t blockcnt,
> + blk64_t ref_block EXT2FS_ATTR((unused)),
> + int ref_offset EXT2FS_ATTR((unused)),
> + void *priv_data)
> +{
> + struct mkorphan_info *oi = (struct mkorphan_info *)priv_data;
> + blk64_t new_blk;
> + errcode_t err;
> +
> + err = ext2fs_new_block2(fs, 0, 0, &new_blk);

Hm. I think this breaks the cluster allocation rules, since this allocates
a new block for every logical block inside a cluster.

Hopefully ext2fs_fallocate will land soon, then we can get rid of open-coding
file block allocation like this. You'll still have to have the iterate3 loop
to write out the appropriate block footer, but iirc the library call can be
told to allocate written extents without zeroing the blocks, precisely for
cases like these.

> + if (err) {
> + oi->err = err;
> + return BLOCK_ABORT;
> + }
> + ext2fs_block_alloc_stats2(fs, new_blk, +1);
> + if (blockcnt >= 0)
> + err = io_channel_write_blk64(fs->io, new_blk, 1, oi->buf);
> + else
> + err = io_channel_write_blk64(fs->io, new_blk, 1, oi->zerobuf);
> + if (err) {
> + oi->err = err;
> + return BLOCK_ABORT;
> + }
> + oi->alloc_blocks++;
> + *blocknr = new_blk;
> + if (blockcnt >= 0 && --oi->num_blocks == 0)
> + return BLOCK_CHANGED | BLOCK_ABORT;
> + return BLOCK_CHANGED;
> +}
> +
> +errcode_t ext2fs_create_orphan_file(ext2_filsys fs, blk_t num_blocks)
> +{
> + struct ext2_inode inode;
> + errcode_t err;
> + char *buf = NULL, *zerobuf = NULL;
> + struct mkorphan_info oi;
> + struct ext4_orphan_block_tail *ob_tail;
> +
> + err = ext2fs_read_inode(fs, EXT4_ORPHAN_INO, &inode);
> + if (err)
> + return err;
> + if (EXT2_I_SIZE(&inode)) {
> + err = ext2fs_truncate_orphan_file(fs);
> + if (err)
> + return err;
> + }
> +
> + memset(&inode, 0, sizeof(struct ext2_inode));
> + if (fs->super->s_feature_incompat & EXT3_FEATURE_INCOMPAT_EXTENTS) {
> + inode.i_flags |= EXT4_EXTENTS_FL;
> + err = ext2fs_write_inode(fs, EXT4_ORPHAN_INO, &inode);
> + if (err)
> + return err;
> + }
> +
> + err = ext2fs_get_mem(fs->blocksize, &buf);
> + if (err)
> + return err;
> + err = ext2fs_get_mem(fs->blocksize, &zerobuf);
> + if (err)
> + goto out;
> + memset(buf, 0, fs->blocksize);
> + memset(zerobuf, 0, fs->blocksize);
> + ob_tail = ext2fs_orphan_block_tail(fs, buf);
> + ob_tail->ob_magic = ext2fs_cpu_to_le32(EXT4_ORPHAN_BLOCK_MAGIC);
> + ext2fs_orphan_file_block_csum_set(fs, buf);
> + oi.num_blocks = num_blocks;
> + oi.alloc_blocks = 0;
> + oi.buf = buf;
> + oi.zerobuf = zerobuf;
> + oi.err = 0;
> + err = ext2fs_block_iterate3(fs, EXT4_ORPHAN_INO, BLOCK_FLAG_APPEND,
> + 0, mkorphan_proc, &oi);
> + if (err)
> + goto out;
> +
> + /* Reread inode after blocks were allocated */
> + err = ext2fs_read_inode(fs, EXT4_ORPHAN_INO, &inode);
> + if (err)
> + goto out;
> + ext2fs_iblk_set(fs, &inode, 0);
> + inode.i_atime = inode.i_mtime =
> + inode.i_ctime = fs->now ? fs->now : time(0);
> + inode.i_links_count = 1;
> + inode.i_mode = LINUX_S_IFREG | 0600;
> + ext2fs_iblk_add_blocks(fs, &inode, oi.alloc_blocks);
> + err = ext2fs_inode_size_set(fs, &inode,
> + (unsigned long long)fs->blocksize * num_blocks);
> + if (err)
> + goto out;
> + err = ext2fs_write_new_inode(fs, EXT4_ORPHAN_INO, &inode);
> + if (err)
> + goto out;
> +
> + fs->super->s_feature_compat |= EXT4_FEATURE_COMPAT_ORPHAN_FILE;
> + ext2fs_mark_super_dirty(fs);
> +out:
> + if (buf)
> + ext2fs_free_mem(&buf);
> + if (zerobuf)
> + ext2fs_free_mem(&zerobuf);
> + return err;
> +}
> +
> +/*
> + * Find reasonable size for orphan file. We choose orphan file size to be
> + * between 32 and 512 filesystem blocks and not more than 1/4096 of the
> + * filesystem unless it is really small.
> + */
> +e2_blkcnt_t ext2fs_default_orphan_file_blocks(__u64 num_blocks)
> +{
> + if (num_blocks < 128 * 1024)
> + return 32;
> + if (num_blocks < 2 * 1024 * 1024)
> + return num_blocks / 4096;
> + return 512;
> +}
> +
> +static errcode_t ext2fs_orphan_file_block_csum(ext2_filsys fs, char *buf,
> + __u32 *crc)
> +{
> + int inodes_per_ob = ext2fs_inodes_per_orphan_block(fs);
> + __u32 gen;
> + ext2_ino_t inum;
> + struct ext2_inode inode;
> + errcode_t retval;
> +
> + retval = ext2fs_read_inode(fs, EXT4_ORPHAN_INO, &inode);
> + if (retval)
> + return retval;
> + inum = ext2fs_cpu_to_le32(EXT4_ORPHAN_INO);
> + gen = ext2fs_cpu_to_le32(inode.i_generation);
> + *crc = ext2fs_crc32c_le(fs->csum_seed, (unsigned char *)&inum,
> + sizeof(inum));
> + *crc = ext2fs_crc32c_le(*crc, (unsigned char *)&gen, sizeof(gen));
> + *crc = ext2fs_crc32c_le(*crc, buf, inodes_per_ob * sizeof(__u32));
> +
> + return 0;
> +}
> +
> +errcode_t ext2fs_orphan_file_block_csum_set(ext2_filsys fs, char *buf)
> +{
> + struct ext4_orphan_block_tail *tail;
> +
> + if (!EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> + EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
> + return 0;
> +
> + tail = ext2fs_orphan_block_tail(fs, buf);
> + return ext2fs_orphan_file_block_csum(fs, buf, &tail->ob_checksum);
> +}
> +
> +int ext2fs_orphan_file_block_csum_verify(ext2_filsys fs, char *buf)
> +{
> + struct ext4_orphan_block_tail *tail;
> + __u32 crc;
> + errcode_t retval;
> +
> + if (!EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> + EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
> + return 1;
> + retval = ext2fs_orphan_file_block_csum(fs, buf, &crc);
> + if (retval)
> + return 0;
> + tail = ext2fs_orphan_block_tail(fs, buf);
> + return ext2fs_le32_to_cpu(tail->ob_checksum) == crc;
> +}
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2015-05-22 21:59:23

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH 1/4] libext2fs: Support for orphan file feature

On May 22, 2015, at 5:28 AM, Jan Kara <[email protected]> wrote:
>
> Add support for creating and deleting orphan file and a couple of
> utility functions that will be used in other tools.
>
> Signed-off-by: Jan Kara <[email protected]>
> ---
> lib/e2p/feature.c | 4 +
> lib/ext2fs/Makefile.in | 2 +
> lib/ext2fs/ext2_fs.h | 11 +++
> lib/ext2fs/ext2fs.h | 35 +++++++-
> lib/ext2fs/orphan.c | 217 +++++++++++++++++++++++++++++++++++++++++++++++++
> 5 files changed, 267 insertions(+), 2 deletions(-)
> create mode 100644 lib/ext2fs/orphan.c
>
> diff --git a/lib/e2p/feature.c b/lib/e2p/feature.c
> index 73884f2cf5bf..a8e0d4a4644a 100644
> --- a/lib/e2p/feature.c
> +++ b/lib/e2p/feature.c
> @@ -45,6 +45,8 @@ static struct feature feature_list[] = {
> "snapshot_bitmap" },
> { E2P_FEATURE_COMPAT, EXT4_FEATURE_COMPAT_SPARSE_SUPER2,
> "sparse_super2" },
> + { E2P_FEATURE_COMPAT, EXT4_FEATURE_COMPAT_ORPHAN_FILE,
> + "orphan_file" },
>
> { E2P_FEATURE_RO_INCOMPAT, EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER,
> "sparse_super" },
> @@ -70,6 +72,8 @@ static struct feature feature_list[] = {
> "replica" },
> { E2P_FEATURE_RO_INCOMPAT, EXT4_FEATURE_RO_COMPAT_READONLY,
> "read-only" },
> + { E2P_FEATURE_RO_INCOMPAT, EXT4_FEATURE_RO_COMPAT_ORPHAN_PRESENT,
> + "orphan_file_used" },
>
> { E2P_FEATURE_INCOMPAT, EXT2_FEATURE_INCOMPAT_COMPRESSION,
> "compression" },
> diff --git a/lib/ext2fs/Makefile.in b/lib/ext2fs/Makefile.in
> index 8a7f8ca52902..67120b10438c 100644
> --- a/lib/ext2fs/Makefile.in
> +++ b/lib/ext2fs/Makefile.in
> @@ -109,6 +109,7 @@ OBJS= $(DEBUGFS_LIB_OBJS) $(RESIZE_LIB_OBJS) $(E2IMAGE_LIB_OBJS) \
> native.o \
> newdir.o \
> openfs.o \
> + orphan.o \
> progress.o \
> punch.o \
> qcow2.o \
> @@ -189,6 +190,7 @@ SRCS= ext2_err.c \
> $(srcdir)/native.c \
> $(srcdir)/newdir.c \
> $(srcdir)/openfs.c \
> + $(srcdir)/orphan.c \
> $(srcdir)/progress.c \
> $(srcdir)/punch.c \
> $(srcdir)/qcow2.c \
> diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
> index a755cfac8eae..a77c8fa09938 100644
> --- a/lib/ext2fs/ext2_fs.h
> +++ b/lib/ext2fs/ext2_fs.h
> @@ -52,6 +52,7 @@
> #define EXT2_JOURNAL_INO 8 /* Journal inode */
> #define EXT2_EXCLUDE_INO 9 /* The "exclude" inode, for snapshots */
> #define EXT4_REPLICA_INO 10 /* Used by non-upstream feature */
> +#define EXT4_ORPHAN_INO 9 /* Inode with orphan entries */

This still has a problem here, and can't be safely landed until it is resolved.
At a minimum, it shouldn't be possible to create a filesystem with COMPAT_ORPHAN_FILE
at the same time as COMPAT_EXCLUDE_BITMAP. Since EXCLUDE_BITMAP never made it
upstream, that might be a reasonable compromise for now.

That said, we still need to do something about the lack of reserved inodes.

Cheers, Andreas

> /* First non-reserved inode for old ext2 filesystems */
> #define EXT2_GOOD_OLD_FIRST_INO 11
> @@ -769,6 +770,7 @@ struct ext2_super_block {
> /* #define EXT2_FEATURE_COMPAT_EXCLUDE_INODE 0x0080 not used, legacy */
> #define EXT2_FEATURE_COMPAT_EXCLUDE_BITMAP 0x0100
> #define EXT4_FEATURE_COMPAT_SPARSE_SUPER2 0x0200
> +#define EXT4_FEATURE_COMPAT_ORPHAN_FILE 0x0400
>
>
> #define EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER 0x0001
> @@ -789,6 +791,7 @@ struct ext2_super_block {
> #define EXT4_FEATURE_RO_COMPAT_METADATA_CSUM 0x0400
> #define EXT4_FEATURE_RO_COMPAT_REPLICA 0x0800
> #define EXT4_FEATURE_RO_COMPAT_READONLY 0x1000
> +#define EXT4_FEATURE_RO_COMPAT_ORPHAN_PRESENT 0x2000
>
>
> #define EXT2_FEATURE_INCOMPAT_COMPRESSION 0x0001
> @@ -838,6 +841,14 @@ struct ext2_super_block {
> #define EXT4_DEFM_DISCARD 0x0400
> #define EXT4_DEFM_NODELALLOC 0x0800
>
> +#define EXT4_ORPHAN_BLOCK_MAGIC 0x0b10ca04
> +
> +/* Structure at the tail of orphan block */
> +struct ext4_orphan_block_tail {
> + __u32 ob_magic;
> + __u32 ob_checksum;
> +};
> +
> /*
> * Structure of a directory entry
> */
> diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
> index 28c46701da29..1e303d5d59ca 100644
> --- a/lib/ext2fs/ext2fs.h
> +++ b/lib/ext2fs/ext2fs.h
> @@ -555,7 +555,8 @@ typedef struct ext2_icount *ext2_icount_t;
> EXT2_FEATURE_COMPAT_RESIZE_INODE|\
> EXT2_FEATURE_COMPAT_DIR_INDEX|\
> EXT2_FEATURE_COMPAT_EXT_ATTR|\
> - EXT4_FEATURE_COMPAT_SPARSE_SUPER2)
> + EXT4_FEATURE_COMPAT_SPARSE_SUPER2|\
> + EXT4_FEATURE_COMPAT_ORPHAN_FILE)
>
> #ifdef CONFIG_MMP
> #define EXT4_LIB_INCOMPAT_MMP EXT4_FEATURE_INCOMPAT_MMP
> @@ -589,7 +590,8 @@ typedef struct ext2_icount *ext2_icount_t;
> EXT4_FEATURE_RO_COMPAT_BIGALLOC|\
> EXT4_LIB_RO_COMPAT_QUOTA|\
> EXT4_FEATURE_RO_COMPAT_METADATA_CSUM|\
> - EXT4_FEATURE_RO_COMPAT_READONLY)
> + EXT4_FEATURE_RO_COMPAT_READONLY|\
> + EXT4_FEATURE_RO_COMPAT_ORPHAN_PRESENT)
>
> /*
> * These features are only allowed if EXT2_FLAG_SOFTSUPP_FEATURES is passed
> @@ -1512,6 +1514,19 @@ errcode_t ext2fs_get_data_io(ext2_filsys fs, io_channel *old_io);
> errcode_t ext2fs_set_data_io(ext2_filsys fs, io_channel new_io);
> errcode_t ext2fs_rewrite_to_io(ext2_filsys fs, io_channel new_io);
>
> +/* orphan.c */
> +/*
> + * Minimum orphan file size (it must be at least 1 block and smaller one isn't
> + * very useful).
> + */
> +#define EXT4_MIN_ORPHAN_FILE_SIZE 16384
> +
> +extern errcode_t ext2fs_create_orphan_file(ext2_filsys fs, blk_t num_blocks);
> +extern errcode_t ext2fs_truncate_orphan_file(ext2_filsys fs);
> +extern e2_blkcnt_t ext2fs_default_orphan_file_blocks(__u64 num_blocks);
> +extern errcode_t ext2fs_orphan_file_block_csum_set(ext2_filsys fs, char *buf);
> +extern int ext2fs_orphan_file_block_csum_verify(ext2_filsys fs, char *buf);
> +
> /* get_pathname.c */
> extern errcode_t ext2fs_get_pathname(ext2_filsys fs, ext2_ino_t dir, ext2_ino_t ino,
> char **name);
> @@ -1645,6 +1660,9 @@ extern int ext2fs_dirent_name_len(const struct ext2_dir_entry *entry);
> extern void ext2fs_dirent_set_name_len(struct ext2_dir_entry *entry, int len);
> extern int ext2fs_dirent_file_type(const struct ext2_dir_entry *entry);
> extern void ext2fs_dirent_set_file_type(struct ext2_dir_entry *entry, int type);
> +extern int ext2fs_inodes_per_orphan_block(ext2_filsys fs);
> +extern struct ext4_orphan_block_tail *ext2fs_orphan_block_tail(ext2_filsys fs,
> + char *buf)
>
> #endif
>
> @@ -1915,6 +1933,19 @@ _INLINE_ void ext2fs_dirent_set_file_type(struct ext2_dir_entry *entry, int type
> entry->name_len = (entry->name_len & 0xff) | (type << 8);
> }
>
> +_INLINE_ int ext2fs_inodes_per_orphan_block(ext2_filsys fs)
> +{
> + return (fs->blocksize - sizeof(struct ext4_orphan_block_tail)) /
> + sizeof(__u32);
> +}
> +
> +_INLINE_ struct ext4_orphan_block_tail *
> +ext2fs_orphan_block_tail(ext2_filsys fs, char *buf)
> +{
> + return (struct ext4_orphan_block_tail *)(buf + fs->blocksize -
> + sizeof(struct ext4_orphan_block_tail));
> +}
> +
> #undef _INLINE_
> #endif
>
> diff --git a/lib/ext2fs/orphan.c b/lib/ext2fs/orphan.c
> new file mode 100644
> index 000000000000..1fd5c0688218
> --- /dev/null
> +++ b/lib/ext2fs/orphan.c
> @@ -0,0 +1,217 @@
> +/*
> + * orphan.c --- utility function to handle orphan file
> + *
> + * Copyright (C) 2015 Jan Kara.
> + *
> + * %Begin-Header%
> + * This file may be redistributed under the terms of the GNU Library
> + * General Public License, version 2.
> + * %End-Header%
> + */
> +
> +#include "config.h"
> +#include <string.h>
> +
> +#include "ext2_fs.h"
> +#include "ext2fsP.h"
> +
> +errcode_t ext2fs_truncate_orphan_file(ext2_filsys fs)
> +{
> + struct ext2_inode inode;
> + errcode_t err;
> +
> + err = ext2fs_read_inode(fs, EXT4_ORPHAN_INO, &inode);
> + if (err)
> + return err;
> +
> + err = ext2fs_punch(fs, EXT4_ORPHAN_INO, &inode, NULL, 0, ~0ULL);
> + if (err)
> + return err;
> +
> + fs->flags &= ~EXT2_FLAG_SUPER_ONLY;
> + memset(&inode, 0, sizeof(struct ext2_inode));
> + err = ext2fs_write_inode(fs, EXT4_ORPHAN_INO, &inode);
> +
> + fs->super->s_feature_compat &= ~EXT4_FEATURE_COMPAT_ORPHAN_FILE;
> + ext2fs_mark_super_dirty(fs);
> +
> + return err;
> +}
> +
> +struct mkorphan_info {
> + char *buf;
> + char *zerobuf;
> + blk_t num_blocks;
> + blk_t alloc_blocks;
> + errcode_t err;
> +};
> +
> +static int mkorphan_proc(ext2_filsys fs,
> + blk64_t *blocknr,
> + e2_blkcnt_t blockcnt,
> + blk64_t ref_block EXT2FS_ATTR((unused)),
> + int ref_offset EXT2FS_ATTR((unused)),
> + void *priv_data)
> +{
> + struct mkorphan_info *oi = (struct mkorphan_info *)priv_data;
> + blk64_t new_blk;
> + errcode_t err;
> +
> + err = ext2fs_new_block2(fs, 0, 0, &new_blk);
> + if (err) {
> + oi->err = err;
> + return BLOCK_ABORT;
> + }
> + ext2fs_block_alloc_stats2(fs, new_blk, +1);
> + if (blockcnt >= 0)
> + err = io_channel_write_blk64(fs->io, new_blk, 1, oi->buf);
> + else
> + err = io_channel_write_blk64(fs->io, new_blk, 1, oi->zerobuf);
> + if (err) {
> + oi->err = err;
> + return BLOCK_ABORT;
> + }
> + oi->alloc_blocks++;
> + *blocknr = new_blk;
> + if (blockcnt >= 0 && --oi->num_blocks == 0)
> + return BLOCK_CHANGED | BLOCK_ABORT;
> + return BLOCK_CHANGED;
> +}
> +
> +errcode_t ext2fs_create_orphan_file(ext2_filsys fs, blk_t num_blocks)
> +{
> + struct ext2_inode inode;
> + errcode_t err;
> + char *buf = NULL, *zerobuf = NULL;
> + struct mkorphan_info oi;
> + struct ext4_orphan_block_tail *ob_tail;
> +
> + err = ext2fs_read_inode(fs, EXT4_ORPHAN_INO, &inode);
> + if (err)
> + return err;
> + if (EXT2_I_SIZE(&inode)) {
> + err = ext2fs_truncate_orphan_file(fs);
> + if (err)
> + return err;
> + }
> +
> + memset(&inode, 0, sizeof(struct ext2_inode));
> + if (fs->super->s_feature_incompat & EXT3_FEATURE_INCOMPAT_EXTENTS) {
> + inode.i_flags |= EXT4_EXTENTS_FL;
> + err = ext2fs_write_inode(fs, EXT4_ORPHAN_INO, &inode);
> + if (err)
> + return err;
> + }
> +
> + err = ext2fs_get_mem(fs->blocksize, &buf);
> + if (err)
> + return err;
> + err = ext2fs_get_mem(fs->blocksize, &zerobuf);
> + if (err)
> + goto out;
> + memset(buf, 0, fs->blocksize);
> + memset(zerobuf, 0, fs->blocksize);
> + ob_tail = ext2fs_orphan_block_tail(fs, buf);
> + ob_tail->ob_magic = ext2fs_cpu_to_le32(EXT4_ORPHAN_BLOCK_MAGIC);
> + ext2fs_orphan_file_block_csum_set(fs, buf);
> + oi.num_blocks = num_blocks;
> + oi.alloc_blocks = 0;
> + oi.buf = buf;
> + oi.zerobuf = zerobuf;
> + oi.err = 0;
> + err = ext2fs_block_iterate3(fs, EXT4_ORPHAN_INO, BLOCK_FLAG_APPEND,
> + 0, mkorphan_proc, &oi);
> + if (err)
> + goto out;
> +
> + /* Reread inode after blocks were allocated */
> + err = ext2fs_read_inode(fs, EXT4_ORPHAN_INO, &inode);
> + if (err)
> + goto out;
> + ext2fs_iblk_set(fs, &inode, 0);
> + inode.i_atime = inode.i_mtime =
> + inode.i_ctime = fs->now ? fs->now : time(0);
> + inode.i_links_count = 1;
> + inode.i_mode = LINUX_S_IFREG | 0600;
> + ext2fs_iblk_add_blocks(fs, &inode, oi.alloc_blocks);
> + err = ext2fs_inode_size_set(fs, &inode,
> + (unsigned long long)fs->blocksize * num_blocks);
> + if (err)
> + goto out;
> + err = ext2fs_write_new_inode(fs, EXT4_ORPHAN_INO, &inode);
> + if (err)
> + goto out;
> +
> + fs->super->s_feature_compat |= EXT4_FEATURE_COMPAT_ORPHAN_FILE;
> + ext2fs_mark_super_dirty(fs);
> +out:
> + if (buf)
> + ext2fs_free_mem(&buf);
> + if (zerobuf)
> + ext2fs_free_mem(&zerobuf);
> + return err;
> +}
> +
> +/*
> + * Find reasonable size for orphan file. We choose orphan file size to be
> + * between 32 and 512 filesystem blocks and not more than 1/4096 of the
> + * filesystem unless it is really small.
> + */
> +e2_blkcnt_t ext2fs_default_orphan_file_blocks(__u64 num_blocks)
> +{
> + if (num_blocks < 128 * 1024)
> + return 32;
> + if (num_blocks < 2 * 1024 * 1024)
> + return num_blocks / 4096;
> + return 512;
> +}
> +
> +static errcode_t ext2fs_orphan_file_block_csum(ext2_filsys fs, char *buf,
> + __u32 *crc)
> +{
> + int inodes_per_ob = ext2fs_inodes_per_orphan_block(fs);
> + __u32 gen;
> + ext2_ino_t inum;
> + struct ext2_inode inode;
> + errcode_t retval;
> +
> + retval = ext2fs_read_inode(fs, EXT4_ORPHAN_INO, &inode);
> + if (retval)
> + return retval;
> + inum = ext2fs_cpu_to_le32(EXT4_ORPHAN_INO);
> + gen = ext2fs_cpu_to_le32(inode.i_generation);
> + *crc = ext2fs_crc32c_le(fs->csum_seed, (unsigned char *)&inum,
> + sizeof(inum));
> + *crc = ext2fs_crc32c_le(*crc, (unsigned char *)&gen, sizeof(gen));
> + *crc = ext2fs_crc32c_le(*crc, buf, inodes_per_ob * sizeof(__u32));
> +
> + return 0;
> +}
> +
> +errcode_t ext2fs_orphan_file_block_csum_set(ext2_filsys fs, char *buf)
> +{
> + struct ext4_orphan_block_tail *tail;
> +
> + if (!EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> + EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
> + return 0;
> +
> + tail = ext2fs_orphan_block_tail(fs, buf);
> + return ext2fs_orphan_file_block_csum(fs, buf, &tail->ob_checksum);
> +}
> +
> +int ext2fs_orphan_file_block_csum_verify(ext2_filsys fs, char *buf)
> +{
> + struct ext4_orphan_block_tail *tail;
> + __u32 crc;
> + errcode_t retval;
> +
> + if (!EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> + EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
> + return 1;
> + retval = ext2fs_orphan_file_block_csum(fs, buf, &crc);
> + if (retval)
> + return 0;
> + tail = ext2fs_orphan_block_tail(fs, buf);
> + return ext2fs_le32_to_cpu(tail->ob_checksum) == crc;
> +}
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


Cheers, Andreas






2015-05-25 07:19:42

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 0/4] e2fsprogs: Support for orphan file feature

On Fri 22-05-15 12:51:51, Albino Biasutti Neto wrote:
> 2015-05-22 8:28 GMT-03:00 Jan Kara <[email protected]>:
> > this is support orphan file feature in e2fsprogs. mke2fs and tune2fs support
> > should be fine, e2fsck support still has bugs so use with care. I'm posting
> > this mainly so that people can easily create filesystem with orphan file
> > feature for testing the kernel patches.
>
> How ?
Build & boot a kernel with the kernel patches I posted ([PATCH 0/3 v2]
ext4: Speedup orphan file handling) - warning there are actually 4 patches
in the series, I just messed up header.

Build e2fsprogs with patches in this series.

Create filesystem with '-O orphan_file' feature.

Now you can test whether the feature makes a difference for things you are
interested in - it may be visible when you heavily delete / truncate files
from lots of processes in parallel.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2015-05-25 07:38:36

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 4/4] tune2fs: Add support for orphan_file feature

On Fri 22-05-15 10:18:42, Darrick J. Wong wrote:
> On Fri, May 22, 2015 at 01:28:57PM +0200, Jan Kara wrote:
> > @@ -1970,6 +2011,31 @@ static int parse_extended_opts(ext2_filsys fs, const char *opts)
> > continue;
> > }
> > ext_mount_opts = strdup(arg);
> > + } else if (!strcmp(token, "orphan_file_size")) {
> > + __u64 size;
> > +
> > + if (!arg) {
> > + r_usage++;
> > + continue;
> > + }
> > + size = strtoul(arg, &p, 0);
>
> Would be nice if I could supply units, e.g. orphan_file_size = 128K here.

I don't mind doing this but do we have precedens in any option? Because
e.g. journal size is just a number (in megabytes), stripe options are in fs
blocks. For orphan file megabytes are too coarse. I can make the unit
fs-blocks, or kbytes, or leave it at bytes, or allow specifying units but
some consistency would be good. Any opinion?

> > + if (*p) {
> > + fprintf(stderr,
> > + _("Invalid size of orphan file %s\n"),
> > + arg);
> > + r_usage++;
> > + continue;
> > + }
> > + if (size < EXT4_MIN_ORPHAN_FILE_SIZE) {
> > + fprintf(stderr,
> > + _("Orphan file is too small. Minimum "
> > + "size is %u\n"),
> > + EXT4_MIN_ORPHAN_FILE_SIZE);
> > + r_usage++;
> > + continue;
> > + }
> > + orphan_file_blocks = (size + fs->blocksize - 1) /
> > + fs->blocksize;
> > } else
> > r_usage++;
> > }
> > @@ -2921,6 +2987,17 @@ retry_open:
> > if (rc)
> > goto closefs;
> > }
> > + if (orphan_file_blocks) {
>
> If someone specifies -E orphan_file_size=NNN -O ^orphan_file, does this have
> the effect of erasing and recreating the orphan file?

Yes, although that's a side-effect of the option parsing :) Intended use
is that you first disable the orphan_file feature and then enable it again
with new size.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2015-05-25 07:50:03

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 1/4] libext2fs: Support for orphan file feature

On Fri 22-05-15 10:35:21, Darrick J. Wong wrote:
> On Fri, May 22, 2015 at 01:28:54PM +0200, Jan Kara wrote:
...
> > diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
> > index 28c46701da29..1e303d5d59ca 100644
> > --- a/lib/ext2fs/ext2fs.h
> > +++ b/lib/ext2fs/ext2fs.h
> > @@ -555,7 +555,8 @@ typedef struct ext2_icount *ext2_icount_t;
> > EXT2_FEATURE_COMPAT_RESIZE_INODE|\
> > EXT2_FEATURE_COMPAT_DIR_INDEX|\
> > EXT2_FEATURE_COMPAT_EXT_ATTR|\
> > - EXT4_FEATURE_COMPAT_SPARSE_SUPER2)
> > + EXT4_FEATURE_COMPAT_SPARSE_SUPER2|\
> > + EXT4_FEATURE_COMPAT_ORPHAN_FILE)
> >
> > #ifdef CONFIG_MMP
> > #define EXT4_LIB_INCOMPAT_MMP EXT4_FEATURE_INCOMPAT_MMP
> > @@ -589,7 +590,8 @@ typedef struct ext2_icount *ext2_icount_t;
> > EXT4_FEATURE_RO_COMPAT_BIGALLOC|\
> > EXT4_LIB_RO_COMPAT_QUOTA|\
> > EXT4_FEATURE_RO_COMPAT_METADATA_CSUM|\
> > - EXT4_FEATURE_RO_COMPAT_READONLY)
> > + EXT4_FEATURE_RO_COMPAT_READONLY|\
> > + EXT4_FEATURE_RO_COMPAT_ORPHAN_PRESENT)
> >
> > /*
> > * These features are only allowed if EXT2_FLAG_SOFTSUPP_FEATURES is passed
> > @@ -1512,6 +1514,19 @@ errcode_t ext2fs_get_data_io(ext2_filsys fs, io_channel *old_io);
> > errcode_t ext2fs_set_data_io(ext2_filsys fs, io_channel new_io);
> > errcode_t ext2fs_rewrite_to_io(ext2_filsys fs, io_channel new_io);
> >
> > +/* orphan.c */
> > +/*
> > + * Minimum orphan file size (it must be at least 1 block and smaller one isn't
> > + * very useful).
> > + */
> > +#define EXT4_MIN_ORPHAN_FILE_SIZE 16384
>
> What about 64k block size? I guess it's fine not to use the whole block in
> this (non-default) configuration.

We round up file size to be a multiple of fs block size.

> > +struct mkorphan_info {
> > + char *buf;
> > + char *zerobuf;
> > + blk_t num_blocks;
> > + blk_t alloc_blocks;
> > + errcode_t err;
> > +};
> > +
> > +static int mkorphan_proc(ext2_filsys fs,
> > + blk64_t *blocknr,
> > + e2_blkcnt_t blockcnt,
> > + blk64_t ref_block EXT2FS_ATTR((unused)),
> > + int ref_offset EXT2FS_ATTR((unused)),
> > + void *priv_data)
> > +{
> > + struct mkorphan_info *oi = (struct mkorphan_info *)priv_data;
> > + blk64_t new_blk;
> > + errcode_t err;
> > +
> > + err = ext2fs_new_block2(fs, 0, 0, &new_blk);
>
> Hm. I think this breaks the cluster allocation rules, since this allocates
> a new block for every logical block inside a cluster.

Yes, I didn't think of cluster allocation. Frankly, I have just mirrored
what a code creating journal does. If this is wrong, then that code needs
fixing as well ;)

> Hopefully ext2fs_fallocate will land soon, then we can get rid of open-coding
> file block allocation like this. You'll still have to have the iterate3 loop
> to write out the appropriate block footer, but iirc the library call can be
> told to allocate written extents without zeroing the blocks, precisely for
> cases like these.

So should I wait for that or do you have any pointer to code which does
it correctly?

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2015-05-25 08:01:39

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 1/4] libext2fs: Support for orphan file feature

On Fri 22-05-15 15:59:19, Andreas Dilger wrote:
> On May 22, 2015, at 5:28 AM, Jan Kara <[email protected]> wrote:
> > diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
> > index a755cfac8eae..a77c8fa09938 100644
> > --- a/lib/ext2fs/ext2_fs.h
> > +++ b/lib/ext2fs/ext2_fs.h
> > @@ -52,6 +52,7 @@
> > #define EXT2_JOURNAL_INO 8 /* Journal inode */
> > #define EXT2_EXCLUDE_INO 9 /* The "exclude" inode, for snapshots */
> > #define EXT4_REPLICA_INO 10 /* Used by non-upstream feature */
> > +#define EXT4_ORPHAN_INO 9 /* Inode with orphan entries */
>
> This still has a problem here, and can't be safely landed until it is
> resolved. At a minimum, it shouldn't be possible to create a filesystem
> with COMPAT_ORPHAN_FILE at the same time as COMPAT_EXCLUDE_BITMAP. Since
> EXCLUDE_BITMAP never made it upstream, that might be a reasonable
> compromise for now.

Yeah, for now I've chosen inode number 9 as for testing it's good enough.
We can make this feature incompatible with COMPAT_EXCLUDE_BITMAP as you
suggest or we can use some higher inode number and require increased number
of reserved inodes. I don't mind either too much.

> That said, we still need to do something about the lack of reserved inodes.

Agreed. I've tried to get some decision from Ted regarding this a few times
but failed.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2015-05-25 16:16:31

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH 4/4] tune2fs: Add support for orphan_file feature


> On May 25, 2015, at 1:38 AM, Jan Kara <[email protected]> wrote:
>
> On Fri 22-05-15 10:18:42, Darrick J. Wong wrote:
>> On Fri, May 22, 2015 at 01:28:57PM +0200, Jan Kara wrote:
>>> @@ -1970,6 +2011,31 @@ static int parse_extended_opts(ext2_filsys fs, const char *opts)
>>> continue;
>>> }
>>> ext_mount_opts = strdup(arg);
>>> + } else if (!strcmp(token, "orphan_file_size")) {
>>> + __u64 size;
>>> +
>>> + if (!arg) {
>>> + r_usage++;
>>> + continue;
>>> + }
>>> + size = strtoul(arg, &p, 0);
>>
>> Would be nice if I could supply units, e.g. orphan_file_size = 128K here.
>
> I don't mind doing this but do we have precedens in any option? Because
> e.g. journal size is just a number (in megabytes), stripe options are in fs
> blocks. For orphan file megabytes are too coarse. I can make the unit
> fs-blocks, or kbytes, or leave it at bytes, or allow specifying units but
> some consistency would be good. Any opinion?

Use "parse_num_blocks2()" for this. This is used to parse the filesystem size in mke2fs,
blocksize, bigalloc cluster size, resize, etc. You are right that the journal_size is
NOT using this helper, but it probably should.

Cheers, Andreas


>>> + if (*p) {
>>> + fprintf(stderr,
>>> + _("Invalid size of orphan file %s\n"),
>>> + arg);
>>> + r_usage++;
>>> + continue;
>>> + }
>>> + if (size < EXT4_MIN_ORPHAN_FILE_SIZE) {
>>> + fprintf(stderr,
>>> + _("Orphan file is too small. Minimum "
>>> + "size is %u\n"),
>>> + EXT4_MIN_ORPHAN_FILE_SIZE);
>>> + r_usage++;
>>> + continue;
>>> + }
>>> + orphan_file_blocks = (size + fs->blocksize - 1) /
>>> + fs->blocksize;
>>> } else
>>> r_usage++;
>>> }
>>> @@ -2921,6 +2987,17 @@ retry_open:
>>> if (rc)
>>> goto closefs;
>>> }
>>> + if (orphan_file_blocks) {
>>
>> If someone specifies -E orphan_file_size=NNN -O ^orphan_file, does this have
>> the effect of erasing and recreating the orphan file?
>
> Yes, although that's a side-effect of the option parsing :) Intended use
> is that you first disable the orphan_file feature and then enable it again
> with new size.
>
> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


Cheers, Andreas






2015-05-26 07:24:40

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 4/4] tune2fs: Add support for orphan_file feature

On Mon 25-05-15 10:16:27, Andreas Dilger wrote:
>
> > On May 25, 2015, at 1:38 AM, Jan Kara <[email protected]> wrote:
> >
> > On Fri 22-05-15 10:18:42, Darrick J. Wong wrote:
> >> On Fri, May 22, 2015 at 01:28:57PM +0200, Jan Kara wrote:
> >>> @@ -1970,6 +2011,31 @@ static int parse_extended_opts(ext2_filsys fs, const char *opts)
> >>> continue;
> >>> }
> >>> ext_mount_opts = strdup(arg);
> >>> + } else if (!strcmp(token, "orphan_file_size")) {
> >>> + __u64 size;
> >>> +
> >>> + if (!arg) {
> >>> + r_usage++;
> >>> + continue;
> >>> + }
> >>> + size = strtoul(arg, &p, 0);
> >>
> >> Would be nice if I could supply units, e.g. orphan_file_size = 128K here.
> >
> > I don't mind doing this but do we have precedens in any option? Because
> > e.g. journal size is just a number (in megabytes), stripe options are in fs
> > blocks. For orphan file megabytes are too coarse. I can make the unit
> > fs-blocks, or kbytes, or leave it at bytes, or allow specifying units but
> > some consistency would be good. Any opinion?
>
> Use "parse_num_blocks2()" for this. This is used to parse the filesystem size in mke2fs,
> blocksize, bigalloc cluster size, resize, etc. You are right that the journal_size is
> NOT using this helper, but it probably should.
Ah, thanks for the hint! Will do.

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR