2014-07-18 22:52:08

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 00/24] e2fsprogs patchbomb 7/14, part 1

Hi all,

Since my last patch submission in May, I've been fuzzing both the
in-kernel ext4 driver, and e2fsck. The main objective of this work
has been to determine if the kernel is capable of detecting invalid
mutations and returning -EIO without crashing; and whether or not
e2fsck can salvage the filesystem (or at least get it back to a
(self-defined) "clean" state) within a finite number of e2fsck runs.

I have a program "e2fuzz" (in patch 24) that formats and populates an
ext4 filesystem, randomly corrupts some number of metadata block
bytes, mounts the FS, tries to do some simple IO, unmounts, then
repeatedly runs fsck until either it says the FS is clean, we've run
too many times, or the output indicates that no progress is being
made.

The kernel, it turns out, seems to be able to handle problems with
grace. Luckily, it at least has the privilege of simply shutting down
the filesystem. e2fsck is not so fortunate -- upon detecting badness,
it has to decide a resolution and make it stick. This exposed a
number of incorrect fixes, infinite loop opportunities, crashes, and
in a few cases, total filesystem destruction. Lots of patches, though
I swear I'm _not_ paid by the patch. :)

The 24 patches following this mesage fix various problems in the more
mature parts of libext2fs and e2fsck. Most (18) apply cleanly against
-maint, but a few of them also happen to touch things that only appear
in -next. There are of course many more patches in the patch set, but
I'm breaking them up to avoid blasting people all at once. The second
patchbomb will have about 35 fixes against the new features in the
-next branch. I'll push it out in a few days, since I'm travelling
for OSCON. The third patchbomb will be the same pile of "new"
features from May's patch series; there's about 20 or so of those.
They haven't changed since May.

The first patch is the e4defrag fix from a few days ago. There are
three patches to debugfs that made it much easier to figure out what
was going on in the mutated filesystems. Everything after that are
miscellaneous fixes that e2fuzz turned up. There are two that I want
to call out specifically -- patch 10 solves the particular problem
that fsck needs to avoid touching corrupt metadata blocks if they're
cross-linked with critical FS metadata. Patch 11 problem that hidden
allocations (think extra ETB/map blocks when extending a file) were
coming from the wrong block bitmap. Patch 23 is unchanged from the
May patch set.

I've tested these e2fsprogs changes against the -next branch as of
7/13. These days, I use several VMs, each with 32M-1G ramdisks to test
with; the test process is "misc/e2fuzz.sh -B <fuzz> -s <size>", where
fuzz is anything from 2 bytes to "0.1%" of metadata bytes. In the
past month or so I've run about a million iterations of "-B 2" without
incident, and about 100,000 iterations of "-B 0.1%" without problems.
FS size was 256M and yes, some of the testing was done before the most
recent push to git.kernel.org.

Comments and questions are, as always, welcome.

--D


2014-07-18 22:52:25

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 01/24] e4defrag: backwards-allocated files should be defragmented too

Currently, e4defrag avoids increasing file fragmentation by comparing
the number of runs of physical extents of both the original and the
donor files. Unfortunately, there is a bug in the routine that counts
physical extents, since it doesn't look at the logical block offsets
of the extents. Therefore, a file whose blocks were allocated in
reverse order will be seen as only having one big physical extent, and
therefore will not be defragmented.

Fix the counting routine to consider logical extent offset so that we
defragment backwards-allocated files. This could be problematic if we
ever gain the ability to lay out logically sparse extents in a
physically contiguous manner, but presumably one wouldn't call defrag
on such a file.

Reported-by: Xiaoguang Wang <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
---
misc/e4defrag.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)


diff --git a/misc/e4defrag.c b/misc/e4defrag.c
index a204793..d0eac60 100644
--- a/misc/e4defrag.c
+++ b/misc/e4defrag.c
@@ -888,7 +888,9 @@ static int get_physical_count(struct fiemap_extent_list *physical_list_head)

do {
if ((ext_list_tmp->data.physical + ext_list_tmp->data.len)
- != ext_list_tmp->next->data.physical) {
+ != ext_list_tmp->next->data.physical ||
+ (ext_list_tmp->data.logical + ext_list_tmp->data.len)
+ != ext_list_tmp->next->data.logical) {
/* This extent and next extent are not continuous. */
ret++;
}


2014-07-18 22:52:35

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 02/24] debugfs: Only print the first 60 bytes from i_block on a fast symlink

If we have an inline_data fast symlink, i_size can be larger than the
size of i_block. In this case, debugfs prints off the end of the
buffer, so fix that.

Signed-off-by: Darrick J. Wong <[email protected]>
---
debugfs/debugfs.c | 15 ++++++++++-----
tests/d_special_files/expect | 2 +-
2 files changed, 11 insertions(+), 6 deletions(-)


diff --git a/debugfs/debugfs.c b/debugfs/debugfs.c
index 51f386b..5eecabe 100644
--- a/debugfs/debugfs.c
+++ b/debugfs/debugfs.c
@@ -783,11 +783,16 @@ void internal_dump_inode(FILE *out, const char *prefix,
}

if (LINUX_S_ISLNK(inode->i_mode) &&
- ext2fs_inode_data_blocks(current_fs,inode) == 0 &&
- !(inode->i_flags & EXT4_INLINE_DATA_FL))
- fprintf(out, "%sFast_link_dest: %.*s\n", prefix,
- (int) inode->i_size, (char *)inode->i_block);
- else if (LINUX_S_ISBLK(inode->i_mode) || LINUX_S_ISCHR(inode->i_mode)) {
+ ext2fs_inode_data_blocks(current_fs, inode) == 0 &&
+ !(inode->i_flags & EXT4_INLINE_DATA_FL)) {
+ int sz = EXT2_I_SIZE(inode);
+
+ if (sz > sizeof(inode->i_block))
+ sz = sizeof(inode->i_block);
+ fprintf(out, "%sFast link dest: \"%.*s\"\n", prefix, sz,
+ (char *)inode->i_block);
+ } else if (LINUX_S_ISBLK(inode->i_mode) ||
+ LINUX_S_ISCHR(inode->i_mode)) {
int major, minor;
const char *devnote;

diff --git a/tests/d_special_files/expect b/tests/d_special_files/expect
index 2b2dbfa..f729b0f 100644
--- a/tests/d_special_files/expect
+++ b/tests/d_special_files/expect
@@ -11,7 +11,7 @@ Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x50f560e0 -- Tue Jan 15 14:00:00 2013
atime: 0x50f560e0 -- Tue Jan 15 14:00:00 2013
mtime: 0x50f560e0 -- Tue Jan 15 14:00:00 2013
-Fast_link_dest: bar
+Fast link dest: "bar"
Exit status is 0
debugfs -R ''stat foo2'' -w test.img
Inode: 13 Type: symlink Mode: 0777 Flags: 0x0


2014-07-18 22:52:41

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 03/24] debugfs: create idump command to dump an inode in hex

Create a command that will dump an entire inode's space in hex.

Signed-off-by: Darrick J. Wong <[email protected]>
---
debugfs/debug_cmds.ct | 4 ++++
debugfs/debugfs.c | 33 +++++++++++++++++++++++++++++++++
debugfs/debugfs.h | 1 +
debugfs/zap.c | 33 +++++++++++++++++++--------------
4 files changed, 57 insertions(+), 14 deletions(-)


diff --git a/debugfs/debug_cmds.ct b/debugfs/debug_cmds.ct
index 814fd52..9a66494 100644
--- a/debugfs/debug_cmds.ct
+++ b/debugfs/debug_cmds.ct
@@ -208,5 +208,9 @@ request do_list_quota, "List quota",
request do_get_quota, "Get quota",
get_quota, gq;

+request do_idump, "Dump inode",
+ idump;
+
+
end;

diff --git a/debugfs/debugfs.c b/debugfs/debugfs.c
index 5eecabe..8078a02 100644
--- a/debugfs/debugfs.c
+++ b/debugfs/debugfs.c
@@ -1888,6 +1888,39 @@ void do_imap(int argc, char *argv[])

}

+void do_idump(int argc, char *argv[])
+{
+ ext2_ino_t ino;
+ char *buf;
+ errcode_t err;
+ int isize;
+
+ if (common_args_process(argc, argv, 2, 2, argv[0],
+ "<file>", 0))
+ return;
+ ino = string_to_inode(argv[1]);
+ if (!ino)
+ return;
+
+ isize = EXT2_INODE_SIZE(current_fs->super);
+ err = ext2fs_get_mem(isize, &buf);
+ if (err) {
+ com_err(argv[0], err, "while allocating memory");
+ return;
+ }
+
+ err = ext2fs_read_inode_full(current_fs, ino,
+ (struct ext2_inode *)buf, isize);
+ if (err) {
+ com_err(argv[0], err, "while reading inode %d", ino);
+ goto err;
+ }
+
+ do_byte_hexdump(stdout, buf, isize);
+err:
+ ext2fs_free_mem(&buf);
+}
+
#ifndef READ_ONLY
void do_set_current_time(int argc, char *argv[])
{
diff --git a/debugfs/debugfs.h b/debugfs/debugfs.h
index df51aa0..6eb5732 100644
--- a/debugfs/debugfs.h
+++ b/debugfs/debugfs.h
@@ -191,3 +191,4 @@ void do_list_xattr(int argc, char **argv);
/* zap.c */
extern void do_zap_block(int argc, char **argv);
extern void do_block_dump(int argc, char **argv);
+extern void do_byte_hexdump(FILE *fp, unsigned char *buf, size_t bufsize);
diff --git a/debugfs/zap.c b/debugfs/zap.c
index 8109209..917bddf 100644
--- a/debugfs/zap.c
+++ b/debugfs/zap.c
@@ -176,7 +176,6 @@ void do_block_dump(int argc, char *argv[])
char *file = NULL;
unsigned int i, j;
int c, err;
- int suppress = -1;

if (check_fs_open(argv[0]))
return;
@@ -229,11 +228,21 @@ void do_block_dump(int argc, char *argv[])
goto errout;
}

- for (i=0; i < current_fs->blocksize; i += 16) {
+ do_byte_hexdump(stdout, buf, current_fs->blocksize);
+errout:
+ free(buf);
+}
+
+void do_byte_hexdump(FILE *fp, unsigned char *buf, size_t bufsize)
+{
+ size_t i, j;
+ int suppress = -1;
+
+ for (i = 0; i < bufsize; i += 16) {
if (suppress < 0) {
if (i && memcmp(buf + i, buf + i - 16, 16) == 0) {
suppress = i;
- printf("*\n");
+ fprintf(fp, "*\n");
continue;
}
} else {
@@ -241,20 +250,16 @@ void do_block_dump(int argc, char *argv[])
continue;
suppress = -1;
}
- printf("%04o ", i);
+ fprintf(fp, "%04o ", (unsigned int)i);
for (j = 0; j < 16; j++) {
- printf("%02x", buf[i+j]);
+ fprintf(fp, "%02x", buf[i+j]);
if ((j % 2) == 1)
- putchar(' ');
+ fprintf(fp, " ");
}
- putchar(' ');
+ fprintf(fp, " ");
for (j = 0; j < 16; j++)
- printf("%c", isprint(buf[i+j]) ? buf[i+j] : '.');
- putchar('\n');
+ fprintf(fp, "%c", isprint(buf[i+j]) ? buf[i+j] : '.');
+ fprintf(fp, "\n");
}
- putchar('\n');

2014-07-18 22:52:48

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 04/24] debugfs: allow bmap to allocate blocks

Allow set_inode_field's bmap command in debugfs to allocate blocks,
which enables us to allocate blocks for indirect blocks and internal
extent tree blocks. True, we could do this manually, but seems like
unnecessary bookkeeping activity for humans.

Signed-off-by: Darrick J. Wong <[email protected]>
---
debugfs/set_fields.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)


diff --git a/debugfs/set_fields.c b/debugfs/set_fields.c
index b4ca546..c5f865e 100644
--- a/debugfs/set_fields.c
+++ b/debugfs/set_fields.c
@@ -556,8 +556,9 @@ static errcode_t parse_bmap(struct field_set_info *info,
}

retval = ext2fs_bmap2(current_fs, set_ino,
- (struct ext2_inode *) &set_inode,
- NULL, BMAP_SET, array_idx, NULL, &blk);
+ (struct ext2_inode *) &set_inode,
+ NULL, BMAP_ALLOC | BMAP_SET, array_idx, NULL,
+ &blk);
if (retval) {
com_err("set_inode", retval, "while setting block map");
}


2014-07-18 22:52:54

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 05/24] e2fsck: report correct inode number in pass1b

If there's a problem with the inode scan during pass 1b, report the
inode that we were trying to examine when the error happened, not the
inode that just went through the checker.

Signed-off-by: Darrick J. Wong <[email protected]>
---
e2fsck/pass1b.c | 1 +
1 file changed, 1 insertion(+)


diff --git a/e2fsck/pass1b.c b/e2fsck/pass1b.c
index d7c5e55..b4041d1 100644
--- a/e2fsck/pass1b.c
+++ b/e2fsck/pass1b.c
@@ -301,6 +301,7 @@ static void pass1b(e2fsck_t ctx, char *block_buf)
if (pctx.errcode == EXT2_ET_BAD_BLOCK_IN_INODE_TABLE)
continue;
if (pctx.errcode) {
+ pctx.ino = ino;
fix_problem(ctx, PR_1B_ISCAN_ERROR, &pctx);
ctx->flags |= E2F_FLAG_ABORT;
return;


2014-07-18 22:53:03

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 06/24] e2fsck: don't offer to recreate the journal if fsck is aborting due to bad block bitmaps

If e2fsck knows the bitmaps are bad at the exit (probably because they
were bad at the start and have not been fixed), don't offer to
recreate the journal because doing so causes e2fsck to abort a second
time.

Signed-off-by: Darrick J. Wong <[email protected]>
---
e2fsck/unix.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)


diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index 7055a92..bb5141f 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -1642,7 +1642,8 @@ print_unsupp_features:
run_result = e2fsck_run(ctx);
e2fsck_clear_progbar(ctx);

- if (ctx->flags & E2F_FLAG_JOURNAL_INODE) {
+ if (!ctx->invalid_bitmaps &&
+ (ctx->flags & E2F_FLAG_JOURNAL_INODE)) {
if (fix_problem(ctx, PR_6_RECREATE_JOURNAL, &pctx)) {
if (journal_size < 1024)
journal_size = ext2fs_default_journal_size(ext2fs_blocks_count(fs->super));


2014-07-18 22:53:10

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 07/24] e2fsck: skip clearing bad extents if bitmaps are unreadable

If the bitmaps are known to be unreadable, don't bother clearing them;
just mark fsck to restart itself after pass 5, by which time the
bitmaps should be fixed.

Signed-off-by: Darrick J. Wong <[email protected]>
---
e2fsck/pass1.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)


diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index ba60029..5c628a3 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -1980,6 +1980,26 @@ report_problem:
pctx->blkcount = extent.e_lblk + extent.e_len;
if (fix_problem(ctx, problem, pctx)) {
fix_problem_now:
+ if (ctx->invalid_bitmaps) {
+ /*
+ * If fsck knows the bitmaps are bad,
+ * skip to the next extent and
+ * try to clear this extent again
+ * after fixing the bitmaps, by
+ * restarting fsck.
+ */
+ pctx->errcode = ext2fs_extent_get(
+ ehandle,
+ EXT2_EXTENT_NEXT_SIB,
+ &extent);
+ ctx->flags |= E2F_FLAG_RESTART_LATER;
+ if (pctx->errcode ==
+ EXT2_ET_NO_CURRENT_NODE) {
+ pctx->errcode = 0;
+ break;
+ }
+ continue;
+ }
e2fsck_read_bitmaps(ctx);
pctx->errcode =
ext2fs_extent_delete(ehandle, 0);


2014-07-18 22:53:18

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 08/24] e2fsck: fix inode coherency issue when iterating an inode's blocks

When we're about to iterate the blocks of a block-map file, we need to
write the inode out to disk if it's dirty because block_iterate3()
will re-read the inode from disk. (In practice this won't happen
because nothing dirties block-mapped inodes before the iterate call,
but we can program defensively).

More importantly, we need to re-read the inode after the iterate()
operation because it's possible that mappings were changed (or erased)
during the iteration. If we then dirty or clear the inode, we'll
mistakenly write the old inode values back out to disk!

Signed-off-by: Darrick J. Wong <[email protected]>
---
e2fsck/pass1.c | 31 +++++++++++++++++++++++++++++--
1 file changed, 29 insertions(+), 2 deletions(-)


diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 5c628a3..b696d02 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -80,7 +80,8 @@ static void adjust_extattr_refcount(e2fsck_t ctx, ext2_refcount_t refcount,
struct process_block_struct {
ext2_ino_t ino;
unsigned is_dir:1, is_reg:1, clear:1, suppress:1,
- fragmented:1, compressed:1, bbcheck:1;
+ fragmented:1, compressed:1, bbcheck:1,
+ inode_modified:1;
blk64_t num_blocks;
blk64_t max_blocks;
e2_blkcnt_t last_block;
@@ -1968,8 +1969,10 @@ static void scan_extent_node(e2fsck_t ctx, struct problem_context *pctx,
pctx->num = extent.e_len;
problem = 0;
if (fix_problem(ctx, PR_1_EXTENT_ONLY_CSUM_INVALID,
- pctx))
+ pctx)) {
+ pb->inode_modified = 1;
ext2fs_extent_replace(ehandle, 0, &extent);
+ }
}

if (problem) {
@@ -2001,6 +2004,7 @@ fix_problem_now:
continue;
}
e2fsck_read_bitmaps(ctx);
+ pb->inode_modified = 1;
pctx->errcode =
ext2fs_extent_delete(ehandle, 0);
if (pctx->errcode) {
@@ -2046,6 +2050,7 @@ fix_problem_now:
pctx->num = e_info.curr_level - 1;
problem = PR_1_EXTENT_INDEX_START_INVALID;
if (fix_problem(ctx, problem, pctx)) {
+ pb->inode_modified = 1;
pctx->errcode =
ext2fs_extent_fix_parents(ehandle);
if (pctx->errcode)
@@ -2243,6 +2248,7 @@ static void check_blocks(e2fsck_t ctx, struct problem_context *pctx,
pb.inode = inode;
pb.pctx = pctx;
pb.ctx = ctx;
+ pb.inode_modified = 0;
pctx->ino = ino;
pctx->errcode = 0;

@@ -2274,6 +2280,16 @@ static void check_blocks(e2fsck_t ctx, struct problem_context *pctx,
if (extent_fs && (inode->i_flags & EXT4_EXTENTS_FL))
check_blocks_extents(ctx, pctx, &pb);
else {
+ /*
+ * If we've modified the inode, write it out before
+ * iterate() tries to use it.
+ */
+ if (dirty_inode) {
+ e2fsck_write_inode(ctx, ino, inode,
+ "check_blocks");
+ dirty_inode = 0;
+ }
+ fs->flags |= EXT2_FLAG_IGNORE_CSUM_ERRORS;
pctx->errcode = ext2fs_block_iterate3(fs, ino,
pb.is_dir ? BLOCK_FLAG_HOLE : 0,
block_buf, process_block, &pb);
@@ -2282,6 +2298,16 @@ static void check_blocks(e2fsck_t ctx, struct problem_context *pctx,
* files.
*/
pb.last_init_lblock = pb.last_block;
+ /*
+ * If iterate() changed a block mapping, we have to
+ * re-read the inode. If we decide to clear the
+ * inode after clearing some stuff, we'll re-write the
+ * bad mappings into the inode!
+ */
+ if (pb.inode_modified)
+ e2fsck_read_inode(ctx, ino, inode,
+ "check_blocks");
+ fs->flags &= ~EXT2_FLAG_IGNORE_CSUM_ERRORS;
}
} else {
/* check inline data */
@@ -2581,6 +2607,7 @@ static int process_block(ext2_filsys fs,
if (fix_problem(ctx, problem, pctx)) {
blk = *block_nr = 0;
ret_code = BLOCK_CHANGED;
+ p->inode_modified = 1;
goto mark_dir;
} else
return 0;


2014-07-18 22:53:27

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 09/24] e2fsck: clear i_block if there are too many bad block mappings

If there are too many bad block mappings in a file and the user says
to zap it, erase i_block before clearing the inode.

Signed-off-by: Darrick J. Wong <[email protected]>
---
e2fsck/pass1.c | 1 +
1 file changed, 1 insertion(+)


diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index b696d02..18980f1 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -2329,6 +2329,7 @@ static void check_blocks(e2fsck_t ctx, struct problem_context *pctx,
}

if (pb.clear) {
+ memset(inode->i_block, 0, sizeof(inode->i_block));
e2fsck_clear_inode(ctx, ino, inode, E2F_FLAG_RESTART,
"check_blocks");
return;


2014-07-18 22:53:33

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 10/24] e2fsck: don't clobber critical metadata during check_blocks

If we encounter an inode with IND/DIND/TIND blocks or internal extent
tree blocks that point into critical FS metadata such as the
superblock, the group descriptors, the bitmaps, or the inode table,
it's quite possible that the validation code for those blocks is not
going to like what it finds, and it'll ask to try to fix the block.
Unfortunately, this happens before duplicate block processing (pass
1b), which means that we can end up doing stupid things like writing
extent blocks into the inode table, which multiplies e2fsck'
destructive effect and can render a filesystem unfixable.

To solve this, create a bitmap of all the critical FS metadata. If
before pass1b runs (basically check_blocks) we find a metadata block
that points into these critical regions, continue processing that
block, but avoid making any modifications, because we could be
misinterpreting inodes as block maps. Pass 1b will find the
multiply-owned blocks and fix that situation, which means that we can
then restart e2fsck from the beginning and actually fix whatever
problems we find.

Signed-off-by: Darrick J. Wong <[email protected]>
---
e2fsck/e2fsck.c | 4 +
e2fsck/e2fsck.h | 1
e2fsck/pass1.c | 86 ++++++++++++++++++++++++++++----
e2fsck/pass1b.c | 2 -
e2fsck/pass5.c | 2 +
e2fsck/problem.c | 8 +++
e2fsck/problem.h | 3 +
tests/f_itable_collision/expect.1 | 101 +++++++++++++++++++++++++++++++++++++
tests/f_itable_collision/expect.2 | 7 +++
tests/f_itable_collision/image.gz | Bin
tests/f_itable_collision/name | 1
tests/f_itable_collision/script | 38 ++++++++++++++
12 files changed, 242 insertions(+), 11 deletions(-)
create mode 100644 tests/f_itable_collision/expect.1
create mode 100644 tests/f_itable_collision/expect.2
create mode 100644 tests/f_itable_collision/image.gz
create mode 100644 tests/f_itable_collision/name
create mode 100755 tests/f_itable_collision/script


diff --git a/e2fsck/e2fsck.c b/e2fsck/e2fsck.c
index 0ec1540..fcda7d7 100644
--- a/e2fsck/e2fsck.c
+++ b/e2fsck/e2fsck.c
@@ -108,6 +108,10 @@ errcode_t e2fsck_reset_context(e2fsck_t ctx)
ext2fs_free_block_bitmap(ctx->block_ea_map);
ctx->block_ea_map = 0;
}
+ if (ctx->block_metadata_map) {
+ ext2fs_free_block_bitmap(ctx->block_metadata_map);
+ ctx->block_metadata_map = 0;
+ }
if (ctx->inode_bb_map) {
ext2fs_free_inode_bitmap(ctx->inode_bb_map);
ctx->inode_bb_map = 0;
diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
index 80ab18a..53f10b1 100644
--- a/e2fsck/e2fsck.h
+++ b/e2fsck/e2fsck.h
@@ -374,6 +374,7 @@ struct e2fsck_struct {
* e2fsck functions themselves.
*/
void *priv_data;
+ ext2fs_block_bitmap block_metadata_map; /* Metadata blocks */
};

/* Used by the region allocation code */
diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 18980f1..37b05ff 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -93,6 +93,7 @@ struct process_block_struct {
struct problem_context *pctx;
ext2fs_block_bitmap fs_meta_blocks;
e2fsck_t ctx;
+ blk64_t bad_ref;
};

struct process_inode_block {
@@ -696,6 +697,15 @@ void e2fsck_pass1(e2fsck_t ctx)
ctx->flags |= E2F_FLAG_ABORT;
return;
}
+ pctx.errcode = e2fsck_allocate_block_bitmap(fs,
+ _("metadata block map"), EXT2FS_BMAP64_RBTREE,
+ "block_metadata_map", &ctx->block_metadata_map);
+ if (pctx.errcode) {
+ pctx.num = 1;
+ fix_problem(ctx, PR_1_ALLOCATE_BBITMAP_ERROR, &pctx);
+ ctx->flags |= E2F_FLAG_ABORT;
+ return;
+ }
e2fsck_setup_tdb_icount(ctx, 0, &ctx->inode_link_info);
if (!ctx->inode_link_info) {
e2fsck_set_bitmap_type(fs, EXT2FS_BMAP64_RBTREE,
@@ -1904,7 +1914,8 @@ static void scan_extent_node(e2fsck_t ctx, struct problem_context *pctx,
struct process_block_struct *pb,
blk64_t start_block, blk64_t end_block,
blk64_t eof_block,
- ext2_extent_handle_t ehandle)
+ ext2_extent_handle_t ehandle,
+ int try_repairs)
{
struct ext2fs_extent extent;
blk64_t blk, last_lblk;
@@ -1931,7 +1942,8 @@ static void scan_extent_node(e2fsck_t ctx, struct problem_context *pctx,

problem = 0;
/* Ask to clear a corrupt extent block */
- if (pctx->errcode == EXT2_ET_EXTENT_CSUM_INVALID) {
+ if (try_repairs &&
+ pctx->errcode == EXT2_ET_EXTENT_CSUM_INVALID) {
pctx->blk = extent.e_pblk;
pctx->blk2 = extent.e_lblk;
pctx->num = extent.e_len;
@@ -1963,7 +1975,7 @@ static void scan_extent_node(e2fsck_t ctx, struct problem_context *pctx,
problem = PR_1_TOOBIG_DIR;

/* Corrupt but passes checks? Ask to fix checksum. */
- if (failed_csum) {
+ if (try_repairs && failed_csum) {
pctx->blk = extent.e_pblk;
pctx->blk2 = extent.e_lblk;
pctx->num = extent.e_len;
@@ -1975,7 +1987,7 @@ static void scan_extent_node(e2fsck_t ctx, struct problem_context *pctx,
}
}

- if (problem) {
+ if (try_repairs && problem) {
report_problem:
pctx->blk = extent.e_pblk;
pctx->blk2 = extent.e_lblk;
@@ -2026,13 +2038,32 @@ fix_problem_now:

if (!is_leaf) {
blk64_t lblk = extent.e_lblk;
+ int next_try_repairs = 1;

blk = extent.e_pblk;
+
+ /*
+ * If this lower extent block collides with critical
+ * metadata, don't try to repair the damage. Pass 1b
+ * will reallocate the block; then we can try again.
+ */
+ if (pb->ino != EXT2_RESIZE_INO &&
+ ext2fs_test_block_bitmap2(ctx->block_metadata_map,
+ extent.e_pblk)) {
+ next_try_repairs = 0;
+ pctx->blk = blk;
+ fix_problem(ctx,
+ PR_1_CRITICAL_METADATA_COLLISION,
+ pctx);
+ ctx->flags |= E2F_FLAG_RESTART_LATER;
+ }
pctx->errcode = ext2fs_extent_get(ehandle,
EXT2_EXTENT_DOWN, &extent);
if (pctx->errcode) {
pctx->str = "EXT2_EXTENT_DOWN";
problem = PR_1_EXTENT_HEADER_INVALID;
+ if (!next_try_repairs)
+ return;
if (pctx->errcode ==
EXT2_ET_EXTENT_HEADER_BAD ||
pctx->errcode ==
@@ -2058,7 +2089,8 @@ fix_problem_now:
}
}
scan_extent_node(ctx, pctx, pb, extent.e_lblk,
- last_lblk, eof_block, ehandle);
+ last_lblk, eof_block, ehandle,
+ next_try_repairs);
if (pctx->errcode)
return;
pctx->errcode = ext2fs_extent_get(ehandle,
@@ -2181,7 +2213,7 @@ static void check_blocks_extents(e2fsck_t ctx, struct problem_context *pctx,

eof_lblk = ((EXT2_I_SIZE(inode) + fs->blocksize - 1) >>
EXT2_BLOCK_SIZE_BITS(fs->super)) - 1;
- scan_extent_node(ctx, pctx, pb, 0, 0, eof_lblk, ehandle);
+ scan_extent_node(ctx, pctx, pb, 0, 0, eof_lblk, ehandle, 1);
if (pctx->errcode &&
fix_problem(ctx, PR_1_EXTENT_ITERATE_FAILURE, pctx)) {
pb->num_blocks = 0;
@@ -2249,6 +2281,7 @@ static void check_blocks(e2fsck_t ctx, struct problem_context *pctx,
pb.pctx = pctx;
pb.ctx = ctx;
pb.inode_modified = 0;
+ pb.bad_ref = 0;
pctx->ino = ino;
pctx->errcode = 0;

@@ -2590,8 +2623,35 @@ static int process_block(ext2_filsys fs,
blk >= ext2fs_blocks_count(fs->super))
problem = PR_1_ILLEGAL_BLOCK_NUM;

+ /*
+ * If this IND/DIND/TIND block is squatting atop some critical metadata
+ * (group descriptors, superblock, bitmap, inode table), any write to
+ * "fix" mapping problems will destroy the metadata. We'll let pass 1b
+ * fix that and restart fsck.
+ */
+ if (blockcnt < 0 &&
+ p->ino != EXT2_RESIZE_INO &&
+ ext2fs_test_block_bitmap2(ctx->block_metadata_map, blk)) {
+ p->bad_ref = blk;
+ pctx->blk = blk;
+ fix_problem(ctx, PR_1_CRITICAL_METADATA_COLLISION, pctx);
+ ctx->flags |= E2F_FLAG_RESTART_LATER;
+ }
+
if (problem) {
p->num_illegal_blocks++;
+ /*
+ * A bit of subterfuge here -- we're trying to fix a block
+ * mapping, but know that the IND/DIND/TIND block has collided
+ * with some critical metadata. So, fix the in-core mapping so
+ * iterate won't go insane, but return 0 instead of
+ * BLOCK_CHANGED so that it won't write the remapping out to
+ * our multiply linked block.
+ */
+ if (p->bad_ref && ref_block == p->bad_ref) {
+ *block_nr = 0;
+ return 0;
+ }
if (!p->suppress && (p->num_illegal_blocks % 12) == 0) {
if (fix_problem(ctx, PR_1_TOO_MANY_BAD_BLOCKS, pctx)) {
p->clear = 1;
@@ -2972,6 +3032,7 @@ static void mark_table_blocks(e2fsck_t ctx)
pctx.group = i;

ext2fs_reserve_super_and_bgd(fs, i, ctx->block_found_map);
+ ext2fs_reserve_super_and_bgd(fs, i, ctx->block_metadata_map);

/*
* Mark the blocks used for the inode table
@@ -2990,8 +3051,10 @@ static void mark_table_blocks(e2fsck_t ctx)
ctx->invalid_bitmaps++;
}
} else {
- ext2fs_mark_block_bitmap2(ctx->block_found_map,
- b);
+ ext2fs_mark_block_bitmap2(
+ ctx->block_found_map, b);
+ ext2fs_mark_block_bitmap2(
+ ctx->block_metadata_map, b);
}
}
}
@@ -3010,8 +3073,9 @@ static void mark_table_blocks(e2fsck_t ctx)
} else {
ext2fs_mark_block_bitmap2(ctx->block_found_map,
ext2fs_block_bitmap_loc(fs, i));
- }
-
+ ext2fs_mark_block_bitmap2(ctx->block_metadata_map,
+ ext2fs_block_bitmap_loc(fs, i));
+ }
}
/*
* Mark block used for the inode bitmap
@@ -3025,6 +3089,8 @@ static void mark_table_blocks(e2fsck_t ctx)
ctx->invalid_bitmaps++;
}
} else {
+ ext2fs_mark_block_bitmap2(ctx->block_metadata_map,
+ ext2fs_inode_bitmap_loc(fs, i));
ext2fs_mark_block_bitmap2(ctx->block_found_map,
ext2fs_inode_bitmap_loc(fs, i));
}
diff --git a/e2fsck/pass1b.c b/e2fsck/pass1b.c
index b4041d1..c2a3cf3 100644
--- a/e2fsck/pass1b.c
+++ b/e2fsck/pass1b.c
@@ -389,7 +389,7 @@ static int process_pass1b_block(ext2_filsys fs EXT2FS_ATTR((unused)),
p->dup_blocks++;
ext2fs_mark_inode_bitmap2(inode_dup_map, p->ino);

- if (lc != p->cur_cluster)
+ if (blockcnt < 0 || lc != p->cur_cluster)
add_dupe(ctx, p->ino, EXT2FS_B2C(fs, *block_nr), p->inode);

finish:
diff --git a/e2fsck/pass5.c b/e2fsck/pass5.c
index e23680a..eeb16c3 100644
--- a/e2fsck/pass5.c
+++ b/e2fsck/pass5.c
@@ -75,6 +75,8 @@ void e2fsck_pass5(e2fsck_t ctx)
ctx->inode_dir_map = 0;
ext2fs_free_block_bitmap(ctx->block_found_map);
ctx->block_found_map = 0;
+ ext2fs_free_block_bitmap(ctx->block_metadata_map);
+ ctx->block_metadata_map = 0;

print_resource_track(ctx, _("Pass 5"), &rtrack, ctx->fs->io);
}
diff --git a/e2fsck/problem.c b/e2fsck/problem.c
index cb7c700..18d8025 100644
--- a/e2fsck/problem.c
+++ b/e2fsck/problem.c
@@ -1030,6 +1030,14 @@ static struct e2fsck_problem problem_table[] = {
N_("@i %i has INLINE_DATA_FL flag on @f without inline data support.\n"),
PROMPT_CLEAR, 0 },

+ /*
+ * Inode block conflicts with critical metadata, skipping
+ * block checks
+ */
+ { PR_1_CRITICAL_METADATA_COLLISION,
+ N_("@i %i block %b conflicts with critical metadata, skipping block checks.\n"),
+ PROMPT_NONE, 0 },
+
/* Pass 1b errors */

/* Pass 1B: Rescan for duplicate/bad blocks */
diff --git a/e2fsck/problem.h b/e2fsck/problem.h
index bc9fa9c..9001ef4 100644
--- a/e2fsck/problem.h
+++ b/e2fsck/problem.h
@@ -600,6 +600,9 @@ struct problem_context {
/* INLINE_DATA feature is set in a non-inline-data filesystem */
#define PR_1_INLINE_DATA_SET 0x010070

+/* file metadata collides with critical metadata */
+#define PR_1_CRITICAL_METADATA_COLLISION 0x010071
+
/*
* Pass 1b errors
*/
diff --git a/tests/f_itable_collision/expect.1 b/tests/f_itable_collision/expect.1
new file mode 100644
index 0000000..00cdced
--- /dev/null
+++ b/tests/f_itable_collision/expect.1
@@ -0,0 +1,101 @@
+Pass 1: Checking inodes, blocks, and sizes
+Inode 12 block 37 conflicts with critical metadata, skipping block checks.
+Illegal block number passed to ext2fs_test_block_bitmap #268435455 for in-use block map
+Illegal block number passed to ext2fs_mark_block_bitmap #268435455 for in-use block map
+Inode 12, i_blocks is 48, should be 56. Fix? yes
+
+Inode 13 has a bad extended attribute block 34. Clear? yes
+
+Deleted inode 33 has zero dtime. Fix? yes
+
+Inodes that were part of a corrupted orphan linked list found. Fix? yes
+
+Inode 49 was part of the orphaned inode list. FIXED.
+Inode 14 block 36 conflicts with critical metadata, skipping block checks.
+Illegal block number passed to ext2fs_test_block_bitmap #4294967295 for metadata block map
+Inode 14 has illegal block(s). Clear? yes
+
+Illegal indirect block (4294967295) in inode 14. CLEARED.
+Illegal block number passed to ext2fs_test_block_bitmap #4294967295 for metadata block map
+Illegal indirect block (4294967295) in inode 14. CLEARED.
+Illegal block number passed to ext2fs_test_block_bitmap #4294967295 for metadata block map
+Illegal indirect block (4294967295) in inode 14. CLEARED.
+Illegal block number passed to ext2fs_test_block_bitmap #4294967295 for metadata block map
+Illegal indirect block (4294967295) in inode 14. CLEARED.
+Illegal block number passed to ext2fs_test_block_bitmap #4294967295 for metadata block map
+Illegal indirect block (4294967295) in inode 14. CLEARED.
+Illegal block number passed to ext2fs_test_block_bitmap #4294967295 for metadata block map
+Illegal indirect block (4294967295) in inode 14. CLEARED.
+Illegal block number passed to ext2fs_test_block_bitmap #4294967295 for metadata block map
+Illegal indirect block (4294967295) in inode 14. CLEARED.
+Illegal block number passed to ext2fs_test_block_bitmap #4294967295 for metadata block map
+Too many illegal blocks in inode 14.
+Clear inode? yes
+
+Restarting e2fsck from the beginning...
+Pass 1: Checking inodes, blocks, and sizes
+Inode 12 block 37 conflicts with critical metadata, skipping block checks.
+Illegal block number passed to ext2fs_test_block_bitmap #4294967294 for in-use block map
+Illegal block number passed to ext2fs_mark_block_bitmap #4294967294 for in-use block map
+Illegal block number passed to ext2fs_test_block_bitmap #268435455 for in-use block map
+Illegal block number passed to ext2fs_mark_block_bitmap #268435455 for in-use block map
+
+Running additional passes to resolve blocks claimed by more than one inode...
+Pass 1B: Rescanning for multiply-claimed blocks
+Illegal block number passed to ext2fs_test_block_bitmap #4294967294 for multiply claimed block map
+Illegal block number passed to ext2fs_test_block_bitmap #268435455 for multiply claimed block map
+Multiply-claimed block(s) in inode 12: 37
+Pass 1C: Scanning directories for inodes with multiply-claimed blocks
+Pass 1D: Reconciling multiply-claimed blocks
+(There are 1 inodes containing multiply-claimed blocks.)
+
+File /a (inode #12, mod time Fri Jun 27 18:34:44 2014)
+ has 1 multiply-claimed block(s), shared with 1 file(s):
+ <filesystem metadata>
+Clone multiply-claimed blocks? yes
+
+Illegal block number passed to ext2fs_test_block_bitmap #4294967294 for multiply claimed block map
+Illegal block number passed to ext2fs_test_block_bitmap #268435455 for multiply claimed block map
+Pass 2: Checking directory structure
+Setting filetype for entry 'bad1' in / (2) to 1.
+Setting filetype for entry 'bad2' in / (2) to 1.
+Restarting e2fsck from the beginning...
+Pass 1: Checking inodes, blocks, and sizes
+Inode 12 has an invalid extent
+ (logical block 0, invalid physical block 4294967294, len 1)
+Clear? yes
+
+Inode 12 has an invalid extent
+ (logical block 5, invalid physical block 268435455, len 1)
+Clear? yes
+
+Inode 12, i_blocks is 56, should be 40. Fix? yes
+
+Pass 2: Checking directory structure
+Entry 'b' in / (2) has deleted/unused inode 14. Clear? yes
+
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+Block bitmap differences: -9 -13 -42
+Fix? yes
+
+Free blocks count wrong for group #0 (485, counted=488).
+Fix? yes
+
+Free blocks count wrong (485, counted=488).
+Fix? yes
+
+Inode bitmap differences: -14 +34 +50
+Fix? yes
+
+Free inodes count wrong for group #0 (114, counted=113).
+Fix? yes
+
+Free inodes count wrong (114, counted=113).
+Fix? yes
+
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 15/128 files (6.7% non-contiguous), 24/512 blocks
+Exit status is 0
diff --git a/tests/f_itable_collision/expect.2 b/tests/f_itable_collision/expect.2
new file mode 100644
index 0000000..4abc600
--- /dev/null
+++ b/tests/f_itable_collision/expect.2
@@ -0,0 +1,7 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 15/128 files (6.7% non-contiguous), 24/512 blocks
+Exit status is 0
diff --git a/tests/f_itable_collision/image.gz b/tests/f_itable_collision/image.gz
new file mode 100644
index 0000000000000000000000000000000000000000..6218b1464a9fa9d6a714dfa64e1c67398734a9aa
GIT binary patch
literal 2817
zcmb2|=3tl|xjC4L`R!fbj2KrLh7Xmqdv#~~uT^s5Ul=ikD>KNo>JkU*#{j`?;`+R~
zb{n!b1t@bQd47Ge@KK<kjq5FMKS7l$gAVnN4?<ls`IlB{L<N0N+H>yBK9y{rq%E0t
z_4_~6f4*OvUsnE1b@OL4XCbBy;l9g!HCx=13T?R`))dyS?D)kvx%W}geV?xfxQ(ys
zJgNNpeVPD2KYy)hY;47vH5#|Oq>m?m^sSk&J^AOW7cbB6-(UFX+tTlR_LX0s_t(Y#
zv+C>X^}q9aTiwUoZ<caj_qY4|QN8$lP4(-${+kU1Zh80C{@v{T_J#)g<M3^V{yvpZ
zzrJGntC-H(yZ1`&pNJOr=RJ1z!3Aw)=k>hvq}TKIn_S;HP50`~X>0e(F)&nn&-!O4
z@Hn9MU%lNuJ^gpphF|_HY-eC#NO*qw^R*Ax_w8QE59H;Pzx@9Ht)lkVwqsEZ-vYG0
zelxrF!A?SJ>GkY<9~;A{>f#!Ux2xr6&+SZ2pE$oa)qNuW(N7V!`t6(3+kRi3)wFV5
zRoxkNU7$%DN^ebz{eS;mR?L59Aa}RVc0C})aQ#1!Xn0)@BonMxpV{6P?E7EeYN>md
zk?XIF1tCwSO6a=I`zMyL^MrM&GEh-Yw(Kq-#Zb1F8BBxN$l%@1jZteCi9>}F(oZu2
zjlcUoay~my=s;dND+{t}650P<|Gz0IRs0sTJKs*+{H|im%h@aa#4p~sEcEUA>A#8b
zcfy`+4#-<I{c8V?YWCzWbF8Zt%Wa>xf5Y`(@;9s|i$6}wOKAOQuYGL)|Mj^y-{wsH
znE&(lgw0H|ll9KlvxUw2Kh??K@IR;anfg<%IY<7aYuEo<e(A^epf^oF_NaG7NUZ(U
z^)0>oZM@V$KJ(tp$cx@fmDY<CAD*)P_>Tu0<?Ulu|Gis#Jbs<~`QP)GUY~gCY)-vZ
z|K<NP{-<?+zMnJoXa04o`Hvs2erzurb^edJTh+giyjMT}E4Hjp-d|pGv!hk4Z2iZ5
zU-RX93+pG=e}DePI(zQ_<L>p6o`&UBuZpT<SDF|B1CwEc;3u2=m*4KMEZk~4GoKGA
zQr3IwPwm5yZ{JtnQ%=l`4US^od~4UTqKW0lGIT%LoZnp*^g9+D>1e^dLGoDE)~+}-
l(NXSb2#kinXb6mkz-S1Jh5#ucaHH}a!^Um<yBQc17y!{Atk?hm

literal 0
HcmV?d00001

diff --git a/tests/f_itable_collision/name b/tests/f_itable_collision/name
new file mode 100644
index 0000000..dab3d45
--- /dev/null
+++ b/tests/f_itable_collision/name
@@ -0,0 +1 @@
+collision between IND/extent tree blocks and inode table
diff --git a/tests/f_itable_collision/script b/tests/f_itable_collision/script
new file mode 100755
index 0000000..7eedfd3
--- /dev/null
+++ b/tests/f_itable_collision/script
@@ -0,0 +1,38 @@
+#!/bin/bash
+
+# Run this test with a specific time, because we're crosslinking an extent tree
+# block with the inode table. When fsck sets dtime to now, we want "now" to be
+# our preprogrammed value.
+
+FSCK_OPT=-fy
+IMAGE=$test_dir/image.gz
+E2FSCK_TIME=4294967294
+export E2FSCK_TIME
+
+gzip -d < $IMAGE > $TMPFILE
+e2label $TMPFILE test_filesys
+
+# Run fsck to fix things?
+EXP1=$test_dir/expect.1
+OUT1=$test_name.1.log
+rm -rf $test_name.failed $test_name.ok
+
+$FSCK $FSCK_OPT $TMPFILE 2>&1 | tail -n +2 > $OUT1
+echo "Exit status is $?" >> $OUT1
+
+# Run a second time
+EXP2=$test_dir/expect.2
+OUT2=$test_name.2.log
+
+$FSCK $FSCK_OPT $TMPFILE 2>&1 | tail -n +2 > $OUT2
+echo "Exit status is $?" >> $OUT2
+
+# Figure out what happened
+if cmp -s $EXP1 $OUT1 && cmp -s $EXP2 $OUT2; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ echo "$test_name: $test_description: failed"
+ diff -u $EXP1 $OUT1 >> $test_name.failed
+ diff -u $EXP2 $OUT2 >> $test_name.failed
+fi


2014-07-18 22:53:40

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 11/24] e2fsck: free ctx->fs, not fs, at the end of fsck

When we call ext2fs_close_free at the end of main(), we need to supply
the address of ctx->fs, because the subsequent e2fsck_free_context
call will try to access ctx->fs (which is now set to a freed block) to
see if it should free the directory block list. This is clearly not
desirable, so fix the problem.

Signed-off-by: Darrick J. Wong <[email protected]>
---
e2fsck/unix.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index bb5141f..d883c9e 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -1780,7 +1780,7 @@ no_journal:
io_channel_flush(ctx->fs->io);
print_resource_track(ctx, NULL, &ctx->global_rtrack, ctx->fs->io);

- ext2fs_close_free(&fs);
+ ext2fs_close_free(&ctx->fs);
free(ctx->journal_name);

e2fsck_free_context(ctx);


2014-07-18 22:53:47

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 12/24] e2fsck: force all block allocations to use block_found_map

During the later passes of efsck, we sometimes need to allocate and
map blocks into a file. This can happen either by fsck directly
calling new_block() or indirectly by the library calling new_block
because it needs to allocate a block for lower level metadata (bmap2()
with BMAP_SET; block_iterate3() with BLOCK_CHANGED).

We need to force new_block to allocate blocks from the found block
map, because the FS block map could be inaccurate for various reasons:
the map is wrong, there are missing blocks, the checksum failed, etc.

Therefore, any time fsck does something that could to allocate blocks,
we need to intercept allocation requests so that they're sourced from
the found block map. Remove the previous code that swapped bitmap
pointers as this is now unneeded.

Signed-off-by: Darrick J. Wong <[email protected]>
---
e2fsck/e2fsck.h | 1 +
e2fsck/pass1.c | 18 ++++++++----------
e2fsck/pass1b.c | 1 -
e2fsck/pass2.c | 3 ---
e2fsck/pass3.c | 37 -------------------------------------
e2fsck/rehash.c | 2 --
6 files changed, 9 insertions(+), 53 deletions(-)


diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
index 53f10b1..d6d0ba9 100644
--- a/e2fsck/e2fsck.h
+++ b/e2fsck/e2fsck.h
@@ -473,6 +473,7 @@ extern int e2fsck_pass1_check_symlink(ext2_filsys fs, ext2_ino_t ino,
extern void e2fsck_clear_inode(e2fsck_t ctx, ext2_ino_t ino,
struct ext2_inode *inode, int restart_flag,
const char *source);
+extern void e2fsck_intercept_block_allocations(e2fsck_t ctx);

/* pass2.c */
extern int e2fsck_process_bad_inode(e2fsck_t ctx, ext2_ino_t dir,
diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 37b05ff..5ad7fe5 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -764,6 +764,7 @@ void e2fsck_pass1(e2fsck_t ctx)
"block interate buffer");
if (EXT2_INODE_SIZE(fs->super) == EXT2_GOOD_OLD_INODE_SIZE)
e2fsck_use_inode_shortcuts(ctx, 1);
+ e2fsck_intercept_block_allocations(ctx);
old_op = ehandler_operation(_("opening inode scan"));
pctx.errcode = ext2fs_open_inode_scan(fs, ctx->inode_buffer_blocks,
&scan);
@@ -1306,10 +1307,6 @@ void e2fsck_pass1(e2fsck_t ctx)
}

if (ctx->flags & E2F_FLAG_RESIZE_INODE) {
- ext2fs_block_bitmap save_bmap;
-
- save_bmap = fs->block_map;
- fs->block_map = ctx->block_found_map;
clear_problem_context(&pctx);
pctx.errcode = ext2fs_create_resize_inode(fs);
if (pctx.errcode) {
@@ -1327,7 +1324,6 @@ void e2fsck_pass1(e2fsck_t ctx)
e2fsck_write_inode(ctx, EXT2_RESIZE_INO, inode,
"recreate inode");
}
- fs->block_map = save_bmap;
ctx->flags &= ~E2F_FLAG_RESIZE_INODE;
}

@@ -3206,11 +3202,6 @@ void e2fsck_use_inode_shortcuts(e2fsck_t ctx, int use_shortcuts)
fs->read_inode = pass1_read_inode;
fs->write_inode = pass1_write_inode;
ctx->stashed_ino = 0;
- ext2fs_set_alloc_block_callback(fs, e2fsck_get_alloc_block,
- 0);
- ext2fs_set_block_alloc_stats_callback(fs,
- e2fsck_block_alloc_stats,
- 0);
} else {
fs->get_blocks = 0;
fs->check_directory = 0;
@@ -3218,3 +3209,10 @@ void e2fsck_use_inode_shortcuts(e2fsck_t ctx, int use_shortcuts)
fs->write_inode = 0;
}
}
+
+void e2fsck_intercept_block_allocations(e2fsck_t ctx)
+{
+ ext2fs_set_alloc_block_callback(ctx->fs, e2fsck_get_alloc_block, 0);
+ ext2fs_set_block_alloc_stats_callback(ctx->fs,
+ e2fsck_block_alloc_stats, 0);
+}
diff --git a/e2fsck/pass1b.c b/e2fsck/pass1b.c
index c2a3cf3..8d42d10 100644
--- a/e2fsck/pass1b.c
+++ b/e2fsck/pass1b.c
@@ -630,7 +630,6 @@ static int delete_file_block(ext2_filsys fs,
_("internal error: can't find dup_blk for %llu\n"),
*block_nr);
} else {
- ext2fs_unmark_block_bitmap2(ctx->block_found_map, *block_nr);
ext2fs_block_alloc_stats2(fs, *block_nr, -1);
pb->dup_blocks++;
}
diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
index 5488c73..0ef9637 100644
--- a/e2fsck/pass2.c
+++ b/e2fsck/pass2.c
@@ -1336,7 +1336,6 @@ static int deallocate_inode_block(ext2_filsys fs,
if ((*block_nr < fs->super->s_first_data_block) ||
(*block_nr >= ext2fs_blocks_count(fs->super)))
return 0;
- ext2fs_unmark_block_bitmap2(p->ctx->block_found_map, *block_nr);
ext2fs_block_alloc_stats2(fs, *block_nr, -1);
p->num++;
return 0;
@@ -1379,8 +1378,6 @@ static void deallocate_inode(e2fsck_t ctx, ext2_ino_t ino, char* block_buf)
return;
}
if (count == 0) {
- ext2fs_unmark_block_bitmap2(ctx->block_found_map,
- ext2fs_file_acl_block(fs, &inode));
ext2fs_block_alloc_stats2(fs,
ext2fs_file_acl_block(fs, &inode), -1);
}
diff --git a/e2fsck/pass3.c b/e2fsck/pass3.c
index 6f7f855..4fc390a 100644
--- a/e2fsck/pass3.c
+++ b/e2fsck/pass3.c
@@ -779,27 +779,6 @@ static int expand_dir_proc(ext2_filsys fs,
return BLOCK_CHANGED;
}

-/*
- * Ensure that all blocks are marked in the block_found_map, since it's
- * possible that the library allocated an extent node block or a block map
- * block during the directory rebuilding; these new allocations are not
- * captured in block_found_map. This is bad since we could later use
- * block_found_map to allocate more blocks.
- */
-static int find_new_blocks_proc(ext2_filsys fs,
- blk64_t *blocknr,
- e2_blkcnt_t blockcnt,
- blk64_t ref_block EXT2FS_ATTR((unused)),
- int ref_offset EXT2FS_ATTR((unused)),
- void *priv_data)
-{
- struct expand_dir_struct *es = (struct expand_dir_struct *) priv_data;
- e2fsck_t ctx = es->ctx;
-
- ext2fs_mark_block_bitmap2(ctx->block_found_map, *blocknr);
- return 0;
-}
-
errcode_t e2fsck_expand_directory(e2fsck_t ctx, ext2_ino_t dir,
int num, int guaranteed_size)
{
@@ -830,27 +809,11 @@ errcode_t e2fsck_expand_directory(e2fsck_t ctx, ext2_ino_t dir,
es.ctx = ctx;
es.dir = dir;

- before = ext2fs_free_blocks_count(fs->super);
retval = ext2fs_block_iterate3(fs, dir, BLOCK_FLAG_APPEND,
0, expand_dir_proc, &es);

if (es.err)
return es.err;
- after = ext2fs_free_blocks_count(fs->super);
-
- /*
- * If the free block count has dropped by more than the blocks we
- * allocated ourselves, then we must've allocated some extent/map
- * blocks. Therefore, we must iterate this dir's blocks again to
- * ensure that all newly allocated blocks are captured in
- * block_found_map.
- */
- if ((before - after) > es.newblocks) {
- retval = ext2fs_block_iterate3(fs, dir, BLOCK_FLAG_READ_ONLY,
- 0, find_new_blocks_proc, &es);
- if (es.err)
- return es.err;
- }

/*
* Update the size and block count fields in the inode.
diff --git a/e2fsck/rehash.c b/e2fsck/rehash.c
index 3b05715..f75479a 100644
--- a/e2fsck/rehash.c
+++ b/e2fsck/rehash.c
@@ -725,8 +725,6 @@ static int write_dir_block(ext2_filsys fs,
* once.
*/
if (blk % EXT2FS_CLUSTER_RATIO(fs) == 0) {
- ext2fs_unmark_block_bitmap2(wd->ctx->block_found_map,
- blk);
ext2fs_block_alloc_stats2(fs, blk, -1);
wd->cleared++;
}


2014-07-18 22:54:05

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 14/24] libext2fs: fix bounds check of the bitmap test range in get_free_blocks2

In the loop in ext2fs_get_free_blocks2, we ask the bitmap if there's a
range of free blocks starting at "b" and ending at "b + num - 1".
That quantity is the number of the last block in the range. Since
ext2fs_blocks_count() returns the number of blocks and not the number
of the last block in the filesystem, the check is incorrect.

Put in a shortcut to exit the loop if finish > start, because in that
case it's obvious that we don't need to reset to the beginning of the
FS to continue the search for blocks. This is needed to terminate the
loop because the broken test meant that b could get large enough to
equal finish, which would end the while loop.

The attached testcase shows that with the off by one error, it is
possible to throw e2fsck into an infinite loop while it tries to
find space for the inode table even though there's no space for one.

Signed-off-by: Darrick J. Wong <[email protected]>
---
lib/ext2fs/alloc.c | 5 ++++-
tests/f_boundscheck/expect.1 | 25 +++++++++++++++++++++++++
tests/f_boundscheck/expect.2 | 25 +++++++++++++++++++++++++
tests/f_boundscheck/image.bz2 | Bin
tests/f_boundscheck/name | 1 +
tests/f_boundscheck/script | 32 ++++++++++++++++++++++++++++++++
6 files changed, 87 insertions(+), 1 deletion(-)
create mode 100644 tests/f_boundscheck/expect.1
create mode 100644 tests/f_boundscheck/expect.2
create mode 100644 tests/f_boundscheck/image.bz2
create mode 100644 tests/f_boundscheck/name
create mode 100755 tests/f_boundscheck/script


diff --git a/lib/ext2fs/alloc.c b/lib/ext2fs/alloc.c
index b36d288..578fd7f 100644
--- a/lib/ext2fs/alloc.c
+++ b/lib/ext2fs/alloc.c
@@ -255,8 +255,11 @@ errcode_t ext2fs_get_free_blocks2(ext2_filsys fs, blk64_t start, blk64_t finish,
b &= ~(c_ratio - 1);
finish &= ~(c_ratio -1);
do {
- if (b+num-1 > ext2fs_blocks_count(fs->super))
+ if (b + num - 1 >= ext2fs_blocks_count(fs->super)) {
+ if (finish > start)
+ return EXT2_ET_BLOCK_ALLOC_FAIL;
b = fs->super->s_first_data_block;
+ }
if (ext2fs_fast_test_block_bitmap_range2(map, b, num)) {
*ret = b;
return 0;
diff --git a/tests/f_boundscheck/expect.1 b/tests/f_boundscheck/expect.1
new file mode 100644
index 0000000..c2170b8
--- /dev/null
+++ b/tests/f_boundscheck/expect.1
@@ -0,0 +1,25 @@
+ext2fs_check_desc: Corrupt group descriptor: bad block for inode table
+../e2fsck/e2fsck: Group descriptors look bad... trying backup blocks...
+../e2fsck/e2fsck: Bad magic number in super-block while using the backup blocks../e2fsck/e2fsck: going back to original superblock
+Note: if several inode or block bitmap blocks or part
+of the inode table require relocation, you may wish to try
+running e2fsck with the '-b 8193' option first. The problem
+may lie only with the primary block group descriptors, and
+the backup block group descriptors may be OK.
+
+Inode table for group 1 is not in group. (block 4294967295)
+WARNING: SEVERE DATA LOSS POSSIBLE.
+Relocate? yes
+
+One or more block group descriptor checksums are invalid. Fix? yes
+
+Group descriptor 1 checksum is 0x6ea2, should be 0x7edd. FIXED.
+Pass 1: Checking inodes, blocks, and sizes
+Error allocating 256 contiguous block(s) in block group 1 for inode table: Could not allocate block in ext2 filesystem
+e2fsck: aborted
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+
+test_filesys: ********** WARNING: Filesystem still has errors **********
+
+Exit status is 0
diff --git a/tests/f_boundscheck/expect.2 b/tests/f_boundscheck/expect.2
new file mode 100644
index 0000000..c2170b8
--- /dev/null
+++ b/tests/f_boundscheck/expect.2
@@ -0,0 +1,25 @@
+ext2fs_check_desc: Corrupt group descriptor: bad block for inode table
+../e2fsck/e2fsck: Group descriptors look bad... trying backup blocks...
+../e2fsck/e2fsck: Bad magic number in super-block while using the backup blocks../e2fsck/e2fsck: going back to original superblock
+Note: if several inode or block bitmap blocks or part
+of the inode table require relocation, you may wish to try
+running e2fsck with the '-b 8193' option first. The problem
+may lie only with the primary block group descriptors, and
+the backup block group descriptors may be OK.
+
+Inode table for group 1 is not in group. (block 4294967295)
+WARNING: SEVERE DATA LOSS POSSIBLE.
+Relocate? yes
+
+One or more block group descriptor checksums are invalid. Fix? yes
+
+Group descriptor 1 checksum is 0x6ea2, should be 0x7edd. FIXED.
+Pass 1: Checking inodes, blocks, and sizes
+Error allocating 256 contiguous block(s) in block group 1 for inode table: Could not allocate block in ext2 filesystem
+e2fsck: aborted
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+
+test_filesys: ********** WARNING: Filesystem still has errors **********
+
+Exit status is 0
diff --git a/tests/f_boundscheck/image.bz2 b/tests/f_boundscheck/image.bz2
new file mode 100644
index 0000000000000000000000000000000000000000..098d02e9f51694b6c34273eff00d4646487bfe35
GIT binary patch
literal 986
zcmV<0110=IT4*^jL0KkKS!@T;Y5)L7fB*mg|NsC0|NsAxJrKWd-YIU-z`>Bx0gJ_8
z1O$Skj#<zHlgYD)b#`G2sMFMB02%-^Vrip8C#V3?kYvC^CV&UMs)#ZG8ZrhT!UhQ>
znhI)qC#H$&dYP%Fnqo3~jW(lbfCh$u&;T+3&;S`4G|*^fri{@DdZ{1=hJXzLplAR9
zXaL9n0000Q000000u3@Snl!=;LlaB^85&|3m;o|i0GS4i009{=nJ@&xFhr!O?M+Ro
zrXe0E+B5)0n1(}WWM~1P&>-{%kThZ*qfIox05mkv`7wMwxMQ;-MP8RQ2~iwbW5<=l
z+z`k#qfP2V(PUu|5lwW3*!@2NhD}kiUck1EBM30*IOQ@GZF`Ez-RhVaRoH+i7m*PG
zR@4v`RLqK~qN-OFKtST!qXfAXX&|aU4fpwb%VZI>HSqLcnTpW@iHPHrBF4577A;v^
zFpHL$GBx$v8x57y+|fab;MvWi*m<>L>ho;{?QJc-;`)_4{0+|PUhaWOx0~JcSDAT?
z4l^9w?EMBuqh`3wv|2T6-NTn!T6&bu;G0$H#&J71d7YxgGsH;pj@uPb{fYM#@J+or
zl5%KZkl*3c0~&*gE}(#@00o-LC1enITjG?7AjlUn3PwUrRXP<aJR>F*K>-jD5C9Ya
z000005EK9a5C9Ya0003{00000009sH6cIsXk(&&JVM?J9Qlv-H;GhyUS)$-U0Z;%7
zGDWC`>lDSPqP0OSiml>Qf?L3GA_HKEBYzZ6OjfvvDgXf_A#6iHOo%wJmTNCPg+^Gy
z_{uEZz-*t%Lp(y>CU!j?Co1H)EG?+oARM{v?fE{XSV@_#NXlUqZe)SYY^S*HTe@=z
zK3kJDAYHJ(BADcSk;^PPA{BZi=^Nq#TD3=fO!*v*6u^B(1z5{GZ(?QnsODukje?GL
zH0ip`YkGH7FfFgZ2Y>(=;dlT501H3>ISa^$h=`t%ZlHn)emG{(fMMiOL->Jvj8c+>
zaAGDD|K-yI)WyYtbU`ta6E;Dn@;7taWTB7x&@Lxq%TKMFm-}ce7UPX0EJTJGBwt~U
zfVYImC}ft3$~I+ZgzS3BGX8Yp%C*aCW_D-Xl^<lZRdO;+&v9Z|8Bd!<f$j>1Z}mb=
z=U>r3%HsB+p5&xqCV<W!L3gMviG#f%nU1`}@Kq7h#>C*&P*Y5PFh>f4MS~=XkZ8}+
z(2^N#pql{Is=8i6tX0Mm-elz1{OYB7jXV(mQ~(1?q>P9Vi233)rc@7fO1<LlNT&)C
I4S@O$K;###8UO$Q

literal 0
HcmV?d00001

diff --git a/tests/f_boundscheck/name b/tests/f_boundscheck/name
new file mode 100644
index 0000000..192d279
--- /dev/null
+++ b/tests/f_boundscheck/name
@@ -0,0 +1 @@
+infinite loop due to off by one error when finding free space for inode table relocation
diff --git a/tests/f_boundscheck/script b/tests/f_boundscheck/script
new file mode 100755
index 0000000..fbbce62
--- /dev/null
+++ b/tests/f_boundscheck/script
@@ -0,0 +1,32 @@
+#!/bin/bash
+
+FSCK_OPT=-fy
+IMAGE=$test_dir/image.bz2
+
+bzip2 -d < $IMAGE > $TMPFILE
+#e2label $TMPFILE test_filesys
+
+# Run fsck to fix things?
+EXP1=$test_dir/expect.1
+OUT1=$test_name.1.log
+rm -rf $test_name.failed $test_name.ok
+
+$FSCK $FSCK_OPT $TMPFILE 2>&1 | head -n 1000 | tail -n +2 > $OUT1
+echo "Exit status is $?" >> $OUT1
+
+# Run a second time
+EXP2=$test_dir/expect.2
+OUT2=$test_name.2.log
+
+$FSCK $FSCK_OPT $TMPFILE 2>&1 | head -n 1000 | tail -n +2 > $OUT2
+echo "Exit status is $?" >> $OUT2
+
+# Figure out what happened
+if cmp -s $EXP1 $OUT1 && cmp -s $EXP2 $OUT2; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ echo "$test_name: $test_description: failed"
+ diff -u $EXP1 $OUT1 >> $test_name.failed
+ diff -u $EXP2 $OUT2 >> $test_name.failed
+fi


2014-07-18 22:53:56

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 13/24] e2fsck: fix off-by-one bounds check on group number

Since fs->group_desc_count is the number of block groups, the number
of the last group is always one less than this count. Fix the bounds
check to reflect that.

This flaw shouldn't have any user-visible side effects, since the
block bitmap test based on last_grp later on can handle overbig block
numbers.

Signed-off-by: Darrick J. Wong <[email protected]>
---
e2fsck/pass1.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)


diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 5ad7fe5..eec93c3 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -2916,8 +2916,8 @@ static void new_table_block(e2fsck_t ctx, blk64_t first_block, dgrp_t group,
first_block = ext2fs_group_first_block2(fs,
flexbg_size * flexbg);
last_grp = group | (flexbg_size - 1);
- if (last_grp > fs->group_desc_count)
- last_grp = fs->group_desc_count;
+ if (last_grp >= fs->group_desc_count)
+ last_grp = fs->group_desc_count - 1;
last_block = ext2fs_group_last_block2(fs, last_grp);
} else
last_block = ext2fs_group_last_block2(fs, group);


2014-07-18 22:54:13

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 15/24] misc: fix problems with strncat

The third argument to strncat is the maximum number of characters to
copy out of the second argument; it is not the maximum length of the
first argument.

Therefore, code in a check just in case we ever find a /sys/block/X
path long enough to hit the end of the buffer. FWIW the longest path
I could find on my machine was 133 bytes.

Fixes-Coverity-Bug: 1252003
Signed-off-by: Darrick J. Wong <[email protected]>
---
misc/mk_hugefiles.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)


diff --git a/misc/mk_hugefiles.c b/misc/mk_hugefiles.c
index ea42b6c..7a565ca 100644
--- a/misc/mk_hugefiles.c
+++ b/misc/mk_hugefiles.c
@@ -188,7 +188,9 @@ static blk64_t get_partition_start(const char *device_name)
cp = search_sysfs_block(st.st_rdev, path);
if (!cp)
return 0;
- strncat(path, "/start", SYSFS_PATH_LEN);
+ if (strlen(path) > SYSFS_PATH_LEN - strlen("/start") - 1)
+ return 0;
+ strcat(path, "/start");
f = fopen(path, "r");
if (!f)
return 0;


2014-07-18 22:54:21

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 16/24] e2fsck: don't crash during rehash

If a user crafts a carefully constructed filesystem containing a
single directory entry block with an invalid checksum and fewer than
two entries, and then runs e2fsck to fix the filesystem, fsck will
crash when it tries to "compress" the short dir and passes a negative
dirent array length to qsort. Therefore, don't allow directory
"compression" in this situation.

Signed-off-by: Darrick J. Wong <[email protected]>
---
e2fsck/rehash.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)


diff --git a/e2fsck/rehash.c b/e2fsck/rehash.c
index f75479a..5913ae7 100644
--- a/e2fsck/rehash.c
+++ b/e2fsck/rehash.c
@@ -848,7 +848,7 @@ retry_nohash:

/* Sort the list */
resort:
- if (fd.compress)
+ if (fd.compress && fd.num_array > 1)
qsort(fd.harray+2, fd.num_array-2, sizeof(struct hash_entry),
hash_cmp);
else
@@ -867,7 +867,7 @@ resort:
}

/* Sort non-hashed directories by inode number */
- if (fd.compress)
+ if (fd.compress && fd.num_array > 1)
qsort(fd.harray+2, fd.num_array-2,
sizeof(struct hash_entry), ino_cmp);



2014-07-18 22:54:30

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 17/24] e2fsck: reserve blocks for root/lost+found directory repair

If we think we're going to need to repair either the root directory or
the lost+found directory, reserve a block at the end of pass 1 to
reduce the likelihood of an e2fsck abort while reconstructing
root/lost+found during pass 3.

Signed-off-by: Darrick J. Wong <[email protected]>
---
e2fsck/e2fsck.h | 3 +++
e2fsck/pass1.c | 39 +++++++++++++++++++++++++++++++++++++++
e2fsck/pass3.c | 23 +++++++++++++++++++++++
tests/f_holedir/expect.1 | 2 +-
tests/f_holedir/expect.2 | 2 +-
5 files changed, 67 insertions(+), 2 deletions(-)


diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
index d6d0ba9..76d15c4 100644
--- a/e2fsck/e2fsck.h
+++ b/e2fsck/e2fsck.h
@@ -375,6 +375,9 @@ struct e2fsck_struct {
*/
void *priv_data;
ext2fs_block_bitmap block_metadata_map; /* Metadata blocks */
+
+ /* Reserve blocks for root and l+f re-creation */
+ blk64_t root_repair_block, lnf_repair_block;
};

/* Used by the region allocation code */
diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index eec93c3..5d93feb 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -601,6 +601,42 @@ static errcode_t recheck_bad_inode_checksum(ext2_filsys fs, ext2_ino_t ino,
return 0;
}

+static void reserve_block_for_root_repair(e2fsck_t ctx)
+{
+ blk64_t blk = 0;
+ errcode_t err;
+ ext2_filsys fs = ctx->fs;
+
+ ctx->root_repair_block = 0;
+ if (ext2fs_test_inode_bitmap2(ctx->inode_used_map, EXT2_ROOT_INO))
+ return;
+
+ err = ext2fs_new_block2(fs, 0, ctx->block_found_map, &blk);
+ if (err)
+ return;
+ ext2fs_mark_block_bitmap2(ctx->block_found_map, blk);
+ ctx->root_repair_block = blk;
+}
+
+static void reserve_block_for_lnf_repair(e2fsck_t ctx)
+{
+ blk64_t blk = 0;
+ errcode_t err;
+ ext2_filsys fs = ctx->fs;
+ const char *name = "lost+found";
+ ext2_ino_t ino;
+
+ ctx->lnf_repair_block = 0;
+ if (!ext2fs_lookup(fs, EXT2_ROOT_INO, name, sizeof(name)-1, 0, &ino))
+ return;
+
+ err = ext2fs_new_block2(fs, 0, ctx->block_found_map, &blk);
+ if (err)
+ return;
+ ext2fs_mark_block_bitmap2(ctx->block_found_map, blk);
+ ctx->lnf_repair_block = blk;
+}
+
void e2fsck_pass1(e2fsck_t ctx)
{
int i;
@@ -1357,6 +1393,9 @@ endit:
if (inode)
ext2fs_free_mem(&inode);

+ reserve_block_for_root_repair(ctx);
+ reserve_block_for_lnf_repair(ctx);
+
/*
* The l+f inode may have been cleared, so zap it now and
* later passes will recalculate it if necessary
diff --git a/e2fsck/pass3.c b/e2fsck/pass3.c
index 4fc390a..92e71e7 100644
--- a/e2fsck/pass3.c
+++ b/e2fsck/pass3.c
@@ -134,6 +134,17 @@ abort_exit:
inode_done_map = 0;
}

+ if (ctx->lnf_repair_block) {
+ ext2fs_unmark_block_bitmap2(ctx->block_found_map,
+ ctx->lnf_repair_block);
+ ctx->lnf_repair_block = 0;
+ }
+ if (ctx->root_repair_block) {
+ ext2fs_unmark_block_bitmap2(ctx->block_found_map,
+ ctx->root_repair_block);
+ ctx->root_repair_block = 0;
+ }
+
print_resource_track(ctx, _("Pass 3"), &rtrack, ctx->fs->io);
}

@@ -176,6 +187,11 @@ static void check_root(e2fsck_t ctx)
/*
* First, find a free block
*/
+ if (ctx->root_repair_block) {
+ blk = ctx->root_repair_block;
+ ctx->root_repair_block = 0;
+ goto skip_new_block;
+ }
pctx.errcode = ext2fs_new_block2(fs, 0, ctx->block_found_map, &blk);
if (pctx.errcode) {
pctx.str = "ext2fs_new_block";
@@ -184,6 +200,7 @@ static void check_root(e2fsck_t ctx)
return;
}
ext2fs_mark_block_bitmap2(ctx->block_found_map, blk);
+skip_new_block:
ext2fs_mark_block_bitmap2(fs->block_map, blk);
ext2fs_mark_bb_dirty(fs);

@@ -425,6 +442,11 @@ unlink:
/*
* First, find a free block
*/
+ if (ctx->lnf_repair_block) {
+ blk = ctx->lnf_repair_block;
+ ctx->lnf_repair_block = 0;
+ goto skip_new_block;
+ }
retval = ext2fs_new_block2(fs, 0, ctx->block_found_map, &blk);
if (retval) {
pctx.errcode = retval;
@@ -432,6 +454,7 @@ unlink:
return 0;
}
ext2fs_mark_block_bitmap2(ctx->block_found_map, blk);
+skip_new_block:
ext2fs_block_alloc_stats2(fs, blk, +1);

/*
diff --git a/tests/f_holedir/expect.1 b/tests/f_holedir/expect.1
index ad74fa6..e9cf590 100644
--- a/tests/f_holedir/expect.1
+++ b/tests/f_holedir/expect.1
@@ -18,7 +18,7 @@ Directory inode 11 has an unallocated block #6. Allocate? yes
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
-Block bitmap differences: -21
+Block bitmap differences: -10
Fix? yes

Free blocks count wrong for group #0 (78, counted=79).
diff --git a/tests/f_holedir/expect.2 b/tests/f_holedir/expect.2
index 4c0b4f2..6ab6209 100644
--- a/tests/f_holedir/expect.2
+++ b/tests/f_holedir/expect.2
@@ -3,5 +3,5 @@ Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
-test_filesys: 11/32 files (0.0% non-contiguous), 21/100 blocks
+test_filesys: 11/32 files (9.1% non-contiguous), 21/100 blocks
Exit status is 0


2014-07-18 22:54:36

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 18/24] e2fsck: collapse holes in extent-based directories

If we notice a hole in the block map of an extent-based directory,
offer to collapse the hole by decreasing the logical block # of the
extent. This saves us from pass 3's inefficient strategy, which fills
the holes by mapping in a lot of empty directory blocks.

Signed-off-by: Darrick J. Wong <[email protected]>
---
e2fsck/pass1.c | 39 +++++++++++++++++++++++++++++++++++++++
e2fsck/problem.c | 5 +++++
e2fsck/problem.h | 3 +++
tests/f_holedir2/expect.1 | 8 +++-----
tests/f_holedir3/expect.1 | 13 +++++++++++++
tests/f_holedir3/expect.2 | 7 +++++++
tests/f_holedir3/image.gz | Bin
tests/f_holedir3/name | 2 ++
8 files changed, 72 insertions(+), 5 deletions(-)
create mode 100644 tests/f_holedir3/expect.1
create mode 100644 tests/f_holedir3/expect.2
create mode 100644 tests/f_holedir3/image.gz
create mode 100644 tests/f_holedir3/name


diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 5d93feb..756e3bb 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -2160,6 +2160,45 @@ fix_problem_now:
}
pb->fragmented = 1;
}
+ /*
+ * If we notice a gap in the logical block mappings of an
+ * extent-mapped directory, offer to close the hole by
+ * moving the logical block down, otherwise we'll go mad in
+ * pass 3 allocating empty directory blocks to fill the hole.
+ */
+ if (try_repairs && is_dir &&
+ pb->last_block + 1 < (e2_blkcnt_t)extent.e_lblk) {
+ blk64_t new_lblk;
+
+ new_lblk = pb->last_block + 1;
+ if (EXT2FS_CLUSTER_RATIO(ctx->fs) > 1)
+ new_lblk = ((new_lblk +
+ EXT2FS_CLUSTER_RATIO(ctx->fs)) &
+ EXT2FS_CLUSTER_MASK(ctx->fs)) |
+ (extent.e_lblk &
+ EXT2FS_CLUSTER_MASK(ctx->fs));
+ pctx->blk = extent.e_lblk;
+ pctx->blk2 = new_lblk;
+ if (fix_problem(ctx, PR_1_COLLAPSE_DBLOCK, pctx)) {
+ extent.e_lblk = new_lblk;
+ pb->inode_modified = 1;
+ pctx->errcode = ext2fs_extent_replace(ehandle,
+ 0, &extent);
+ if (pctx->errcode) {
+ pctx->errcode = 0;
+ goto alloc_later;
+ }
+ pctx->errcode = ext2fs_extent_fix_parents(ehandle);
+ if (pctx->errcode)
+ goto failed_add_dir_block;
+ pctx->errcode = ext2fs_extent_goto(ehandle,
+ extent.e_lblk);
+ if (pctx->errcode)
+ goto failed_add_dir_block;
+ last_lblk = extent.e_lblk + extent.e_len - 1;
+ }
+ }
+alloc_later:
while (is_dir && (++pb->last_db_block <
(e2_blkcnt_t) extent.e_lblk)) {
pctx->errcode = ext2fs_add_dir_block2(ctx->fs->dblist,
diff --git a/e2fsck/problem.c b/e2fsck/problem.c
index 18d8025..a1986c6 100644
--- a/e2fsck/problem.c
+++ b/e2fsck/problem.c
@@ -1038,6 +1038,11 @@ static struct e2fsck_problem problem_table[] = {
N_("@i %i block %b conflicts with critical metadata, skipping block checks.\n"),
PROMPT_NONE, 0 },

+ /* Directory inode block <block> should be at block <otherblock> */
+ { PR_1_COLLAPSE_DBLOCK,
+ N_("@d @i %i @b %b should be at @b %c. "),
+ PROMPT_FIX, 0 },
+
/* Pass 1b errors */

/* Pass 1B: Rescan for duplicate/bad blocks */
diff --git a/e2fsck/problem.h b/e2fsck/problem.h
index 9001ef4..c349286 100644
--- a/e2fsck/problem.h
+++ b/e2fsck/problem.h
@@ -603,6 +603,9 @@ struct problem_context {
/* file metadata collides with critical metadata */
#define PR_1_CRITICAL_METADATA_COLLISION 0x010071

+/* Directory inode has a missing block (hole) */
+#define PR_1_COLLAPSE_DBLOCK 0x010072
+
/*
* Pass 1b errors
*/
diff --git a/tests/f_holedir2/expect.1 b/tests/f_holedir2/expect.1
index 5124f61..455f4b0 100644
--- a/tests/f_holedir2/expect.1
+++ b/tests/f_holedir2/expect.1
@@ -1,21 +1,19 @@
Pass 1: Checking inodes, blocks, and sizes
Inode 12, i_size is 0, should be 5120. Fix? yes

-Inode 13, i_size is 4096, should be 5120. Fix? yes
+Directory inode 13 block 2 should be at block 1. Fix? yes

Pass 2: Checking directory structure
Directory inode 12 has an unallocated block #3. Allocate? yes

-Directory inode 13 has an unallocated block #1. Allocate? yes
-
Pass 3: Checking directory connectivity
Pass 3A: Optimizing directories
Pass 4: Checking reference counts
Pass 5: Checking group summary information
-Free blocks count wrong for group #0 (79, counted=77).
+Free blocks count wrong for group #0 (78, counted=77).
Fix? yes

-Free blocks count wrong (79, counted=77).
+Free blocks count wrong (78, counted=77).
Fix? yes


diff --git a/tests/f_holedir3/expect.1 b/tests/f_holedir3/expect.1
new file mode 100644
index 0000000..074ca6c
--- /dev/null
+++ b/tests/f_holedir3/expect.1
@@ -0,0 +1,13 @@
+Pass 1: Checking inodes, blocks, and sizes
+Directory inode 12 block 5 should be at block 2. Fix? yes
+
+Inode 12, i_size is 6144, should be 3072. Fix? yes
+
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 17/128 files (5.9% non-contiguous), 1093/2048 blocks
+Exit status is 1
diff --git a/tests/f_holedir3/expect.2 b/tests/f_holedir3/expect.2
new file mode 100644
index 0000000..b675e6d
--- /dev/null
+++ b/tests/f_holedir3/expect.2
@@ -0,0 +1,7 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 17/128 files (5.9% non-contiguous), 1093/2048 blocks
+Exit status is 0
diff --git a/tests/f_holedir3/image.gz b/tests/f_holedir3/image.gz
new file mode 100644
index 0000000000000000000000000000000000000000..c5eeb37d2029fdf7d98d86548ebea0024f082b4e
GIT binary patch
literal 3700
zcmeH}{Wsfr7RPnEhPJ28bhj2;COylX&RBGo!2}hj3{I73tMO0|_0VFh#-L4u%(pWw
zJsM<<A=6R98nyA*2#t73?TlnAk|0PmByFiML`dX`<oj8>d-m+iIr|sPx<A}=@BQt*
z-tTkIMUpJdKANFkd~3gAUYKbowl~}GVw`Oo&2jGcbG&jve1w*f)ZS;9__Kc|ciewv
z_`Ky}#FueC9}o9zkbfu9*K4D_-jUZAYft1@dOSl%gan?;PrrV+=b(p;Pj|_}OYbA}
z^oH>krDCnV$I-EAwrz1%7+k31ua|4|9UAF?m%Gnz-WqQY5(VHag(I)|v6c?>vDm2R
zk%6(wsLF)^>d4>!wzV~0)<&1Y`hmC}Bfhl;^Db$`hhTX!*{~oSf}d~IA=kI(YcHd3
zsW6Q?2sPVQt3He4saWwys9<dpF@#^*Bq8EikR-8dZKnZ!N*YSonr|zrtJbKmZ4=a9
zxDvW6r%aDJ%)bntx(Gcv)IZhrT(B4F^pX@qJDHB+XoGf;2ije(I1bzzKKU!NpGV@*
zidC{g^&Q&#aJ+N8<PkcRaQc{-wSpidbXtoyX5ZOX?RN++|9q8ILA9T&kseCw!t!)}
zP&D^|)#H5Z4z<Nffn5G|$0rsq(DkPM|0^+za>;(G(Q}CLi>{9(SQ#pFnp4{)XOZzs
z7w<~?Olf@WAn}V!UKFA~!<m9ik4Zo?<`|4OnoiELI^+9Mx6`%!{3Cg!u6ZTZ4(Zfb
zv&~-hx&MKX_eUlD+Sj=a$s3ar8#~SX*!9`4uRJU~5~nyeT!cdURq_Vf#O+h4(;hHE
z<{bq(kEY?55Wm%9rpk+BdD!)Iy*M;K<o5k_5T>RM8<rqKZ$AngP>(^AJuf}GGhk7R
zNKMUwy*$%k$|-LkVg0bMBp!0t2%E;hh3~d6&azC<9|E!nurVG~cFZ8-0HUvI6`Zja
zg8OPHRw$2|8rLs&GV31d85o4q1T0!F{m49V*~reTqvq=Czw9#^sd(&D1G+gJ>;wzI
zK5tJr*Vs7$K$5u~@Y&H(I`X#li993p?B2`QMnJxdF)qclTR6D`%;4;jIW}uPSbyak
z-40ZyG|mQ_F-ulD|H*2_0D`Zh00`+N1R)MgrvbJJ_~UI&ghHFOl?M`#mL#$^&SPBZ
zpk~ni$6IjHi7^WF*>a;Ve7?U4U?jz1d^ci|pW<<%^+J%#U18<Z961J`Ydq!mLm>F=
z>fnn~H^4Ff&?5i%RT;l9d<Vk?>*?j@c$aMFZ2AI(8#HlJ?<-;LG{pxtv7-X?XWv(`
z=Sm+OT_2l#LJWqw6wMC7Iv9Dwm{MVc0FS{AsZe1w&2PfxAgPU6dQ%P&D~!zv5dJTv
zk&u&5EvSmd*(wB}@)H%(t^nlN(%{IO-hiAuc(vOH949TZb>k}~ZWm0nwLCaVz6B|G
zK=hs#`e1j`ef901d;X{0l_pxzcrP=fkCM?RUTKM#(|F8VKR(6zF=9@6BdoOl%xCO{
zFLrC2UgC`%6SS{ynq|M4^x<6-UFOnw>m&m2H7ez#$oS6P(${`v7GB*Aa2cacI3=?c
z<B#2uu_SC?_?SViUZg%L@hPara(jey({YjpiS8~4ZB{99fz{&Uw#{#RJH;eAJpICO
ziQ5|7c&`gUTNP2?hOyIV;97d2>JHAR>+C1yN0ex(#qJq8*{=5dnhM|NUB;(lIYL3T
z82M>Tf~-rh;KJ)aLnj@Lep@7(ZIk;mqh)i;IP&JTEXS$XNBbOJjU37k`Rj(3%_D;<
z17x%6(l~EOOmqY^v8?uehiU3)xEy+K=5kQK8l+YjEx1y4Tu=+Ts#X9R|1AOM_Xg$Z
zy!?L&qdAwC3djpkS?Z)#`KkM;QN+qls%M@T2++0e;*S3~p(@*mYr)!q%&me=HXK>#
zp@DSo4#^Ai*$25`5>;v)U`5WVOMh4s!=DTH6rhMst;yHF=lQ{G?`DuG)64%WWpVf#
zNVJ`C#cVff%|!f}$eRPvrD|pCrQC-|Ctm^Ph>aW{I4vhlk7f04&ht*HoR*$&>j%m>
z+pcS&?JrjtLc4;Gt25Bj;piDz!K=wkIwcDQbmdL3pkeb_rpzwk2B-PZPaU#X?-AG|
z@XH{O$AE)&ADn7JDJ`~6ygE`4gR^XY-5L`&q}w<x7wvFy(5sGbxFgx0Pj8N*j?Q2`
znUzT=mb8)Bk-1_@yQ!15CJUVq?4BD&?Y)Cc_2TWD(xaDT36bv1SC3tnw1EjDb6T6P
znG?~nA|lPom#BS^g5bK9Eg(2uu=Efmfi&2LvHWGQ+}mQ0!2b(@+u?d7(JsZ$%<P2O
FKLDc*@#_Ep

literal 0
HcmV?d00001

diff --git a/tests/f_holedir3/name b/tests/f_holedir3/name
new file mode 100644
index 0000000..a526787
--- /dev/null
+++ b/tests/f_holedir3/name
@@ -0,0 +1,2 @@
+real directories with holes and zero i_size
+


2014-07-18 22:54:46

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 19/24] e2fsck: always submit logical block 0 of a directory for pass 2

Always iterate logical block 0 in a directory, even if no physical
block has been allocated. Pass 2 will notice the lack of mapping and
offer to allocate a new directory block; this enables us to link the
directory into lost+found.

Previously, if there were no logical blocks mapped, we would fail to
pick up even block 0 of the directory for processing in pass 2. This
meant that e2fsck never allocated a block 0 and therefore wouldn't fix
the missing . and .. entries for the directory; subsequent e2fsck runs
would complain about (yet never fix) the problem.

Signed-off-by: Darrick J. Wong <[email protected]>
---
e2fsck/pass1.c | 15 +++++++++++++++
tests/f_emptydir/expect.1 | 19 +++++++++++++++++++
tests/f_emptydir/expect.2 | 7 +++++++
tests/f_emptydir/image.gz | Bin
tests/f_emptydir/name | 2 ++
5 files changed, 43 insertions(+)
create mode 100644 tests/f_emptydir/expect.1
create mode 100644 tests/f_emptydir/expect.2
create mode 100644 tests/f_emptydir/image.gz
create mode 100644 tests/f_emptydir/name


diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 756e3bb..7a16780 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -2652,6 +2652,21 @@ static int process_block(ext2_filsys fs,
return 0;
}

+ /*
+ * For a directory, add logical block zero for processing even if it's
+ * not mapped or we'll be perennially stuck with broken "." and ".."
+ * entries.
+ */
+ if (p->is_dir && blockcnt == 0 && blk == 0) {
+ pctx->errcode = ext2fs_add_dir_block2(fs->dblist, p->ino, 0, 0);
+ if (pctx->errcode) {
+ pctx->blk = blk;
+ pctx->num = blockcnt;
+ goto failed_add_dir_block;
+ }
+ p->last_db_block++;
+ }
+
if (blk == 0)
return 0;

diff --git a/tests/f_emptydir/expect.1 b/tests/f_emptydir/expect.1
new file mode 100644
index 0000000..c612621
--- /dev/null
+++ b/tests/f_emptydir/expect.1
@@ -0,0 +1,19 @@
+Pass 1: Checking inodes, blocks, and sizes
+Inode 12, i_size is 1024, should be 0. Fix? yes
+
+Pass 2: Checking directory structure
+Directory inode 12 has an unallocated block #0. Allocate? yes
+
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+Free blocks count wrong for group #0 (957, counted=956).
+Fix? yes
+
+Free blocks count wrong (957, counted=956).
+Fix? yes
+
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 12/256 files (8.3% non-contiguous), 1092/2048 blocks
+Exit status is 1
diff --git a/tests/f_emptydir/expect.2 b/tests/f_emptydir/expect.2
new file mode 100644
index 0000000..75c0515
--- /dev/null
+++ b/tests/f_emptydir/expect.2
@@ -0,0 +1,7 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 12/256 files (16.7% non-contiguous), 1092/2048 blocks
+Exit status is 0
diff --git a/tests/f_emptydir/image.gz b/tests/f_emptydir/image.gz
new file mode 100644
index 0000000000000000000000000000000000000000..1b6f8f8350312013782acd2d2136db9f77c896dd
GIT binary patch
literal 2710
zcmb2|=3q#<KR=j>`R(2P*;fK&7(PtDv+`@y*0o$s-fO#+L;_h{*-MK89Q_M7b7*zh
zYyFeBqqSrAuCMk51x12Ku7<u)Vht2zUHORDZ=U{i>*{Uop|AS0KmR=NeUi76+{vfX
zaYZMq6&YHm9(lp}utvs4uQeds?^oUm$-+mgS9ZSoE#bN1`s&M`$vaQ}*jpPq^Pa?G
z!*53`X0;Vq^S#ftEO~kQ+TqvoHL|wbUS2smy*@qp+L@V`E}QN9{bjN6{5W~ZeY^iX
zS?B%UZl3p~zYF!xK2N<J_%nv<>9(hrpDkN{)-=D*P$1dr-SVvCpMG1vyLWSwk=_*9
zsz3G3hHc4jGM?1yZb^60-6OlTodFd*c)z;5^V0vDe?D4?L|ohE8}D&x;)LzRm&GrY
zq^RT{PLEx&Wcu;EJ9o<J-dwx7`Q{#hdDEvY4?h=UUHx%avARy#-p!x(hn@N?ufdUf
zvGEW46MKdKj6eBL)I0oR{v<!)e*;KpgNDt`-OvA>kz8;`>g=wy^Zt6pEnfbH?{9_r
zHzl1{fBa|twEZ*L_3|}Kg<Chx>@1Aq>~5IVy5IhBOG30-dY5<YS)0#|OTGU&C-RzU
z|BX!jTewn8|5th*+qFZt15H&DPTu=(Az;6)-N4c`FxT~U+3~hx8<u~J{5j{(2XoJB
zVwZN=2mEVi{;7N2eB+(4EvNivKUJ4Kb$*%bpQHbC<9znoyqdmj_UrGr{x{dJ*U$LR
z-IbU3_xt^ftM%&7uhgIZe69ZaubUAWtRMW!{)X$FxBWNSV9SA0`TtixIoH1ksjGW=
zbJstezXgBNcN$)iN!$Lx{J2Dip{(|${gV&e`rpcQ|Np7^|1;~irN>?ET=`vE-=;d_
zJtHIv9Lhibk#`OG^8A0-y5INzM@KMc-?VPJ_n#RxhR}FFY?oem(PX95z0u}a*M#p?
zdyU*w-SSWT-W?~JTzcZS^n3ow`p0N`M!BOQFd71*Aut*OqaiRF0s|fb8>ZAS-!GLj
JV_;BV0054XfYJZ}

literal 0
HcmV?d00001

diff --git a/tests/f_emptydir/name b/tests/f_emptydir/name
new file mode 100644
index 0000000..d5bc660
--- /dev/null
+++ b/tests/f_emptydir/name
@@ -0,0 +1,2 @@
+always iterate dir block 0 or e2fsck goes into infinite loop
+


2014-07-18 22:55:16

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 20/24] e2fsck: pass2 should not process directory blocks that are impossibly large

Currently, directories cannot be fallocated, which means that the only
way they get bigger is for the kernel to append blocks one by one.
Therefore, if we encounter a logical block offset that is too big, we
needn't bother adding it to the dblist for pass2 processing, because
it's unlikely to contain a valid directory block. The code that
handles extent based directories also does not add toobig blocks to
the dblist.

Note that we can easily cause e2fsck to fail with ENOMEM if we start
feeding it really large logical block offsets, as the dblist
implementation will try to realloc() an array big enough to hold it.

Signed-off-by: Darrick J. Wong <[email protected]>
---
e2fsck/pass1.c | 11 +++++++++++
tests/f_hugedir_blocks/expect.1 | 10 ++++++++++
tests/f_hugedir_blocks/expect.2 | 7 +++++++
tests/f_hugedir_blocks/image.gz | Bin
tests/f_hugedir_blocks/name | 1 +
5 files changed, 29 insertions(+)
create mode 100644 tests/f_hugedir_blocks/expect.1
create mode 100644 tests/f_hugedir_blocks/expect.2
create mode 100644 tests/f_hugedir_blocks/image.gz
create mode 100644 tests/f_hugedir_blocks/name


diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 7a16780..b478bca 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -2758,6 +2758,17 @@ static int process_block(ext2_filsys fs,
blk = *block_nr = 0;
ret_code = BLOCK_CHANGED;
p->inode_modified = 1;
+ /*
+ * If the directory block is too big and is beyond the
+ * end of the FS, don't bother trying to add it for
+ * processing -- the kernel would never have created a
+ * directory this large, and we risk an ENOMEM abort.
+ * In any case, the toobig handler for extent-based
+ * directories also doesn't feed toobig blocks to
+ * pass 2.
+ */
+ if (problem == PR_1_TOOBIG_DIR)
+ return ret_code;
goto mark_dir;
} else
return 0;
diff --git a/tests/f_hugedir_blocks/expect.1 b/tests/f_hugedir_blocks/expect.1
new file mode 100644
index 0000000..798a7ac
--- /dev/null
+++ b/tests/f_hugedir_blocks/expect.1
@@ -0,0 +1,10 @@
+Pass 1: Checking inodes, blocks, and sizes
+Inode 12 is too big. Truncate? yes
+
+Block #1074791435 (13) causes directory to be too big. CLEARED.
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 12/128 files (8.3% non-contiguous), 22/512 blocks
+Exit status is 0
diff --git a/tests/f_hugedir_blocks/expect.2 b/tests/f_hugedir_blocks/expect.2
new file mode 100644
index 0000000..ac5f4c1
--- /dev/null
+++ b/tests/f_hugedir_blocks/expect.2
@@ -0,0 +1,7 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 12/128 files (8.3% non-contiguous), 22/512 blocks
+Exit status is 0
diff --git a/tests/f_hugedir_blocks/image.gz b/tests/f_hugedir_blocks/image.gz
new file mode 100644
index 0000000000000000000000000000000000000000..1d54de806e596575a2a081931c587d77a695b79d
GIT binary patch
literal 2497
zcmb2|=3uxyeNiwI^V>Uy*}{P`3?Is;hhz(93W_gGC@csx?s^bZx=}%o@nERSk`B?T
zq-@zBbMwNN_YU$ucr@$%0)-tMeNDSwEOM%u?w!>sn6uV%_P@G^^~X=|z4LVM?md6r
z7pbr$?9X~K*`y`8@S7s1;%l4vWs71L2A+AhcJ1}~$32`@hyI__5@CDhcu{$*XL{UA
zk#kR9cW6g;e)fOAU$1RucmBMXox5r>Yo7ixxV7ucn`2M2?>~>*yQlWq(VyGZkAM0e
z?c1F?`}W4*Wsj9#ey*`RHpiKtp+WJ)<@?Hpep^2O*O_wbW~L%{zyE7R1_p*3zoU+y
ze(3&EK8_#AdsFrE{r{^ImTV2U!yUkz8MRvU`v*B*;g@q;kG{FGw)J;;{<G6g6}FLH
z)2GSrDqsTYsP<U;{LlZ>DTV()`g>+*#sVpZbN?BE<n#X^HV_=xAajWoNE>`wCH?R}
zSPm@Cz>u)rY7IA(jZ7sdL`}9$1&Sm**}(*|hfHuF?9BSL*LQE6f9vl@r#U{NZ_ir=
z>S(rq^qZ^V_R>DL-s;%pXU&@{OTITR7L;f?mNmyD{ddp*lgEB+kDWTN=u`X0_1~*c
z#5Eqwc=q#r&i+68yvDO%O-=4OY9a9W0pD}`!+-vif8PIQ`=9sWzdsjdC(V9+{`iHf
z<@e?O`u>~lzVC0I9tW8YAJsM*0;3@?8UmvsFd71*AwVbu4qW@oGI8xHe+C8x1^`B?
B32*=a

literal 0
HcmV?d00001

diff --git a/tests/f_hugedir_blocks/name b/tests/f_hugedir_blocks/name
new file mode 100644
index 0000000..d74761b
--- /dev/null
+++ b/tests/f_hugedir_blocks/name
@@ -0,0 +1 @@
+crash e2fsck with a dir with an impossibly high logical blk offset


2014-07-18 22:55:31

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 21/24] e2fsck: clear uninit flag on directory extents

Directories can't have uninitialized extents, so offer to clear the
uninit flag when we find this situation. The actual directory blocks
will be checked in pass 2 and 3 regardless of the uninit flag.

Signed-off-by: Darrick J. Wong <[email protected]>
---
e2fsck/pass1.c | 16 ++++++++++++++++
e2fsck/problem.c | 5 +++++
e2fsck/problem.h | 3 +++
tests/f_uninit_dir/expect.1 | 27 +++++++++++++++++++++++++++
tests/f_uninit_dir/expect.2 | 7 +++++++
tests/f_uninit_dir/image.gz | Bin
tests/f_uninit_dir/name | 1 +
7 files changed, 59 insertions(+)
create mode 100644 tests/f_uninit_dir/expect.1
create mode 100644 tests/f_uninit_dir/expect.2
create mode 100644 tests/f_uninit_dir/image.gz
create mode 100644 tests/f_uninit_dir/name


diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index b478bca..1895ee4 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -2009,6 +2009,22 @@ static void scan_extent_node(e2fsck_t ctx, struct problem_context *pctx,
(1 << (21 - ctx->fs->super->s_log_block_size))))
problem = PR_1_TOOBIG_DIR;

+ /*
+ * Uninitialized blocks in a directory? Clear the flag and
+ * we'll interpret the blocks later.
+ */
+ if (try_repairs && is_dir && problem == 0 &&
+ (extent.e_flags & EXT2_EXTENT_FLAGS_UNINIT) &&
+ fix_problem(ctx, PR_1_UNINIT_DBLOCK, pctx)) {
+ extent.e_flags &= ~EXT2_EXTENT_FLAGS_UNINIT;
+ pb->inode_modified = 1;
+ pctx->errcode = ext2fs_extent_replace(ehandle, 0,
+ &extent);
+ if (pctx->errcode)
+ return;
+ failed_csum = 0;
+ }
+
/* Corrupt but passes checks? Ask to fix checksum. */
if (try_repairs && failed_csum) {
pctx->blk = extent.e_pblk;
diff --git a/e2fsck/problem.c b/e2fsck/problem.c
index a1986c6..60c02af 100644
--- a/e2fsck/problem.c
+++ b/e2fsck/problem.c
@@ -1043,6 +1043,11 @@ static struct e2fsck_problem problem_table[] = {
N_("@d @i %i @b %b should be at @b %c. "),
PROMPT_FIX, 0 },

+ /* Extents/inlinedata flag set on a device or socket inode */
+ { PR_1_UNINIT_DBLOCK,
+ N_("@d @i %i has @x marked uninitialized at @b %c. "),
+ PROMPT_FIX, PR_PREEN_OK },
+
/* Pass 1b errors */

/* Pass 1B: Rescan for duplicate/bad blocks */
diff --git a/e2fsck/problem.h b/e2fsck/problem.h
index c349286..6cd3d50 100644
--- a/e2fsck/problem.h
+++ b/e2fsck/problem.h
@@ -606,6 +606,9 @@ struct problem_context {
/* Directory inode has a missing block (hole) */
#define PR_1_COLLAPSE_DBLOCK 0x010072

+/* uninit directory block */
+#define PR_1_UNINIT_DBLOCK 0x010073
+
/*
* Pass 1b errors
*/
diff --git a/tests/f_uninit_dir/expect.1 b/tests/f_uninit_dir/expect.1
new file mode 100644
index 0000000..f0065f1
--- /dev/null
+++ b/tests/f_uninit_dir/expect.1
@@ -0,0 +1,27 @@
+Pass 1: Checking inodes, blocks, and sizes
+Directory inode 12 has extent marked uninitialized at block 0. Fix? yes
+
+Directory inode 14 has extent marked uninitialized at block 0. Fix? yes
+
+Pass 2: Checking directory structure
+Directory inode 14, block #0, offset 0: directory corrupted
+Salvage? yes
+
+Missing '.' in directory inode 14.
+Fix? yes
+
+Setting filetype for entry '.' in ??? (14) to 2.
+Missing '..' in directory inode 14.
+Fix? yes
+
+Setting filetype for entry '..' in ??? (14) to 2.
+Pass 3: Checking directory connectivity
+'..' in /abc (14) is <The NULL inode> (0), should be / (2).
+Fix? yes
+
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 14/128 files (0.0% non-contiguous), 20/512 blocks
+Exit status is 1
diff --git a/tests/f_uninit_dir/expect.2 b/tests/f_uninit_dir/expect.2
new file mode 100644
index 0000000..7b28f43
--- /dev/null
+++ b/tests/f_uninit_dir/expect.2
@@ -0,0 +1,7 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 14/128 files (0.0% non-contiguous), 20/512 blocks
+Exit status is 0
diff --git a/tests/f_uninit_dir/image.gz b/tests/f_uninit_dir/image.gz
new file mode 100644
index 0000000000000000000000000000000000000000..ac9131d26d9da4943e912c5ea493110727f1fbc7
GIT binary patch
literal 2604
zcmb2|=3uzrvLu*^`R(2P*}{P`3?Is;b1m&%mLYhw>5)Q50K1?p%VVXkZi$$fBl0_b
zEDjL8!q>aGxv4u_v1xMi0zRvb$qE*ndjA-ex|$bt3*3KNq-ChP@!F5Y^ZkF^+Z})M
z^y$;F`}8?uI1VI*o>$A{Se^V)jWco6Ub|fP?Dnd-H#76D@E!4R-W*vyy=6!BjG0+i
zx{iC@dRkDOSgj*>oPB23(~Q#7r=nMXv)%pc(YKTRQ%_Aj_k82eFEfkx?_PWMy7}ts
z%gXb2m)9R|o_Xx^Y|j;KMYHmM?~~<D)wwrG?=^eUo3GcH86NPTxOD%i;kE6b{-2yN
zo3G}snPl9$uUw1_3<vhFkJM{FzchY_Jp%*7hwG>7{_D4^$nyWJXFR3od;Iw4{~!9=
z7Jj*M<Yw_~Kc6}F#WuI6xoYk$_!tx+_I!Sb*UO2EkJofE0rl4JeAxH#f4i66KXxGR
zpRsZrNYnt#>-i6~3I-bPYwllmzyJFGXsPG%p<bP@e<Ml46d}_aBuvW7beMrc|7Gtt
zAj^=*{$QN8Olqr=Sn%%q@46;W-1GPMvd<^B>u@%&EizbAX!XU8lllH#{m#IshF`~%
z?%ddzRxjpiUt!MlH(<YvdjG1}bN@Ym|1qyWEPMK-&h!1}&MWR^`ukDibp5KkpZ1%q
z$$oCX(EFe9g}&Ak_5YrHjDJ#Jp1ha;cHox!bAQ&$UH=+!{P^;AwP!j1uUUsj6>ccA
zt(J?tx&E8E-t(u2<!{B$S<4<>?EL=u{rmH)a_f7-{@$#a|9k(5zx$v5zyHgA=F<P+
zU#fa!<?i0_{x{z>tlm(JM9+^Z9}R)g5Eu=C(GVC7fzc2kIs~3Q=gz$9$Hu^*zyJU<
C_B(|D

literal 0
HcmV?d00001

diff --git a/tests/f_uninit_dir/name b/tests/f_uninit_dir/name
new file mode 100644
index 0000000..d7f5bee
--- /dev/null
+++ b/tests/f_uninit_dir/name
@@ -0,0 +1 @@
+fix uninit flag on directory extents and check the dir blocks


2014-07-18 22:55:37

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 22/24] e2fsck: check return value of ext2fs_extent_fix_parents()

When fixing parent nodes in an extent tree, check the return value for
errors and bail out if problems happen.

Signed-off-by: Darrick J. Wong <[email protected]>
Fixes-Coverity-Bug: 1193379
---
e2fsck/pass1.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)


diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 1895ee4..c52b141 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -2074,7 +2074,10 @@ fix_problem_now:
pctx->str = "ext2fs_extent_delete";
return;
}
- ext2fs_extent_fix_parents(ehandle);
+ pctx->errcode = ext2fs_extent_fix_parents(ehandle);
+ if (pctx->errcode &&
+ pctx->errcode != EXT2_ET_NO_CURRENT_NODE)
+ break;
pctx->errcode = ext2fs_extent_get(ehandle,
EXT2_EXTENT_CURRENT,
&extent);


2014-07-18 22:55:42

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 23/24] mke2fs: set error behavior at initialization time

Port tune2fs' -e flag to mke2fs so that we can set error behavior at
format time, and introduce the equivalent errors= setting into
mke2fs.conf.

Signed-off-by: Darrick J. Wong <[email protected]>
---
misc/mke2fs.8.in | 23 +++++++++
misc/mke2fs.c | 57 +++++++++++++++++++++-
misc/mke2fs.conf.5.in | 19 +++++++
tests/t_mke2fs_errors/expect | 24 +++++++++
tests/t_mke2fs_errors/script | 110 ++++++++++++++++++++++++++++++++++++++++++
5 files changed, 231 insertions(+), 2 deletions(-)
create mode 100644 tests/t_mke2fs_errors/expect
create mode 100755 tests/t_mke2fs_errors/script


diff --git a/misc/mke2fs.8.in b/misc/mke2fs.8.in
index bf17eae..bad76bb 100644
--- a/misc/mke2fs.8.in
+++ b/misc/mke2fs.8.in
@@ -113,6 +113,10 @@ mke2fs \- create an ext2/ext3/ext4 filesystem
[
.B \-V
]
+[
+.B \-e
+.I errors-behavior
+]
.I device
[
.I blocks-count
@@ -206,6 +210,25 @@ lot of buffer cache memory, which may impact other applications running
on a busy server. This option will cause mke2fs to run much more
slowly, however, so there is a tradeoff to using direct I/O.
.TP
+.BI \-e " error-behavior"
+Change the behavior of the kernel code when errors are detected.
+In all cases, a filesystem error will cause
+.BR e2fsck (8)
+to check the filesystem on the next boot.
+.I error-behavior
+can be one of the following:
+.RS 1.2i
+.TP 1.2i
+.B continue
+Continue normal execution.
+.TP
+.B remount-ro
+Remount filesystem read-only.
+.TP
+.B panic
+Cause a kernel panic.
+.RE
+.TP
.BI \-E " extended-options"
Set extended options for the filesystem. Extended options are comma
separated, and may take an argument using the equals ('=') sign. The
diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index e26408c..792c754 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -113,6 +113,8 @@ static profile_t profile;

static int sys_page_size = 4096;

+static int errors_behavior = 0;
+
static void usage(void)
{
fprintf(stderr, _("Usage: %s [-c|-l filename] [-b block-size] "
@@ -124,7 +126,7 @@ static void usage(void)
"\t[-g blocks-per-group] [-L volume-label] "
"[-M last-mounted-directory]\n\t[-O feature[,...]] "
"[-r fs-revision] [-E extended-option[,...]]\n"
- "\t[-t fs-type] [-T usage-type ] [-U UUID] "
+ "\t[-t fs-type] [-T usage-type ] [-U UUID] [-e errors_behavior]"
"[-jnqvDFKSV] device [blocks-count]\n"),
program_name);
exit(1);
@@ -1547,7 +1549,7 @@ profile_error:
}

while ((c = getopt (argc, argv,
- "b:cg:i:jl:m:no:qr:s:t:d:vC:DE:FG:I:J:KL:M:N:O:R:ST:U:V")) != EOF) {
+ "b:ce:g:i:jl:m:no:qr:s:t:d:vC:DE:FG:I:J:KL:M:N:O:R:ST:U:V")) != EOF) {
switch (c) {
case 'b':
blocksize = parse_num_blocks2(optarg, -1);
@@ -1590,6 +1592,20 @@ profile_error:
case 'E':
extended_opts = optarg;
break;
+ case 'e':
+ if (strcmp(optarg, "continue") == 0)
+ errors_behavior = EXT2_ERRORS_CONTINUE;
+ else if (strcmp(optarg, "remount-ro") == 0)
+ errors_behavior = EXT2_ERRORS_RO;
+ else if (strcmp(optarg, "panic") == 0)
+ errors_behavior = EXT2_ERRORS_PANIC;
+ else {
+ com_err(program_name, 0,
+ _("bad error behavior - %s"),
+ optarg);
+ usage();
+ }
+ break;
case 'F':
force++;
break;
@@ -2622,6 +2638,38 @@ static int create_quota_inodes(ext2_filsys fs)
return 0;
}

+static errcode_t set_error_behavior(ext2_filsys fs)
+{
+ char *arg = NULL;
+ short errors = fs->super->s_errors;
+
+ arg = get_string_from_profile(fs_types, "errors", NULL);
+ if (arg == NULL)
+ goto try_user;
+
+ if (strcmp(arg, "continue") == 0)
+ errors = EXT2_ERRORS_CONTINUE;
+ else if (strcmp(arg, "remount-ro") == 0)
+ errors = EXT2_ERRORS_RO;
+ else if (strcmp(arg, "panic") == 0)
+ errors = EXT2_ERRORS_PANIC;
+ else {
+ com_err(program_name, 0,
+ _("bad error behavior in profile - %s"),
+ arg);
+ free(arg);
+ return EXT2_ET_INVALID_ARGUMENT;
+ }
+ free(arg);
+
+try_user:
+ if (errors_behavior)
+ errors = errors_behavior;
+
+ fs->super->s_errors = errors;
+ return 0;
+}
+
int main (int argc, char *argv[])
{
errcode_t retval = 0;
@@ -2686,6 +2734,11 @@ int main (int argc, char *argv[])
}
fs->progress_ops = &ext2fs_numeric_progress_ops;

+ /* Set the error behavior */
+ retval = set_error_behavior(fs);
+ if (retval)
+ usage();
+
/* Check the user's mkfs options for metadata checksumming */
if (!quiet &&
EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
diff --git a/misc/mke2fs.conf.5.in b/misc/mke2fs.conf.5.in
index 19458ac..ad6c11b 100644
--- a/misc/mke2fs.conf.5.in
+++ b/misc/mke2fs.conf.5.in
@@ -317,6 +317,25 @@ whose subsections define the
relation, only the last will be used by
.BR mke2fs (8).
.TP
+.I errors
+Change the behavior of the kernel code when errors are detected.
+In all cases, a filesystem error will cause
+.BR e2fsck (8)
+to check the filesystem on the next boot.
+.I errors
+can be one of the following:
+.RS 1.2i
+.TP 1.2i
+.B continue
+Continue normal execution.
+.TP
+.B remount-ro
+Remount filesystem read-only.
+.TP
+.B panic
+Cause a kernel panic.
+.RE
+.TP
.I features
This relation specifies a comma-separated list of features edit
requests which modify the feature set
diff --git a/tests/t_mke2fs_errors/expect b/tests/t_mke2fs_errors/expect
new file mode 100644
index 0000000..78514bd
--- /dev/null
+++ b/tests/t_mke2fs_errors/expect
@@ -0,0 +1,24 @@
+error default
+Errors behavior: Continue
+error continue
+Errors behavior: Continue
+error panic
+Errors behavior: Panic
+error remount-ro
+Errors behavior: Remount read-only
+error garbage
+error default profile continue
+Errors behavior: Continue
+error default profile panic
+Errors behavior: Panic
+error default profile remount-ro
+Errors behavior: Remount read-only
+error default profile broken
+error fs_types profile continue
+Errors behavior: Continue
+error fs_types profile panic
+Errors behavior: Panic
+error fs_types profile remount-ro
+Errors behavior: Remount read-only
+error fs_types profile remount-ro
+Errors behavior: Panic
diff --git a/tests/t_mke2fs_errors/script b/tests/t_mke2fs_errors/script
new file mode 100755
index 0000000..d09e926
--- /dev/null
+++ b/tests/t_mke2fs_errors/script
@@ -0,0 +1,110 @@
+test_description="mke2fs with error behavior"
+
+conf=$TMPFILE.conf
+write_defaults_conf()
+{
+ errors="$1"
+ cat > $conf << ENDL
+[defaults]
+ errors = $errors
+ENDL
+}
+
+write_section_conf()
+{
+ errors="$1"
+ cat > $conf << ENDL
+[defaults]
+ errors = broken
+
+[fs_types]
+ test_suite = {
+ errors = $errors
+ }
+ENDL
+}
+
+trap "rm -rf $TMPFILE $TMPFILE.conf" EXIT INT QUIT
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+OUT=$test_name.log
+EXP=$test_dir/expect
+rm -rf $OUT
+
+# Test command line option
+echo "error default" >> $OUT
+$MKE2FS -F $TMPFILE > /dev/null 2>&1
+$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
+
+echo "error continue" >> $OUT
+$MKE2FS -e continue -F $TMPFILE > /dev/null 2>&1
+$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
+
+echo "error panic" >> $OUT
+$MKE2FS -e panic -F $TMPFILE > /dev/null 2>&1
+$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
+
+echo "error remount-ro" >> $OUT
+$MKE2FS -e remount-ro -F $TMPFILE > /dev/null 2>&1
+$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
+
+echo "error garbage" >> $OUT
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+$MKE2FS -e broken -F $TMPFILE > /dev/null 2>&1
+$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
+
+# Test errors= in default
+echo "error default profile continue" >> $OUT
+write_defaults_conf continue
+MKE2FS_CONFIG=$conf $MKE2FS -F $TMPFILE > /dev/null 2>&1
+$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
+
+echo "error default profile panic" >> $OUT
+write_defaults_conf panic
+MKE2FS_CONFIG=$conf $MKE2FS -F $TMPFILE > /dev/null 2>&1
+$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
+
+echo "error default profile remount-ro" >> $OUT
+write_defaults_conf remount-ro
+MKE2FS_CONFIG=$conf $MKE2FS -F $TMPFILE > /dev/null 2>&1
+$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
+
+echo "error default profile broken" >> $OUT
+write_defaults_conf broken
+dd if=/dev/zero of=$TMPFILE bs=1k count=512 > /dev/null 2>&1
+MKE2FS_CONFIG=$conf $MKE2FS -F $TMPFILE > /dev/null 2>&1
+$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
+
+# Test errors= in a fs type
+echo "error fs_types profile continue" >> $OUT
+write_section_conf continue
+MKE2FS_CONFIG=$conf $MKE2FS -F $TMPFILE -T test_suite > /dev/null 2>&1
+$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
+
+echo "error fs_types profile panic" >> $OUT
+write_section_conf panic
+MKE2FS_CONFIG=$conf $MKE2FS -F $TMPFILE -T test_suite > /dev/null 2>&1
+$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
+
+echo "error fs_types profile remount-ro" >> $OUT
+write_section_conf remount-ro
+MKE2FS_CONFIG=$conf $MKE2FS -F $TMPFILE -T test_suite > /dev/null 2>&1
+$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
+
+# Test command line override
+echo "error fs_types profile remount-ro" >> $OUT
+write_section_conf remount-ro
+MKE2FS_CONFIG=$conf $MKE2FS -F $TMPFILE -T test_suite -e panic > /dev/null 2>&1
+$DUMPE2FS $TMPFILE 2>&1 | grep 'Errors behavior' >> $OUT
+
+cmp -s $OUT $EXP
+status=$?
+
+if [ "$status" = 0 ] ; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ echo "$test_name: $test_description: failed"
+ diff $DIFF_OPTS $EXP $OUT > $test_name.failed
+ rm -f $test_name.tmp
+fi
+


2014-07-18 22:55:55

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH 24/24] e2fuzz: Create a tool to fuzz ext* filesystems

Creates a program that fuzzes only the metadata blocks (or optionally
all in-use blocks) of an ext* filesystem. There's also a script to
automate fuzz testing of the kernel and e2fsck in a loop.

Signed-off-by: Darrick J. Wong <[email protected]>
---
misc/Makefile.in | 17 ++-
misc/e2fuzz.c | 352 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
misc/e2fuzz.sh | 262 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 630 insertions(+), 1 deletion(-)
create mode 100644 misc/e2fuzz.c
create mode 100755 misc/e2fuzz.sh


diff --git a/misc/Makefile.in b/misc/Makefile.in
index 94fac22..d178ff0 100644
--- a/misc/Makefile.in
+++ b/misc/Makefile.in
@@ -57,6 +57,7 @@ FILEFRAG_OBJS= filefrag.o
E2UNDO_OBJS= e2undo.o
E4DEFRAG_OBJS= e4defrag.o
E2FREEFRAG_OBJS= e2freefrag.o
+E2FUZZ_OBJS= e2fuzz.o

PROFILED_TUNE2FS_OBJS= profiled/tune2fs.o profiled/util.o
PROFILED_MKLPF_OBJS= profiled/mklost+found.o
@@ -107,7 +108,7 @@ COMPILE_ET=$(top_builddir)/lib/et/compile_et --build-tree
@PROFILE_CMT@ $(Q) $(CC) $(ALL_CFLAGS) -g -pg -o profiled/$*.o -c $<

all:: profiled $(SPROGS) $(UPROGS) $(USPROGS) $(SMANPAGES) $(UMANPAGES) \
- $(FMANPAGES) $(LPROGS) $(E4DEFRAG_PROG)
+ $(FMANPAGES) $(LPROGS) $(E4DEFRAG_PROG) e2fuzz

@PROFILE_CMT@all:: tune2fs.profiled blkid.profiled e2image.profiled \
e2undo.profiled mke2fs.profiled dumpe2fs.profiled fsck.profiled \
@@ -344,6 +345,12 @@ e2freefrag.profiled: $(E2FREEFRAG_OBJS) $(PROFILED_DEPLIBS)
$(Q) $(CC) $(ALL_LDFLAGS) -g -pg -o e2freefrag.profiled \
$(PROFILED_E2FREEFRAG_OBJS) $(PROFILED_LIBS) $(SYSLIBS)

+e2fuzz: $(E2FUZZ_OBJS) $(DEPLIBS) $(DEPLIBBLKID) $(DEPLIBUUID) \
+ $(DEPLIBQUOTA) $(LIBEXT2FS)
+ $(E) " LD $@"
+ $(Q) $(CC) $(ALL_LDFLAGS) -o e2fuzz $(E2FUZZ_OBJS) $(LIBS) \
+ $(LIBBLKID) $(LIBUUID) $(LIBEXT2FS)
+
filefrag: $(FILEFRAG_OBJS)
$(E) " LD $@"
$(Q) $(CC) $(ALL_LDFLAGS) -o filefrag $(FILEFRAG_OBJS) $(SYSLIBS)
@@ -738,6 +745,14 @@ e2freefrag.o: $(srcdir)/e2freefrag.c $(top_builddir)/lib/config.h \
$(top_srcdir)/lib/ext2fs/ext2_io.h $(top_builddir)/lib/ext2fs/ext2_err.h \
$(top_srcdir)/lib/ext2fs/ext2_ext_attr.h $(top_srcdir)/lib/ext2fs/bitops.h \
$(srcdir)/e2freefrag.h
+e2fuzz.o: $(srcdir)/e2fuzz.c $(top_builddir)/lib/config.h \
+ $(top_builddir)/lib/dirpaths.h $(top_srcdir)/lib/ext2fs/tdb.h \
+ $(top_srcdir)/lib/ext2fs/ext2fs.h $(top_builddir)/lib/ext2fs/ext2_types.h \
+ $(top_srcdir)/lib/ext2fs/ext2_fs.h $(top_srcdir)/lib/ext2fs/ext3_extents.h \
+ $(top_srcdir)/lib/et/com_err.h $(top_srcdir)/lib/ext2fs/ext2_io.h \
+ $(top_builddir)/lib/ext2fs/ext2_err.h \
+ $(top_srcdir)/lib/ext2fs/ext2_ext_attr.h $(top_srcdir)/lib/ext2fs/bitops.h \
+ $(srcdir)/nls-enable.h
create_inode.o: $(srcdir)/create_inode.c $(srcdir)/create_inode.h \
$(top_srcdir)/lib/et/com_err.h $(top_srcdir)/lib/e2p/e2p.h \
$(top_srcdir)/lib/ext2fs/ext2_fs.h $(top_builddir)/lib/ext2fs/ext2_types.h \
diff --git a/misc/e2fuzz.c b/misc/e2fuzz.c
new file mode 100644
index 0000000..644c9c5
--- /dev/null
+++ b/misc/e2fuzz.c
@@ -0,0 +1,352 @@
+/*
+ * e2fuzz.c -- Fuzz an ext4 image, for testing purposes.
+ *
+ * Copyright (C) 2014 Oracle.
+ *
+ * %Begin-Header%
+ * This file may be redistributed under the terms of the GNU Library
+ * General Public License, version 2.
+ * %End-Header%
+ */
+#define _XOPEN_SOURCE 600
+#define _FILE_OFFSET_BITS 64
+#define _LARGEFILE64_SOURCE 1
+#define _GNU_SOURCE 1
+
+#include "config.h"
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <stdint.h>
+#ifdef HAVE_GETOPT_H
+#include <getopt.h>
+#endif
+
+#include "ext2fs/ext2_fs.h"
+#include "ext2fs/ext2fs.h"
+
+static int dryrun = 0;
+static int verbose = 0;
+static int metadata_only = 1;
+static unsigned long long user_corrupt_bytes = 0;
+static double user_corrupt_pct = 0.0;
+
+int getseed(void)
+{
+ int r;
+ int fd;
+
+ fd = open("/dev/urandom", O_RDONLY);
+ if (fd < 0) {
+ perror("open");
+ exit(0);
+ }
+ read(fd, &r, sizeof(r));
+ close(fd);
+ return r;
+}
+
+struct find_block {
+ ext2_ino_t ino;
+ ext2fs_block_bitmap bmap;
+ struct ext2_inode *inode;
+ blk64_t corrupt_blocks;
+};
+
+int find_block_helper(ext2_filsys fs, blk64_t *blocknr, e2_blkcnt_t blockcnt,
+ blk64_t ref_blk, int ref_offset, void *priv_data)
+{
+ struct find_block *fb = (struct find_block *)priv_data;
+
+ if (S_ISDIR(fb->inode->i_mode) || !metadata_only || blockcnt < 0) {
+ ext2fs_mark_block_bitmap2(fb->bmap, *blocknr);
+ fb->corrupt_blocks++;
+ }
+
+ return 0;
+}
+
+errcode_t find_metadata_blocks(ext2_filsys fs, ext2fs_block_bitmap bmap,
+ off_t *corrupt_bytes)
+{
+ dgrp_t i;
+ blk64_t b, c, d, j, old_desc_blocks;
+ ext2_inode_scan scan;
+ ext2_ino_t ino;
+ struct ext2_inode inode;
+ struct find_block fb;
+ errcode_t retval;
+
+ *corrupt_bytes = 0;
+ fb.corrupt_blocks = 0;
+ if (EXT2_HAS_INCOMPAT_FEATURE(fs->super,
+ EXT2_FEATURE_INCOMPAT_META_BG))
+ old_desc_blocks = fs->super->s_first_meta_bg;
+ else
+ old_desc_blocks = fs->desc_blocks +
+ fs->super->s_reserved_gdt_blocks;
+
+ /* Construct bitmaps of super/descriptor blocks */
+ for (i = 0; i < fs->group_desc_count; i++) {
+ ext2fs_reserve_super_and_bgd(fs, i, bmap);
+
+ /* bitmaps and inode table */
+ b = ext2fs_block_bitmap_loc(fs, i);
+ ext2fs_mark_block_bitmap2(bmap, b);
+ fb.corrupt_blocks++;
+
+ b = ext2fs_inode_bitmap_loc(fs, i);
+ ext2fs_mark_block_bitmap2(bmap, b);
+ fb.corrupt_blocks++;
+
+ c = ext2fs_inode_table_loc(fs, i);
+ ext2fs_mark_block_bitmap_range2(bmap, c,
+ fs->inode_blocks_per_group);
+ fb.corrupt_blocks += fs->inode_blocks_per_group;
+ }
+
+ /* Scan inodes */
+ fb.bmap = bmap;
+ fb.inode = &inode;
+ memset(&inode, 0, sizeof(inode));
+ retval = ext2fs_open_inode_scan(fs, 0, &scan);
+ if (retval)
+ goto out;
+
+ retval = ext2fs_get_next_inode_full(scan, &ino, &inode, sizeof(inode));
+ if (retval)
+ goto out2;
+ while (ino) {
+ if (inode.i_links_count == 0)
+ goto next_loop;
+
+ b = ext2fs_file_acl_block(fs, &inode);
+ if (b) {
+ ext2fs_mark_block_bitmap2(bmap, b);
+ fb.corrupt_blocks++;
+ }
+
+ /*
+ * Inline data, sockets, devices, and symlinks have
+ * no blocks to iterate.
+ */
+ if ((inode.i_flags & EXT4_INLINE_DATA_FL) ||
+ S_ISLNK(inode.i_mode) || S_ISFIFO(inode.i_mode) ||
+ S_ISCHR(inode.i_mode) || S_ISBLK(inode.i_mode) ||
+ S_ISSOCK(inode.i_mode))
+ goto next_loop;
+ fb.ino = ino;
+ retval = ext2fs_block_iterate3(fs, ino, BLOCK_FLAG_READ_ONLY,
+ NULL, find_block_helper, &fb);
+ if (retval)
+ goto out2;
+next_loop:
+ retval = ext2fs_get_next_inode_full(scan, &ino, &inode,
+ sizeof(inode));
+ if (retval)
+ goto out2;
+ }
+out2:
+ ext2fs_close_inode_scan(scan);
+out:
+ if (!retval)
+ *corrupt_bytes = fb.corrupt_blocks * fs->blocksize;
+ return retval;
+}
+
+uint64_t rand_num(uint64_t min, uint64_t max)
+{
+ uint64_t x;
+ int i;
+ uint8_t *px = (uint8_t *)&x;
+
+ for (i = 0; i < sizeof(x); i++)
+ px[i] = random();
+
+ return min + (uint64_t)((double)(max - min) * (x / (UINT64_MAX + 1.0)));
+}
+
+int process_fs(const char *fsname)
+{
+ errcode_t ret;
+ int flags, fd;
+ ext2_filsys fs = NULL;
+ ext2fs_block_bitmap corrupt_map;
+ off_t hsize, count, off, offset, corrupt_bytes;
+ unsigned char c;
+ unsigned long i;
+
+ /* If mounted rw, force dryrun mode */
+ ret = ext2fs_check_if_mounted(fsname, &flags);
+ if (ret) {
+ fprintf(stderr, "%s: failed to determine filesystem mount "
+ "state.\n", fsname);
+ return 1;
+ }
+
+ if (!dryrun && (flags & EXT2_MF_MOUNTED) &&
+ !(flags & EXT2_MF_READONLY)) {
+ fprintf(stderr, "%s: is mounted rw, performing dry run.\n",
+ fsname);
+ dryrun = 1;
+ }
+
+ /* Ensure the fs is clean and does not have errors */
+ ret = ext2fs_open(fsname, EXT2_FLAG_64BITS, 0, 0, unix_io_manager,
+ &fs);
+ if (ret) {
+ fprintf(stderr, "%s: failed to open filesystem.\n",
+ fsname);
+ return 1;
+ }
+
+ if ((fs->super->s_state & EXT2_ERROR_FS)) {
+ fprintf(stderr, "%s: errors detected, run fsck.\n",
+ fsname);
+ goto fail;
+ }
+
+ if (!dryrun && (fs->super->s_state & EXT2_VALID_FS) == 0) {
+ fprintf(stderr, "%s: unclean shutdown, performing dry run.\n",
+ fsname);
+ dryrun = 1;
+ }
+
+ /* Construct a bitmap of whatever we're corrupting */
+ if (!metadata_only) {
+ /* Load block bitmap */
+ ret = ext2fs_read_block_bitmap(fs);
+ if (ret) {
+ fprintf(stderr, "%s: error while reading block bitmap\n",
+ fsname);
+ goto fail;
+ }
+ corrupt_map = fs->block_map;
+ corrupt_bytes = (ext2fs_blocks_count(fs->super) -
+ ext2fs_free_blocks_count(fs->super)) *
+ fs->blocksize;
+ } else {
+ ret = ext2fs_allocate_block_bitmap(fs, "metadata block map",
+ &corrupt_map);
+ if (ret) {
+ fprintf(stderr, "%s: unable to create block bitmap\n",
+ fsname);
+ goto fail;
+ }
+
+ /* Iterate everything... */
+ ret = find_metadata_blocks(fs, corrupt_map, &corrupt_bytes);
+ if (ret) {
+ fprintf(stderr, "%s: while finding metadata\n",
+ fsname);
+ goto fail;
+ }
+ }
+
+ /* Run around corrupting things */
+ fd = open(fsname, O_RDWR);
+ if (fd < 0) {
+ perror(fsname);
+ goto fail;
+ }
+ srandom(getseed());
+ hsize = fs->blocksize * ext2fs_blocks_count(fs->super);
+ if (user_corrupt_bytes > 0)
+ count = user_corrupt_bytes;
+ else if (user_corrupt_pct > 0.0)
+ count = user_corrupt_pct * corrupt_bytes / 100;
+ else
+ count = rand_num(0, corrupt_bytes / 100);
+ offset = 4096; /* never corrupt superblock */
+ for (i = 0; i < count; i++) {
+ do
+ off = rand_num(offset, hsize);
+ while (!ext2fs_test_block_bitmap2(corrupt_map,
+ off / fs->blocksize));
+ c = rand() % 256;
+ if ((rand() % 2) && c < 128)
+ c |= 0x80;
+ if (verbose)
+ printf("Corrupting byte %jd in block %jd to 0x%x\n",
+ off % fs->blocksize, off / fs->blocksize, c);
+ if (dryrun)
+ continue;
+ if (pwrite64(fd, &c, sizeof(c), off) != sizeof(c)) {
+ perror(fsname);
+ goto fail3;
+ }
+ }
+ close(fd);
+
+ /* Clean up */
+ ret = ext2fs_close(fs);
+ if (ret) {
+ fs = NULL;
+ fprintf(stderr, "%s: error while closing filesystem\n",
+ fsname);
+ goto fail2;
+ }
+
+ return 0;
+fail3:
+ close(fd);
+fail2:
+ if (corrupt_map != fs->block_map)
+ ext2fs_free_block_bitmap(corrupt_map);
+fail:
+ if (fs)
+ ext2fs_close(fs);
+ return 1;
+}
+
+void print_help(const char *progname)
+{
+ printf("Usage: %s OPTIONS device\n", progname);
+ printf("-b: Corrupt this many bytes.\n");
+ printf("-d: Fuzz data blocks too.\n");
+ printf("-n: Dry run only.\n");
+ printf("-v: Verbose output.\n");
+ exit(0);
+}
+
+int main(int argc, char *argv[])
+{
+ int c;
+
+ while ((c = getopt(argc, argv, "b:dnv")) != -1) {
+ switch (c) {
+ case 'b':
+ if (optarg[strlen(optarg) - 1] == '%') {
+ user_corrupt_pct = strtod(optarg, NULL);
+ if (user_corrupt_pct > 100 ||
+ user_corrupt_pct < 0) {
+ fprintf(stderr, "%s: Invalid percentage.\n",
+ optarg);
+ return 1;
+ }
+ } else
+ user_corrupt_bytes = strtoull(optarg, NULL, 0);
+ if (errno) {
+ perror(optarg);
+ return 1;
+ }
+ break;
+ case 'd':
+ metadata_only = 0;
+ break;
+ case 'n':
+ dryrun = 1;
+ break;
+ case 'v':
+ verbose = 1;
+ break;
+ default:
+ print_help(argv[0]);
+ }
+ }
+
+ for (c = optind; c < argc; c++)
+ if (process_fs(argv[c]))
+ return 1;
+ return 0;
+}
diff --git a/misc/e2fuzz.sh b/misc/e2fuzz.sh
new file mode 100755
index 0000000..a261d5a
--- /dev/null
+++ b/misc/e2fuzz.sh
@@ -0,0 +1,262 @@
+#!/bin/bash
+
+# Test harness to fuzz a filesystem over and over...
+# Copyright (C) 2014 Oracle.
+
+DIR=/tmp
+PASSES=10000
+SZ=32m
+SCRIPT_DIR="$(dirname "$0")"
+FEATURES="has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,64bit,metadata_csum,bigalloc,sparse_super2,inline_data"
+BLK_SZ=4096
+INODE_SZ=256
+EXTENDED_OPTS="discard"
+EXTENDED_FSCK_OPTIONS=""
+RUN_FSCK=1
+OVERRIDE_PATH=1
+HAS_FUSE2FS=0
+USE_FUSE2FS=0
+MAX_FSCK=10
+SRCDIR=/etc
+test -x "${SCRIPT_DIR}/fuse2fs" && HAS_FUSE2FS=1
+
+print_help() {
+ echo "Usage: $0 OPTIONS"
+ echo "-b: FS block size is this. (${BLK_SZ})"
+ echo "-B: Corrupt this many bytes per run."
+ echo "-d: Create test files in this directory. (${DIR})"
+ echo "-E: Extended mke2fs options."
+ echo "-f: Do not run e2fsck after each pass."
+ echo "-F: Extended e2fsck options."
+ echo "-I: Create inodes of this size. (${INODE_SZ})"
+ echo "-n: Run this many passes. (${PASSES})"
+ echo "-O: Create FS with these features."
+ echo "-p: Use system's mke2fs/e2fsck/tune2fs tools."
+ echo "-s: Create FS images of this size. (${SZ})"
+ echo "-S: Copy files from this dir. (${SRCDIR})"
+ echo "-x: Run e2fck at most this many times. (${MAX_FSCK})"
+ test "${HAS_FUSE2FS}" -gt 0 && echo "-u: Use fuse2fs instead of the kernel."
+ exit 0
+}
+
+GETOPT="d:n:s:O:I:b:B:E:F:fpx:S:"
+test "${HAS_FUSE2FS}" && GETOPT="${GETOPT}u"
+
+while getopts "${GETOPT}" opt; do
+ case "${opt}" in
+ "B")
+ E2FUZZ_ARGS="${E2FUZZ_ARGS} -b ${OPTARG}"
+ ;;
+ "d")
+ DIR="${OPTARG}"
+ ;;
+ "n")
+ PASSES="${OPTARG}"
+ ;;
+ "s")
+ SZ="${OPTARG}"
+ ;;
+ "O")
+ FEATURES="${FEATURES},${OPTARG}"
+ ;;
+ "I")
+ INODE_SZ="${OPTARG}"
+ ;;
+ "b")
+ BLK_SZ="${OPTARG}"
+ ;;
+ "E")
+ EXTENDED_OPTS="${OPTARG}"
+ ;;
+ "F")
+ EXTENDED_FSCK_OPTS="-E ${OPTARG}"
+ ;;
+ "f")
+ RUN_FSCK=0
+ ;;
+ "p")
+ OVERRIDE_PATH=0
+ ;;
+ "u")
+ USE_FUSE2FS=1
+ ;;
+ "x")
+ MAX_FSCK="${OPTARG}"
+ ;;
+ "S")
+ SRCDIR="${OPTARG}"
+ ;;
+ *)
+ print_help
+ ;;
+ esac
+done
+
+if [ "${OVERRIDE_PATH}" -gt 0 ]; then
+ PATH="${SCRIPT_DIR}:${SCRIPT_DIR}/../e2fsck/:${PATH}"
+ export PATH
+fi
+
+TESTDIR="${DIR}/tests/"
+TESTMNT="${DIR}/mnt/"
+BASE_IMG="${DIR}/e2fuzz.img"
+
+cat > /tmp/mke2fs.conf << ENDL
+[defaults]
+ base_features = ${FEATURES}
+ default_mntopts = acl,user_xattr,block_validity
+ enable_periodic_fsck = 0
+ blocksize = ${BLK_SZ}
+ inode_size = ${INODE_SZ}
+ inode_ratio = 4096
+ cluster_size = $((BLK_SZ * 2))
+ options = ${EXTENDED_OPTS}
+ENDL
+MKE2FS_CONFIG=/tmp/mke2fs.conf
+export MKE2FS_CONFIG
+
+# Set up FS image
+echo "+ create fs image"
+umount "${TESTDIR}"
+umount "${TESTMNT}"
+rm -rf "${TESTDIR}"
+rm -rf "${TESTMNT}"
+mkdir -p "${TESTDIR}"
+mkdir -p "${TESTMNT}"
+rm -rf "${BASE_IMG}"
+truncate -s "${SZ}" "${BASE_IMG}"
+mke2fs -F -v "${BASE_IMG}"
+if [ $? -ne 0 ]; then
+ exit $?
+fi
+
+# Populate FS image
+echo "+ populate fs image"
+modprobe loop
+mount "${BASE_IMG}" "${TESTMNT}" -o loop
+if [ $? -ne 0 ]; then
+ exit $?
+fi
+SRC_SZ="$(du -ks "${SRCDIR}" | awk '{print $1}')"
+FS_SZ="$(( $(stat -f "${TESTMNT}" -c '%a * %S') / 1024 ))"
+NR="$(( (FS_SZ * 6 / 10) / SRC_SZ ))"
+if [ "${NR}" -lt 1 ]; then
+ NR=1
+fi
+echo "+ make ${NR} copies"
+seq 1 "${NR}" | while read nr; do
+ cp -pRdu "${SRCDIR}" "${TESTMNT}/test.${nr}" 2> /dev/null
+done
+umount "${TESTMNT}"
+e2fsck -fn "${BASE_IMG}"
+if [ $? -ne 0 ]; then
+ echo "fsck failed??"
+ exit 1
+fi
+
+# Run tests
+echo "+ run test"
+ret=0
+seq 1 "${PASSES}" | while read pass; do
+ echo "+ pass ${pass}"
+ PASS_IMG="${TESTDIR}/e2fuzz-${pass}.img"
+ FSCK_IMG="${TESTDIR}/e2fuzz-${pass}.fsck"
+ FUZZ_LOG="${TESTDIR}/e2fuzz-${pass}.fuzz.log"
+ OPS_LOG="${TESTDIR}/e2fuzz-${pass}.ops.log"
+
+ echo "++ corrupt image"
+ cp "${BASE_IMG}" "${PASS_IMG}"
+ if [ $? -ne 0 ]; then
+ exit $?
+ fi
+ tune2fs -L "e2fuzz-${pass}" "${PASS_IMG}"
+ e2fuzz -v "${PASS_IMG}" ${E2FUZZ_ARGS} > "${FUZZ_LOG}"
+ if [ $? -ne 0 ]; then
+ exit $?
+ fi
+
+ echo "++ mount image"
+ if [ "${USE_FUSE2FS}" -gt 0 ]; then
+ "${SCRIPT_DIR}/fuse2fs" "${PASS_IMG}" "${TESTMNT}"
+ res=$?
+ else
+ mount "${PASS_IMG}" "${TESTMNT}" -o loop
+ res=$?
+ fi
+
+ if [ "${res}" -eq 0 ]; then
+ echo "+++ ls -laR"
+ ls -laR "${TESTMNT}/test.1/" > /dev/null 2> "${OPS_LOG}"
+
+ echo "+++ cat files"
+ find "${TESTMNT}/test.1/" -type f -size -1048576k -print0 | xargs -0 cat > /dev/null 2>> "${OPS_LOG}"
+
+ echo "+++ expand"
+ find "${TESTMNT}/test.1/" -type f 2> /dev/null | while read f; do
+ attr -l "$f" > /dev/null 2>> "${OPS_LOG}"
+ mv "$f" "$f.longer" > /dev/null 2>> "${OPS_LOG}"
+ if [ -f "$f" -a -w "$f" ]; then
+ dd if=/dev/zero bs="${BLK_SZ}" count=1 >> "$f" 2>> "${OPS_LOG}"
+ fi
+ done
+ sync
+
+ echo "+++ create files"
+ cp -pRdu "${SRCDIR}" "${TESTMNT}/test.moo" 2>> "${OPS_LOG}"
+ sync
+
+ echo "+++ remove files"
+ rm -rf "${TESTMNT}/test.moo" 2>> "${OPS_LOG}"
+
+ umount "${TESTMNT}"
+ res=$?
+ if [ "${res}" -ne 0 ]; then
+ ret=1
+ break
+ fi
+ sync
+ test "${USE_FUSE2FS}" -gt 0 && sleep 2
+ fi
+ if [ "${RUN_FSCK}" -gt 0 ]; then
+ cp "${PASS_IMG}" "${FSCK_IMG}"
+
+ seq 1 "${MAX_FSCK}" | while read fsck_pass; do
+ echo "++ fsck pass ${fsck_pass}: $(which e2fsck) -fy ${FSCK_IMG} ${EXTENDED_FSCK_OPTS}"
+ FSCK_LOG="${TESTDIR}/e2fuzz-${pass}-${fsck_pass}.log"
+ e2fsck -fy "${FSCK_IMG}" ${EXTENDED_FSCK_OPTS} > "${FSCK_LOG}" 2>&1
+ res=$?
+ echo "++ fsck returns ${res}"
+ if [ "${res}" -eq 0 ]; then
+ exit 0
+ elif [ "${fsck_pass}" -eq "${MAX_FSCK}" ]; then
+ echo "++ fsck did not fix in ${MAX_FSCK} passes."
+ exit 1
+ fi
+ if [ "${res}" -gt 0 -a \
+ "$(grep 'Memory allocation failed' "${FSCK_LOG}" | wc -l)" -gt 0 ]; then
+ echo "++ Ran out of memory, get more RAM"
+ exit 0
+ fi
+ if [ "${res}" -gt 0 -a \
+ "$(grep 'Could not allocate block' "${FSCK_LOG}" | wc -l)" -gt 0 -a \
+ "$(dumpe2fs -h "${FSCK_IMG}" | grep '^Free blocks:' | awk '{print $3}')0" -eq 0 ]; then
+ echo "++ Ran out of space, get a bigger image"
+ exit 0
+ fi
+ if [ "${fsck_pass}" -gt 1 ]; then
+ diff -u "${TESTDIR}/e2fuzz-${pass}-$((fsck_pass - 1)).log" "${FSCK_LOG}"
+ if [ $? -eq 0 ]; then
+ echo "++ fsck makes no progress"
+ exit 2
+ fi
+ fi
+ done
+ fsck_loop_ret=$?
+ if [ "${fsck_loop_ret}" -gt 0 ]; then
+ break;
+ fi
+ fi
+ rm -rf "${FSCK_IMG}" "${PASS_IMG}" "${FUZZ_LOG}" "${TESTDIR}"/e2fuzz*.log
+done
+
+exit $ret


2014-07-19 16:49:42

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH 02/24] debugfs: Only print the first 60 bytes from i_block on a fast symlink

Wouldn't it be better to decode and print the whole symlink from the inline data? Depending on the use of debugfs, printing only the first 60 bytes may cause people/scripts to think the link is corrupted.

Cheers, Andreas

> On Jul 18, 2014, at 16:52, "Darrick J. Wong" <[email protected]> wrote:
>
> If we have an inline_data fast symlink, i_size can be larger than the
> size of i_block. In this case, debugfs prints off the end of the
> buffer, so fix that.
>
> Signed-off-by: Darrick J. Wong <[email protected]>
> ---
> debugfs/debugfs.c | 15 ++++++++++-----
> tests/d_special_files/expect | 2 +-
> 2 files changed, 11 insertions(+), 6 deletions(-)
>
>
> diff --git a/debugfs/debugfs.c b/debugfs/debugfs.c
> index 51f386b..5eecabe 100644
> --- a/debugfs/debugfs.c
> +++ b/debugfs/debugfs.c
> @@ -783,11 +783,16 @@ void internal_dump_inode(FILE *out, const char *prefix,
> }
>
> if (LINUX_S_ISLNK(inode->i_mode) &&
> - ext2fs_inode_data_blocks(current_fs,inode) == 0 &&
> - !(inode->i_flags & EXT4_INLINE_DATA_FL))
> - fprintf(out, "%sFast_link_dest: %.*s\n", prefix,
> - (int) inode->i_size, (char *)inode->i_block);
> - else if (LINUX_S_ISBLK(inode->i_mode) || LINUX_S_ISCHR(inode->i_mode)) {
> + ext2fs_inode_data_blocks(current_fs, inode) == 0 &&
> + !(inode->i_flags & EXT4_INLINE_DATA_FL)) {
> + int sz = EXT2_I_SIZE(inode);
> +
> + if (sz > sizeof(inode->i_block))
> + sz = sizeof(inode->i_block);
> + fprintf(out, "%sFast link dest: \"%.*s\"\n", prefix, sz,
> + (char *)inode->i_block);
> + } else if (LINUX_S_ISBLK(inode->i_mode) ||
> + LINUX_S_ISCHR(inode->i_mode)) {
> int major, minor;
> const char *devnote;
>
> diff --git a/tests/d_special_files/expect b/tests/d_special_files/expect
> index 2b2dbfa..f729b0f 100644
> --- a/tests/d_special_files/expect
> +++ b/tests/d_special_files/expect
> @@ -11,7 +11,7 @@ Fragment: Address: 0 Number: 0 Size: 0
> ctime: 0x50f560e0 -- Tue Jan 15 14:00:00 2013
> atime: 0x50f560e0 -- Tue Jan 15 14:00:00 2013
> mtime: 0x50f560e0 -- Tue Jan 15 14:00:00 2013
> -Fast_link_dest: bar
> +Fast link dest: "bar"
> Exit status is 0
> debugfs -R ''stat foo2'' -w test.img
> Inode: 13 Type: symlink Mode: 0777 Flags: 0x0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-07-21 23:51:57

by Akira Fujita

[permalink] [raw]
Subject: RE: [PATCH 01/24] e4defrag: backwards-allocated files should be defragmented too

>-----Original Message-----
>From: [email protected] [mailto:[email protected]] On Behalf Of
>Darrick J. Wong
>Sent: Saturday, July 19, 2014 7:52 AM
>To: [email protected]; [email protected]
>Cc: Xiaoguang Wang; [email protected]
>Subject: [PATCH 01/24] e4defrag: backwards-allocated files should be defragmented too
>
>Currently, e4defrag avoids increasing file fragmentation by comparing
>the number of runs of physical extents of both the original and the
>donor files. Unfortunately, there is a bug in the routine that counts
>physical extents, since it doesn't look at the logical block offsets
>of the extents. Therefore, a file whose blocks were allocated in
>reverse order will be seen as only having one big physical extent, and
>therefore will not be defragmented.
>
>Fix the counting routine to consider logical extent offset so that we
>defragment backwards-allocated files. This could be problematic if we
>ever gain the ability to lay out logically sparse extents in a
>physically contiguous manner, but presumably one wouldn't call defrag
>on such a file.
>
>Reported-by: Xiaoguang Wang <[email protected]>
>Signed-off-by: Darrick J. Wong <[email protected]>
>---
> misc/e4defrag.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
>
>diff --git a/misc/e4defrag.c b/misc/e4defrag.c
>index a204793..d0eac60 100644
>--- a/misc/e4defrag.c
>+++ b/misc/e4defrag.c
>@@ -888,7 +888,9 @@ static int get_physical_count(struct fiemap_extent_list *physical_list_head)
>
> do {
> if ((ext_list_tmp->data.physical + ext_list_tmp->data.len)
>- != ext_list_tmp->next->data.physical) {
>+ != ext_list_tmp->next->data.physical ||
>+ (ext_list_tmp->data.logical + ext_list_tmp->data.len)
>+ != ext_list_tmp->next->data.logical) {
> /* This extent and next extent are not continuous. */
> ret++;
> }

It looks good to me.

Reviewed-by: Akira Fujita <[email protected]>

Thanks,
Akira Fujita

2014-07-22 20:34:30

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 08/24] e2fsck: fix inode coherency issue when iterating an inode's blocks

On Fri, Jul 18, 2014 at 03:53:11PM -0700, Darrick J. Wong wrote:
> When we're about to iterate the blocks of a block-map file, we need to
> write the inode out to disk if it's dirty because block_iterate3()
> will re-read the inode from disk. (In practice this won't happen
> because nothing dirties block-mapped inodes before the iterate call,
> but we can program defensively).
>
> More importantly, we need to re-read the inode after the iterate()
> operation because it's possible that mappings were changed (or erased)
> during the iteration. If we then dirty or clear the inode, we'll
> mistakenly write the old inode values back out to disk!
>
> Signed-off-by: Darrick J. Wong <[email protected]>

Thanks, applied.

- Ted

2014-07-22 20:34:30

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 06/24] e2fsck: don't offer to recreate the journal if fsck is aborting due to bad block bitmaps

On Fri, Jul 18, 2014 at 03:52:56PM -0700, Darrick J. Wong wrote:
> If e2fsck knows the bitmaps are bad at the exit (probably because they
> were bad at the start and have not been fixed), don't offer to
> recreate the journal because doing so causes e2fsck to abort a second
> time.
>
> Signed-off-by: Darrick J. Wong <[email protected]>

Thanks, applied.

- Ted

2014-07-22 20:34:30

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 07/24] e2fsck: skip clearing bad extents if bitmaps are unreadable

On Fri, Jul 18, 2014 at 03:53:04PM -0700, Darrick J. Wong wrote:
> If the bitmaps are known to be unreadable, don't bother clearing them;
> just mark fsck to restart itself after pass 5, by which time the
> bitmaps should be fixed.
>
> Signed-off-by: Darrick J. Wong <[email protected]>

Thanks, applied.

- Ted

2014-07-22 20:34:45

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH 02/24] debugfs: Only print the first 60 bytes from i_block on a fast symlink

On Sat, Jul 19, 2014 at 10:49:35AM -0600, Andreas Dilger wrote:
> Wouldn't it be better to decode and print the whole symlink from the inline
> data? Depending on the use of debugfs, printing only the first 60 bytes may
> cause people/scripts to think the link is corrupted.

Hmm, you're right, this /does/ expose some weird behavior w.r.t. fast link
printing. I think I should go further and inhibit printing of system.data
entirely during xattr dump since dump_inode doesn't print file contents, and
the inline data could be fairly long.

Ok, I retract this patch and I'll repost it as part of the inline data cleanup
later this week.

--D
>
> Cheers, Andreas
>
> > On Jul 18, 2014, at 16:52, "Darrick J. Wong" <[email protected]> wrote:
> >
> > If we have an inline_data fast symlink, i_size can be larger than the
> > size of i_block. In this case, debugfs prints off the end of the
> > buffer, so fix that.
> >
> > Signed-off-by: Darrick J. Wong <[email protected]>
> > ---
> > debugfs/debugfs.c | 15 ++++++++++-----
> > tests/d_special_files/expect | 2 +-
> > 2 files changed, 11 insertions(+), 6 deletions(-)
> >
> >
> > diff --git a/debugfs/debugfs.c b/debugfs/debugfs.c
> > index 51f386b..5eecabe 100644
> > --- a/debugfs/debugfs.c
> > +++ b/debugfs/debugfs.c
> > @@ -783,11 +783,16 @@ void internal_dump_inode(FILE *out, const char *prefix,
> > }
> >
> > if (LINUX_S_ISLNK(inode->i_mode) &&
> > - ext2fs_inode_data_blocks(current_fs,inode) == 0 &&
> > - !(inode->i_flags & EXT4_INLINE_DATA_FL))
> > - fprintf(out, "%sFast_link_dest: %.*s\n", prefix,
> > - (int) inode->i_size, (char *)inode->i_block);
> > - else if (LINUX_S_ISBLK(inode->i_mode) || LINUX_S_ISCHR(inode->i_mode)) {
> > + ext2fs_inode_data_blocks(current_fs, inode) == 0 &&
> > + !(inode->i_flags & EXT4_INLINE_DATA_FL)) {
> > + int sz = EXT2_I_SIZE(inode);
> > +
> > + if (sz > sizeof(inode->i_block))
> > + sz = sizeof(inode->i_block);
> > + fprintf(out, "%sFast link dest: \"%.*s\"\n", prefix, sz,
> > + (char *)inode->i_block);
> > + } else if (LINUX_S_ISBLK(inode->i_mode) ||
> > + LINUX_S_ISCHR(inode->i_mode)) {
> > int major, minor;
> > const char *devnote;
> >
> > diff --git a/tests/d_special_files/expect b/tests/d_special_files/expect
> > index 2b2dbfa..f729b0f 100644
> > --- a/tests/d_special_files/expect
> > +++ b/tests/d_special_files/expect
> > @@ -11,7 +11,7 @@ Fragment: Address: 0 Number: 0 Size: 0
> > ctime: 0x50f560e0 -- Tue Jan 15 14:00:00 2013
> > atime: 0x50f560e0 -- Tue Jan 15 14:00:00 2013
> > mtime: 0x50f560e0 -- Tue Jan 15 14:00:00 2013
> > -Fast_link_dest: bar
> > +Fast link dest: "bar"
> > Exit status is 0
> > debugfs -R ''stat foo2'' -w test.img
> > Inode: 13 Type: symlink Mode: 0777 Flags: 0x0
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-07-22 20:35:06

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 03/24] debugfs: create idump command to dump an inode in hex

On Fri, Jul 18, 2014 at 03:52:36PM -0700, Darrick J. Wong wrote:
> Create a command that will dump an entire inode's space in hex.
>
> Signed-off-by: Darrick J. Wong <[email protected]>

Applied with the following addition which (a) adds a formal longer
command name ("inode_dump") to be analogous with the "block_dump"
command, and (b) adds an entry to the debugfs man page.

- Ted

diff --git a/debugfs/debug_cmds.ct b/debugfs/debug_cmds.ct
index ef4c5c4..ed3728f 100644
--- a/debugfs/debug_cmds.ct
+++ b/debugfs/debug_cmds.ct
@@ -188,7 +188,7 @@ request do_zap_block, "Zap block: fill with 0, pattern, flip bits etc.",
zap_block, zap;

request do_block_dump, "Dump contents of a block",
- block_dump, bd;
+ block_dump, bdump, bd;

request do_list_quota, "List quota",
list_quota, lq;
@@ -196,8 +196,8 @@ request do_list_quota, "List quota",
request do_get_quota, "Get quota",
get_quota, gq;

-request do_idump, "Dump inode",
- idump;
+request do_idump, "Dump the inode structure in hex",
+ inode_dump, idump, id;


end;
diff --git a/debugfs/debugfs.8.in b/debugfs/debugfs.8.in
index 73254d3..9a125f6 100644
--- a/debugfs/debugfs.8.in
+++ b/debugfs/debugfs.8.in
@@ -346,6 +346,9 @@ showing its tree structure.
Print a listing of the inodes which use the one or more blocks specified
on the command line.
.TP
+.BI inode_dump " filespec"
+Print the contents of the inode data structure in hex and ASCII format.
+.TP
.BI imap " filespec"
Print the location of the inode data structure (in the inode table)
of the inode

2014-07-22 20:35:06

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 05/24] e2fsck: report correct inode number in pass1b

On Fri, Jul 18, 2014 at 03:52:49PM -0700, Darrick J. Wong wrote:
> If there's a problem with the inode scan during pass 1b, report the
> inode that we were trying to examine when the error happened, not the
> inode that just went through the checker.
>
> Signed-off-by: Darrick J. Wong <[email protected]>

Thanks, applied.

- Ted

2014-07-22 20:35:06

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 09/24] e2fsck: clear i_block if there are too many bad block mappings

On Fri, Jul 18, 2014 at 03:53:21PM -0700, Darrick J. Wong wrote:
> If there are too many bad block mappings in a file and the user says
> to zap it, erase i_block before clearing the inode.
>
> Signed-off-by: Darrick J. Wong <[email protected]>

Why is this necessary? E2fsck will clear i_links_count and set dtime,
so the contents of i_block shouldn't matter, yes?

- Ted

>
> diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
> index b696d02..18980f1 100644
> --- a/e2fsck/pass1.c
> +++ b/e2fsck/pass1.c
> @@ -2329,6 +2329,7 @@ static void check_blocks(e2fsck_t ctx, struct problem_context *pctx,
> }
>
> if (pb.clear) {
> + memset(inode->i_block, 0, sizeof(inode->i_block));
> e2fsck_clear_inode(ctx, ino, inode, E2F_FLAG_RESTART,
> "check_blocks");
> return;
>

2014-07-22 20:35:05

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 02/24] debugfs: Only print the first 60 bytes from i_block on a fast symlink

On Sat, Jul 19, 2014 at 10:49:35AM -0600, Andreas Dilger wrote:
> Wouldn't it be better to decode and print the whole symlink from the
> inline data? Depending on the use of debugfs, printing only the
> first 60 bytes may cause people/scripts to think the link is
> corrupted.

I agree in principle, although for now, this fixes a bug, so I'll
apply it. Humans shouldn't get too confused, since we now do print
the extended attributes:

debugfs: stat symlinkfile
Inode: 131075 Type: symlink Mode: 0777 Flags: 0x10000000
Generation: 3080286240 Version: 0x00000000:00000001
User: 0 Group: 0 Size: 62
File ACL: 0 Directory ACL: 0
Links: 1 Blockcount: 0
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x53ce9340:c78386a4 -- Tue Jul 22 16:37:20 2014
atime: 0x53ce9346:bd2d7348 -- Tue Jul 22 16:37:26 2014
mtime: 0x53ce9340:c78386a4 -- Tue Jul 22 16:37:20 2014
crtime: 0x53ce9340:c78386a4 -- Tue Jul 22 16:37:20 2014
Size of extra inode fields: 28
Extended attributes:
system.data = "89" (2)
Fast_link_dest: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567"


But I would agree it owuld be better if we displayed the symlink
target completely.

- Ted

2014-07-22 20:39:09

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH 02/24] debugfs: Only print the first 60 bytes from i_block on a fast symlink

On Tue, Jul 22, 2014 at 12:43:56PM -0400, Theodore Ts'o wrote:
> On Sat, Jul 19, 2014 at 10:49:35AM -0600, Andreas Dilger wrote:
> > Wouldn't it be better to decode and print the whole symlink from the
> > inline data? Depending on the use of debugfs, printing only the
> > first 60 bytes may cause people/scripts to think the link is
> > corrupted.
>
> I agree in principle, although for now, this fixes a bug, so I'll
> apply it. Humans shouldn't get too confused, since we now do print
> the extended attributes:

Nah, ignore/revert it, I'll have a better patch out in a few minutes.

> debugfs: stat symlinkfile
> Inode: 131075 Type: symlink Mode: 0777 Flags: 0x10000000
> Generation: 3080286240 Version: 0x00000000:00000001
> User: 0 Group: 0 Size: 62
> File ACL: 0 Directory ACL: 0
> Links: 1 Blockcount: 0
> Fragment: Address: 0 Number: 0 Size: 0
> ctime: 0x53ce9340:c78386a4 -- Tue Jul 22 16:37:20 2014
> atime: 0x53ce9346:bd2d7348 -- Tue Jul 22 16:37:26 2014
> mtime: 0x53ce9340:c78386a4 -- Tue Jul 22 16:37:20 2014
> crtime: 0x53ce9340:c78386a4 -- Tue Jul 22 16:37:20 2014
> Size of extra inode fields: 28
> Extended attributes:
> system.data = "89" (2)
> Fast_link_dest: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567"
>
>
> But I would agree it owuld be better if we displayed the symlink
> target completely.

Yes, I'll do that.

--D
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-07-22 20:40:04

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 04/24] debugfs: allow bmap to allocate blocks

On Fri, Jul 18, 2014 at 03:52:43PM -0700, Darrick J. Wong wrote:
> Allow set_inode_field's bmap command in debugfs to allocate blocks,
> which enables us to allocate blocks for indirect blocks and internal
> extent tree blocks. True, we could do this manually, but seems like
> unnecessary bookkeeping activity for humans.
>
> Signed-off-by: Darrick J. Wong <[email protected]>

Thanks, applied.

- Ted

2014-07-22 22:14:37

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH 09/24] e2fsck: clear i_block if there are too many bad block mappings

On Tue, Jul 22, 2014 at 02:59:33PM -0400, Theodore Ts'o wrote:
> On Fri, Jul 18, 2014 at 03:53:21PM -0700, Darrick J. Wong wrote:
> > If there are too many bad block mappings in a file and the user says
> > to zap it, erase i_block before clearing the inode.
> >
> > Signed-off-by: Darrick J. Wong <[email protected]>
>
> Why is this necessary? E2fsck will clear i_links_count and set dtime,
> so the contents of i_block shouldn't matter, yes?

Hmm... oh, I think I remember where this patch came from. When I wrote the
patch "e2fsck: fix inode coherency issue when iterating an inode's blocks", I
looked down a few lines to see what pb.clear == 1 did. I figured that if the
block mappings were really bad (i.e. there are more than 12 bad mappings), why
not just wipe i_block entirely?

I didn't have a specific failure case in mind when I wrote this patch.

--D
>
> - Ted
>
> >
> > diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
> > index b696d02..18980f1 100644
> > --- a/e2fsck/pass1.c
> > +++ b/e2fsck/pass1.c
> > @@ -2329,6 +2329,7 @@ static void check_blocks(e2fsck_t ctx, struct problem_context *pctx,
> > }
> >
> > if (pb.clear) {
> > + memset(inode->i_block, 0, sizeof(inode->i_block));
> > e2fsck_clear_inode(ctx, ino, inode, E2F_FLAG_RESTART,
> > "check_blocks");
> > return;
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-07-22 22:50:38

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 09/24] e2fsck: clear i_block if there are too many bad block mappings

On Tue, Jul 22, 2014 at 03:14:26PM -0700, Darrick J. Wong wrote:
> On Tue, Jul 22, 2014 at 02:59:33PM -0400, Theodore Ts'o wrote:
> > On Fri, Jul 18, 2014 at 03:53:21PM -0700, Darrick J. Wong wrote:
> > > If there are too many bad block mappings in a file and the user says
> > > to zap it, erase i_block before clearing the inode.
> > >
> > > Signed-off-by: Darrick J. Wong <[email protected]>
> >
> > Why is this necessary? E2fsck will clear i_links_count and set dtime,
> > so the contents of i_block shouldn't matter, yes?
>
> Hmm... oh, I think I remember where this patch came from. When I wrote the
> patch "e2fsck: fix inode coherency issue when iterating an inode's blocks", I
> looked down a few lines to see what pb.clear == 1 did. I figured that if the
> block mappings were really bad (i.e. there are more than 12 bad mappings), why
> not just wipe i_block entirely?
>
> I didn't have a specific failure case in mind when I wrote this patch.

Ok, then I'll drop this. In general I try to avoid unnecessary
zero'ing when possible. It might be useful in doing a post morten
after the user runs e2fsck -y, and so long as inode is skipped, I'd
rather not zap more of the inode structure than is strictly necessary.

Cheers,

- Ted

2014-07-25 01:03:27

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 10/24] e2fsck: don't clobber critical metadata during check_blocks

On Fri, Jul 18, 2014 at 03:53:28PM -0700, Darrick J. Wong wrote:
> If we encounter an inode with IND/DIND/TIND blocks or internal extent
> tree blocks that point into critical FS metadata such as the
> superblock, the group descriptors, the bitmaps, or the inode table,
> it's quite possible that the validation code for those blocks is not
> going to like what it finds, and it'll ask to try to fix the block.
> Unfortunately, this happens before duplicate block processing (pass
> 1b), which means that we can end up doing stupid things like writing
> extent blocks into the inode table, which multiplies e2fsck'
> destructive effect and can render a filesystem unfixable.
>
> To solve this, create a bitmap of all the critical FS metadata. If
> before pass1b runs (basically check_blocks) we find a metadata block
> that points into these critical regions, continue processing that
> block, but avoid making any modifications, because we could be
> misinterpreting inodes as block maps. Pass 1b will find the
> multiply-owned blocks and fix that situation, which means that we can
> then restart e2fsck from the beginning and actually fix whatever
> problems we find.
>
> Signed-off-by: Darrick J. Wong <[email protected]>

Thanks, applied.

- Ted

2014-07-25 01:14:12

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 11/24] e2fsck: free ctx->fs, not fs, at the end of fsck

On Fri, Jul 18, 2014 at 03:53:35PM -0700, Darrick J. Wong wrote:
> When we call ext2fs_close_free at the end of main(), we need to supply
> the address of ctx->fs, because the subsequent e2fsck_free_context
> call will try to access ctx->fs (which is now set to a freed block) to
> see if it should free the directory block list. This is clearly not
> desirable, so fix the problem.
>
> Signed-off-by: Darrick J. Wong <[email protected]>

Thanks, applied.

- Ted

2014-07-25 02:18:57

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 12/24] e2fsck: force all block allocations to use block_found_map

On Fri, Jul 18, 2014 at 03:53:41PM -0700, Darrick J. Wong wrote:
> During the later passes of efsck, we sometimes need to allocate and
> map blocks into a file. This can happen either by fsck directly
> calling new_block() or indirectly by the library calling new_block
> because it needs to allocate a block for lower level metadata (bmap2()
> with BMAP_SET; block_iterate3() with BLOCK_CHANGED).
>
> We need to force new_block to allocate blocks from the found block
> map, because the FS block map could be inaccurate for various reasons:
> the map is wrong, there are missing blocks, the checksum failed, etc.
>
> Therefore, any time fsck does something that could to allocate blocks,
> we need to intercept allocation requests so that they're sourced from
> the found block map. Remove the previous code that swapped bitmap
> pointers as this is now unneeded.
>
> Signed-off-by: Darrick J. Wong <[email protected]>

Thanks, applied.

- Ted

2014-07-25 02:20:13

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 13/24] e2fsck: fix off-by-one bounds check on group number

On Fri, Jul 18, 2014 at 03:53:48PM -0700, Darrick J. Wong wrote:
> Since fs->group_desc_count is the number of block groups, the number
> of the last group is always one less than this count. Fix the bounds
> check to reflect that.
>
> This flaw shouldn't have any user-visible side effects, since the
> block bitmap test based on last_grp later on can handle overbig block
> numbers.
>
> Signed-off-by: Darrick J. Wong <[email protected]>

Thanks, applied.

- Ted

2014-07-25 11:13:48

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 14/24] libext2fs: fix bounds check of the bitmap test range in get_free_blocks2

On Fri, Jul 18, 2014 at 03:53:59PM -0700, Darrick J. Wong wrote:
> In the loop in ext2fs_get_free_blocks2, we ask the bitmap if there's a
> range of free blocks starting at "b" and ending at "b + num - 1".
> That quantity is the number of the last block in the range. Since
> ext2fs_blocks_count() returns the number of blocks and not the number
> of the last block in the filesystem, the check is incorrect.
>
> Put in a shortcut to exit the loop if finish > start, because in that
> case it's obvious that we don't need to reset to the beginning of the
> FS to continue the search for blocks. This is needed to terminate the
> loop because the broken test meant that b could get large enough to
> equal finish, which would end the while loop.
>
> The attached testcase shows that with the off by one error, it is
> possible to throw e2fsck into an infinite loop while it tries to
> find space for the inode table even though there's no space for one.
>
> Signed-off-by: Darrick J. Wong <[email protected]>

Thanks, applied.

- Ted

2014-07-25 11:22:14

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 15/24] misc: fix problems with strncat

On Fri, Jul 18, 2014 at 03:54:07PM -0700, Darrick J. Wong wrote:
> The third argument to strncat is the maximum number of characters to
> copy out of the second argument; it is not the maximum length of the
> first argument.
>
> Therefore, code in a check just in case we ever find a /sys/block/X
> path long enough to hit the end of the buffer. FWIW the longest path
> I could find on my machine was 133 bytes.
>
> Fixes-Coverity-Bug: 1252003
> Signed-off-by: Darrick J. Wong <[email protected]>

Thanks, applied.

- Ted

2014-07-25 11:22:18

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 16/24] e2fsck: don't crash during rehash

On Fri, Jul 18, 2014 at 03:54:15PM -0700, Darrick J. Wong wrote:
> If a user crafts a carefully constructed filesystem containing a
> single directory entry block with an invalid checksum and fewer than
> two entries, and then runs e2fsck to fix the filesystem, fsck will
> crash when it tries to "compress" the short dir and passes a negative
> dirent array length to qsort. Therefore, don't allow directory
> "compression" in this situation.
>
> Signed-off-by: Darrick J. Wong <[email protected]>

Thanks, applied.

- Ted

2014-07-25 12:12:56

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 17/24] e2fsck: reserve blocks for root/lost+found directory repair

On Fri, Jul 18, 2014 at 03:54:22PM -0700, Darrick J. Wong wrote:
> If we think we're going to need to repair either the root directory or
> the lost+found directory, reserve a block at the end of pass 1 to
> reduce the likelihood of an e2fsck abort while reconstructing
> root/lost+found during pass 3.

Can you say more about when this situation arises? The only thing I
can think of is if there are a large number of blocks that need to be
cloned during passes 1b/c/d?

- Ted

2014-07-25 12:33:30

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 18/24] e2fsck: collapse holes in extent-based directories

On Fri, Jul 18, 2014 at 03:54:30PM -0700, Darrick J. Wong wrote:
> If we notice a hole in the block map of an extent-based directory,
> offer to collapse the hole by decreasing the logical block # of the
> extent. This saves us from pass 3's inefficient strategy, which fills
> the holes by mapping in a lot of empty directory blocks.
>
> Signed-off-by: Darrick J. Wong <[email protected]>

Thanks, applied.

- Ted

2014-07-25 12:41:01

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 19/24] e2fsck: always submit logical block 0 of a directory for pass 2

On Fri, Jul 18, 2014 at 03:54:38PM -0700, Darrick J. Wong wrote:
> Always iterate logical block 0 in a directory, even if no physical
> block has been allocated. Pass 2 will notice the lack of mapping and
> offer to allocate a new directory block; this enables us to link the
> directory into lost+found.
>
> Previously, if there were no logical blocks mapped, we would fail to
> pick up even block 0 of the directory for processing in pass 2. This
> meant that e2fsck never allocated a block 0 and therefore wouldn't fix
> the missing . and .. entries for the directory; subsequent e2fsck runs
> would complain about (yet never fix) the problem.
>
> Signed-off-by: Darrick J. Wong <[email protected]>

Thanks, applied.

- Ted

2014-07-25 12:42:06

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 20/24] e2fsck: pass2 should not process directory blocks that are impossibly large

On Fri, Jul 18, 2014 at 03:54:46PM -0700, Darrick J. Wong wrote:
> Currently, directories cannot be fallocated, which means that the only
> way they get bigger is for the kernel to append blocks one by one.
> Therefore, if we encounter a logical block offset that is too big, we
> needn't bother adding it to the dblist for pass2 processing, because
> it's unlikely to contain a valid directory block. The code that
> handles extent based directories also does not add toobig blocks to
> the dblist.
>
> Note that we can easily cause e2fsck to fail with ENOMEM if we start
> feeding it really large logical block offsets, as the dblist
> implementation will try to realloc() an array big enough to hold it.
>
> Signed-off-by: Darrick J. Wong <[email protected]>

Thanks, applied.

- Ted

2014-07-25 12:51:23

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 22/24] e2fsck: check return value of ext2fs_extent_fix_parents()

On Fri, Jul 18, 2014 at 03:55:28PM -0700, Darrick J. Wong wrote:
> When fixing parent nodes in an extent tree, check the return value for
> errors and bail out if problems happen.
>
> Signed-off-by: Darrick J. Wong <[email protected]>
> Fixes-Coverity-Bug: 1193379

Already fixed in my tree, thanks.

- Ted

2014-07-25 13:00:04

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 21/24] e2fsck: clear uninit flag on directory extents

On Fri, Jul 18, 2014 at 03:55:21PM -0700, Darrick J. Wong wrote:
> Directories can't have uninitialized extents, so offer to clear the
> uninit flag when we find this situation. The actual directory blocks
> will be checked in pass 2 and 3 regardless of the uninit flag.
>
> Signed-off-by: Darrick J. Wong <[email protected]>

Thanks, applied.

- Ted

2014-07-25 13:00:20

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 23/24] mke2fs: set error behavior at initialization time

On Fri, Jul 18, 2014 at 03:55:36PM -0700, Darrick J. Wong wrote:
> Port tune2fs' -e flag to mke2fs so that we can set error behavior at
> format time, and introduce the equivalent errors= setting into
> mke2fs.conf.
>
> Signed-off-by: Darrick J. Wong <[email protected]>

Thanks, applied.

- Ted

2014-07-25 13:16:04

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 24/24] e2fuzz: Create a tool to fuzz ext* filesystems

On Fri, Jul 18, 2014 at 03:55:44PM -0700, Darrick J. Wong wrote:
> Creates a program that fuzzes only the metadata blocks (or optionally
> all in-use blocks) of an ext* filesystem. There's also a script to
> automate fuzz testing of the kernel and e2fsck in a loop.
>
> Signed-off-by: Darrick J. Wong <[email protected]>

Applied, with a minor change to add e2fuzz to the clean makefile rule.
Thanks for all of your hard work!

- Ted

2014-07-25 20:19:17

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH 17/24] e2fsck: reserve blocks for root/lost+found directory repair

On Fri, Jul 25, 2014 at 08:12:53AM -0400, Theodore Ts'o wrote:
> On Fri, Jul 18, 2014 at 03:54:22PM -0700, Darrick J. Wong wrote:
> > If we think we're going to need to repair either the root directory or
> > the lost+found directory, reserve a block at the end of pass 1 to
> > reduce the likelihood of an e2fsck abort while reconstructing
> > root/lost+found during pass 3.
>
> Can you say more about when this situation arises? The only thing I
> can think of is if there are a large number of blocks that need to be
> cloned during passes 1b/c/d?

Yep, that's one of the scenarios that this patch fixes -- if / is corrupt and
duplicate processing allocates all the free blocks in the FS, we end up with an
unfixable FS -- there's a big file somewhere, but no directory structure; the
only way to repair would be to run debugfs -w to find and delete files, and
re-run e2fsck. Granted, in this situation, you'd end up in a nasty loop of
running fsck, mounting the FS long enough to delete a big file or two,
unmounting, and rerunning fsck, but that's less troublesome than mucking with
debugfs.

This wasn't actually how I found the bug. What really happened was that I
corrupted an extent in a directory such that the lblk was a really huge number.
The directory processing code would then try to "fill in" the "hole" by
allocating tons of blocks, typically until there weren't any free blocks left.
Then, any attempt to repair / or /lost+found would abort because there weren't
any free blocks. Of course, this behavior is fixed by patch #18, so it's no
longer a good reproducer.

--D
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html