2021-01-22 05:52:34

by harshad shirwadkar

[permalink] [raw]
Subject: [PATCH v4 0/8] e2fsck: add fast commit replay path

From: Harshad Shirwadkar <[email protected]>

This patch series consists of modified e2fsck fast commit replay
patches from the patch series "[PATCH v3 00/15] Fast commit changes
for e2fsprogs" sent on Jan 20, 2021
(https://patchwork.ozlabs.org/project/linux-ext4/list/?series=225577&state=*). All
the patches except fast commit recovery path were merged upstream. So,
this series contains only the fast commit replay patch changes.

Verified that all the regression tests pass:
367 tests succeeded 0 tests failed

New fast commit recovery test:
j_recover_fast_commit: ok

Changes Since V3:
----------------

- All the patches except the replay path were merged upstream. Thus,
this version of the series is shorter than the V3 of the series and
only contains the recovery path changes.

- Added errcode_to_errno() function to translate libe2fs errcode to
standard linux error codes. As of now, we simply translate any
errcode_t > 256 to -EFAULT and <= 256 to -errno. We also log the
actual ext2fs error code to stderr along with function name and line
number for debugging purpose.

- Consistent naming: renamed e2fsck replay path functions to have the
following convention - all the ext4_* functions in the e2fsck fast
commit replay path return standard linux style error codes while the
functions starting with ext2fs_* return errcode_t error codes.

Harshad Shirwadkar (8):
ext2fs: add new APIs needed for fast commits
e2fsck: add function to rewrite extent tree
e2fsck: add fast commit setup code
e2fsck: add fast commit scan pass
e2fsck: add fast commit replay skeleton
e2fsck: add fc replay for link, unlink, creat tags
e2fsck: add replay for add_range, del_range, and inode tags
tests: add fast commit recovery tests

e2fsck/e2fsck.h | 32 ++
e2fsck/extents.c | 175 ++++---
e2fsck/journal.c | 654 +++++++++++++++++++++++++++
lib/ext2fs/ext2_fs.h | 1 +
lib/ext2fs/ext2fs.h | 8 +
lib/ext2fs/extent.c | 64 +++
lib/ext2fs/unlink.c | 6 +-
tests/j_recover_fast_commit/commands | 4 +
tests/j_recover_fast_commit/expect | 22 +
tests/j_recover_fast_commit/image.gz | Bin 0 -> 3595 bytes
tests/j_recover_fast_commit/script | 26 ++
11 files changed, 929 insertions(+), 63 deletions(-)
create mode 100644 tests/j_recover_fast_commit/commands
create mode 100644 tests/j_recover_fast_commit/expect
create mode 100644 tests/j_recover_fast_commit/image.gz
create mode 100755 tests/j_recover_fast_commit/script

--
2.30.0.280.ga3ce27912f-goog


2021-01-22 05:52:34

by harshad shirwadkar

[permalink] [raw]
Subject: [PATCH v4 1/8] ext2fs: add new APIs needed for fast commits

From: Harshad Shirwadkar <[email protected]>

This patch adds the following new APIs:

Count the total number of blocks occupied by inode including
intermediate extent tree nodes.
extern errcode_t ext2fs_count_blocks(ext2_filsys fs, ext2_ino_t ino,
struct ext2_inode *inode,
blk64_t *ret_count);

Convert ext3_extent to ext2fs_extent.
extern void ext2fs_convert_extent(struct ext2fs_extent *to,
struct ext3_extent *from);

Signed-off-by: Harshad Shirwadkar <[email protected]>
---
lib/ext2fs/ext2fs.h | 4 +++
lib/ext2fs/extent.c | 64 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 68 insertions(+)

diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 7218fde9..7a25e0e5 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -1341,6 +1341,10 @@ extern errcode_t ext2fs_extent_fix_parents(ext2_extent_handle_t handle);
extern size_t ext2fs_max_extent_depth(ext2_extent_handle_t handle);
extern errcode_t ext2fs_fix_extents_checksums(ext2_filsys fs, ext2_ino_t ino,
struct ext2_inode *inode);
+extern errcode_t ext2fs_count_blocks(ext2_filsys fs, ext2_ino_t ino,
+ struct ext2_inode *inode, blk64_t *ret_count);
+extern errcode_t ext2fs_decode_extent(struct ext2fs_extent *to, void *from,
+ int len);

/* fallocate.c */
#define EXT2_FALLOCATE_ZERO_BLOCKS (0x1)
diff --git a/lib/ext2fs/extent.c b/lib/ext2fs/extent.c
index ac3dbfec..bde6b0f3 100644
--- a/lib/ext2fs/extent.c
+++ b/lib/ext2fs/extent.c
@@ -1785,6 +1785,70 @@ out:
return errcode;
}

+errcode_t ext2fs_decode_extent(struct ext2fs_extent *to, void *addr, int len)
+{
+ struct ext3_extent *from = (struct ext3_extent *)addr;
+
+ if (len != sizeof(struct ext3_extent))
+ return EXT2_ET_INVALID_ARGUMENT;
+
+ to->e_pblk = ext2fs_le32_to_cpu(from->ee_start) +
+ ((__u64) ext2fs_le16_to_cpu(from->ee_start_hi)
+ << 32);
+ to->e_lblk = ext2fs_le32_to_cpu(from->ee_block);
+ to->e_len = ext2fs_le16_to_cpu(from->ee_len);
+ to->e_flags |= EXT2_EXTENT_FLAGS_LEAF;
+ if (to->e_len > EXT_INIT_MAX_LEN) {
+ to->e_len -= EXT_INIT_MAX_LEN;
+ to->e_flags |= EXT2_EXTENT_FLAGS_UNINIT;
+ }
+
+ return 0;
+}
+
+errcode_t ext2fs_count_blocks(ext2_filsys fs, ext2_ino_t ino,
+ struct ext2_inode *inode, blk64_t *ret_count)
+{
+ ext2_extent_handle_t handle;
+ struct ext2fs_extent extent;
+ errcode_t errcode;
+ int i;
+ blk64_t blkcount = 0;
+ blk64_t *intermediate_nodes;
+
+ errcode = ext2fs_extent_open2(fs, ino, inode, &handle);
+ if (errcode)
+ goto out;
+
+ errcode = ext2fs_extent_get(handle, EXT2_EXTENT_ROOT, &extent);
+ if (errcode)
+ goto out;
+
+ ext2fs_get_array(handle->max_depth, sizeof(blk64_t),
+ &intermediate_nodes);
+ blkcount = handle->level;
+ while (!errcode) {
+ if (extent.e_flags & EXT2_EXTENT_FLAGS_LEAF) {
+ blkcount += extent.e_len;
+ for (i = 0; i < handle->level; i++) {
+ if (intermediate_nodes[i] !=
+ handle->path[i].end_blk) {
+ blkcount++;
+ intermediate_nodes[i] =
+ handle->path[i].end_blk;
+ }
+ }
+ }
+ errcode = ext2fs_extent_get(handle, EXT2_EXTENT_NEXT, &extent);
+ }
+ ext2fs_free_mem(&intermediate_nodes);
+out:
+ *ret_count = blkcount;
+ ext2fs_extent_free(handle);
+
+ return 0;
+}
+
#ifdef DEBUG
/*
* Override debugfs's prompt
--
2.30.0.280.ga3ce27912f-goog

2021-01-22 05:52:34

by harshad shirwadkar

[permalink] [raw]
Subject: [PATCH v4 3/8] e2fsck: add fast commit setup code

From: Harshad Shirwadkar <[email protected]>

Introduce "e2fsck_fc_replay_state" structure which is needed for ext4
fast commit replay.

Signed-off-by: Harshad Shirwadkar <[email protected]>
Reviewed-by: Theodore Ts'o <[email protected]>
---
e2fsck/e2fsck.h | 16 ++++++++++++++++
e2fsck/journal.c | 15 +++++++++++++++
lib/ext2fs/ext2_fs.h | 1 +
3 files changed, 32 insertions(+)

diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
index 3b9c1874..f75cc343 100644
--- a/e2fsck/e2fsck.h
+++ b/e2fsck/e2fsck.h
@@ -68,6 +68,7 @@
#endif

#include "support/quotaio.h"
+#include "ext2fs/fast_commit.h"

/*
* Exit codes used by fsck-type programs
@@ -239,6 +240,18 @@ struct extent_list {
errcode_t retval;
ext2_ino_t ino;
};
+
+/* State structure for fast commit replay */
+struct e2fsck_fc_replay_state {
+ struct extent_list fc_extent_list;
+ int fc_replay_num_tags;
+ int fc_replay_expected_off;
+ int fc_current_pass;
+ int fc_cur_tag;
+ int fc_crc;
+ __u16 fc_super_state;
+};
+
struct e2fsck_struct {
ext2_filsys fs;
const char *program_name;
@@ -431,6 +444,9 @@ struct e2fsck_struct {

/* Undo file */
char *undo_file;
+
+ /* Fast commit replay state */
+ struct e2fsck_fc_replay_state fc_replay_state;
};

/* Data structures to evaluate whether an extent tree needs rebuilding. */
diff --git a/e2fsck/journal.c b/e2fsck/journal.c
index 75fefcde..2c8e3441 100644
--- a/e2fsck/journal.c
+++ b/e2fsck/journal.c
@@ -278,6 +278,17 @@ static int process_journal_block(ext2_filsys fs,
return 0;
}

+/*
+ * Main recovery path entry point. This function returns JBD2_FC_REPLAY_CONTINUE
+ * to indicate that it is expecting more fast commit blocks. It returns
+ * JBD2_FC_REPLAY_STOP to indicate that replay is done.
+ */
+static int ext4_fc_replay(journal_t *journal, struct buffer_head *bh,
+ enum passtype pass, int off, tid_t expected_tid)
+{
+ return JBD2_FC_REPLAY_STOP;
+}
+
static errcode_t e2fsck_get_journal(e2fsck_t ctx, journal_t **ret_journal)
{
struct process_block_struct pb;
@@ -514,6 +525,10 @@ static errcode_t e2fsck_get_journal(e2fsck_t ctx, journal_t **ret_journal)

journal->j_sb_buffer = bh;
journal->j_superblock = (journal_superblock_t *)bh->b_data;
+ if (ext2fs_has_feature_fast_commit(ctx->fs->super))
+ journal->j_fc_replay_callback = ext4_fc_replay;
+ else
+ journal->j_fc_replay_callback = NULL;

#ifdef USE_INODE_IO
if (j_inode)
diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
index bfc30c29..b1e4329c 100644
--- a/lib/ext2fs/ext2_fs.h
+++ b/lib/ext2fs/ext2_fs.h
@@ -543,6 +543,7 @@ struct ext2_inode *EXT2_INODE(struct ext2_inode_large *large_inode)
#define EXT2_VALID_FS 0x0001 /* Unmounted cleanly */
#define EXT2_ERROR_FS 0x0002 /* Errors detected */
#define EXT3_ORPHAN_FS 0x0004 /* Orphans being recovered */
+#define EXT4_FC_REPLAY 0x0020 /* Ext4 fast commit replay ongoing */

/*
* Misc. filesystem flags
--
2.30.0.280.ga3ce27912f-goog

2021-01-22 05:52:44

by harshad shirwadkar

[permalink] [raw]
Subject: [PATCH v4 8/8] tests: add fast commit recovery tests

From: Harshad Shirwadkar <[email protected]>

Add j_recover_fast_commit test that ensure that e2fsck is able to
recover a disk from fast commit log.

Signed-off-by: Harshad Shirwadkar <[email protected]>
---
tests/j_recover_fast_commit/commands | 4 ++++
tests/j_recover_fast_commit/expect | 22 ++++++++++++++++++++++
tests/j_recover_fast_commit/image.gz | Bin 0 -> 3595 bytes
tests/j_recover_fast_commit/script | 26 ++++++++++++++++++++++++++
4 files changed, 52 insertions(+)
create mode 100644 tests/j_recover_fast_commit/commands
create mode 100644 tests/j_recover_fast_commit/expect
create mode 100644 tests/j_recover_fast_commit/image.gz
create mode 100755 tests/j_recover_fast_commit/script

diff --git a/tests/j_recover_fast_commit/commands b/tests/j_recover_fast_commit/commands
new file mode 100644
index 00000000..74e20e4e
--- /dev/null
+++ b/tests/j_recover_fast_commit/commands
@@ -0,0 +1,4 @@
+ls
+ls a/
+ex a/new
+ex a/data
diff --git a/tests/j_recover_fast_commit/expect b/tests/j_recover_fast_commit/expect
new file mode 100644
index 00000000..18e2fe06
--- /dev/null
+++ b/tests/j_recover_fast_commit/expect
@@ -0,0 +1,22 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 14/256 files (14.3% non-contiguous), 1365/2048 blocks
+Exit status is 0
+debugfs: ls
+ 2 (12) . 2 (12) .. 11 (20) lost+found 12 (968) a
+debugfs: ls a/
+ 12 (12) . 2 (12) .. 13 (12) data 14 (976) new
+debugfs: ex a/new
+Level Entries Logical Physical Length Flags
+ 0/ 0 1/ 1 0 - 0 1107 - 1107 1
+debugfs: ex a/data
+Level Entries Logical Physical Length Flags
+ 0/ 1 1/ 1 0 - 255 1618 256
+ 1/ 1 1/ 5 0 - 15 1619 - 1634 16
+ 1/ 1 2/ 5 16 - 31 1601 - 1616 16
+ 1/ 1 3/ 5 32 - 63 1985 - 2016 32
+ 1/ 1 4/ 5 64 - 127 1537 - 1600 64
+ 1/ 1 5/ 5 128 - 255 1793 - 1920 128
diff --git a/tests/j_recover_fast_commit/image.gz b/tests/j_recover_fast_commit/image.gz
new file mode 100644
index 0000000000000000000000000000000000000000..b7357afc46bec8a0154d5746fa2fce86c2341efc
GIT binary patch
literal 3595
zcmeH}eNfT~8ppY9Z*$KxYh$%EC0B221!CQ{-F$;y_Nv9SE-STSSvB{>*GvmjkZCXT
z?f_j=!B-kB-%S%0H3Y6v!7^5Yd<&I|MMB;Z6jb2%yKeTccIN)Sx$&8K{`k&!zW;ol
z`OZ8rs*@8`&c;JF5K?3RngG$yQc})Dc|qP}Dsm5ehAgEclfGm;JbtkgO84-68YdB4
zsDm<_xB0<`_Mfe5Ehc<(^ZmG2x5R<(yD|IUh}iq9FWN?(5XZU@N2~<_h=?GsN5^Ot
zwZ0W$-7K4dx71_oJoDV-Z!~|)54Ez0bAf6;KQnhu>)7(1gnesUs!s~_(bLOL?&h|z
z2j;(5SZe|~f%H?`Jhc>LGeZuYOm4*@|JfNVe2#S*tlw;)X@adgK;_GgnB@`IN$()O
zeJdVTF+(3Gx@-!k#510$y2lYkS(vN9#GrQLCa;-mV?qZU{;kwK$Rv{L6Jf{#Z20Qb
z@<IdCR2elC?GcaQM>yz}Ms3R|)&tr)^yDbq#d`(A0X*cavzL<Ih>NvxIgvD1VK$rO
z1ND7TxV7GM&y8Z#1w(&&^>=;7g{6-m?@l{8KJ!Tb_9_8)`n-#{^gDTjuU|duC!D&$
zw-g&1I=zp?$hSx>UPRYfcB-iNgN=ul&jpT^l}H&s&`y6Z9O@)lFlKin4*F>!XjY}z
zH%zu?*}+&9T@cR1r$2t?naFsn7j8!KH0(v!d&4UQ6AAXi+Ao0GE89WI0~#d}C&YB`
zQNqqK-`b;Z?K3p61O`q$?5kAp0>k|^;cYVc*)iv>>G+ztPx_vc2X@W83hHPAj!b1r
zY6XV1#4Z~|Tv`Oi4Jc0kf>D_H1)flpYK!H<%0qTJsVSQBvck24`sm8=TEiXV>bV)u
zl-(02z{HCOM=U<$%Q$!e#;Zh2T~=hF><M#4mMpSnp{MtWj`3a(FmBaZh3M6Zgi3PW
z7oSt-lI@3EvAcAdyO^4COpLdS%D&a^wF?Lh^2l3Lj_Td#_T$hdf%q$kaMo^{=YS<j
zpX%d+p4Tq1vWoaB>WDWopLklvSs1fCy}Odu<<E1IvI!iDX>l`9%z?r4c5`xf7S%aU
z&E+*8v}xS`<mn%a<)r5>Qt75B{U2YB@TN`IA6TdzSLAF|!jZ$}ahX?GtJAfpeUB+=
zijU8Z&<GZLe6<#QIL-jPlCvCwpZn~m`m1$QXbCOj!zXo1AXuZmiAPRo)q<`L(hnq{
zieo$JJ$<Jv%&bTOVlW^tuTpP}2@?OIK~qHs26H~<-52!hiR8w9Jva~=0g;Vicx70J
z4fElwtc@}#58*Fs1|0nM50oel-tjty1RzdigN^Zbf=@7qqPO^xgBxY9fzK?)NeUd8
z=K`qiTd&px<Cwb{?p?@Fk1F}fxbmz6c%pZJnctU=y#o~r*YII2swB1ZuvtnL8MY-W
zHzn^koHn$58IpGC$6uS%`rn%qHQSg^3U1&P>02?n0K&<(kM_cNQj~1IX?b$W(*0xP
z-_i?z5rOsTi%Tnl`>$~7v^=gl6@7L7z-8Ol{1CZg!>|2lrzyd9v<TeSOBR9Qoi7j>
zfMUoDdQ$?Zb&fbNiXZ`wSA*6!T&fxL@5%ub|6)+N9;(~X%%0oc*_U@JfYDyOz~PA0
zNh(zxZAqoz5<M~({cibwqa-<xeYm}uIk5TF0Jj^u_DSdXZc#LpZNm|T(L2~SIx%0D
zr~FN5U9J;YC-DDBK+OKU`;2ULe9F^Jv^2u=c|z4(-z)mcy|^smP;(_`xPJ5Vqa`Ps
zo8e+@hdA3gnIT}e`~{e1%5omGIGXNt?QqL4?5m!4O><ZVNIiRnh58MIv0yW+V5;$E
zC@S1=?Q-)t{aOdt6zz^LH2gH&<$UP1bBtdA<@?cL96g6`B*u&d4`$;WW|28byT_Cy
z-~D#|O;^>$h-3JF6oyry?<XXVURb>+E>j#lpG`UEw?qP#1vezN7g2eO#qvYh3)d0)
zpTpb;=`8TsO9V3HWC%cFD7c@5S<xo-E@_tE<oVK3852`dW}C5sOW&lD|Ge4};ZLCu
z;<zQVm}Zhc5<ifpabBC<2S;Br8Dm00dkSOE?PqDVP4U;>J^Fl0VbXtImzuDtE!cV%
zOQ(}EAFD2kEZ66+ZZ6Nrn7OSF`P#l9ncx;YubSQdZg^d-6IdtkKS!WE7X1D1l(1q5
HWGCd`W!a1I

literal 0
HcmV?d00001

diff --git a/tests/j_recover_fast_commit/script b/tests/j_recover_fast_commit/script
new file mode 100755
index 00000000..22ef6325
--- /dev/null
+++ b/tests/j_recover_fast_commit/script
@@ -0,0 +1,26 @@
+#!/bin/bash
+
+FSCK_OPT=-fy
+IMAGE=$test_dir/image.gz
+CMDS=$test_dir/commands
+
+gunzip < $IMAGE > $TMPFILE
+
+# Run fsck to fix things?
+EXP=$test_dir/expect
+OUT=$test_name.log
+
+cp $TMPFILE /tmp/debugthis
+$FSCK $FSCK_OPT -E journal_only -N test_filesys $TMPFILE 2>/dev/null | head -n 1000 | tail -n +2 > $OUT
+echo "Exit status is $?" >> $OUT
+
+$DEBUGFS -f $CMDS $TMPFILE >> $OUT 2>/dev/null
+
+# Figure out what happened
+if cmp -s $EXP $OUT; then
+ echo "$test_name: $test_description: ok"
+ touch $test_name.ok
+else
+ echo "$test_name: $test_description: failed"
+ diff -u $EXP $OUT >> $test_name.failed
+fi
--
2.30.0.280.ga3ce27912f-goog

2021-01-22 05:52:51

by harshad shirwadkar

[permalink] [raw]
Subject: [PATCH v4 6/8] e2fsck: add fc replay for link, unlink, creat tags

From: Harshad Shirwadkar <[email protected]>

Add fast commit replay for directory entry updates.

Signed-off-by: Harshad Shirwadkar <[email protected]>
---
e2fsck/journal.c | 112 ++++++++++++++++++++++++++++++++++++++++++++
lib/ext2fs/ext2fs.h | 4 ++
lib/ext2fs/unlink.c | 6 +--
3 files changed, 119 insertions(+), 3 deletions(-)

diff --git a/e2fsck/journal.c b/e2fsck/journal.c
index 007c32c6..2afe0929 100644
--- a/e2fsck/journal.c
+++ b/e2fsck/journal.c
@@ -380,6 +380,114 @@ static int ext4_fc_replay_scan(journal_t *j, struct buffer_head *bh,
out_err:
return ret;
}
+
+static int __errcode_to_errno(errcode_t err, const char *func, int line)
+{
+ if (err == 0)
+ return 0;
+ fprintf(stderr, "Error \"%s\" encountered in function %s at line %d\n",
+ error_message(err), func, line);
+ if (err <= 256)
+ return -err;
+ return -EFAULT;
+}
+
+#define errcode_to_errno(err) __errcode_to_errno(err, __func__, __LINE__)
+
+/* Helper struct for dentry replay routines */
+struct dentry_info_args {
+ int parent_ino, dname_len, ino, inode_len;
+ char *dname;
+};
+
+static inline void tl_to_darg(struct dentry_info_args *darg,
+ struct ext4_fc_tl *tl)
+{
+ struct ext4_fc_dentry_info *fcd;
+ int tag = le16_to_cpu(tl->fc_tag);
+
+ fcd = (struct ext4_fc_dentry_info *)ext4_fc_tag_val(tl);
+
+ darg->parent_ino = le32_to_cpu(fcd->fc_parent_ino);
+ darg->ino = le32_to_cpu(fcd->fc_ino);
+ darg->dname = fcd->fc_dname;
+ darg->dname_len = ext4_fc_tag_len(tl) -
+ sizeof(struct ext4_fc_dentry_info);
+ darg->dname = malloc(darg->dname_len + 1);
+ memcpy(darg->dname, fcd->fc_dname, darg->dname_len);
+ darg->dname[darg->dname_len] = 0;
+ jbd_debug(1, "%s: %s, ino %d, parent %d\n",
+ tag == EXT4_FC_TAG_CREAT ? "create" :
+ (tag == EXT4_FC_TAG_LINK ? "link" :
+ (tag == EXT4_FC_TAG_UNLINK ? "unlink" : "error")),
+ darg->dname, darg->ino, darg->parent_ino);
+}
+
+static int ext4_fc_handle_unlink(e2fsck_t ctx, struct ext4_fc_tl *tl)
+{
+ struct ext2_inode inode;
+ struct dentry_info_args darg;
+ ext2_filsys fs = ctx->fs;
+ int ret;
+
+ tl_to_darg(&darg, tl);
+ ret = errcode_to_errno(
+ ext2fs_unlink(ctx->fs, darg.parent_ino,
+ darg.dname, darg.ino, 0));
+ /* It's okay if the above call fails */
+ free(darg.dname);
+ return ret;
+}
+
+static int ext4_fc_handle_link_and_create(e2fsck_t ctx, struct ext4_fc_tl *tl)
+{
+ struct dentry_info_args darg;
+ ext2_filsys fs = ctx->fs;
+ struct ext2_inode_large inode_large;
+ int ret, filetype, mode;
+
+ tl_to_darg(&darg, tl);
+ ret = errcode_to_errno(ext2fs_read_inode(fs, darg.ino,
+ (struct ext2_inode *)&inode_large));
+ if (ret)
+ goto out;
+
+ mode = inode_large.i_mode;
+
+ if (LINUX_S_ISREG(mode))
+ filetype = EXT2_FT_REG_FILE;
+ else if (LINUX_S_ISDIR(mode))
+ filetype = EXT2_FT_DIR;
+ else if (LINUX_S_ISCHR(mode))
+ filetype = EXT2_FT_CHRDEV;
+ else if (LINUX_S_ISBLK(mode))
+ filetype = EXT2_FT_BLKDEV;
+ else if (LINUX_S_ISLNK(mode))
+ return EXT2_FT_SYMLINK;
+ else if (LINUX_S_ISFIFO(mode))
+ filetype = EXT2_FT_FIFO;
+ else if (LINUX_S_ISSOCK(mode))
+ filetype = EXT2_FT_SOCK;
+ else {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ /*
+ * Forcefully unlink if the same name is present and ignore the error
+ * if any, since this dirent might not exist
+ */
+ ext2fs_unlink(fs, darg.parent_ino, darg.dname, darg.ino,
+ EXT2FS_UNLINK_FORCE);
+
+ ret = errcode_to_errno(
+ ext2fs_link(fs, darg.parent_ino, darg.dname, darg.ino,
+ filetype));
+out:
+ free(darg.dname);
+ return ret;
+
+}
/*
* Main recovery path entry point. This function returns JBD2_FC_REPLAY_CONTINUE
* to indicate that it is expecting more fast commit blocks. It returns
@@ -437,7 +545,11 @@ static int ext4_fc_replay(journal_t *journal, struct buffer_head *bh,
switch (le16_to_cpu(tl->fc_tag)) {
case EXT4_FC_TAG_CREAT:
case EXT4_FC_TAG_LINK:
+ ret = ext4_fc_handle_link_and_create(ctx, tl);
+ break;
case EXT4_FC_TAG_UNLINK:
+ ret = ext4_fc_handle_unlink(ctx, tl);
+ break;
case EXT4_FC_TAG_ADD_RANGE:
case EXT4_FC_TAG_DEL_RANGE:
case EXT4_FC_TAG_INODE:
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 7a25e0e5..b1752ac7 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -1693,6 +1693,10 @@ extern errcode_t ext2fs_get_pathname(ext2_filsys fs, ext2_ino_t dir, ext2_ino_t
char **name);

/* link.c */
+#define EXT2FS_UNLINK_FORCE 0x1 /* Forcefully unlink even if
+ * the inode number doesn't
+ * match the dirent
+ */
errcode_t ext2fs_link(ext2_filsys fs, ext2_ino_t dir, const char *name,
ext2_ino_t ino, int flags);
errcode_t ext2fs_unlink(ext2_filsys fs, ext2_ino_t dir, const char *name,
diff --git a/lib/ext2fs/unlink.c b/lib/ext2fs/unlink.c
index 8ab27ee2..3ec04cfb 100644
--- a/lib/ext2fs/unlink.c
+++ b/lib/ext2fs/unlink.c
@@ -49,7 +49,7 @@ static int unlink_proc(struct ext2_dir_entry *dirent,
if (strncmp(ls->name, dirent->name, ext2fs_dirent_name_len(dirent)))
return 0;
}
- if (ls->inode) {
+ if (!(ls->flags & EXT2FS_UNLINK_FORCE) && ls->inode) {
if (dirent->inode != ls->inode)
return 0;
} else {
@@ -70,7 +70,7 @@ static int unlink_proc(struct ext2_dir_entry *dirent,
#endif
errcode_t ext2fs_unlink(ext2_filsys fs, ext2_ino_t dir,
const char *name, ext2_ino_t ino,
- int flags EXT2FS_ATTR((unused)))
+ int flags)
{
errcode_t retval;
struct link_struct ls;
@@ -86,7 +86,7 @@ errcode_t ext2fs_unlink(ext2_filsys fs, ext2_ino_t dir,
ls.name = name;
ls.namelen = name ? strlen(name) : 0;
ls.inode = ino;
- ls.flags = 0;
+ ls.flags = flags;
ls.done = 0;
ls.prev = 0;

--
2.30.0.280.ga3ce27912f-goog

2021-01-22 05:54:11

by harshad shirwadkar

[permalink] [raw]
Subject: [PATCH v4 7/8] e2fsck: add replay for add_range, del_range, and inode tags

From: Harshad Shirwadkar <[email protected]>

Add replay for inode's extent trees and inode itself.

Signed-off-by: Harshad Shirwadkar <[email protected]>
---
e2fsck/journal.c | 348 ++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 347 insertions(+), 1 deletion(-)

diff --git a/e2fsck/journal.c b/e2fsck/journal.c
index 2afe0929..922c252d 100644
--- a/e2fsck/journal.c
+++ b/e2fsck/journal.c
@@ -394,6 +394,217 @@ static int __errcode_to_errno(errcode_t err, const char *func, int line)

#define errcode_to_errno(err) __errcode_to_errno(err, __func__, __LINE__)

+#define ex_end(__ex) ((__ex)->e_lblk + (__ex)->e_len - 1)
+#define ex_pend(__ex) ((__ex)->e_pblk + (__ex)->e_len - 1)
+
+static int make_room(struct extent_list *list, int i)
+{
+ int ret;
+
+ if (list->count == list->size) {
+ unsigned int new_size = (list->size + 341) *
+ sizeof(struct ext2fs_extent);
+ ret = errcode_to_errno(ext2fs_resize_mem(0, new_size, &list->extents));
+ if (ret)
+ return ret;
+ list->size += 341;
+ }
+
+ memmove(&list->extents[i + 1], &list->extents[i],
+ sizeof(list->extents[0]) * (list->count - i));
+ list->count++;
+ return 0;
+}
+
+static int ex_compar(const void *arg1, const void *arg2)
+{
+ struct ext2fs_extent *ex1 = (struct ext2fs_extent *)arg1;
+ struct ext2fs_extent *ex2 = (struct ext2fs_extent *)arg2;
+
+ if (ex1->e_lblk < ex2->e_lblk)
+ return -1;
+ if (ex1->e_lblk > ex2->e_lblk)
+ return 1;
+ return ex1->e_len - ex2->e_len;
+}
+
+static int ex_len_compar(const void *arg1, const void *arg2)
+{
+ struct ext2fs_extent *ex1 = (struct ext2fs_extent *)arg1;
+ struct ext2fs_extent *ex2 = (struct ext2fs_extent *)arg2;
+
+ if (ex1->e_len < ex2->e_len)
+ return 1;
+
+ if (ex1->e_lblk > ex2->e_lblk)
+ return -1;
+
+ return 0;
+}
+
+static void ex_sort_and_merge(e2fsck_t ctx, struct extent_list *list)
+{
+ blk64_t ex_end;
+ int i, j;
+
+ if (list->count < 2)
+ return;
+
+ /*
+ * Reverse sort by length, that way we strip off all the 0 length
+ * extents
+ */
+ qsort(list->extents, list->count, sizeof(struct ext2fs_extent),
+ ex_len_compar);
+
+ for (i = 0; i < list->count; i++) {
+ if (list->extents[i].e_len == 0) {
+ list->count = i;
+ break;
+ }
+ }
+
+ /* Now sort by logical offset */
+ qsort(list->extents, list->count, sizeof(list->extents[0]),
+ ex_compar);
+
+ /* Merge adjacent extents if they are logically and physically contiguous */
+ i = 0;
+ while (i < list->count - 1) {
+ if (ex_end(&list->extents[i]) + 1 != list->extents[i + 1].e_lblk ||
+ ex_pend(&list->extents[i]) + 1 != list->extents[i + 1].e_pblk ||
+ (list->extents[i].e_flags & EXT2_EXTENT_FLAGS_UNINIT) !=
+ (list->extents[i + 1].e_flags & EXT2_EXTENT_FLAGS_UNINIT)) {
+ i++;
+ continue;
+ }
+
+ list->extents[i].e_len += list->extents[i + 1].e_len;
+ for (j = i + 1; j < list->count - 1; j++)
+ list->extents[j] = list->extents[j + 1];
+ list->count--;
+ }
+}
+
+/* must free blocks that are released */
+static int ext4_modify_extent_list(e2fsck_t ctx, struct extent_list *list,
+ struct ext2fs_extent *ex, int del)
+{
+ int ret;
+ int i, offset;
+ struct ext2fs_extent add_ex = *ex, add_ex2;
+
+ /* First let's create a hole from ex->e_lblk of length ex->e_len */
+ for (i = 0; i < list->count; i++) {
+ if (ex_end(&list->extents[i]) < add_ex.e_lblk)
+ continue;
+
+ /* Case 1: No overlap */
+ if (list->extents[i].e_lblk > ex_end(&add_ex))
+ break;
+ /*
+ * Unmark all the blocks in bb now. All the blocks get marked
+ * before we exit this function.
+ */
+ ext2fs_unmark_block_bitmap_range2(ctx->fs->block_map,
+ list->extents[i].e_pblk, list->extents[i].e_len);
+ /* Case 2: Split */
+ if (list->extents[i].e_lblk < add_ex.e_lblk &&
+ ex_end(&list->extents[i]) > ex_end(&add_ex)) {
+ ret = make_room(list, i + 1);
+ if (ret)
+ return ret;
+ list->extents[i + 1] = list->extents[i];
+ offset = ex_end(&add_ex) + 1 - list->extents[i].e_lblk;
+ list->extents[i + 1].e_lblk += offset;
+ list->extents[i + 1].e_pblk += offset;
+ list->extents[i + 1].e_len -= offset;
+ list->extents[i].e_len =
+ add_ex.e_lblk - list->extents[i].e_lblk;
+ break;
+ }
+
+ /* Case 3: Exact overlap */
+ if (add_ex.e_lblk <= list->extents[i].e_lblk &&
+ ex_end(&list->extents[i]) <= ex_end(&add_ex)) {
+
+ list->extents[i].e_len = 0;
+ continue;
+ }
+
+ /* Case 4: Partial overlap */
+ if (ex_end(&list->extents[i]) > ex_end(&add_ex)) {
+ offset = ex_end(&add_ex) + 1 - list->extents[i].e_lblk;
+ list->extents[i].e_lblk += offset;
+ list->extents[i].e_pblk += offset;
+ list->extents[i].e_len -= offset;
+ break;
+ }
+
+ if (ex_end(&add_ex) >= ex_end(&list->extents[i]))
+ list->extents[i].e_len =
+ add_ex.e_lblk > list->extents[i].e_lblk ?
+ add_ex.e_lblk - list->extents[i].e_lblk : 0;
+ }
+
+ if (add_ex.e_len && !del) {
+ make_room(list, list->count);
+ list->extents[list->count - 1] = add_ex;
+ }
+
+ ex_sort_and_merge(ctx, list);
+
+ /* Mark all occupied blocks allocated */
+ for (i = 0; i < list->count; i++)
+ ext2fs_mark_block_bitmap_range2(ctx->fs->block_map,
+ list->extents[i].e_pblk, list->extents[i].e_len);
+ ext2fs_mark_bb_dirty(ctx->fs);
+
+ return 0;
+}
+
+static int ext4_add_extent_to_list(e2fsck_t ctx, struct extent_list *list,
+ struct ext2fs_extent *ex)
+{
+ return ext4_modify_extent_list(ctx, list, ex, 0 /* add */);
+}
+
+static int ext4_del_extent_from_list(e2fsck_t ctx, struct extent_list *list,
+ struct ext2fs_extent *ex)
+{
+ return ext4_modify_extent_list(ctx, list, ex, 1 /* delete */);
+}
+
+static int ext4_fc_read_extents(e2fsck_t ctx, int ino)
+{
+ struct extent_list *extent_list = &ctx->fc_replay_state.fc_extent_list;
+
+ if (extent_list->ino == ino)
+ return 0;
+
+ extent_list->ino = ino;
+ return errcode_to_errno(e2fsck_read_extents(ctx, extent_list));
+}
+
+/*
+ * Flush extents in replay state on disk. @ino is the inode that is going
+ * to be processed next. So, we hold back flushing of the extent list
+ * if the next inode that's going to be processed is same as the one with
+ * cached extents in our replay state. That allows us to gather multiple extents
+ * for the inode so that we can flush all of them at once and it also saves us
+ * from continuously growing and shrinking the extent tree.
+ */
+static void ext4_fc_flush_extents(e2fsck_t ctx, int ino)
+{
+ struct extent_list *extent_list = &ctx->fc_replay_state.fc_extent_list;
+
+ if (extent_list->ino == ino || extent_list->ino == 0)
+ return;
+ e2fsck_rewrite_extent_tree(ctx, extent_list);
+ ext2fs_free_mem(&extent_list->extents);
+ memset(extent_list, 0, sizeof(*extent_list));
+}
+
/* Helper struct for dentry replay routines */
struct dentry_info_args {
int parent_ino, dname_len, ino, inode_len;
@@ -431,6 +642,7 @@ static int ext4_fc_handle_unlink(e2fsck_t ctx, struct ext4_fc_tl *tl)
int ret;

tl_to_darg(&darg, tl);
+ ext4_fc_flush_extents(ctx, darg.ino);
ret = errcode_to_errno(
ext2fs_unlink(ctx->fs, darg.parent_ino,
darg.dname, darg.ino, 0));
@@ -447,6 +659,7 @@ static int ext4_fc_handle_link_and_create(e2fsck_t ctx, struct ext4_fc_tl *tl)
int ret, filetype, mode;

tl_to_darg(&darg, tl);
+ ext4_fc_flush_extents(ctx, 0);
ret = errcode_to_errno(ext2fs_read_inode(fs, darg.ino,
(struct ext2_inode *)&inode_large));
if (ret)
@@ -488,6 +701,132 @@ out:
return ret;

}
+
+/* This function fixes the i_blocks field in the replayed indoe */
+static void ext4_fc_replay_fixup_iblocks(struct ext2_inode_large *ondisk_inode,
+ struct ext2_inode_large *fc_inode)
+{
+ if (le32_to_cpu(ondisk_inode->i_flags) & EXT4_EXTENTS_FL) {
+ struct ext3_extent_header *eh;
+
+ eh = (struct ext3_extent_header *)(&ondisk_inode->i_block[0]);
+ if (eh->eh_magic != EXT3_EXT_MAGIC) {
+ memset(eh, 0, sizeof(*eh));
+ eh->eh_magic = EXT3_EXT_MAGIC;
+ eh->eh_max = cpu_to_le16(
+ (sizeof(ondisk_inode->i_block) -
+ sizeof(struct ext3_extent_header)) /
+ sizeof(struct ext3_extent));
+ }
+ } else if (le32_to_cpu(ondisk_inode->i_flags) & EXT4_INLINE_DATA_FL) {
+ memcpy(ondisk_inode->i_block, fc_inode->i_block,
+ sizeof(fc_inode->i_block));
+ }
+}
+
+static int ext4_fc_handle_inode(e2fsck_t ctx, struct ext4_fc_tl *tl)
+{
+ struct e2fsck_fc_replay_state *state = &ctx->fc_replay_state;
+ int ino, inode_len = EXT2_GOOD_OLD_INODE_SIZE;
+ struct ext2_inode_large *inode = NULL;
+ struct ext4_fc_inode *fc_inode;
+ errcode_t err;
+ blk64_t blks;
+
+ fc_inode = (struct ext4_fc_inode *)ext4_fc_tag_val(tl);
+ ino = le32_to_cpu(fc_inode->fc_ino);
+
+ if (EXT2_INODE_SIZE(ctx->fs->super) > EXT2_GOOD_OLD_INODE_SIZE)
+ inode_len += ext2fs_le16_to_cpu(
+ ((struct ext2_inode_large *)fc_inode->fc_raw_inode)
+ ->i_extra_isize);
+ err = ext2fs_get_mem(inode_len, &inode);
+ if (err)
+ return errcode_to_errno(err);
+ ext4_fc_flush_extents(ctx, ino);
+
+ err = ext2fs_read_inode_full(ctx->fs, ino, (struct ext2_inode *)inode,
+ inode_len);
+ if (err)
+ goto out;
+ memcpy(inode, fc_inode->fc_raw_inode,
+ offsetof(struct ext2_inode_large, i_block));
+ memcpy(&inode->i_generation,
+ &((struct ext2_inode_large *)(fc_inode->fc_raw_inode))->i_generation,
+ inode_len - offsetof(struct ext2_inode_large, i_generation));
+ ext4_fc_replay_fixup_iblocks(inode,
+ (struct ext2_inode_large *)fc_inode->fc_raw_inode);
+ err = ext2fs_count_blocks(ctx->fs, ino, EXT2_INODE(inode), &blks);
+ if (err)
+ goto out;
+ ext2fs_iblk_set(ctx->fs, EXT2_INODE(inode), blks);
+ ext2fs_inode_csum_set(ctx->fs, ino, inode);
+
+ err = ext2fs_write_inode_full(ctx->fs, ino, (struct ext2_inode *)inode,
+ inode_len);
+ if (err)
+ goto out;
+ if (inode->i_links_count)
+ ext2fs_mark_inode_bitmap2(ctx->fs->inode_map, ino);
+ else
+ ext2fs_unmark_inode_bitmap2(ctx->fs->inode_map, ino);
+ ext2fs_mark_ib_dirty(ctx->fs);
+
+out:
+ ext2fs_free_mem(&inode);
+ return errcode_to_errno(err);
+}
+
+/*
+ * Handle add extent replay tag.
+ */
+static int ext4_fc_handle_add_extent(e2fsck_t ctx, struct ext4_fc_tl *tl)
+{
+ struct ext2fs_extent extent;
+ struct ext4_fc_add_range *add_range;
+ struct ext4_fc_del_range *del_range;
+ int ret = 0, ino;
+
+ add_range = (struct ext4_fc_add_range *)ext4_fc_tag_val(tl);
+ ino = le32_to_cpu(add_range->fc_ino);
+ ext4_fc_flush_extents(ctx, ino);
+
+ ret = ext4_fc_read_extents(ctx, ino);
+ if (ret)
+ return ret;
+ memset(&extent, 0, sizeof(extent));
+ ret = errcode_to_errno(ext2fs_decode_extent(
+ &extent, (void *)(add_range->fc_ex),
+ sizeof(add_range->fc_ex)));
+ if (ret)
+ return ret;
+ return ext4_add_extent_to_list(ctx,
+ &ctx->fc_replay_state.fc_extent_list, &extent);
+}
+
+/*
+ * Handle delete logical range replay tag.
+ */
+static int ext4_fc_handle_del_range(e2fsck_t ctx, struct ext4_fc_tl *tl)
+{
+ struct ext2fs_extent extent;
+ struct ext4_fc_del_range *del_range;
+ int ret, ino;
+
+ del_range = (struct ext4_fc_del_range *)ext4_fc_tag_val(tl);
+ ino = le32_to_cpu(del_range->fc_ino);
+ ext4_fc_flush_extents(ctx, ino);
+
+ memset(&extent, 0, sizeof(extent));
+ extent.e_lblk = ext2fs_le32_to_cpu(del_range->fc_lblk);
+ extent.e_len = ext2fs_le16_to_cpu(del_range->fc_len);
+ ret = ext4_fc_read_extents(ctx, ino);
+ if (ret)
+ return ret;
+ return ext4_del_extent_from_list(ctx,
+ &ctx->fc_replay_state.fc_extent_list, &extent);
+}
+
/*
* Main recovery path entry point. This function returns JBD2_FC_REPLAY_CONTINUE
* to indicate that it is expecting more fast commit blocks. It returns
@@ -515,7 +854,7 @@ static int ext4_fc_replay(journal_t *journal, struct buffer_head *bh,
state->fc_current_pass = pass;
/* We will reset checksums */
ctx->fs->flags |= EXT2_FLAG_IGNORE_CSUM_ERRORS;
- ret = ext2fs_read_bitmaps(ctx->fs);
+ ret = errcode_to_errno(ext2fs_read_bitmaps(ctx->fs));
if (ret) {
jbd_debug(1, "Error %d while reading bitmaps\n", ret);
return ret;
@@ -551,9 +890,16 @@ static int ext4_fc_replay(journal_t *journal, struct buffer_head *bh,
ret = ext4_fc_handle_unlink(ctx, tl);
break;
case EXT4_FC_TAG_ADD_RANGE:
+ ret = ext4_fc_handle_add_extent(ctx, tl);
+ break;
case EXT4_FC_TAG_DEL_RANGE:
+ ret = ext4_fc_handle_del_range(ctx, tl);
+ break;
case EXT4_FC_TAG_INODE:
+ ret = ext4_fc_handle_inode(ctx, tl);
+ break;
case EXT4_FC_TAG_TAIL:
+ ext4_fc_flush_extents(ctx, 0);
case EXT4_FC_TAG_PAD:
case EXT4_FC_TAG_HEAD:
break;
--
2.30.0.280.ga3ce27912f-goog

2021-01-27 16:59:54

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v4 0/8] e2fsck: add fast commit replay path

On Thu, Jan 21, 2021 at 09:44:56PM -0800, Harshad Shirwadkar wrote:
> From: Harshad Shirwadkar <[email protected]>
>
> This patch series consists of modified e2fsck fast commit replay
> patches from the patch series "[PATCH v3 00/15] Fast commit changes
> for e2fsprogs" sent on Jan 20, 2021
> (https://patchwork.ozlabs.org/project/linux-ext4/list/?series=225577&state=*). All
> the patches except fast commit recovery path were merged upstream. So,
> this series contains only the fast commit replay patch changes.
>
> Verified that all the regression tests pass:
> 367 tests succeeded 0 tests failed
>
> New fast commit recovery test:
> j_recover_fast_commit: ok

I've applied this patch series, thanks!

- Ted