2012-12-13 15:32:23

by Forrest Liu

[permalink] [raw]
Subject: [PATCH] ext4: fix extent tree corruption that incurred by hole punch [V2]

When depth of extent tree is greater than 1, logical start value
of interior node didn't updated correctly in ext4_ext_rm_idx.

Signed-off-by: Forrest Liu <[email protected]>
---
fs/ext4/extents.c | 22 ++++++++++++++++++----
1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index d3dd618..496952e 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2190,13 +2190,14 @@ errout:
* removes index from the index block.
*/
static int ext4_ext_rm_idx(handle_t *handle, struct inode *inode,
- struct ext4_ext_path *path)
+ struct ext4_ext_path *path, int depth)
{
int err;
ext4_fsblk_t leaf;

/* free index block */
- path--;
+ depth--;
+ path = path + depth;
leaf = ext4_idx_pblock(path->p_idx);
if (unlikely(path->p_hdr->eh_entries == 0)) {
EXT4_ERROR_INODE(inode, "path->p_hdr->eh_entries == 0");
@@ -2221,6 +2222,19 @@ static int ext4_ext_rm_idx(handle_t *handle,
struct inode *inode,

ext4_free_blocks(handle, inode, NULL, leaf, 1,
EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET);
+
+ while (--depth >= 0) {
+ if (path->p_idx != EXT_FIRST_INDEX(path->p_hdr))
+ break;
+ path--;
+ err = ext4_ext_get_access(handle, inode, path);
+ if (err)
+ break;
+ path->p_idx->ei_block = (path+1)->p_idx->ei_block;
+ err = ext4_ext_dirty(handle, inode, path);
+ if (err)
+ break;
+ }
return err;
}

@@ -2557,7 +2571,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
/* if this leaf is free, then we should
* remove it from index block above */
if (err == 0 && eh->eh_entries == 0 && path[depth].p_bh != NULL)
- err = ext4_ext_rm_idx(handle, inode, path + depth);
+ err = ext4_ext_rm_idx(handle, inode, path, depth);

out:
return err;
@@ -2760,7 +2774,7 @@ again:
/* index is empty, remove it;
* handle must be already prepared by the
* truncatei_leaf() */
- err = ext4_ext_rm_idx(handle, inode, path + i);
+ err = ext4_ext_rm_idx(handle, inode, path, i);
}
/* root level has p_bh == NULL, brelse() eats this */
brelse(path[i].p_bh);
--
1.7.9.5


2012-12-13 16:04:59

by Forrest Liu

[permalink] [raw]
Subject: Re: [PATCH] ext4: fix extent tree corruption that incurred by hole punch [V2]

@Ashish
I have modified the patch with your suggestion, could you help to review.

@Ted
I wrote a script to verify this problem.
This script needs tst_extents and fallocate executables from e2fsprogs, and
fallocate needs your patch with bug fix to do hole punch.

Steps to run the script
1) assign variable blkdev to path of block device that filesystem mount on
2) cd to mount point
3) run script

Thanks,
Forrest

# verify hole punch bug
#
blkdev=/dev/sda1
file=punch_test
cmdfile=cmds
filesize=`expr 1024 \* 1024 \* 1024`
blksize=`tst_extents -R "stats" $blkdev | grep "Block size" | cut -d
':' -f 2 | tr -d ' '`
maxlblks=`expr $filesize \/ $blksize`

rm $file
touch $file
fallocate -l $filesize $file
sync

inode=`tst_extents -R "stat ${file}" $blkdev | grep Inode | cut -d ' ' -f 2`
echo "Inode number: $inode"

# punch every even blocks
echo "Punch out every even blocks"
i=0
while [ "${i}" -lt "${maxlblks}" ]
do
offset=`expr $i \* $blksize`
fallocate -p -o $offset -l $blksize $file
i=$(($i+2))
done

# get lblk from root->ns->down
echo "inode <$inode>" > cmds
echo "root" >> cmds
echo "ns" >> cmds

a=`tst_extents -f $cmdfile $blkdev`
echo "down" >> cmds
b=`tst_extents -f $cmdfile $blkdev`

# get output form command "down"
c=${b:${#a}}
echo "Out put from command \"down\""
echo $c

a=`echo ${c#*lblk} | cut -d ',' -f 1`
lblk_low=`echo $a | cut -d '-' -f 1`
lblk_hi=`echo $a | cut -d '-' -f 3`

echo ""
echo "Punch out blocks $lblk_hi, ..., $lblk_low"

i=$lblk_hi
while [ $i -ge $lblk_low ]
do
offset=`expr $i \* $blksize`
fallocate -p -o $offset -l $blksize $file
i=$(($i-1))
done

# verify logical start value is correct or not
echo "inode <$inode>" > cmds
echo "root" >> cmds
a=`tst_extents -f $cmdfile $blkdev`
echo "ns" >> cmds
b=`tst_extents -f $cmdfile $blkdev`

c=${b:${#a}}
c=`echo ${c#*lblk} | cut -d ',' -f 1`
lblk_start0=`echo $c | cut -d '-' -f 1`

echo "down" >> cmds
c=`tst_extents -f $cmdfile $blkdev`
c=${c:${#b}}
c=`echo ${c#*lblk} | cut -d ',' -f 1`
lblk_start1=`echo $c | cut -d '-' -f 1`

if [ $lblk_start0 -eq $lblk_start1 ]
then
echo "logical start valie is consistency:$lblk_start0,$lblk_start1"
else
echo "logical start valie is not consistency:$lblk_start0,$lblk_start1"
fi



diff --git a/contrib/fallocate.c b/contrib/fallocate.c
index 0e8319f..c1c08e1 100644
--- a/contrib/fallocate.c
+++ b/contrib/fallocate.c
@@ -35,10 +35,11 @@

// #include <linux/falloc.h>
#define FALLOC_FL_KEEP_SIZE 0x01
+#define FALLOC_FL_PUNCH_HOLE 0x02 /* de-allocates range */

void usage(void)
{
- printf("Usage: fallocate [-nt] [-o offset] -l length filename\n");
+ printf("Usage: fallocate [-npt] [-o offset] -l length filename\n");
exit(EXIT_FAILURE);
}

@@ -56,6 +57,7 @@ cvtnum(char *s)
char *sp;
int c;

+
i = strtoll(s, &sp, 0);
if (i == 0 && sp == s)
return -1LL;
@@ -94,12 +96,18 @@ int main(int argc, char **argv)
int error;
int tflag = 0;

- while ((opt = getopt(argc, argv, "nl:ot")) != -1) {
+ while ((opt = getopt(argc, argv, "npl:o:t")) != -1) {
switch(opt) {
case 'n':
/* do not change filesize */
falloc_mode = FALLOC_FL_KEEP_SIZE;
break;
+ case 'p':
+ /* punch mode */
+ falloc_mode = (FALLOC_FL_PUNCH_HOLE |
+ FALLOC_FL_KEEP_SIZE);
+ break;
+
case 'l':
length = cvtnum(optarg);
break;

2012/12/13 Forrest Liu <[email protected]>:
> When depth of extent tree is greater than 1, logical start value
> of interior node didn't updated correctly in ext4_ext_rm_idx.
>
> Signed-off-by: Forrest Liu <[email protected]>
> ---
> fs/ext4/extents.c | 22 ++++++++++++++++++----
> 1 file changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index d3dd618..496952e 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -2190,13 +2190,14 @@ errout:
> * removes index from the index block.
> */
> static int ext4_ext_rm_idx(handle_t *handle, struct inode *inode,
> - struct ext4_ext_path *path)
> + struct ext4_ext_path *path, int depth)
> {
> int err;
> ext4_fsblk_t leaf;
>
> /* free index block */
> - path--;
> + depth--;
> + path = path + depth;
> leaf = ext4_idx_pblock(path->p_idx);
> if (unlikely(path->p_hdr->eh_entries == 0)) {
> EXT4_ERROR_INODE(inode, "path->p_hdr->eh_entries == 0");
> @@ -2221,6 +2222,19 @@ static int ext4_ext_rm_idx(handle_t *handle,
> struct inode *inode,
>
> ext4_free_blocks(handle, inode, NULL, leaf, 1,
> EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET);
> +
> + while (--depth >= 0) {
> + if (path->p_idx != EXT_FIRST_INDEX(path->p_hdr))
> + break;
> + path--;
> + err = ext4_ext_get_access(handle, inode, path);
> + if (err)
> + break;
> + path->p_idx->ei_block = (path+1)->p_idx->ei_block;
> + err = ext4_ext_dirty(handle, inode, path);
> + if (err)
> + break;
> + }
> return err;
> }
>
> @@ -2557,7 +2571,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
> /* if this leaf is free, then we should
> * remove it from index block above */
> if (err == 0 && eh->eh_entries == 0 && path[depth].p_bh != NULL)
> - err = ext4_ext_rm_idx(handle, inode, path + depth);
> + err = ext4_ext_rm_idx(handle, inode, path, depth);
>
> out:
> return err;
> @@ -2760,7 +2774,7 @@ again:
> /* index is empty, remove it;
> * handle must be already prepared by the
> * truncatei_leaf() */
> - err = ext4_ext_rm_idx(handle, inode, path + i);
> + err = ext4_ext_rm_idx(handle, inode, path, i);
> }
> /* root level has p_bh == NULL, brelse() eats this */
> brelse(path[i].p_bh);
> --
> 1.7.9.5

2012-12-13 16:17:42

by Forrest Liu

[permalink] [raw]
Subject: Re: [PATCH] ext4: fix extent tree corruption that incurred by hole punch [V2]

Hi Ted,
Your patch for fallocate is no problem, my mistake. Sorry~

-Forrest

2012/12/14 Forrest Liu <[email protected]>:
> @Ashish
> I have modified the patch with your suggestion, could you help to review.
>
> @Ted
> I wrote a script to verify this problem.
> This script needs tst_extents and fallocate executables from e2fsprogs, and
> fallocate needs your patch with bug fix to do hole punch.
>
> Steps to run the script
> 1) assign variable blkdev to path of block device that filesystem mount on
> 2) cd to mount point
> 3) run script
>
> Thanks,
> Forrest
>
> # verify hole punch bug
> #
> blkdev=/dev/sda1
> file=punch_test
> cmdfile=cmds
> filesize=`expr 1024 \* 1024 \* 1024`
> blksize=`tst_extents -R "stats" $blkdev | grep "Block size" | cut -d
> ':' -f 2 | tr -d ' '`
> maxlblks=`expr $filesize \/ $blksize`
>
> rm $file
> touch $file
> fallocate -l $filesize $file
> sync
>
> inode=`tst_extents -R "stat ${file}" $blkdev | grep Inode | cut -d ' ' -f 2`
> echo "Inode number: $inode"
>
> # punch every even blocks
> echo "Punch out every even blocks"
> i=0
> while [ "${i}" -lt "${maxlblks}" ]
> do
> offset=`expr $i \* $blksize`
> fallocate -p -o $offset -l $blksize $file
> i=$(($i+2))
> done
>
> # get lblk from root->ns->down
> echo "inode <$inode>" > cmds
> echo "root" >> cmds
> echo "ns" >> cmds
>
> a=`tst_extents -f $cmdfile $blkdev`
> echo "down" >> cmds
> b=`tst_extents -f $cmdfile $blkdev`
>
> # get output form command "down"
> c=${b:${#a}}
> echo "Out put from command \"down\""
> echo $c
>
> a=`echo ${c#*lblk} | cut -d ',' -f 1`
> lblk_low=`echo $a | cut -d '-' -f 1`
> lblk_hi=`echo $a | cut -d '-' -f 3`
>
> echo ""
> echo "Punch out blocks $lblk_hi, ..., $lblk_low"
>
> i=$lblk_hi
> while [ $i -ge $lblk_low ]
> do
> offset=`expr $i \* $blksize`
> fallocate -p -o $offset -l $blksize $file
> i=$(($i-1))
> done
>
> # verify logical start value is correct or not
> echo "inode <$inode>" > cmds
> echo "root" >> cmds
> a=`tst_extents -f $cmdfile $blkdev`
> echo "ns" >> cmds
> b=`tst_extents -f $cmdfile $blkdev`
>
> c=${b:${#a}}
> c=`echo ${c#*lblk} | cut -d ',' -f 1`
> lblk_start0=`echo $c | cut -d '-' -f 1`
>
> echo "down" >> cmds
> c=`tst_extents -f $cmdfile $blkdev`
> c=${c:${#b}}
> c=`echo ${c#*lblk} | cut -d ',' -f 1`
> lblk_start1=`echo $c | cut -d '-' -f 1`
>
> if [ $lblk_start0 -eq $lblk_start1 ]
> then
> echo "logical start valie is consistency:$lblk_start0,$lblk_start1"
> else
> echo "logical start valie is not consistency:$lblk_start0,$lblk_start1"
> fi
>
>
>
> diff --git a/contrib/fallocate.c b/contrib/fallocate.c
> index 0e8319f..c1c08e1 100644
> --- a/contrib/fallocate.c
> +++ b/contrib/fallocate.c
> @@ -35,10 +35,11 @@
>
> // #include <linux/falloc.h>
> #define FALLOC_FL_KEEP_SIZE 0x01
> +#define FALLOC_FL_PUNCH_HOLE 0x02 /* de-allocates range */
>
> void usage(void)
> {
> - printf("Usage: fallocate [-nt] [-o offset] -l length filename\n");
> + printf("Usage: fallocate [-npt] [-o offset] -l length filename\n");
> exit(EXIT_FAILURE);
> }
>
> @@ -56,6 +57,7 @@ cvtnum(char *s)
> char *sp;
> int c;
>
> +
> i = strtoll(s, &sp, 0);
> if (i == 0 && sp == s)
> return -1LL;
> @@ -94,12 +96,18 @@ int main(int argc, char **argv)
> int error;
> int tflag = 0;
>
> - while ((opt = getopt(argc, argv, "nl:ot")) != -1) {
> + while ((opt = getopt(argc, argv, "npl:o:t")) != -1) {
> switch(opt) {
> case 'n':
> /* do not change filesize */
> falloc_mode = FALLOC_FL_KEEP_SIZE;
> break;
> + case 'p':
> + /* punch mode */
> + falloc_mode = (FALLOC_FL_PUNCH_HOLE |
> + FALLOC_FL_KEEP_SIZE);
> + break;
> +
> case 'l':
> length = cvtnum(optarg);
> break;
>
> 2012/12/13 Forrest Liu <[email protected]>:
>> When depth of extent tree is greater than 1, logical start value
>> of interior node didn't updated correctly in ext4_ext_rm_idx.
>>
>> Signed-off-by: Forrest Liu <[email protected]>
>> ---
>> fs/ext4/extents.c | 22 ++++++++++++++++++----
>> 1 file changed, 18 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index d3dd618..496952e 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -2190,13 +2190,14 @@ errout:
>> * removes index from the index block.
>> */
>> static int ext4_ext_rm_idx(handle_t *handle, struct inode *inode,
>> - struct ext4_ext_path *path)
>> + struct ext4_ext_path *path, int depth)
>> {
>> int err;
>> ext4_fsblk_t leaf;
>>
>> /* free index block */
>> - path--;
>> + depth--;
>> + path = path + depth;
>> leaf = ext4_idx_pblock(path->p_idx);
>> if (unlikely(path->p_hdr->eh_entries == 0)) {
>> EXT4_ERROR_INODE(inode, "path->p_hdr->eh_entries == 0");
>> @@ -2221,6 +2222,19 @@ static int ext4_ext_rm_idx(handle_t *handle,
>> struct inode *inode,
>>
>> ext4_free_blocks(handle, inode, NULL, leaf, 1,
>> EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET);
>> +
>> + while (--depth >= 0) {
>> + if (path->p_idx != EXT_FIRST_INDEX(path->p_hdr))
>> + break;
>> + path--;
>> + err = ext4_ext_get_access(handle, inode, path);
>> + if (err)
>> + break;
>> + path->p_idx->ei_block = (path+1)->p_idx->ei_block;
>> + err = ext4_ext_dirty(handle, inode, path);
>> + if (err)
>> + break;
>> + }
>> return err;
>> }
>>
>> @@ -2557,7 +2571,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
>> /* if this leaf is free, then we should
>> * remove it from index block above */
>> if (err == 0 && eh->eh_entries == 0 && path[depth].p_bh != NULL)
>> - err = ext4_ext_rm_idx(handle, inode, path + depth);
>> + err = ext4_ext_rm_idx(handle, inode, path, depth);
>>
>> out:
>> return err;
>> @@ -2760,7 +2774,7 @@ again:
>> /* index is empty, remove it;
>> * handle must be already prepared by the
>> * truncatei_leaf() */
>> - err = ext4_ext_rm_idx(handle, inode, path + i);
>> + err = ext4_ext_rm_idx(handle, inode, path, i);
>> }
>> /* root level has p_bh == NULL, brelse() eats this */
>> brelse(path[i].p_bh);
>> --
>> 1.7.9.5

2012-12-14 17:18:37

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] ext4: fix extent tree corruption that incurred by hole punch [V2]

On 12/13/12 10:04 AM, Forrest Liu wrote:
> @Ashish
> I have modified the patch with your suggestion, could you help to review.
>
> @Ted
> I wrote a script to verify this problem.
> This script needs tst_extents and fallocate executables from e2fsprogs, and
> fallocate needs your patch with bug fix to do hole punch.

Just FWIW, upstream fallocate in util-linux can punch holes already.

Usage:
fallocate [options] <filename>

Options:
-n, --keep-size don't modify the length of the file
-p, --punch-hole punch holes in the file
...

-Eric

> Steps to run the script
> 1) assign variable blkdev to path of block device that filesystem mount on
> 2) cd to mount point
> 3) run script
>
> Thanks,
> Forrest
>
> # verify hole punch bug
> #
> blkdev=/dev/sda1
> file=punch_test
> cmdfile=cmds
> filesize=`expr 1024 \* 1024 \* 1024`
> blksize=`tst_extents -R "stats" $blkdev | grep "Block size" | cut -d
> ':' -f 2 | tr -d ' '`
> maxlblks=`expr $filesize \/ $blksize`
>
> rm $file
> touch $file
> fallocate -l $filesize $file
> sync
>
> inode=`tst_extents -R "stat ${file}" $blkdev | grep Inode | cut -d ' ' -f 2`
> echo "Inode number: $inode"
>
> # punch every even blocks
> echo "Punch out every even blocks"
> i=0
> while [ "${i}" -lt "${maxlblks}" ]
> do
> offset=`expr $i \* $blksize`
> fallocate -p -o $offset -l $blksize $file
> i=$(($i+2))
> done
>
> # get lblk from root->ns->down
> echo "inode <$inode>" > cmds
> echo "root" >> cmds
> echo "ns" >> cmds
>
> a=`tst_extents -f $cmdfile $blkdev`
> echo "down" >> cmds
> b=`tst_extents -f $cmdfile $blkdev`
>
> # get output form command "down"
> c=${b:${#a}}
> echo "Out put from command \"down\""
> echo $c
>
> a=`echo ${c#*lblk} | cut -d ',' -f 1`
> lblk_low=`echo $a | cut -d '-' -f 1`
> lblk_hi=`echo $a | cut -d '-' -f 3`
>
> echo ""
> echo "Punch out blocks $lblk_hi, ..., $lblk_low"
>
> i=$lblk_hi
> while [ $i -ge $lblk_low ]
> do
> offset=`expr $i \* $blksize`
> fallocate -p -o $offset -l $blksize $file
> i=$(($i-1))
> done
>
> # verify logical start value is correct or not
> echo "inode <$inode>" > cmds
> echo "root" >> cmds
> a=`tst_extents -f $cmdfile $blkdev`
> echo "ns" >> cmds
> b=`tst_extents -f $cmdfile $blkdev`
>
> c=${b:${#a}}
> c=`echo ${c#*lblk} | cut -d ',' -f 1`
> lblk_start0=`echo $c | cut -d '-' -f 1`
>
> echo "down" >> cmds
> c=`tst_extents -f $cmdfile $blkdev`
> c=${c:${#b}}
> c=`echo ${c#*lblk} | cut -d ',' -f 1`
> lblk_start1=`echo $c | cut -d '-' -f 1`
>
> if [ $lblk_start0 -eq $lblk_start1 ]
> then
> echo "logical start valie is consistency:$lblk_start0,$lblk_start1"
> else
> echo "logical start valie is not consistency:$lblk_start0,$lblk_start1"
> fi
>
>
>
> diff --git a/contrib/fallocate.c b/contrib/fallocate.c
> index 0e8319f..c1c08e1 100644
> --- a/contrib/fallocate.c
> +++ b/contrib/fallocate.c
> @@ -35,10 +35,11 @@
>
> // #include <linux/falloc.h>
> #define FALLOC_FL_KEEP_SIZE 0x01
> +#define FALLOC_FL_PUNCH_HOLE 0x02 /* de-allocates range */
>
> void usage(void)
> {
> - printf("Usage: fallocate [-nt] [-o offset] -l length filename\n");
> + printf("Usage: fallocate [-npt] [-o offset] -l length filename\n");
> exit(EXIT_FAILURE);
> }
>
> @@ -56,6 +57,7 @@ cvtnum(char *s)
> char *sp;
> int c;
>
> +
> i = strtoll(s, &sp, 0);
> if (i == 0 && sp == s)
> return -1LL;
> @@ -94,12 +96,18 @@ int main(int argc, char **argv)
> int error;
> int tflag = 0;
>
> - while ((opt = getopt(argc, argv, "nl:ot")) != -1) {
> + while ((opt = getopt(argc, argv, "npl:o:t")) != -1) {
> switch(opt) {
> case 'n':
> /* do not change filesize */
> falloc_mode = FALLOC_FL_KEEP_SIZE;
> break;
> + case 'p':
> + /* punch mode */
> + falloc_mode = (FALLOC_FL_PUNCH_HOLE |
> + FALLOC_FL_KEEP_SIZE);
> + break;
> +
> case 'l':
> length = cvtnum(optarg);
> break;
>
> 2012/12/13 Forrest Liu <[email protected]>:
>> When depth of extent tree is greater than 1, logical start value
>> of interior node didn't updated correctly in ext4_ext_rm_idx.
>>
>> Signed-off-by: Forrest Liu <[email protected]>
>> ---
>> fs/ext4/extents.c | 22 ++++++++++++++++++----
>> 1 file changed, 18 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index d3dd618..496952e 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -2190,13 +2190,14 @@ errout:
>> * removes index from the index block.
>> */
>> static int ext4_ext_rm_idx(handle_t *handle, struct inode *inode,
>> - struct ext4_ext_path *path)
>> + struct ext4_ext_path *path, int depth)
>> {
>> int err;
>> ext4_fsblk_t leaf;
>>
>> /* free index block */
>> - path--;
>> + depth--;
>> + path = path + depth;
>> leaf = ext4_idx_pblock(path->p_idx);
>> if (unlikely(path->p_hdr->eh_entries == 0)) {
>> EXT4_ERROR_INODE(inode, "path->p_hdr->eh_entries == 0");
>> @@ -2221,6 +2222,19 @@ static int ext4_ext_rm_idx(handle_t *handle,
>> struct inode *inode,
>>
>> ext4_free_blocks(handle, inode, NULL, leaf, 1,
>> EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET);
>> +
>> + while (--depth >= 0) {
>> + if (path->p_idx != EXT_FIRST_INDEX(path->p_hdr))
>> + break;
>> + path--;
>> + err = ext4_ext_get_access(handle, inode, path);
>> + if (err)
>> + break;
>> + path->p_idx->ei_block = (path+1)->p_idx->ei_block;
>> + err = ext4_ext_dirty(handle, inode, path);
>> + if (err)
>> + break;
>> + }
>> return err;
>> }
>>
>> @@ -2557,7 +2571,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
>> /* if this leaf is free, then we should
>> * remove it from index block above */
>> if (err == 0 && eh->eh_entries == 0 && path[depth].p_bh != NULL)
>> - err = ext4_ext_rm_idx(handle, inode, path + depth);
>> + err = ext4_ext_rm_idx(handle, inode, path, depth);
>>
>> out:
>> return err;
>> @@ -2760,7 +2774,7 @@ again:
>> /* index is empty, remove it;
>> * handle must be already prepared by the
>> * truncatei_leaf() */
>> - err = ext4_ext_rm_idx(handle, inode, path + i);
>> + err = ext4_ext_rm_idx(handle, inode, path, i);
>> }
>> /* root level has p_bh == NULL, brelse() eats this */
>> brelse(path[i].p_bh);
>> --
>> 1.7.9.5
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


2012-12-17 04:25:43

by Ashish Sangwan

[permalink] [raw]
Subject: Re: [PATCH] ext4: fix extent tree corruption that incurred by hole punch [V2]

On Thu, Dec 13, 2012 at 9:02 PM, Forrest Liu <[email protected]> wrote:
> When depth of extent tree is greater than 1, logical start value
> of interior node didn't updated correctly in ext4_ext_rm_idx.
>
> Signed-off-by: Forrest Liu <[email protected]>
> ---


Patch seems ok to me.
Reviewed-by: Ashish Sangwan <[email protected]>

2012-12-20 05:39:22

by Theodore Y. Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: fix extent tree corruption that incurred by hole punch [V2]

On Thu, Dec 13, 2012 at 11:32:22PM +0800, Forrest Liu wrote:
> When depth of extent tree is greater than 1, logical start value
> of interior node didn't updated correctly in ext4_ext_rm_idx.
>
> Signed-off-by: Forrest Liu <[email protected]>

Applied.

BTW, your reproduction case also results in a file system which isn't
noticed as being broken by e2fsck. Eric's patch "e2fsck: Fix
incorrect interior node logical start values" fixes e2fsck so it
handles this. Unfortunately applying his patch seems to uncover a bug
in e2fsck when clearing a bad extent node (f_extent_bad_node) which we
need to fix so the regression test suite is passing.

- Ted

2012-12-20 15:11:28

by Forrest Liu

[permalink] [raw]
Subject: Re: [PATCH] ext4: fix extent tree corruption that incurred by hole punch [V2]

2012/12/20 Theodore Ts'o <[email protected]>:
> On Thu, Dec 13, 2012 at 11:32:22PM +0800, Forrest Liu wrote:
>> When depth of extent tree is greater than 1, logical start value
>> of interior node didn't updated correctly in ext4_ext_rm_idx.
>>
>> Signed-off-by: Forrest Liu <[email protected]>
>
> Applied.
>
> BTW, your reproduction case also results in a file system which isn't
> noticed as being broken by e2fsck. Eric's patch "e2fsck: Fix
> incorrect interior node logical start values" fixes e2fsck so it
> handles this. Unfortunately applying his patch seems to uncover a bug
> in e2fsck when clearing a bad extent node (f_extent_bad_node) which we
> need to fix so the regression test suite is passing.
>
> - Ted

Hi Ted,
I will help to find out the problem in e2fsck.

Thanks,
Forrest

2012-12-20 23:42:35

by Theodore Y. Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: fix extent tree corruption that incurred by hole punch [V2]

On Thu, Dec 20, 2012 at 11:11:28PM +0800, Forrest Liu wrote:
> Hi Ted,
> I will help to find out the problem in e2fsck.

Thanks for the offer, but I think I've found the problem and have the
following set of patches (versus the maint branch) to fix the problem.

- Ted


2012-12-20 23:43:45

by Theodore Y. Ts'o

[permalink] [raw]
Subject: [PATCH 3/3] e2fsck: make sure the extent tree is consistent after bogus node in the tree

Commit 789bd401c3 ("e2fsck: fix incorrect interior node logical start
values") surfaced a bug where if e2fsck finds and removed an invalid
node in the extent tree, i.e.:

Inode 12 has an invalid extent node (blk 22, lblk 0)
Clear? yes

It was possible for starting logical blocks found in the interior
nodes of the extent tree. Commit 789bd401c3 added the ability for
e2fsck to discover this problem, which resulted in the test
f_extent_bad_node to fail when the second pass of e2fsck reported the
following complaint:

Interior extent node level 0 of inode 12:
Logical start 0 does not match logical start 3 at next level. Fix? yes

This patch fixes this by adding a call to ext2fs_extent_fix_parents()
after deleting the bogus node in the extent tree.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
e2fsck/pass1.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 2acbb53..a8231f4 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -1809,6 +1809,7 @@ report_problem:
pctx->str = "ext2fs_extent_delete";
return;
}
+ ext2fs_extent_fix_parents(ehandle);
pctx->errcode = ext2fs_extent_get(ehandle,
EXT2_EXTENT_CURRENT,
&extent);
--
1.7.12.rc0.22.gcdd159b


2012-12-20 23:43:45

by Theodore Y. Ts'o

[permalink] [raw]
Subject: [PATCH 2/3] libext2fs: ext2fs_extents_fix_parents() should not modify the handle location

Previously, ext2fs_extent_fix_parents() would only avoid modifying the
cursor location associated with the extent handle the cursor was
pointed at a leaf node in the extent tree. This is because it saved
the starting logical block number of the current extent, but not the
"level" of the extent (where level 0 is the leaf node, level 1 is the
interior node which points at blocks containing leaf nodes, etc.)

Fix ext2fs_extent_fix_parents() so it is guaranteed to not change the
current extent in the handle even if the current extent is not at the
bottom of the tree.

Also add a fix_extent command to the tst_extents program to make it
easier to test this function.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
lib/ext2fs/extent.c | 25 ++++++++++++++++++++++++-
lib/ext2fs/extent_dbg.ct | 3 +++
2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/lib/ext2fs/extent.c b/lib/ext2fs/extent.c
index 95a0a86..c6716bc 100644
--- a/lib/ext2fs/extent.c
+++ b/lib/ext2fs/extent.c
@@ -709,9 +709,11 @@ errcode_t ext2fs_extent_goto(ext2_extent_handle_t handle,
errcode_t ext2fs_extent_fix_parents(ext2_extent_handle_t handle)
{
int retval = 0;
+ int orig_height;
blk64_t start;
struct extent_path *path;
struct ext2fs_extent extent;
+ struct ext2_extent_info info;

EXT2_CHECK_MAGIC(handle, EXT2_ET_MAGIC_EXTENT_HANDLE);

@@ -732,6 +734,10 @@ errcode_t ext2fs_extent_fix_parents(ext2_extent_handle_t handle)
/* modified node's start block */
start = extent.e_lblk;

+ if ((retval = ext2fs_extent_get_info(handle, &info)))
+ return retval;
+ orig_height = info.max_depth - info.curr_level;
+
/* traverse up until index not first, or startblk matches, or top */
while (handle->level > 0 &&
(path->left == path->entries - 1)) {
@@ -750,7 +756,7 @@ errcode_t ext2fs_extent_fix_parents(ext2_extent_handle_t handle)
}

/* put handle back to where we started */
- retval = ext2fs_extent_goto(handle, start);
+ retval = extent_goto(handle, orig_height, start);
done:
return retval;
}
@@ -1943,6 +1949,23 @@ void do_print_all(int argc, char **argv)
}
}

+void do_fix_parents(int argc, char **argv)
+{
+ struct ext2fs_extent extent;
+ struct ext2_extent_info info;
+ errcode_t retval;
+
+ if (common_extent_args_process(argc, argv, 1, 1, "fix_parents", "",
+ CHECK_FS_RW))
+ return;
+
+ retval = ext2fs_extent_fix_parents(current_handle);
+ if (retval) {
+ com_err(argv[0], retval, 0);
+ return;
+ }
+}
+
void do_info(int argc, char **argv)
{
struct ext2fs_extent extent;
diff --git a/lib/ext2fs/extent_dbg.ct b/lib/ext2fs/extent_dbg.ct
index d0571f4..c1d8033 100644
--- a/lib/ext2fs/extent_dbg.ct
+++ b/lib/ext2fs/extent_dbg.ct
@@ -55,6 +55,9 @@ request do_insert_node, "Insert node",
request do_split_node, "Split node",
split_node, split;

+request do_fix_parents, "Fix parents",
+ fix_parents, fixp;
+
request do_set_bmap, "Set block mapping",
set_bmap;

--
1.7.12.rc0.22.gcdd159b


2012-12-20 23:43:47

by Theodore Y. Ts'o

[permalink] [raw]
Subject: [PATCH 1/3] e2fsck: fix incorrect interior node logical start values

From: Eric Sandeen <[email protected]>

An index node's logical start (ei_block) should
match the logical start of the first node (index
or leaf) below it. If we find a node whose start
does not match its parent, fix all of its parents
accordingly.

If it finds such a problem, we'll see:

Pass 1: Checking inodes, blocks, and sizes
Interior extent node level 0 of inode 274258:
Logical start 3666 does not match logical start 4093 at next level. Fix<y>?

Signed-off-by: Eric Sandeen <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
e2fsck/pass1.c | 17 ++++++++++++++++-
e2fsck/problem.c | 8 ++++++++
e2fsck/problem.h | 4 +++-
lib/ext2fs/ext2fs.h | 1 +
lib/ext2fs/extent.c | 2 +-
5 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index cc00e0f..2acbb53 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -1797,7 +1797,7 @@ static void scan_extent_node(e2fsck_t ctx, struct problem_context *pctx,
problem = PR_1_EXTENT_ENDS_BEYOND;

if (problem) {
- report_problem:
+report_problem:
pctx->blk = extent.e_pblk;
pctx->blk2 = extent.e_lblk;
pctx->num = extent.e_len;
@@ -1822,7 +1822,10 @@ static void scan_extent_node(e2fsck_t ctx, struct problem_context *pctx,
}

if (!is_leaf) {
+ blk64_t lblk;
+
blk = extent.e_pblk;
+ lblk = extent.e_lblk;
pctx->errcode = ext2fs_extent_get(ehandle,
EXT2_EXTENT_DOWN, &extent);
if (pctx->errcode) {
@@ -1832,6 +1835,18 @@ static void scan_extent_node(e2fsck_t ctx, struct problem_context *pctx,
goto report_problem;
return;
}
+ /* The next extent should match this index's logical start */
+ if (extent.e_lblk != lblk) {
+ struct ext2_extent_info info;
+
+ ext2fs_extent_get_info(ehandle, &info);
+ pctx->blk = lblk;
+ pctx->blk2 = extent.e_lblk;
+ pctx->num = info.curr_level - 1;
+ problem = PR_1_EXTENT_INDEX_START_INVALID;
+ if (fix_problem(ctx, problem, pctx))
+ ext2fs_extent_fix_parents(ehandle);
+ }
scan_extent_node(ctx, pctx, pb, extent.e_lblk, ehandle);
if (pctx->errcode)
return;
diff --git a/e2fsck/problem.c b/e2fsck/problem.c
index 977a4c8..ab67cff 100644
--- a/e2fsck/problem.c
+++ b/e2fsck/problem.c
@@ -946,6 +946,14 @@ static struct e2fsck_problem problem_table[] = {
N_("@i %i has zero length extent\n\t(@n logical @b %c, physical @b %b)\n"),
PROMPT_CLEAR, 0 },

+ /*
+ * Interior extent node logical offset doesn't match first node below it
+ */
+ { PR_1_EXTENT_INDEX_START_INVALID,
+ N_("Interior @x node level %N of @i %i:\n"
+ "Logical start %b does not match logical start %c at next level. "),
+ PROMPT_FIX, 0 },
+
/* Pass 1b errors */

/* Pass 1B: Rescan for duplicate/bad blocks */
diff --git a/e2fsck/problem.h b/e2fsck/problem.h
index 1b5815b..aed524d 100644
--- a/e2fsck/problem.h
+++ b/e2fsck/problem.h
@@ -558,6 +558,9 @@ struct problem_context {
/* Extent has zero length */
#define PR_1_EXTENT_LENGTH_ZERO 0x010066

+/* Index start doesn't match start of next extent down */
+#define PR_1_EXTENT_INDEX_START_INVALID 0x01006D
+
/*
* Pass 1b errors
*/
@@ -586,7 +589,6 @@ struct problem_context {
/* Error adjusting EA refcount */
#define PR_1B_ADJ_EA_REFCOUNT 0x011007

-
/* Pass 1C: Scan directories for inodes with dup blocks. */
#define PR_1C_PASS_HEADER 0x012000

diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 7a14f40..1b35ff7 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -1082,6 +1082,7 @@ extern errcode_t ext2fs_extent_get_info(ext2_extent_handle_t handle,
struct ext2_extent_info *info);
extern errcode_t ext2fs_extent_goto(ext2_extent_handle_t handle,
blk64_t blk);
+extern errcode_t ext2fs_extent_fix_parents(ext2_extent_handle_t handle);

/* fileio.c */
extern errcode_t ext2fs_file_open2(ext2_filsys fs, ext2_ino_t ino,
diff --git a/lib/ext2fs/extent.c b/lib/ext2fs/extent.c
index 8828764..95a0a86 100644
--- a/lib/ext2fs/extent.c
+++ b/lib/ext2fs/extent.c
@@ -706,7 +706,7 @@ errcode_t ext2fs_extent_goto(ext2_extent_handle_t handle,
* Safe to call for any position in node; if not at the first entry,
* will simply return.
*/
-static errcode_t ext2fs_extent_fix_parents(ext2_extent_handle_t handle)
+errcode_t ext2fs_extent_fix_parents(ext2_extent_handle_t handle)
{
int retval = 0;
blk64_t start;
--
1.7.12.rc0.22.gcdd159b


2012-12-21 03:20:04

by Theodore Y. Ts'o

[permalink] [raw]
Subject: Re: [PATCH 3/3] e2fsck: make sure the extent tree is consistent after bogus node in the tree

And here is the test case....

BTW, #protip: You can use the split_node command in tst_extents
debugging program not only to perform node splits (which will make the
tree wider), but if you try splitting at the root node, it will
allocate a new extent tree block, and then move all of the extent tree
nodes at the top-level, in the inode, into the new exterior extent
tree block. In effect, this will make the tree deeper.

This should allow you to make fairly arbitrarily deep and complex
extent trees by hand, without having to resort to using fallocate and
punch hole commands, which tend to take a lot longer than using the
"insert_extent", "replace_extent", and "split_node" commands in
tst_extent when creating test cases.

This also makes it easier to create small test file system images so
we don't have to bloat the e2fsprogs source tree with huge test file
systems in our regression test suite (which also tend to very much
slow down running said regression test suite).

Regards,

- Ted


Attachments:
(No filename) (0.98 kB)
0001-tests-add-test-of-an-incorrect-interior-node-in-an-e.patch (3.21 kB)
Download all attachments

2012-12-21 11:02:45

by Forrest Liu

[permalink] [raw]
Subject: Re: [PATCH 3/3] e2fsck: make sure the extent tree is consistent after bogus node in the tree

I have test these patches, and they work fine.
Thanks for the tip
- Forrest

2012/12/21 Theodore Ts'o <[email protected]>:
> And here is the test case....
>
> BTW, #protip: You can use the split_node command in tst_extents
> debugging program not only to perform node splits (which will make the
> tree wider), but if you try splitting at the root node, it will
> allocate a new extent tree block, and then move all of the extent tree
> nodes at the top-level, in the inode, into the new exterior extent
> tree block. In effect, this will make the tree deeper.
>
> This should allow you to make fairly arbitrarily deep and complex
> extent trees by hand, without having to resort to using fallocate and
> punch hole commands, which tend to take a lot longer than using the
> "insert_extent", "replace_extent", and "split_node" commands in
> tst_extent when creating test cases.
>
> This also makes it easier to create small test file system images so
> we don't have to bloat the e2fsprogs source tree with huge test file
> systems in our regression test suite (which also tend to very much
> slow down running said regression test suite).
>
> Regards,
>
> - Ted
>

2012-12-21 15:34:32

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH 3/3] e2fsck: make sure the extent tree is consistent after bogus node in the tree

On 12/20/12 5:43 PM, Theodore Ts'o wrote:
> Commit 789bd401c3 ("e2fsck: fix incorrect interior node logical start
> values") surfaced a bug where if e2fsck finds and removed an invalid
> node in the extent tree, i.e.:
>
> Inode 12 has an invalid extent node (blk 22, lblk 0)
> Clear? yes
>
> It was possible for starting logical blocks found in the interior
> nodes of the extent tree. Commit 789bd401c3 added the ability for
> e2fsck to discover this problem, which resulted in the test
> f_extent_bad_node to fail when the second pass of e2fsck reported the
> following complaint:
>
> Interior extent node level 0 of inode 12:
> Logical start 0 does not match logical start 3 at next level. Fix? yes
>
> This patch fixes this by adding a call to ext2fs_extent_fix_parents()
> after deleting the bogus node in the extent tree.
>
> Signed-off-by: "Theodore Ts'o" <[email protected].edu>


Reviewed-by: Eric Sandeen <[email protected]>

Thanks Ted, sorry I didn't catch this. And this gives me hope
that maybe the extent tree corruption report I had received
might be due to an e2fsck, not kernel runtime...

-Eric

> ---
> e2fsck/pass1.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
> index 2acbb53..a8231f4 100644
> --- a/e2fsck/pass1.c
> +++ b/e2fsck/pass1.c
> @@ -1809,6 +1809,7 @@ report_problem:
> pctx->str = "ext2fs_extent_delete";
> return;
> }
> + ext2fs_extent_fix_parents(ehandle);
> pctx->errcode = ext2fs_extent_get(ehandle,
> EXT2_EXTENT_CURRENT,
> &extent);
>


2012-12-21 20:48:06

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH 1/3] e2fsck: fix incorrect interior node logical start values

On 12/20/12 5:43 PM, Theodore Ts'o wrote:
> From: Eric Sandeen <[email protected]>
>
> An index node's logical start (ei_block) should
> match the logical start of the first node (index
> or leaf) below it. If we find a node whose start
> does not match its parent, fix all of its parents
> accordingly.
>
> If it finds such a problem, we'll see:
>
> Pass 1: Checking inodes, blocks, and sizes
> Interior extent node level 0 of inode 274258:
> Logical start 3666 does not match logical start 4093 at next level. Fix<y>?

Hm, this situation might still need more work in some cases.

Looking at a "bad" extent tree reported to me:

1/ 2 25/ 29 57524 - 59011 15538183 1488
2/ 2 1/ 2 57524 - 59011 15556788 - 15558275 1488
2/ 2 2/ 2 59012 - 65535 15558276 - 15564799 6524 Uninit <- what's this extent?

1/ 2 26/ 29 59012 - 60671 15538184 1660
2/ 2 1/ 2 59012 - 60671 25638 - 27297 1660
2/ 2 2/ 2 60672 - 60689 27298 - 27315 18 Uninit

1/ 2 27/ 29 60672 - 61023 15538185 352 <- bad logical start
2/ 2 1/ 19 60690 - 60690 27316 - 27316 1 Uninit


e2fsck with my patch finds & fixes the parent issues:

Pass 1: Checking inodes, blocks, and sizes
Interior extent node level 1 of inode 8126473:
Logical start 60672 does not match logical start 60690 at next level. Fix? yes

Interior extent node level 1 of inode 8126473:
Logical start 63157 does not match logical start 63159 at next level. Fix? yes

and after that the extents look like:

1/ 2 25/ 29 57524 - 59011 15538183 1488
2/ 2 1/ 2 57524 - 59011 15556788 - 15558275 1488
2/ 2 2/ 2 59012 - 65535 15558276 - 15564799 6524 Uninit <--- ???

1/ 2 26/ 29 59012 - 60689 15538184 1678
2/ 2 1/ 2 59012 - 60671 25638 - 27297 1660
2/ 2 2/ 2 60672 - 60689 27298 - 27315 18 Uninit

1/ 2 27/ 29 60690 - 61023 15538185 334 <-- only this got fixed
2/ 2 1/ 19 60690 - 60690 27316 - 27316 1 Uninit

but in this case, it seems that the length of the range covered by the previous interior nodes is still incorrect. :(

-Eric

2012-12-24 14:57:09

by Theodore Y. Ts'o

[permalink] [raw]
Subject: Re: [PATCH 1/3] e2fsck: fix incorrect interior node logical start values

On Fri, Dec 21, 2012 at 02:47:58PM -0600, Eric Sandeen wrote:
>
> Hm, this situation might still need more work in some cases.
>
> but in this case, it seems that the length of the range covered by
> the previous interior nodes is still incorrect. :(

Yep, we've got a problem which e2fsck is not detecting. Here's a
simple test case which shows the problem:

debugfs: extents <12>
Level Entries Logical Physical Length Flags
0/ 2 1/ 2 0 - 6 23 7
1/ 2 1/ 2 0 - 3 18 4
2/ 2 1/ 2 0 - 0 37 - 37 1 Uninit
2/ 2 2/ 2 2 - 21 50 - 69 20 Uninit <----- this conflicts
1/ 2 2/ 2 4 - 6 21 3 <----- with this
2/ 2 1/ 2 4 - 4 39 - 39 1 Uninit
2/ 2 2/ 2 6 - 6 40 - 40 1 Uninit
0/ 2 2/ 2 7 - 10 24 4
1/ 2 1/ 2 7 - 8 20 2
2/ 2 1/ 2 7 - 7 41 - 41 1 Uninit
2/ 2 2/ 2 8 - 8 42 - 42 1 Uninit
1/ 2 2/ 2 9 - 10 22 2
2/ 2 1/ 2 9 - 9 43 - 43 1 Uninit
2/ 2 2/ 2 10 - 10 44 - 44 1 Uninit

In this case it's not obvious what is the right fix, since we have two
physical blocks which claim to map to the same logical block. There
are some places already in e2fsck where we today just throw up our
hands and offer to zap the entire inode, instead of trying to let the
user decide which is the correct physical blocks to use, or do some
way of saving out the conflicting inodes. It may be that's the best
way to fix this, since at the very least should be detecting that
there is a problem, and fixing it somehow. We can always try to come
up with a better way of repairing the corruption later.

Attached is a test case file system image with the above extent tree.

- Ted


Attachments:
(No filename) (1.89 kB)
leaf-node-overlap.img.gz (682.00 B)
Download all attachments