2020-06-17 19:02:20

by Eric Sandeen

[permalink] [raw]
Subject: [PATCH 0/1] ext4: fix potential negative array index in do_split

We recently had a report of a panic in do_split; the filesystem in question
panicked a distribution kernel when trying to add a new directory entry;
the behavior/bug persists upstream.

The directory block in question had lots of unused and un-coalesced
entries, like this, printed from the loop in ext4_insert_dentry():

[32778.024654] reclen 44 for name len 36
[32778.028745] start: de ffff9f4cb5309800 top ffff9f4cb5309bd4
[32778.034971] offset 0 nlen 28 rlen 40, rlen-nlen 12, reclen 44 name <empty>
[32778.042744] offset 40 nlen 28 rlen 28, rlen-nlen 0, reclen 44 name <empty>
[32778.050521] offset 68 nlen 32 rlen 32, rlen-nlen 0, reclen 44 name <empty>
[32778.058294] offset 100 nlen 28 rlen 28, rlen-nlen 0, reclen 44 name <empty>
[32778.066166] offset 128 nlen 28 rlen 28, rlen-nlen 0, reclen 44 name <empty>
[32778.074035] offset 156 nlen 28 rlen 28, rlen-nlen 0, reclen 44 name <empty>
[32778.081907] offset 184 nlen 24 rlen 24, rlen-nlen 0, reclen 44 name <empty>
[32778.089779] offset 208 nlen 36 rlen 36, rlen-nlen 0, reclen 44 name <empty>
[32778.097648] offset 244 nlen 12 rlen 12, rlen-nlen 0, reclen 44 name REDACTED
[32778.105227] offset 256 nlen 24 rlen 24, rlen-nlen 0, reclen 44 name <empty>
[32778.113099] offset 280 nlen 24 rlen 24, rlen-nlen 0, reclen 44 name REDACTED
[32778.122134] offset 304 nlen 20 rlen 20, rlen-nlen 0, reclen 44 name REDACTED
[32778.130780] offset 324 nlen 16 rlen 16, rlen-nlen 0, reclen 44 name REDACTED
[32778.138746] offset 340 nlen 24 rlen 24, rlen-nlen 0, reclen 44 name <empty>
[32778.146616] offset 364 nlen 28 rlen 28, rlen-nlen 0, reclen 44 name <empty>
[32778.154487] offset 392 nlen 24 rlen 24, rlen-nlen 0, reclen 44 name <empty>
[32778.162362] offset 416 nlen 16 rlen 16, rlen-nlen 0, reclen 44 name <empty>
...

the file we were trying to insert needed a record length of 44, and none of the
non-coalesced <empty> slots were big enough, so we failed and told do_split
to get to work.

However, the sum of the non-empty entries didn't exceed half the block size, so
the loop in do_split() iterated over all of the entries, ended at "count," and
told us to split at (count - move) which is zero, and eventually:

continued = hash2 == map[split - 1].hash;

exploded on the negative index.

It's an open question as to how this directory got into this format; I'm not
sure if this should ever happen or not. But at a minimum, I think we should
be defensive here, hence [PATCH 1/1] will do that as an expedient fix and
backportable patch for this situation. There may be some other underlying
probem which led to this directory structure if it's unexpected, and maybe that
can come as another patch if anyone can investigate.

Thanks,
-Eric


2020-06-17 19:20:10

by Eric Sandeen

[permalink] [raw]
Subject: [PATCH 1/1] ext4: fix potential negative array index in do_split()

If for any reason a directory passed to do_split() does not have enough
active entries to exceed half the size of the block, we can end up
iterating over all "count" entries without finding a split point.

In this case, count == move, and split will be zero, and we will
attempt a negative index into map[].

Guard against this by detecting this case, and falling back to
split-to-half-of-count instead; in this case we will still have
plenty of space (> half blocksize) in each split block.

Fixes: ef2b02d3e617 ("ext34: ensure do_split leaves enough free space in both blocks")
Signed-off-by: Eric Sandeen <[email protected]>
---

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index a8aca4772aaa..8b60881f07ee 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1858,7 +1858,7 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
blocksize, hinfo, map);
map -= count;
dx_sort_map(map, count);
- /* Split the existing block in the middle, size-wise */
+ /* Ensure that neither split block is over half full */
size = 0;
move = 0;
for (i = count-1; i >= 0; i--) {
@@ -1868,8 +1868,18 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
size += map[i].size;
move++;
}
- /* map index at which we will split */
- split = count - move;
+ /*
+ * map index at which we will split
+ *
+ * If the sum of active entries didn't exceed half the block size, just
+ * split it in half by count; each resulting block will have at least
+ * half the space free.
+ */
+ if (i > 0)
+ split = count - move;
+ else
+ split = count/2;
+
hash2 = map[split].hash;
continued = hash2 == map[split - 1].hash;
dxtrace(printk(KERN_INFO "Split block %lu at %x, %i/%i\n",


2020-06-19 01:58:29

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH 1/1] ext4: fix potential negative array index in do_split()

On Jun 17, 2020, at 1:19 PM, Eric Sandeen <[email protected]> wrote:
>
> If for any reason a directory passed to do_split() does not have enough
> active entries to exceed half the size of the block, we can end up
> iterating over all "count" entries without finding a split point.
>
> In this case, count == move, and split will be zero, and we will
> attempt a negative index into map[].
>
> Guard against this by detecting this case, and falling back to
> split-to-half-of-count instead; in this case we will still have
> plenty of space (> half blocksize) in each split block.
>
> Fixes: ef2b02d3e617 ("ext34: ensure do_split leaves enough free space in both blocks")
> Signed-off-by: Eric Sandeen <[email protected]>

Reviewed-by: Andreas Dilger <[email protected]>

> ---
>
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index a8aca4772aaa..8b60881f07ee 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -1858,7 +1858,7 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
> blocksize, hinfo, map);
> map -= count;
> dx_sort_map(map, count);
> - /* Split the existing block in the middle, size-wise */
> + /* Ensure that neither split block is over half full */
> size = 0;
> move = 0;
> for (i = count-1; i >= 0; i--) {
> @@ -1868,8 +1868,18 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
> size += map[i].size;
> move++;
> }
> - /* map index at which we will split */
> - split = count - move;
> + /*
> + * map index at which we will split
> + *
> + * If the sum of active entries didn't exceed half the block size, just
> + * split it in half by count; each resulting block will have at least
> + * half the space free.
> + */
> + if (i > 0)
> + split = count - move;
> + else
> + split = count/2;
> +
> hash2 = map[split].hash;
> continued = hash2 == map[split - 1].hash;
> dxtrace(printk(KERN_INFO "Split block %lu at %x, %i/%i\n",
>
>


Cheers, Andreas






Attachments:
signature.asc (890.00 B)
Message signed with OpenPGP

2020-06-19 02:38:34

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH 0/1] ext4: fix potential negative array index in do_split

On Jun 17, 2020, at 1:01 PM, Eric Sandeen <[email protected]> wrote:
>
> We recently had a report of a panic in do_split; the filesystem in question
> panicked a distribution kernel when trying to add a new directory entry;
> the behavior/bug persists upstream.
>
> The directory block in question had lots of unused and un-coalesced
> entries, like this, printed from the loop in ext4_insert_dentry():
>
> [32778.024654] reclen 44 for name len 36
> [32778.028745] start: de ffff9f4cb5309800 top ffff9f4cb5309bd4
> [32778.034971] offset 0 nlen 28 rlen 40, rlen-nlen 12, reclen 44 name <empty>
> [32778.042744] offset 40 nlen 28 rlen 28, rlen-nlen 0, reclen 44 name <empty>
> [32778.050521] offset 68 nlen 32 rlen 32, rlen-nlen 0, reclen 44 name <empty>
> [32778.058294] offset 100 nlen 28 rlen 28, rlen-nlen 0, reclen 44 name <empty>
> [32778.066166] offset 128 nlen 28 rlen 28, rlen-nlen 0, reclen 44 name <empty>
> [32778.074035] offset 156 nlen 28 rlen 28, rlen-nlen 0, reclen 44 name <empty>
> [32778.081907] offset 184 nlen 24 rlen 24, rlen-nlen 0, reclen 44 name <empty>
> [32778.089779] offset 208 nlen 36 rlen 36, rlen-nlen 0, reclen 44 name <empty>
> [32778.097648] offset 244 nlen 12 rlen 12, rlen-nlen 0, reclen 44 name REDACTED
> [32778.105227] offset 256 nlen 24 rlen 24, rlen-nlen 0, reclen 44 name <empty>
> [32778.113099] offset 280 nlen 24 rlen 24, rlen-nlen 0, reclen 44 name REDACTED
> [32778.122134] offset 304 nlen 20 rlen 20, rlen-nlen 0, reclen 44 name REDACTED
> [32778.130780] offset 324 nlen 16 rlen 16, rlen-nlen 0, reclen 44 name REDACTED
> [32778.138746] offset 340 nlen 24 rlen 24, rlen-nlen 0, reclen 44 name <empty>
> [32778.146616] offset 364 nlen 28 rlen 28, rlen-nlen 0, reclen 44 name <empty>
> [32778.154487] offset 392 nlen 24 rlen 24, rlen-nlen 0, reclen 44 name <empty>
> [32778.162362] offset 416 nlen 16 rlen 16, rlen-nlen 0, reclen 44 name <empty>
> ...
>
> the file we were trying to insert needed a record length of 44, and none of the
> non-coalesced <empty> slots were big enough, so we failed and told do_split
> to get to work.
>
> However, the sum of the non-empty entries didn't exceed half the block size, so
> the loop in do_split() iterated over all of the entries, ended at "count," and
> told us to split at (count - move) which is zero, and eventually:
>
> continued = hash2 == map[split - 1].hash;
>
> exploded on the negative index.
>
> It's an open question as to how this directory got into this format; I'm not
> sure if this should ever happen or not. But at a minimum, I think we should
> be defensive here, hence [PATCH 1/1] will do that as an expedient fix and
> backportable patch for this situation. There may be some other underlying
> probem which led to this directory structure if it's unexpected, and maybe that
> can come as another patch if anyone can investigate.

I thought this might be a bit of a conundrum. There is *supposed* to be
merging of adjacent entries, but in some quick testing on RHEL7 (kernel
3.10.0-957.12.1.el7, same with Debian 4.14.79) shows this to be broken
if the files are deleted in dirent order (which would seem to be the most
common order):

# mkdir tmp; cd tmp
# touch file{1..100}
# rm file{33,36,37,39,41,42,43,46,47}
# debugfs -c -R "ls -ld tmp" /dev/sda1
366 100644 (1) 0 0 0 18-Jun-2020 18:43 file30
< 369> 0 (1) 0 0 <reclen= 16> <deleted> file33
< 372> 0 (1) 0 0 <reclen= 16> <deleted> file36
< 373> 0 (1) 0 0 <reclen= 16> <deleted> file37
< 375> 0 (1) 0 0 <reclen= 16> <deleted> file39
< 377> 0 (1) 0 0 <reclen= 16> <deleted> file41
< 378> 0 (1) 0 0 <reclen= 16> <deleted> file42
< 379> 0 (1) 0 0 <reclen= 16> <deleted> file43
< 382> 0 (1) 0 0 <reclen= 16> <deleted> file46
< 383> 0 (1) 0 0 <reclen= 16> <deleted> file47
386 100644 (1) 0 0 0 18-Jun-2020 18:43 file50

Above shows (with modified debugfs to show reclen for deleted files)
that the dirents are *not* combined. If the dirent *before* the
other entries is deleted, then they are merged:

# rm file30
< 366> 0 (1) 0 0 <reclen= 160> <deleted> file30
< 369> 0 (1) 0 0 <reclen= 16> <deleted> file33
< 372> 0 (1) 0 0 <reclen= 16> <deleted> file36
< 373> 0 (1) 0 0 <reclen= 16> <deleted> file37
< 375> 0 (1) 0 0 <reclen= 16> <deleted> file39
< 377> 0 (1) 0 0 <reclen= 16> <deleted> file41
< 378> 0 (1) 0 0 <reclen= 16> <deleted> file42
< 379> 0 (1) 0 0 <reclen= 16> <deleted> file43
< 382> 0 (1) 0 0 <reclen= 16> <deleted> file46
< 383> 0 (1) 0 0 <reclen= 16> <deleted> file47
386 100644 (1) 0 0 0 18-Jun-2020 18:43 file50

Cheers, Andreas






Attachments:
signature.asc (890.00 B)
Message signed with OpenPGP

2020-06-19 06:45:08

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH 1/1] ext4: fix potential negative array index in do_split()

On Wed, Jun 17, 2020 at 02:19:04PM -0500, Eric Sandeen wrote:
> If for any reason a directory passed to do_split() does not have enough
> active entries to exceed half the size of the block, we can end up
> iterating over all "count" entries without finding a split point.
>
> In this case, count == move, and split will be zero, and we will
> attempt a negative index into map[].
>
> Guard against this by detecting this case, and falling back to
> split-to-half-of-count instead; in this case we will still have
> plenty of space (> half blocksize) in each split block.
>
> Fixes: ef2b02d3e617 ("ext34: ensure do_split leaves enough free space in both blocks")
> Signed-off-by: Eric Sandeen <[email protected]>
> ---
>
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index a8aca4772aaa..8b60881f07ee 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -1858,7 +1858,7 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
> blocksize, hinfo, map);
> map -= count;
> dx_sort_map(map, count);
> - /* Split the existing block in the middle, size-wise */
> + /* Ensure that neither split block is over half full */
> size = 0;
> move = 0;
> for (i = count-1; i >= 0; i--) {
> @@ -1868,8 +1868,18 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
> size += map[i].size;
> move++;
> }
> - /* map index at which we will split */
> - split = count - move;
> + /*
> + * map index at which we will split
> + *
> + * If the sum of active entries didn't exceed half the block size, just
> + * split it in half by count; each resulting block will have at least
> + * half the space free.
> + */
> + if (i > 0)
> + split = count - move;
> + else
> + split = count/2;

Won't we have exactly the same problem as we did before your commit
ef2b02d3e617cb0400eedf2668f86215e1b0e6af ? Since we do not know how much
space we actually moved we might have not made enough space for the new
entry ?

Also since we have the move == count when the problem appears then it's
clear that we never hit the condition

1865 → → /* is more than half of this entry in 2nd half of the block? */
1866 → → if (size + map[i].size/2 > blocksize/2)
1867 → → → break;

in the loop. This is surprising but it means the the entries must have
gaps between them that are small enough that we can't fit the entry
right in ? Should not we try to compact it before splitting, or is it
the case that this should have been done somewhere else ?

If we really want ot be fair and we want to split it right in the middle
of the entries size-wise then we need to keep track of of sum of the
entries and decide based on that, not blocksize/2. But maybe the problem
could be solved by compacting the entries together because the condition
seems to rely on that.

-Lukas

> +
> hash2 = map[split].hash;
> continued = hash2 == map[split - 1].hash;
> dxtrace(printk(KERN_INFO "Split block %lu at %x, %i/%i\n",
>
>

2020-06-19 07:09:44

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH 1/1] ext4: fix potential negative array index in do_split()

On Fri, Jun 19, 2020 at 08:41:22AM +0200, Lukas Czerner wrote:
> On Wed, Jun 17, 2020 at 02:19:04PM -0500, Eric Sandeen wrote:
> > If for any reason a directory passed to do_split() does not have enough
> > active entries to exceed half the size of the block, we can end up
> > iterating over all "count" entries without finding a split point.
> >
> > In this case, count == move, and split will be zero, and we will
> > attempt a negative index into map[].
> >
> > Guard against this by detecting this case, and falling back to
> > split-to-half-of-count instead; in this case we will still have
> > plenty of space (> half blocksize) in each split block.
> >
> > Fixes: ef2b02d3e617 ("ext34: ensure do_split leaves enough free space in both blocks")
> > Signed-off-by: Eric Sandeen <[email protected]>
> > ---
> >
> > diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> > index a8aca4772aaa..8b60881f07ee 100644
> > --- a/fs/ext4/namei.c
> > +++ b/fs/ext4/namei.c
> > @@ -1858,7 +1858,7 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
> > blocksize, hinfo, map);
> > map -= count;
> > dx_sort_map(map, count);
> > - /* Split the existing block in the middle, size-wise */
> > + /* Ensure that neither split block is over half full */
> > size = 0;
> > move = 0;
> > for (i = count-1; i >= 0; i--) {
> > @@ -1868,8 +1868,18 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
> > size += map[i].size;
> > move++;
> > }
> > - /* map index at which we will split */
> > - split = count - move;
> > + /*
> > + * map index at which we will split
> > + *
> > + * If the sum of active entries didn't exceed half the block size, just
> > + * split it in half by count; each resulting block will have at least
> > + * half the space free.
> > + */
> > + if (i > 0)
> > + split = count - move;
> > + else
> > + split = count/2;
>
> Won't we have exactly the same problem as we did before your commit
> ef2b02d3e617cb0400eedf2668f86215e1b0e6af ? Since we do not know how much
> space we actually moved we might have not made enough space for the new
> entry ?
>
> Also since we have the move == count when the problem appears then it's
> clear that we never hit the condition
>
> 1865 → → /* is more than half of this entry in 2nd half of the block? */
> 1866 → → if (size + map[i].size/2 > blocksize/2)
> 1867 → → → break;
>
> in the loop. This is surprising but it means the the entries must have
> gaps between them that are small enough that we can't fit the entry
> right in ? Should not we try to compact it before splitting, or is it
> the case that this should have been done somewhere else ?

The other possibility is that map[i].size is not right and indeed there
seems to be a bug in dx_make_map()

map_tail->size = le16_to_cpu(de->rec_len);

should be

map_tail->size = ext4_rec_len_from_disk(de->rec_len, blocksize));

right ? Otherwise with large enough records the size will be smaller
than it really is.

A quick look at fs/ext4/namei.c reveals couple of places there rec_len
is used without the conversion and we should check whether it needs
fixing.

-Lukas



>
> If we really want ot be fair and we want to split it right in the middle
> of the entries size-wise then we need to keep track of of sum of the
> entries and decide based on that, not blocksize/2. But maybe the problem
> could be solved by compacting the entries together because the condition
> seems to rely on that.
>
> -Lukas
>
> > +
> > hash2 = map[split].hash;
> > continued = hash2 == map[split - 1].hash;
> > dxtrace(printk(KERN_INFO "Split block %lu at %x, %i/%i\n",
> >
> >
>

2020-06-19 12:38:24

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH 1/1] ext4: fix potential negative array index in do_split()

On Fri, Jun 19, 2020 at 09:08:54AM +0200, Lukas Czerner wrote:
> On Fri, Jun 19, 2020 at 08:41:22AM +0200, Lukas Czerner wrote:
> > On Wed, Jun 17, 2020 at 02:19:04PM -0500, Eric Sandeen wrote:
> > > If for any reason a directory passed to do_split() does not have enough
> > > active entries to exceed half the size of the block, we can end up
> > > iterating over all "count" entries without finding a split point.
> > >
> > > In this case, count == move, and split will be zero, and we will
> > > attempt a negative index into map[].
> > >
> > > Guard against this by detecting this case, and falling back to
> > > split-to-half-of-count instead; in this case we will still have
> > > plenty of space (> half blocksize) in each split block.
> > >
> > > Fixes: ef2b02d3e617 ("ext34: ensure do_split leaves enough free space in both blocks")
> > > Signed-off-by: Eric Sandeen <[email protected]>
> > > ---
> > >
> > > diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> > > index a8aca4772aaa..8b60881f07ee 100644
> > > --- a/fs/ext4/namei.c
> > > +++ b/fs/ext4/namei.c
> > > @@ -1858,7 +1858,7 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
> > > blocksize, hinfo, map);
> > > map -= count;
> > > dx_sort_map(map, count);
> > > - /* Split the existing block in the middle, size-wise */
> > > + /* Ensure that neither split block is over half full */
> > > size = 0;
> > > move = 0;
> > > for (i = count-1; i >= 0; i--) {
> > > @@ -1868,8 +1868,18 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
> > > size += map[i].size;
> > > move++;
> > > }
> > > - /* map index at which we will split */
> > > - split = count - move;
> > > + /*
> > > + * map index at which we will split
> > > + *
> > > + * If the sum of active entries didn't exceed half the block size, just
> > > + * split it in half by count; each resulting block will have at least
> > > + * half the space free.
> > > + */
> > > + if (i > 0)
> > > + split = count - move;
> > > + else
> > > + split = count/2;
> >
> > Won't we have exactly the same problem as we did before your commit
> > ef2b02d3e617cb0400eedf2668f86215e1b0e6af ? Since we do not know how much
> > space we actually moved we might have not made enough space for the new
> > entry ?
> >
> > Also since we have the move == count when the problem appears then it's
> > clear that we never hit the condition
> >
> > 1865 → → /* is more than half of this entry in 2nd half of the block? */
> > 1866 → → if (size + map[i].size/2 > blocksize/2)
> > 1867 → → → break;
> >
> > in the loop. This is surprising but it means the the entries must have
> > gaps between them that are small enough that we can't fit the entry
> > right in ? Should not we try to compact it before splitting, or is it
> > the case that this should have been done somewhere else ?
>
> The other possibility is that map[i].size is not right and indeed there
> seems to be a bug in dx_make_map()
>
> map_tail->size = le16_to_cpu(de->rec_len);
>
> should be
>
> map_tail->size = ext4_rec_len_from_disk(de->rec_len, blocksize));
>
> right ? Otherwise with large enough records the size will be smaller
> than it really is.
>
> A quick look at fs/ext4/namei.c reveals couple of places there rec_len
> is used without the conversion and we should check whether it needs
> fixing.
>
> -Lukas

And indeed the following patch seems to have fixed the issue we were
seeing. Eric I think that this might be a proper fix. But we still need
to check the other uses of rec_len to make sure it's ok as well.

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 94ec882..5509fdc 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1068,7 +1068,7 @@ static int dx_make_map(struct ext4_dir_entry_2 *de, unsigned blocksize,
map_tail--;
map_tail->hash = h.hash;
map_tail->offs = ((char *) de - base)>>2;
- map_tail->size = le16_to_cpu(de->rec_len);
+ map_tail->size = ext4_rec_len_from_disk(le16_to_cpu(de->rec_len), blocksize);
count++;
cond_resched();
}


>
>
>
> >
> > If we really want ot be fair and we want to split it right in the middle
> > of the entries size-wise then we need to keep track of of sum of the
> > entries and decide based on that, not blocksize/2. But maybe the problem
> > could be solved by compacting the entries together because the condition
> > seems to rely on that.
> >
> > -Lukas
> >
> > > +
> > > hash2 = map[split].hash;
> > > continued = hash2 == map[split - 1].hash;
> > > dxtrace(printk(KERN_INFO "Split block %lu at %x, %i/%i\n",
> > >
> > >
> >
>

2020-06-19 13:43:42

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH 1/1] ext4: fix potential negative array index in do_split()

On 6/19/20 1:41 AM, Lukas Czerner wrote:
> On Wed, Jun 17, 2020 at 02:19:04PM -0500, Eric Sandeen wrote:
>> If for any reason a directory passed to do_split() does not have enough
>> active entries to exceed half the size of the block, we can end up
>> iterating over all "count" entries without finding a split point.
>>
>> In this case, count == move, and split will be zero, and we will
>> attempt a negative index into map[].
>>
>> Guard against this by detecting this case, and falling back to
>> split-to-half-of-count instead; in this case we will still have
>> plenty of space (> half blocksize) in each split block.

...

>> + /*
>> + * map index at which we will split
>> + *
>> + * If the sum of active entries didn't exceed half the block size, just
>> + * split it in half by count; each resulting block will have at least
>> + * half the space free.
>> + */
>> + if (i > 0)
>> + split = count - move;
>> + else
>> + split = count/2;
>
> Won't we have exactly the same problem as we did before your commit
> ef2b02d3e617cb0400eedf2668f86215e1b0e6af ? Since we do not know how much
> space we actually moved we might have not made enough space for the new
> entry ?

I don't think so - while we don't have the original reproducer, I assume that
it was the case where the block was very full, and splitting by count left us
with one of the split blocks still over half full (because ensuring that we
split in half by size seemed to fix it)

In this case, the sum of the active entries was <= half the block size.
So if we split by count, we're guaranteed to have >= half the block size free
in each side of the split.

> Also since we have the move == count when the problem appears then it's
> clear that we never hit the condition
>
> 1865 → → /* is more than half of this entry in 2nd half of the block? */
> 1866 → → if (size + map[i].size/2 > blocksize/2)
> 1867 → → → break;
>
> in the loop. This is surprising but it means the the entries must have
> gaps between them that are small enough that we can't fit the entry
> right in ? Should not we try to compact it before splitting, or is it
> the case that this should have been done somewhere else ?

Yes, that's exactly what happened - see my 0/1 cover letter. Maybe that should
be in the patch description itself. ALso, yes compaction would help but I was
unclear as to whether that should be done here, is the side effect of some other
bug, etc. In general, we do seem to do compaction elsewhere and I don't know
how we got to this point.

> If we really want ot be fair and we want to split it right in the middle
> of the entries size-wise then we need to keep track of of sum of the
> entries and decide based on that, not blocksize/2. But maybe the problem
> could be solved by compacting the entries together because the condition
> seems to rely on that.

I thought about that as well, but it took a bit more code to do; we could make
make_map() return both count and total size, for example. But based on my
theory above that both sides of the split will have >= half block free, it
didn't seem necessary, particularly since this seems like an edge case?

-Eric

> -Lukas
>
>> +
>> hash2 = map[split].hash;
>> continued = hash2 == map[split - 1].hash;
>> dxtrace(printk(KERN_INFO "Split block %lu at %x, %i/%i\n",
>>
>>
>

2020-06-19 13:45:20

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH 1/1] ext4: fix potential negative array index in do_split()

On 6/19/20 6:16 AM, Lukas Czerner wrote:

>> The other possibility is that map[i].size is not right and indeed there
>> seems to be a bug in dx_make_map()
>>
>> map_tail->size = le16_to_cpu(de->rec_len);
>>
>> should be
>>
>> map_tail->size = ext4_rec_len_from_disk(de->rec_len, blocksize));
>>
>> right ? Otherwise with large enough records the size will be smaller
>> than it really is.
>>
>> A quick look at fs/ext4/namei.c reveals couple of places there rec_len
>> is used without the conversion and we should check whether it needs
>> fixing.
>>
>> -Lukas
>
> And indeed the following patch seems to have fixed the issue we were
> seeing. Eric I think that this might be a proper fix. But we still need
> to check the other uses of rec_len to make sure it's ok as well.
>
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 94ec882..5509fdc 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -1068,7 +1068,7 @@ static int dx_make_map(struct ext4_dir_entry_2 *de, unsigned blocksize,
> map_tail--;
> map_tail->hash = h.hash;
> map_tail->offs = ((char *) de - base)>>2;
> - map_tail->size = le16_to_cpu(de->rec_len);
> + map_tail->size = ext4_rec_len_from_disk(le16_to_cpu(de->rec_len), blocksize);

That isn't right, ext4_rec_len_from_disk /takes/ an __le16 :)

- map_tail->size = le16_to_cpu(de->rec_len);
+ map_tail->size = ext4_rec_len_from_disk(de->rec_len), blocksize);

would be more correct, but won't matter for PAGE_SIZE < 65536 right?

-Eric

2020-06-19 13:46:03

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH 1/1] ext4: fix potential negative array index in do_split()

On 6/19/20 2:08 AM, Lukas Czerner wrote:
> On Fri, Jun 19, 2020 at 08:41:22AM +0200, Lukas Czerner wrote:
>> On Wed, Jun 17, 2020 at 02:19:04PM -0500, Eric Sandeen wrote:
>>> If for any reason a directory passed to do_split() does not have enough
>>> active entries to exceed half the size of the block, we can end up
>>> iterating over all "count" entries without finding a split point.
>>>
>>> In this case, count == move, and split will be zero, and we will
>>> attempt a negative index into map[].
>>>
>>> Guard against this by detecting this case, and falling back to
>>> split-to-half-of-count instead; in this case we will still have
>>> plenty of space (> half blocksize) in each split block.
>>>
>>> Fixes: ef2b02d3e617 ("ext34: ensure do_split leaves enough free space in both blocks")
>>> Signed-off-by: Eric Sandeen <[email protected]>
>>> ---
>>>
>>> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
>>> index a8aca4772aaa..8b60881f07ee 100644
>>> --- a/fs/ext4/namei.c
>>> +++ b/fs/ext4/namei.c
>>> @@ -1858,7 +1858,7 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
>>> blocksize, hinfo, map);
>>> map -= count;
>>> dx_sort_map(map, count);
>>> - /* Split the existing block in the middle, size-wise */
>>> + /* Ensure that neither split block is over half full */
>>> size = 0;
>>> move = 0;
>>> for (i = count-1; i >= 0; i--) {
>>> @@ -1868,8 +1868,18 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
>>> size += map[i].size;
>>> move++;
>>> }
>>> - /* map index at which we will split */
>>> - split = count - move;
>>> + /*
>>> + * map index at which we will split
>>> + *
>>> + * If the sum of active entries didn't exceed half the block size, just
>>> + * split it in half by count; each resulting block will have at least
>>> + * half the space free.
>>> + */
>>> + if (i > 0)
>>> + split = count - move;
>>> + else
>>> + split = count/2;
>>
>> Won't we have exactly the same problem as we did before your commit
>> ef2b02d3e617cb0400eedf2668f86215e1b0e6af ? Since we do not know how much
>> space we actually moved we might have not made enough space for the new
>> entry ?
>>
>> Also since we have the move == count when the problem appears then it's
>> clear that we never hit the condition
>>
>> 1865 → → /* is more than half of this entry in 2nd half of the block? */
>> 1866 → → if (size + map[i].size/2 > blocksize/2)
>> 1867 → → → break;
>>
>> in the loop. This is surprising but it means the the entries must have
>> gaps between them that are small enough that we can't fit the entry
>> right in ? Should not we try to compact it before splitting, or is it
>> the case that this should have been done somewhere else ?
>
> The other possibility is that map[i].size is not right and indeed there
> seems to be a bug in dx_make_map()
>
> map_tail->size = le16_to_cpu(de->rec_len);
>
> should be
>
> map_tail->size = ext4_rec_len_from_disk(de->rec_len, blocksize));
>
> right ? Otherwise with large enough records the size will be smaller
> than it really is.

well, those are the same thing unless (PAGE_SIZE >= 65536) so I don't
think that's the issue here.

static inline unsigned int
ext4_rec_len_from_disk(__le16 dlen, unsigned blocksize)
{
unsigned len = le16_to_cpu(dlen);

#if (PAGE_SIZE >= 65536)
...
#else
return len;
#endif
}

Should be fixed for consistency, but seems to not be a root cause here.

> A quick look at fs/ext4/namei.c reveals couple of places there rec_len
> is used without the conversion and we should check whether it needs
> fixing.

...

2020-06-19 13:50:47

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH 1/1] ext4: fix potential negative array index in do_split()

On Fri, Jun 19, 2020 at 08:42:23AM -0500, Eric Sandeen wrote:
> On 6/19/20 2:08 AM, Lukas Czerner wrote:
> > On Fri, Jun 19, 2020 at 08:41:22AM +0200, Lukas Czerner wrote:
> >> On Wed, Jun 17, 2020 at 02:19:04PM -0500, Eric Sandeen wrote:
> >>> If for any reason a directory passed to do_split() does not have enough
> >>> active entries to exceed half the size of the block, we can end up
> >>> iterating over all "count" entries without finding a split point.
> >>>
> >>> In this case, count == move, and split will be zero, and we will
> >>> attempt a negative index into map[].
> >>>
> >>> Guard against this by detecting this case, and falling back to
> >>> split-to-half-of-count instead; in this case we will still have
> >>> plenty of space (> half blocksize) in each split block.
> >>>
> >>> Fixes: ef2b02d3e617 ("ext34: ensure do_split leaves enough free space in both blocks")
> >>> Signed-off-by: Eric Sandeen <[email protected]>
> >>> ---
> >>>
> >>> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> >>> index a8aca4772aaa..8b60881f07ee 100644
> >>> --- a/fs/ext4/namei.c
> >>> +++ b/fs/ext4/namei.c
> >>> @@ -1858,7 +1858,7 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
> >>> blocksize, hinfo, map);
> >>> map -= count;
> >>> dx_sort_map(map, count);
> >>> - /* Split the existing block in the middle, size-wise */
> >>> + /* Ensure that neither split block is over half full */
> >>> size = 0;
> >>> move = 0;
> >>> for (i = count-1; i >= 0; i--) {
> >>> @@ -1868,8 +1868,18 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
> >>> size += map[i].size;
> >>> move++;
> >>> }
> >>> - /* map index at which we will split */
> >>> - split = count - move;
> >>> + /*
> >>> + * map index at which we will split
> >>> + *
> >>> + * If the sum of active entries didn't exceed half the block size, just
> >>> + * split it in half by count; each resulting block will have at least
> >>> + * half the space free.
> >>> + */
> >>> + if (i > 0)
> >>> + split = count - move;
> >>> + else
> >>> + split = count/2;
> >>
> >> Won't we have exactly the same problem as we did before your commit
> >> ef2b02d3e617cb0400eedf2668f86215e1b0e6af ? Since we do not know how much
> >> space we actually moved we might have not made enough space for the new
> >> entry ?
> >>
> >> Also since we have the move == count when the problem appears then it's
> >> clear that we never hit the condition
> >>
> >> 1865 → → /* is more than half of this entry in 2nd half of the block? */
> >> 1866 → → if (size + map[i].size/2 > blocksize/2)
> >> 1867 → → → break;
> >>
> >> in the loop. This is surprising but it means the the entries must have
> >> gaps between them that are small enough that we can't fit the entry
> >> right in ? Should not we try to compact it before splitting, or is it
> >> the case that this should have been done somewhere else ?
> >
> > The other possibility is that map[i].size is not right and indeed there
> > seems to be a bug in dx_make_map()
> >
> > map_tail->size = le16_to_cpu(de->rec_len);
> >
> > should be
> >
> > map_tail->size = ext4_rec_len_from_disk(de->rec_len, blocksize));
> >
> > right ? Otherwise with large enough records the size will be smaller
> > than it really is.
>
> well, those are the same thing unless (PAGE_SIZE >= 65536) so I don't
> think that's the issue here.
>
> static inline unsigned int
> ext4_rec_len_from_disk(__le16 dlen, unsigned blocksize)
> {
> unsigned len = le16_to_cpu(dlen);
>
> #if (PAGE_SIZE >= 65536)
> ...
> #else
> return len;
> #endif
> }

Ah you're right. The reproducer for this is kind of unreliable as well
so that's why it looked to be fxied with this I guess.

>
> Should be fixed for consistency, but seems to not be a root cause here.

Agreed.

-Lukas

>
> > A quick look at fs/ext4/namei.c reveals couple of places there rec_len
> > is used without the conversion and we should check whether it needs
> > fixing.
>
> ...
>

2020-06-19 13:55:02

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH 1/1] ext4: fix potential negative array index in do_split()

On Fri, Jun 19, 2020 at 08:44:19AM -0500, Eric Sandeen wrote:
> On 6/19/20 6:16 AM, Lukas Czerner wrote:
>
> >> The other possibility is that map[i].size is not right and indeed there
> >> seems to be a bug in dx_make_map()
> >>
> >> map_tail->size = le16_to_cpu(de->rec_len);
> >>
> >> should be
> >>
> >> map_tail->size = ext4_rec_len_from_disk(de->rec_len, blocksize));
> >>
> >> right ? Otherwise with large enough records the size will be smaller
> >> than it really is.
> >>
> >> A quick look at fs/ext4/namei.c reveals couple of places there rec_len
> >> is used without the conversion and we should check whether it needs
> >> fixing.
> >>
> >> -Lukas
> >
> > And indeed the following patch seems to have fixed the issue we were
> > seeing. Eric I think that this might be a proper fix. But we still need
> > to check the other uses of rec_len to make sure it's ok as well.
> >
> > diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> > index 94ec882..5509fdc 100644
> > --- a/fs/ext4/namei.c
> > +++ b/fs/ext4/namei.c
> > @@ -1068,7 +1068,7 @@ static int dx_make_map(struct ext4_dir_entry_2 *de, unsigned blocksize,
> > map_tail--;
> > map_tail->hash = h.hash;
> > map_tail->offs = ((char *) de - base)>>2;
> > - map_tail->size = le16_to_cpu(de->rec_len);
> > + map_tail->size = ext4_rec_len_from_disk(le16_to_cpu(de->rec_len), blocksize);
>
> That isn't right, ext4_rec_len_from_disk /takes/ an __le16 :)
>
> - map_tail->size = le16_to_cpu(de->rec_len);
> + map_tail->size = ext4_rec_len_from_disk(de->rec_len), blocksize);

Yep, my bad.

>
> would be more correct, but won't matter for PAGE_SIZE < 65536 right?

True, it's not the problem we're seeing.

-Lukas

>
> -Eric
>

2020-07-08 16:10:45

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 1/1] ext4: fix potential negative array index in do_split()

On Fri 19-06-20 08:39:53, Eric Sandeen wrote:
> On 6/19/20 1:41 AM, Lukas Czerner wrote:
> > On Wed, Jun 17, 2020 at 02:19:04PM -0500, Eric Sandeen wrote:
> >> If for any reason a directory passed to do_split() does not have enough
> >> active entries to exceed half the size of the block, we can end up
> >> iterating over all "count" entries without finding a split point.
> >>
> >> In this case, count == move, and split will be zero, and we will
> >> attempt a negative index into map[].
> >>
> >> Guard against this by detecting this case, and falling back to
> >> split-to-half-of-count instead; in this case we will still have
> >> plenty of space (> half blocksize) in each split block.
>
> ...
>
> >> + /*
> >> + * map index at which we will split
> >> + *
> >> + * If the sum of active entries didn't exceed half the block size, just
> >> + * split it in half by count; each resulting block will have at least
> >> + * half the space free.
> >> + */
> >> + if (i > 0)
> >> + split = count - move;
> >> + else
> >> + split = count/2;
> >
> > Won't we have exactly the same problem as we did before your commit
> > ef2b02d3e617cb0400eedf2668f86215e1b0e6af ? Since we do not know how much
> > space we actually moved we might have not made enough space for the new
> > entry ?
>
> I don't think so - while we don't have the original reproducer, I assume that
> it was the case where the block was very full, and splitting by count left us
> with one of the split blocks still over half full (because ensuring that we
> split in half by size seemed to fix it)
>
> In this case, the sum of the active entries was <= half the block size.
> So if we split by count, we're guaranteed to have >= half the block size free
> in each side of the split.
>
> > Also since we have the move == count when the problem appears then it's
> > clear that we never hit the condition
> >
> > 1865 → → /* is more than half of this entry in 2nd half of the block? */
> > 1866 → → if (size + map[i].size/2 > blocksize/2)
> > 1867 → → → break;
> >
> > in the loop. This is surprising but it means the the entries must have
> > gaps between them that are small enough that we can't fit the entry
> > right in ? Should not we try to compact it before splitting, or is it
> > the case that this should have been done somewhere else ?
>
> Yes, that's exactly what happened - see my 0/1 cover letter. Maybe that should
> be in the patch description itself. ALso, yes compaction would help but I was
> unclear as to whether that should be done here, is the side effect of some other
> bug, etc. In general, we do seem to do compaction elsewhere and I don't know
> how we got to this point.
>
> > If we really want ot be fair and we want to split it right in the middle
> > of the entries size-wise then we need to keep track of of sum of the
> > entries and decide based on that, not blocksize/2. But maybe the problem
> > could be solved by compacting the entries together because the condition
> > seems to rely on that.
>
> I thought about that as well, but it took a bit more code to do; we could make
> make_map() return both count and total size, for example. But based on my
> theory above that both sides of the split will have >= half block free, it
> didn't seem necessary, particularly since this seems like an edge case?

This didn't seem to conclude in any way? The patch looks good to me FWIW so
feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Ted, can you please pick this patch up? Thanks!

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-07-30 01:49:55

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 1/1] ext4: fix potential negative array index in do_split()

On Wed, Jun 17, 2020 at 02:19:04PM -0500, Eric Sandeen wrote:
> If for any reason a directory passed to do_split() does not have enough
> active entries to exceed half the size of the block, we can end up
> iterating over all "count" entries without finding a split point.
>
> In this case, count == move, and split will be zero, and we will
> attempt a negative index into map[].
>
> Guard against this by detecting this case, and falling back to
> split-to-half-of-count instead; in this case we will still have
> plenty of space (> half blocksize) in each split block.
>
> Fixes: ef2b02d3e617 ("ext34: ensure do_split leaves enough free space in both blocks")
> Signed-off-by: Eric Sandeen <[email protected]>

Thanks, applied.

- Ted