In ext4_ext_next_allocated_block(), the path[depth] might
have a p_ext that is NULL -- see ext4_ext_binsearch(). In
such a case, dereferencing it will crash the machine.
This patch checks for p_ext == NULL in
ext4_ext_next_allocated_block() before dereferencinging it.
Tested using a hand-crafted an inode with eh_entries == 0 in
an extent block, verified that running FIEMAP on it crashes
without this patch, works fine with it.
Signed-off-by: Curt Wohlgemuth <[email protected]>
---
fs/ext4/extents.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 57cf568..063a5b8 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -1395,7 +1395,9 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path)
while (depth >= 0) {
if (depth == path->p_depth) {
/* leaf */
- if (path[depth].p_ext !=
+ /* p_ext can be NULL */
+ if (path[depth].p_ext &&
+ path[depth].p_ext !=
EXT_LAST_EXTENT(path[depth].p_hdr))
return le32_to_cpu(path[depth].p_ext[1].ee_block);
} else {
--
1.7.3.1
On Sat, 8 Oct 2011, Curt Wohlgemuth wrote:
> In ext4_ext_next_allocated_block(), the path[depth] might
> have a p_ext that is NULL -- see ext4_ext_binsearch(). In
> such a case, dereferencing it will crash the machine.
>
> This patch checks for p_ext == NULL in
> ext4_ext_next_allocated_block() before dereferencinging it.
>
> Tested using a hand-crafted an inode with eh_entries == 0 in
> an extent block, verified that running FIEMAP on it crashes
> without this patch, works fine with it.
Hi Curt,
It seems to me that even that the patch fixes the NULL dereference, it
is not a complete solution. Is it possible that in "normal" case p_ext
would be NULL ? I think that this is only possible in extent split/add
case (as noted in ext4_ext_binsearch()) which should be atomic to the
other operations (locked i_mutex?).
However this seems like an inode corruption so we should probably be
more verbose about it and print an appropriate EXT4_ERROR_INODE() or
even better check for the corrupted tree in the ext4_ext_find_extent()
(instead in ext4_ext_map_blocks()), however this will need to distinguish
between the normal and the tree modification case.
Thanks!
-Lukas
>
> Signed-off-by: Curt Wohlgemuth <[email protected]>
> ---
> fs/ext4/extents.c | 4 +++-
> 1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 57cf568..063a5b8 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -1395,7 +1395,9 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path)
> while (depth >= 0) {
> if (depth == path->p_depth) {
> /* leaf */
> - if (path[depth].p_ext !=
> + /* p_ext can be NULL */
> + if (path[depth].p_ext &&
> + path[depth].p_ext !=
> EXT_LAST_EXTENT(path[depth].p_hdr))
> return le32_to_cpu(path[depth].p_ext[1].ee_block);
> } else {
>
Hi Lukas:
On Mon, Oct 10, 2011 at 12:19 AM, Lukas Czerner <[email protected]> wrote:
> On Sat, 8 Oct 2011, Curt Wohlgemuth wrote:
>
>> In ext4_ext_next_allocated_block(), the path[depth] might
>> have a p_ext that is NULL -- see ext4_ext_binsearch(). ?In
>> such a case, dereferencing it will crash the machine.
>>
>> This patch checks for p_ext == NULL in
>> ext4_ext_next_allocated_block() before dereferencinging it.
>>
>> Tested using a hand-crafted an inode with eh_entries == 0 in
>> an extent block, verified that running FIEMAP on it crashes
>> without this patch, works fine with it.
>
> Hi Curt,
>
> It seems to me that even that the patch fixes the NULL dereference, it
> is not a complete solution. Is it possible that in "normal" case p_ext
> would be NULL ? I think that this is only possible in extent split/add
> case (as noted in ext4_ext_binsearch()) which should be atomic to the
> other operations (locked i_mutex?).
Yes, unfortunately, it is possible in "normal" cases for p_ext to be NULL.
We've seen this problem during what appears to be a race between an
inode growth (or truncate?) and another task doing a FIEMAP ioctl.
The real problem is that FIEMAP handing in ext4 is just, well, buggy?
ext4_ext_walk_space() will get the i_data_sem, construct the path
array, then release the semaphore. But then it does a bazillion
accesses on the extent/header/index pointers in the path array, with
no protection against truncate, growth, or any other changes. As far
as I can tell, this is the only use of a path array retrieved from
ext4_ext_find_extent() that isn't completely covered by i_data_sem.
> However this seems like an inode corruption so we should probably be
> more verbose about it and print an appropriate EXT4_ERROR_INODE() or
> even better check for the corrupted tree in the ext4_ext_find_extent()
> (instead in ext4_ext_map_blocks()), however this will need to distinguish
> between the normal and the tree modification case.
What we've observed many times is a crash during a FIEMAP call to
ext4_ext_next_allocated_block(), which appears to me to be during a
race with another thread that's splitting the extent tree. This
causes the machine to go down with the inode in a bad state. But of
course, fsck won't detect and fix this, so when the machine comes back
up, and a FIEMAP call is done on this same inode -- without any other
threads -- it'll crash again. Hence a nasty crash loop.
So you're right, in that this isn't the "real solution." But devising
a safe, non-racy design for FIEMAP is not so simple, unless of course
you want to just hold the i_data_sem during the entire loop body of
ext4_ext_walk_space(), which would be pretty ugly. Hence the
"band-aid" approach in my patch, which at least seems correct, if not
thorough.
Thanks,
Curt
>
> Thanks!
> -Lukas
>
>>
>> Signed-off-by: Curt Wohlgemuth <[email protected]>
>> ---
>> ?fs/ext4/extents.c | ? ?4 +++-
>> ?1 files changed, 3 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index 57cf568..063a5b8 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -1395,7 +1395,9 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path)
>> ? ? ? while (depth >= 0) {
>> ? ? ? ? ? ? ? if (depth == path->p_depth) {
>> ? ? ? ? ? ? ? ? ? ? ? /* leaf */
>> - ? ? ? ? ? ? ? ? ? ? if (path[depth].p_ext !=
>> + ? ? ? ? ? ? ? ? ? ? /* p_ext can be NULL */
>> + ? ? ? ? ? ? ? ? ? ? if (path[depth].p_ext &&
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? path[depth].p_ext !=
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? EXT_LAST_EXTENT(path[depth].p_hdr))
>> ? ? ? ? ? ? ? ? ? ? ? ? return le32_to_cpu(path[depth].p_ext[1].ee_block);
>> ? ? ? ? ? ? ? } else {
>>
>
>
> > However this seems like an inode corruption so we should probably be
> > more verbose about it and print an appropriate EXT4_ERROR_INODE() or
> > even better check for the corrupted tree in the ext4_ext_find_extent()
> > (instead in ext4_ext_map_blocks()), however this will need to distinguish
> > between the normal and the tree modification case.
>
> What we've observed many times is a crash during a FIEMAP call to
> ext4_ext_next_allocated_block(), which appears to me to be during a
> race with another thread that's splitting the extent tree. This
> causes the machine to go down with the inode in a bad state. But of
> course, fsck won't detect and fix this, so when the machine comes back
> up, and a FIEMAP call is done on this same inode -- without any other
> threads -- it'll crash again. Hence a nasty crash loop.
>
> So you're right, in that this isn't the "real solution." But devising
> a safe, non-racy design for FIEMAP is not so simple, unless of course
> you want to just hold the i_data_sem during the entire loop body of
> ext4_ext_walk_space(), which would be pretty ugly. Hence the
> "band-aid" approach in my patch, which at least seems correct, if not
> thorough.
>
> Thanks,
> Curt
>
> >
> > Thanks!
> > -Lukas
> >
> >>
> >> Signed-off-by: Curt Wohlgemuth <[email protected]>
> >> ---
> >> fs/ext4/extents.c | 4 +++-
> >> 1 files changed, 3 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> >> index 57cf568..063a5b8 100644
> >> --- a/fs/ext4/extents.c
> >> +++ b/fs/ext4/extents.c
> >> @@ -1395,7 +1395,9 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path)
> >> while (depth >= 0) {
> >> if (depth == path->p_depth) {
> >> /* leaf */
> >> - if (path[depth].p_ext !=
> >> + /* p_ext can be NULL */
> >> + if (path[depth].p_ext &&
> >> + path[depth].p_ext !=
> >> EXT_LAST_EXTENT(path[depth].p_hdr))
> >> return le32_to_cpu(path[depth].p_ext[1].ee_block);
> >> } else {
> >>
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 11, 2011 at 11:01:57AM +0400, Dmitry Monakhov wrote:
> > ext4_ext_walk_space() will get the i_data_sem, construct the path
> > array, then release the semaphore. But then it does a bazillion
> > accesses on the extent/header/index pointers in the path array, with
> > no protection against truncate, growth, or any other changes. As far
> > as I can tell, this is the only use of a path array retrieved from
> > ext4_ext_find_extent() that isn't completely covered by i_data_sem.
>
> In that case i_data sem protects us from nothing. Path collected can
> simply disappear under us. And in fact i don't understand the
> reason why we drop i_data_sem too soon. Are any reason to do that?
The one concern I have is that I don't want FIEMAP to slow down "real"
ext4 processing. So what I've hoping we'll be able to do is use a
seq_lock sort of design, where if the pointer changes out from under
us, FIEMAP is forced to redo its sampling. But if there is some crazy
userspace program which is calling FIEMAP all the time, I'd much
rather that it not block ext4_map_blocks() if possible (which is what
I using a seqlock to protect the FIEMAP routines would help).
- Ted
On Sat, Oct 08, 2011 at 06:01:14PM -0700, Curt Wohlgemuth wrote:
> In ext4_ext_next_allocated_block(), the path[depth] might
> have a p_ext that is NULL -- see ext4_ext_binsearch(). In
> such a case, dereferencing it will crash the machine.
>
> This patch checks for p_ext == NULL in
> ext4_ext_next_allocated_block() before dereferencinging it.
>
> Tested using a hand-crafted an inode with eh_entries == 0 in
> an extent block, verified that running FIEMAP on it crashes
> without this patch, works fine with it.
>
> Signed-off-by: Curt Wohlgemuth <[email protected]>
Thanks, applied.
- Ted